Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iter_lines gives different results depending on chunk_size #4121

Closed
a-p-f opened this issue May 31, 2017 · 3 comments
Closed

iter_lines gives different results depending on chunk_size #4121

a-p-f opened this issue May 31, 2017 · 3 comments

Comments

@a-p-f
Copy link

a-p-f commented May 31, 2017

When using iter_lines with chunk_size other than None, it can generate an extra blank line if the end of a chunk lands on a delimiter.

Reproduction Steps

import StringIO
import requests

def getTestFile():
    f = StringIO.StringIO()
    f.write('a-b')
    f.seek(0)
    return f

r = requests.Response()
r.raw = getTestFile()
print [l for l in r.iter_lines(delimiter='-')]

r = requests.Response()
r.raw = getTestFile()
print [l for l in r.iter_lines(delimiter='-', chunk_size=2)]

Expected Result

['a', 'b']
['a', 'b']

Actual Result

['a', 'b']
['a', '', 'b']

Proposed Fix

I modified iter_lines as follows, and the above test passes:

    def iter_lines(self, chunk_size=ITER_CHUNK_SIZE, decode_unicode=None, delimiter=None):
        """Iterates over the response data, one line at a time.  When
        stream=True is set on the request, this avoids reading the
        content at once into memory for large responses.

        .. note:: This method is not reentrant safe.
        """

        pending = None

        for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):

            if pending is not None:
                chunk = pending + chunk

            if delimiter:
                lines = chunk.split(delimiter)
            else:
                lines = chunk.splitlines()

            if lines and lines[-1] and chunk and lines[-1][-1] == chunk[-1]:
                pending = lines.pop()
            elif lines and lines[-1] == '':
                pending = lines.pop()
            else:
                pending = None

            for line in lines:
                yield line

        if pending is not None:
            yield pending

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.3"
  }, 
  "cryptography": {
    "version": ""
  }, 
  "implementation": {
    "name": "CPython", 
    "version": "2.7.13"
  }, 
  "platform": {
    "release": "16.6.0", 
    "system": "Darwin"
  }, 
  "pyOpenSSL": {
    "openssl_version": "", 
    "version": null
  }, 
  "requests": {
    "version": "2.17.3"
  }, 
  "system_ssl": {
    "version": "100020bf"
  }, 
  "urllib3": {
    "version": "1.21.1"
  }, 
  "using_pyopenssl": false
}

This command is only available on Requests v2.16.4 and greater. Otherwise,
please provide some basic information about your system (Python version,
operating system, &c).

@nateprewitt
Copy link
Member

Hey @a-p-f, thanks for putting together such a detailed issue. This is actually a duplicate of #3980 and addressed by #3984. This is a breaking change though, so it won't be available until 3.0.0 is released.

I'm going to close this out as a dupe, but please let us know if you have any further questions.

@a-p-f
Copy link
Author

a-p-f commented May 31, 2017

Sorry about that. I searched the issues for 'iter_lines', but I had the 'is:open' filter on so it didn't pop up.

@nateprewitt
Copy link
Member

No worries! 😄

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants