Why decode the response body of a redirect? #1939

Closed
mechanical-snail opened this Issue Mar 5, 2014 · 10 comments

Comments

Projects
None yet
4 participants

Requests fails on the URL http://www.whatbird.com/forum/index.php?/gallery/image/291517-foo/, which is a 301 redirect to

http://www.whatbird.com/forum/index.php?/gallery/image/291517-title-paused-jewel-allens-hummingbird-a-backyard-bird-painting-in-oil-by-camille-engel/

. The issue seems to be that the server's initial 301 response has a header falsely claiming that the response body (a simple HTML page) is gzipped, when it's actually uncompressed.

When resolving redirects, Requests does (in requests.sessions.resolve_redirects):

resp.content  # Consume socket so it can be released

which attempts to decode

One could legitimately say this is the server's problem. However, conceptually, why decode the response body of a redirect, which won't get returned? Other programs (Chromium, Firefox, curl) don't do this. For example, curl gives an error, as expected, when not following redirects:

$ curl --compressed 'http://www.whatbird.com/forum/index.php?/gallery/image/291517-foo/'
curl: (61) Error while processing content unencoding: invalid code lengths set

whereas it works if you add the --location flag (follow redirects).

Example of error

Python 3.3.2+ (default, Oct  9 2013, 14:56:03) 
[GCC 4.8.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests ; requests.get('http://www.whatbird.com/forum/index.php?/gallery/image/291517-foo/')
Traceback (most recent call last):
  File "./requests/packages/urllib3/response.py", line 199, in read
    data = self._decoder.decompress(data)
zlib.error: Error -3 while decompressing data: incorrect header check

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./requests/models.py", line 629, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "./requests/packages/urllib3/response.py", line 236, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "./requests/packages/urllib3/response.py", line 204, in read
    e)
requests.packages.urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "./requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "./requests/sessions.py", line 393, in request
    resp = self.send(prep, **send_kwargs)
  File "./requests/sessions.py", line 496, in send
    r = adapter.send(request, **kwargs)
  File "./requests/adapters.py", line 391, in send
    r.content
  File "./requests/models.py", line 691, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "./requests/models.py", line 634, in generate
    raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))
Collaborator

sigmavirus24 commented Mar 5, 2014

You answered your own question when you pasted that line. The comment explains exactly why we consume the content. If we do not, then a user handling a great deal of redirects will run out of available connections that can be made. Sockets will sit in the ready state waiting to be read from. I think it also has the potential to cause a memory usage issue when the ref count does not reach 0 and the socket is not garbage collected. We should be, however, catching that error in resolve_redirects.

Thank you for raising this issue! I'll throw together a PR to patch this.

I agree that consuming the raw response data from the socket is needed. I'm asking why we should decode the data.

(And thanks for the quick response.)

Collaborator

Lukasa commented Mar 5, 2014

We decode the data because your assertion that it won't be read is false. You may read the response body from any redirect because we save it. Each redirect builds a full response object that can be used exactly like any other. This is a very good thing, and won't be changed. =)

The fix, as @sigmavirus24 has suggested, is simply to catch this error.

Contributor

schlamar commented Mar 7, 2014

Interesting. It's very likely that this is a duplicate of shazow/urllib3#206 / #1472 because there is also an 301 redirect in the curl output. @shazow

Collaborator

Lukasa commented Mar 7, 2014

@schlamar That's very interesting.

However, this isn't a dupe, it's just related. The key is that we shouldn't really care even if we hit a legitimate decoding error when following redirects: we just want to do our best and then move on.

Collaborator

sigmavirus24 commented Mar 7, 2014

@Lukasa hit the nail on the head :)

Contributor

schlamar commented Mar 7, 2014

However, this isn't a dupe, it's just related.

Why do you think so?

On requests 1.2.3, I'm getting the same traceback than in #1472 with this URL:

>>> import requests ; requests.get('http://www.whatbird.com/forum/index.php?/gallery/image/291517-foo/')
Traceback (most recent call last):
  ...
  File "c:\Python27\lib\site-packages\requests\packages\urllib3\response.py", line 188, in read
    "failed to decode it." % content_encoding)
requests.packages.urllib3.exceptions.DecodeError: Received response with content-encoding: gzip, but failed to decode it
Collaborator

Lukasa commented Mar 7, 2014

@schlamar So the issues you linked cause the exception, but they aren't the problem being referred to. The key problem in this issue is that if we hit an error decoding the response body of a redirect, we'll stop following redirects. That shouldn't happen: we understood enough of the message to follow the redirect, so there's no reason to stop following them. =)

Fixing the bugs you linked fixes the specific case in question, but not the general one.

Contributor

schlamar commented Mar 7, 2014

Fixing the bugs you linked fixes the specific case in question, but not the general one.

Yes, but fixing this bug should resolve the linked issues (which is what I wanted to say :)

Collaborator

Lukasa commented Mar 7, 2014

Ahhhhh, I see. =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment