requests seems to expect more content until it is forcibly closed by the remote host [Errno 10054] #1804

Closed
ChristopherHackett opened this Issue Dec 16, 2013 · 13 comments

Comments

Projects
None yet
4 participants
@ChristopherHackett

I have noticed that requests seems to have some sort of issue with www.sainsburysbank.co.uk

It will block for 2+ mins despite having already received the content. Other tools like curl have no issue.

Below is the console and I have taken a WireShark dump (screenshot attached). If you need the pcapng file then let me know.

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win 32
>>>import requests
>>> requests.get('http://www.sainsburysbank.co.uk')

    Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\requests\api.py", line 55, in get
    return request('get', url, **kwargs)
  File "C:\Python27\lib\site-packages\requests\api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 279, in request
    resp = self.send(prep, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 374, in send
    r = adapter.send(request, **kwargs)
  File "C:\Python27\lib\site-packages\requests\adapters.py", line 222, in send
    r.content
  File "C:\Python27\lib\site-packages\requests\models.py", line 550, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "C:\Python27\lib\site-packages\requests\models.py", line 496, in generate
    chunk = self.raw.read(chunk_size)
  File "C:\Python27\lib\site-packages\requests\packages\urllib3\response.py", line 148, in read
    return self._fp.read(amt)
  File "C:\Python27\Lib\httplib.py", line 561, in read
    s = self.fp.read(amt)
  File "C:\Python27\Lib\socket.py", line 380, in read
    data = self._sock.recv(left)
socket.error: [Errno 10054] An existing connection was forcibly closed by the remote host

ws-dump

@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Dec 16, 2013

Owner

Gah, well. There's all kinds of fun here. First, Sainsbury's Bank is using chunked encoding to return the response to us. Not the end of the world, but a potential source of fun. According to Wireshark they have terminated it properly though, so that's something. Also, their headers are ill-formatted:

Cache-Control: private
Date: Mon, 16 Dec 2013 11:03:51 GMT
Content-Type: text/html
: 
Set-Cookie: <..snip..>
Set-Cookie: <..snip..>
Content-Encoding: gzip
Vary: Accept-Encoding
Tranfer-Encoding: chunked

That's not a typo: they have an empty header key and value. Which is fun. I'm amazed the server allows them to send that.

Anyway, the problem appears to be that httplib doesn't spot that chunked encoding is being used, presumably because the header parse falls over on that empty header key-value pair. I'm digging into httplib's parsing code now.

Owner

Lukasa commented Dec 16, 2013

Gah, well. There's all kinds of fun here. First, Sainsbury's Bank is using chunked encoding to return the response to us. Not the end of the world, but a potential source of fun. According to Wireshark they have terminated it properly though, so that's something. Also, their headers are ill-formatted:

Cache-Control: private
Date: Mon, 16 Dec 2013 11:03:51 GMT
Content-Type: text/html
: 
Set-Cookie: <..snip..>
Set-Cookie: <..snip..>
Content-Encoding: gzip
Vary: Accept-Encoding
Tranfer-Encoding: chunked

That's not a typo: they have an empty header key and value. Which is fun. I'm amazed the server allows them to send that.

Anyway, the problem appears to be that httplib doesn't spot that chunked encoding is being used, presumably because the header parse falls over on that empty header key-value pair. I'm digging into httplib's parsing code now.

@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Dec 16, 2013

Owner

Yup, the parsing breaks. httplib calls HTTPMessage.isheader, which resolves to rfc822.Message.isheader(), which has the following code:

"""Determine whether a given line is a legal header.

This method should return the header name, suitably canonicalized.
You may override this method in order to use Message parsing on tagged
data in RFC 2822-like formats with special header formats.
"""
i = line.find(':')
if i > 0:
    return line[:i].lower()
return None

Because the colon is the first character, line.find(':') returns 0, which means this function returns None, which means httplib concludes this isn't a header line. This leads to the header being thrown away and the header block being assumed to be over.

Owner

Lukasa commented Dec 16, 2013

Yup, the parsing breaks. httplib calls HTTPMessage.isheader, which resolves to rfc822.Message.isheader(), which has the following code:

"""Determine whether a given line is a legal header.

This method should return the header name, suitably canonicalized.
You may override this method in order to use Message parsing on tagged
data in RFC 2822-like formats with special header formats.
"""
i = line.find(':')
if i > 0:
    return line[:i].lower()
return None

Because the colon is the first character, line.find(':') returns 0, which means this function returns None, which means httplib concludes this isn't a header line. This leads to the header being thrown away and the header block being assumed to be over.

@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Dec 16, 2013

Owner

Sainsbury's Bank is at fault here: that header line is not legal. RFC 2616 defines a header line like so:

message-header = field-name ":" [ field-value ]
field-name     = token
token          = 1*<any CHAR except CTLs or separators>

It's clear that a field-name must not be zero length, which is exactly what we've been sent.

Owner

Lukasa commented Dec 16, 2013

Sainsbury's Bank is at fault here: that header line is not legal. RFC 2616 defines a header line like so:

message-header = field-name ":" [ field-value ]
field-name     = token
token          = 1*<any CHAR except CTLs or separators>

It's clear that a field-name must not be zero length, which is exactly what we've been sent.

@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Dec 16, 2013

Owner

The same behaviour exists in 3.3 as well, and doesn't appear to have been reported on the stdlib bug tracker. I'll do that now.

Owner

Lukasa commented Dec 16, 2013

The same behaviour exists in 3.3 as well, and doesn't appear to have been reported on the stdlib bug tracker. I'll do that now.

@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Dec 16, 2013

Owner

Issue now being tracked here. We'll leave this open until we work out what we're doing in the core library.

Owner

Lukasa commented Dec 16, 2013

Issue now being tracked here. We'll leave this open until we work out what we're doing in the core library.

@ChristopherHackett

This comment has been minimized.

Show comment Hide comment
@ChristopherHackett

ChristopherHackett Dec 16, 2013

Thanks @Lukasa - quick and detailed follow up. I'll keep an eye on the httplib bug.

Thanks @Lukasa - quick and detailed follow up. I'll keep an eye on the httplib bug.

@sigmavirus24

This comment has been minimized.

Show comment Hide comment
@sigmavirus24

sigmavirus24 Dec 16, 2013

Owner

@Lukasa how can I get some beer to you? Seriously, that was incredibly fast and awesome.

Owner

sigmavirus24 commented Dec 16, 2013

@Lukasa how can I get some beer to you? Seriously, that was incredibly fast and awesome.

@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Dec 16, 2013

Owner

Thanks guys. This was actually quite a fun bug to find. =D

@ChristopherHackett I forgot to mention, if you want to be able to work around this, it seems to be affecting only the gzip delivery of the page. It'll work fine if you unset the Accept-Encoding header we send, like this:

r = requests.get('http://www.sainsburysbank.co.uk/', headers={'Accept-Encoding': None})
Owner

Lukasa commented Dec 16, 2013

Thanks guys. This was actually quite a fun bug to find. =D

@ChristopherHackett I forgot to mention, if you want to be able to work around this, it seems to be affecting only the gzip delivery of the page. It'll work fine if you unset the Accept-Encoding header we send, like this:

r = requests.get('http://www.sainsburysbank.co.uk/', headers={'Accept-Encoding': None})
@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Jan 19, 2015

Owner

There is a patch against upstream for this, but there seems to be no movement. Regardless, it's CPython's problem now.

Owner

Lukasa commented Jan 19, 2015

There is a patch against upstream for this, but there seems to be no movement. Regardless, it's CPython's problem now.

@Lukasa Lukasa closed this Jan 19, 2015

@Lukasa

This comment has been minimized.

Show comment Hide comment
@Lukasa

Lukasa Feb 18, 2015

Owner

Amazingly, this patch did actually get merged!

Owner

Lukasa commented Feb 18, 2015

Amazingly, this patch did actually get merged!

@sigmavirus24

This comment has been minimized.

Show comment Hide comment
@sigmavirus24

sigmavirus24 Feb 18, 2015

Owner

WOAH

Owner

sigmavirus24 commented Feb 18, 2015

WOAH

@kevinburke

This comment has been minimized.

Show comment Hide comment
@kevinburke

kevinburke Dec 29, 2015

Contributor

ah neat! i should update hamms to send back that bad header haha

Contributor

kevinburke commented Dec 29, 2015

ah neat! i should update hamms to send back that bad header haha

@kevinburke

This comment has been minimized.

Show comment Hide comment
@kevinburke

kevinburke Dec 29, 2015

Contributor

🍰

Contributor

kevinburke commented Dec 29, 2015

🍰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment