Join GitHub today
Content-Length header not checked by requests if not enough data is sent #2275
This is a feature request.
In our application we've noticed ~ 1 in 2500 GET requests is truncated early. The HTTP responses are typically 20K-2MB. This is certainly caused by network issues or dodgy HTTP servers and isn't a requests problem. By default requests 2.3.0 doesn't give any indication if the response body has a length smaller than the 'Content-Length' header indicated. The response object is presented as if nothing is amiss.
Requests is already doing some checks on Content-Length. When a response body is larger than a Content-Length header describes requests silently truncates the body. I've (yet) to find a way of detecting this case.
I would completely understand if the requests developers thought this type of check was outside the scope of the library - after all it is pretty easy to do this check in user code - but it seemed kind of anomalous given the super succinct high level api.
In a similar vein a Content-MD5 header check would be useful in our application but this is probably a lot more niche.
Let me start off by saying that we've had this requested in the past (see: #1938) so you would do well to read that discussion to understand why we will not ever add content-length checking to requests. (This is also vaguely related to #2255 and requests/toolbelt#41.)
If you read the thread that resulted from the last time this was suggested, you'll see that we do not add arbitrary or niche features to requests. If there's a high likelihood of users needed it (something greater than 90% of our current userbase) then we'll add it. You might understand how this makes requests far more maintainable, especially given we don't do much in the way of header checks.
What I am going to address here is the following:
I'm fairly certain you're misunderstanding what you're receiving. Of course, if you can provide an example that reliably reproduces that, this would be most appreciated. I have never personally seen this behaviour, and I'd be surprised to hear that we're truncating large bodies. If that's happening anywhere, it might be httplib but I doubt it happens there either.
Thanks for your responses. #1938 has convinced me, a Content-Length check doesn't belong in requests, and the check I'm currently doing in the application is broken.
I've got test cases for the missing IncompleteRead error and the silent truncation of the request bodies.
nodejs server 0.10.31 on Ubuntu server 12.04 LTS
Making a request with curl
In python doing
Same behaviour on windows and linux. The versions of the software installed is as follows.
Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:24:06) [MSC v.1600 32 bit (Intel)] on win32
Python 3.4.1 (default, May 25 2014, 22:33:14) [GCC 4.6.3] on linux
I'm pretty new to python (though not programming in general) so it's possible I'm missing something in a python config somewhere which causes the error to be suppressed. I just not very familiar with the python environment. I didn't personally setup or configure the python install on the Windows machine (this was scripted by another developer) but as far as I am aware it doesn't do anything exotic. I did do the Ubuntu server though and that's totally vanilla.
Silent Truncation of Excess Content
Changing the node server to output a Content-Length header which is too small, i.e.
Sometimes curl doesn't doesn't display the error message * Excess found in a non pipelines read ... This happens about 20% of the time. I don't know why it's not 100% reproducible. I've checked with wireshark and I'm pretty sure node is always sending the excess data.
The same python script always produces the same output on both Windows and Ubuntu.
I've also tested some of the different ways of getting the data with stream=True and never been able to read the excess content.
Hope this helps.