New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-Length header not checked by requests if not enough data is sent #2275

Closed
squareproton opened this Issue Oct 12, 2014 · 4 comments

Comments

Projects
None yet
3 participants
@squareproton

squareproton commented Oct 12, 2014

This is a feature request.

In our application we've noticed ~ 1 in 2500 GET requests is truncated early. The HTTP responses are typically 20K-2MB. This is certainly caused by network issues or dodgy HTTP servers and isn't a requests problem. By default requests 2.3.0 doesn't give any indication if the response body has a length smaller than the 'Content-Length' header indicated. The response object is presented as if nothing is amiss.

Requests is already doing some checks on Content-Length. When a response body is larger than a Content-Length header describes requests silently truncates the body. I've (yet) to find a way of detecting this case.

I would completely understand if the requests developers thought this type of check was outside the scope of the library - after all it is pretty easy to do this check in user code - but it seemed kind of anomalous given the super succinct high level api.

In a similar vein a Content-MD5 header check would be useful in our application but this is probably a lot more niche.

@sigmavirus24

This comment has been minimized.

Member

sigmavirus24 commented Oct 12, 2014

Hi @squareproton

Let me start off by saying that we've had this requested in the past (see: #1938) so you would do well to read that discussion to understand why we will not ever add content-length checking to requests. (This is also vaguely related to #2255 and requests/toolbelt#41.)

If you read the thread that resulted from the last time this was suggested, you'll see that we do not add arbitrary or niche features to requests. If there's a high likelihood of users needed it (something greater than 90% of our current userbase) then we'll add it. You might understand how this makes requests far more maintainable, especially given we don't do much in the way of header checks.

What I am going to address here is the following:

Requests is already doing some checks on Content-Length. When a response body is larger than a Content-Length header describes requests silently truncates the body.

I'm fairly certain you're misunderstanding what you're receiving. Of course, if you can provide an example that reliably reproduces that, this would be most appreciated. I have never personally seen this behaviour, and I'd be surprised to hear that we're truncating large bodies. If that's happening anywhere, it might be httplib but I doubt it happens there either.

@Lukasa

This comment has been minimized.

Member

Lukasa commented Oct 12, 2014

For that matter, it was my belief that short bodies threw an IncompleteRead error.

@sigmavirus24

This comment has been minimized.

Member

sigmavirus24 commented Oct 12, 2014

@Lukasa I was under the same impression frankly

@squareproton

This comment has been minimized.

squareproton commented Oct 13, 2014

Thanks for your responses. #1938 has convinced me, a Content-Length check doesn't belong in requests, and the check I'm currently doing in the application is broken.

I've got test cases for the missing IncompleteRead error and the silent truncation of the request bodies.

IncompleteRead error

nodejs server 0.10.31 on Ubuntu server 12.04 LTS

var http = require('http');

http.createServer(function (req, res) {
    var buffer = new Buffer(10);
    buffer.fill('A');
    res.writeHead(200, {'Content-Length': buffer.length*2});
    res.write(buffer);
    // agressively trash the socket, calling res.end() doesn't work if node
    // believes there are still outstanding bytes to be sent
    res.socket.destroy();
}).listen(4321);

Making a request with curl

$ curl -v http://localhost:4321

* About to connect() to localhost port 4321 (#0)
*   Trying 127.0.0.1... connected
> GET / HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:4321
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 20
< Date: Mon, 13 Oct 2014 15:01:30 GMT
< Connection: keep-alive
< 
* transfer closed with 10 bytes remaining to read
* Closing connection #0
curl: (18) transfer closed with 10 bytes remaining to read
AAAAAAAAAA

In python doing

import requests
response = requests.get("http://192.168.2.19:4321")
print(response.content, len(response.content))

generates

b'AAAAAAAAAA' 10

Same behaviour on windows and linux. The versions of the software installed is as follows.

Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:24:06) [MSC v.1600 32 bit (Intel)] on win32
Requests v2.3.0

Python 3.4.1 (default, May 25 2014, 22:33:14) [GCC 4.6.3] on linux
Requests v2.4.1

I'm pretty new to python (though not programming in general) so it's possible I'm missing something in a python config somewhere which causes the error to be suppressed. I just not very familiar with the python environment. I didn't personally setup or configure the python install on the Windows machine (this was scripted by another developer) but as far as I am aware it doesn't do anything exotic. I did do the Ubuntu server though and that's totally vanilla.

Silent Truncation of Excess Content

Changing the node server to output a Content-Length header which is too small, i.e. buffer.length/2

* About to connect() to localhost port 4321 (#0)
*   Trying 127.0.0.1... connected
> GET / HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: localhost:4321
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Length: 5
< Date: Mon, 13 Oct 2014 15:28:56 GMT
< Connection: keep-alive
< 
* Excess found in a non pipelined read: excess = 5, size = 5, maxdownload = 5, bytecount = 0
* Connection #0 to host localhost left intact
* Closing connection #0
AAAAA

Sometimes curl doesn't doesn't display the error message * Excess found in a non pipelines read ... This happens about 20% of the time. I don't know why it's not 100% reproducible. I've checked with wireshark and I'm pretty sure node is always sending the excess data.

The same python script always produces the same output on both Windows and Ubuntu.

b'AAAAA' 5

I've also tested some of the different ways of getting the data with stream=True and never been able to read the excess content.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment