When Content-Length differs from received message body length, an exception should be raised #1855

patricklaw · 2014-01-10T02:26:22Z

The HTTP 1.1 RFC specifies:

"When a Content-Length is given in a message where a message-body is allowed, its field value MUST exactly match the number of OCTETs in the message-body. HTTP/1.1 user agents MUST notify the user when an invalid length is received and detected."

See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

Here is a simple repro. It seems like the call to requests.get should raise an exception rather than silently succeed. Note that python 2.x urllib2 has identical behavior to requests here, but I believe that constitutes a bug in urllib2. curl, on the other hand, outputs the message body but has a return code of 18 ("Partial file. Only a part of the file was transferred."). In python 3.3 (and presumably earlier releases of python 3.x) urllib.request raises http.client.IncompleteRead: IncompleteRead(5 bytes read, 15 more expected), which is in line with the spec.

import SocketServer as socketserver
import threading
import requests
import time

class MyTCPHandler(socketserver.BaseRequestHandler):
    def handle(self):
        self.data = self.request.recv(1024)
        self.request.sendall('HTTP/1.1 200 OK\r\n'
                             'Server: truncator/0.0\r\n'
                             'Content-Length: 20\r\n'
                             'Connection: close\r\n\r\n'
                             '12345')

server = None
def background_server():
    global server
    HOST, PORT = "localhost", 9999
    server = socketserver.TCPServer((HOST, PORT), MyTCPHandler)
    server.serve_forever()


if __name__ == "__main__":
    t = threading.Thread(target=background_server)
    t.daemon = True
    t.start()
    time.sleep(1)
    r = requests.get('http://localhost:9999')
    print(r.content)
    server.shutdown()

The text was updated successfully, but these errors were encountered:

Lukasa · 2014-01-10T08:24:57Z

Thanks for raising this issue!

I have mixed feelings. Firstly, I'd argue that Requests is not technically a user-agent, it's a library. This frees us from some of the constraints of user-agent behaviour (and in fact we take that liberty elsewhere in the library, like with our behaviour on redirects).

Secondly, if we throw an exception we irrevocably destroy the data we read. It becomes impossible to access. This means that situations where the user might want to 'muddle through', taking as much of the data as they were able to read and keeping hold of it, becomes a little bit harder. (They'd have to use the streaming API, on which we should not enforce this logic.)

Finally, even if we did want this logic we'd need to implement it in urllib3. Content-Length refers to the number of bytes on the wire, not the decoded length, so if we get a gzipped (or DEFLATEd) response, we'd need to know how many bytes there were before decoding. This is not typically information we have at the Requests level. So if you're still interested in having this behaviour, I suggest you open an issue over on shazow/urllib3.

With that said, I'm not totally opposed to doing it, it just feels excessive to throw an exception for what is a minor and easily detectable error.

sigmavirus24 · 2014-01-10T13:40:16Z

@Lukasa I agree wholeheartedly with everything you said. Your reservations (regarding data loss), however, make me wonder if we shouldn't be attaching the response object to the exceptions we throw. That's a different discussion which should take place on a different issue but I thought I might raise that idea before I forget it.

On the topic of this issue, I can understand that people may want this and that there are corner cases where the Content Length is not exactly a match for the response body (and in those cases it is okay). I can especially see the benefit where users use requests to perform tests on their servers and web apps. With that in mind, would something similar to raise_for_status be a reasonable compromise? We won't break backwards compatibility for existing users who do not realize their depending on our decision to not be strict and we give these extraordinary users (that I'm probably completely imagining) a way to reasonably achieve their goal.

One other alternative is to provide this functionality via the toolbelt. This way users have confidence in the implementation and they just need to import one extra thing.

Lukasa · 2014-01-10T14:47:38Z

We do explicitly attach the response to a raise_for_status()-caused HTTPError, but for nothing else. It's possible that we could do that, but there are a number of ways that things could go wrong that would lead to that being a bad idea. In general I think that when exceptions are hit data should be lost: after all, the exception will cause execution to abort in a way that can leave the state ill-defined. (raise_for_status() is an obvious exception [heh] to that statement.)

Additionally, I'm -1 on all extensions to the requests interface without good justification. This one doesn't have one, so a new raise_for_status()-like thing is out unless someone can strongly convince me of its value. The toolbelt is likely to be the best place for something. I'm not going to discuss implementation too much here, but either of these could work:

import requests
import requests_toolbelt

r = requests.get('http://stupid_buggy_server.com/')
requests_toolbelt.validate_response(r)  # Throws exception, or returns some relevant return code.

import requests
import requests_toolbelt

requests_toolbelt.use_checked_responses()  # Monkeypatches requests.response
r = requests.get('http://stupid_buggy_server.com/')
r.raise_if_invalid()  # Throws exception.

patricklaw · 2014-01-10T19:42:29Z

urllib3 does the correct thing on python 2.7:

*** MaxRetryError: HTTPConnectionPool(host='localhost', port=9999): Max retries exceeded with url: / (Caused by <class 'httplib.IncompleteRead'>: IncompleteRead(5 bytes read, 15 more expected))

Is this being suppressed somewhere in requests?

Lukasa · 2014-01-10T19:45:49Z

Nope, we don't stop MaxRetryErrors being thrown. What we do do is use the .stream() method of the HTTPResponse instead of the .read() method. Give that a shot and see if you still hit the problem: I'm prepared to believe you don't.

patricklaw · 2014-01-10T20:49:49Z

I get that exception during the call to request(), before I get back an HTTPResponse object. Is there a way to force urllib3 to give me back an HTTPResponse?

Lukasa · 2014-01-10T21:23:45Z

Yeah, set the preload_content keyword argument to False.

On 10 Jan 2014, at 20:50, patricklaw notifications@github.com wrote:

I get that exception during the call to request(), before I get back an HTTPResponse object. Is there a way to force urllib3 to give me back an HTTPResponse?

—
Reply to this email directly or view it on GitHub.

patricklaw · 2014-01-10T21:29:11Z

Got it. You're right, the exception isn't raised when using stream(). I'll file an issue on urllib3.

Lukasa · 2014-01-29T21:03:30Z

I'm closing this for the same reason we closed urllib3/urllib3#311.

patricklaw mentioned this issue Jan 10, 2014

Message body length not matching Content-Length should raise httplib.IncompleteRead even when using stream() urllib3/urllib3#311

Closed

Lukasa closed this as completed Jan 29, 2014

eight04 mentioned this issue Jan 20, 2018

Verify Content-Length eight04/ComicCrawler#91

Closed

kwlzn mentioned this issue May 17, 2018

pex resolver can end up with truncated reads of binary dists, causing an opaque untranslateable error pex-tool/pex#475

Open

debugmiller mentioned this issue Feb 1, 2019

Content-Length is not checked, resulting in short reads with no error #4956

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When Content-Length differs from received message body length, an exception should be raised #1855

When Content-Length differs from received message body length, an exception should be raised #1855

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

sigmavirus24 commented Jan 10, 2014

Lukasa commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 29, 2014

When Content-Length differs from received message body length, an exception should be raised #1855

When Content-Length differs from received message body length, an exception should be raised #1855

Comments

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

sigmavirus24 commented Jan 10, 2014

Lukasa commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 10, 2014

patricklaw commented Jan 10, 2014

Lukasa commented Jan 29, 2014