Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When Content-Length differs from received message body length, an exception should be raised #1855

Closed
patricklaw opened this issue Jan 10, 2014 · 9 comments

Comments

@patricklaw
Copy link

@patricklaw patricklaw commented Jan 10, 2014

The HTTP 1.1 RFC specifies:

"When a Content-Length is given in a message where a message-body is allowed, its field value MUST exactly match the number of OCTETs in the message-body. HTTP/1.1 user agents MUST notify the user when an invalid length is received and detected."

See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.4

Here is a simple repro. It seems like the call to requests.get should raise an exception rather than silently succeed. Note that python 2.x urllib2 has identical behavior to requests here, but I believe that constitutes a bug in urllib2. curl, on the other hand, outputs the message body but has a return code of 18 ("Partial file. Only a part of the file was transferred."). In python 3.3 (and presumably earlier releases of python 3.x) urllib.request raises http.client.IncompleteRead: IncompleteRead(5 bytes read, 15 more expected), which is in line with the spec.

import SocketServer as socketserver
import threading
import requests
import time

class MyTCPHandler(socketserver.BaseRequestHandler):
    def handle(self):
        self.data = self.request.recv(1024)
        self.request.sendall('HTTP/1.1 200 OK\r\n'
                             'Server: truncator/0.0\r\n'
                             'Content-Length: 20\r\n'
                             'Connection: close\r\n\r\n'
                             '12345')

server = None
def background_server():
    global server
    HOST, PORT = "localhost", 9999
    server = socketserver.TCPServer((HOST, PORT), MyTCPHandler)
    server.serve_forever()


if __name__ == "__main__":
    t = threading.Thread(target=background_server)
    t.daemon = True
    t.start()
    time.sleep(1)
    r = requests.get('http://localhost:9999')
    print(r.content)
    server.shutdown()
@Lukasa
Copy link
Member

@Lukasa Lukasa commented Jan 10, 2014

Thanks for raising this issue!

I have mixed feelings. Firstly, I'd argue that Requests is not technically a user-agent, it's a library. This frees us from some of the constraints of user-agent behaviour (and in fact we take that liberty elsewhere in the library, like with our behaviour on redirects).

Secondly, if we throw an exception we irrevocably destroy the data we read. It becomes impossible to access. This means that situations where the user might want to 'muddle through', taking as much of the data as they were able to read and keeping hold of it, becomes a little bit harder. (They'd have to use the streaming API, on which we should not enforce this logic.)

Finally, even if we did want this logic we'd need to implement it in urllib3. Content-Length refers to the number of bytes on the wire, not the decoded length, so if we get a gzipped (or DEFLATEd) response, we'd need to know how many bytes there were before decoding. This is not typically information we have at the Requests level. So if you're still interested in having this behaviour, I suggest you open an issue over on shazow/urllib3.

With that said, I'm not totally opposed to doing it, it just feels excessive to throw an exception for what is a minor and easily detectable error.

@sigmavirus24
Copy link
Contributor

@sigmavirus24 sigmavirus24 commented Jan 10, 2014

@Lukasa I agree wholeheartedly with everything you said. Your reservations (regarding data loss), however, make me wonder if we shouldn't be attaching the response object to the exceptions we throw. That's a different discussion which should take place on a different issue but I thought I might raise that idea before I forget it.

On the topic of this issue, I can understand that people may want this and that there are corner cases where the Content Length is not exactly a match for the response body (and in those cases it is okay). I can especially see the benefit where users use requests to perform tests on their servers and web apps. With that in mind, would something similar to raise_for_status be a reasonable compromise? We won't break backwards compatibility for existing users who do not realize their depending on our decision to not be strict and we give these extraordinary users (that I'm probably completely imagining) a way to reasonably achieve their goal.

One other alternative is to provide this functionality via the toolbelt. This way users have confidence in the implementation and they just need to import one extra thing.

@Lukasa
Copy link
Member

@Lukasa Lukasa commented Jan 10, 2014

We do explicitly attach the response to a raise_for_status()-caused HTTPError, but for nothing else. It's possible that we could do that, but there are a number of ways that things could go wrong that would lead to that being a bad idea. In general I think that when exceptions are hit data should be lost: after all, the exception will cause execution to abort in a way that can leave the state ill-defined. (raise_for_status() is an obvious exception [heh] to that statement.)

Additionally, I'm -1 on all extensions to the requests interface without good justification. This one doesn't have one, so a new raise_for_status()-like thing is out unless someone can strongly convince me of its value. The toolbelt is likely to be the best place for something. I'm not going to discuss implementation too much here, but either of these could work:

import requests
import requests_toolbelt

r = requests.get('http://stupid_buggy_server.com/')
requests_toolbelt.validate_response(r)  # Throws exception, or returns some relevant return code.
import requests
import requests_toolbelt

requests_toolbelt.use_checked_responses()  # Monkeypatches requests.response
r = requests.get('http://stupid_buggy_server.com/')
r.raise_if_invalid()  # Throws exception.
@patricklaw
Copy link
Author

@patricklaw patricklaw commented Jan 10, 2014

urllib3 does the correct thing on python 2.7:

*** MaxRetryError: HTTPConnectionPool(host='localhost', port=9999): Max retries exceeded with url: / (Caused by <class 'httplib.IncompleteRead'>: IncompleteRead(5 bytes read, 15 more expected))

Is this being suppressed somewhere in requests?

@Lukasa
Copy link
Member

@Lukasa Lukasa commented Jan 10, 2014

Nope, we don't stop MaxRetryErrors being thrown. What we do do is use the .stream() method of the HTTPResponse instead of the .read() method. Give that a shot and see if you still hit the problem: I'm prepared to believe you don't.

@patricklaw
Copy link
Author

@patricklaw patricklaw commented Jan 10, 2014

I get that exception during the call to request(), before I get back an HTTPResponse object. Is there a way to force urllib3 to give me back an HTTPResponse?

@Lukasa
Copy link
Member

@Lukasa Lukasa commented Jan 10, 2014

Yeah, set the preload_content keyword argument to False.

On 10 Jan 2014, at 20:50, patricklaw notifications@github.com wrote:

I get that exception during the call to request(), before I get back an HTTPResponse object. Is there a way to force urllib3 to give me back an HTTPResponse?


Reply to this email directly or view it on GitHub.

@patricklaw
Copy link
Author

@patricklaw patricklaw commented Jan 10, 2014

Got it. You're right, the exception isn't raised when using stream(). I'll file an issue on urllib3.

@Lukasa
Copy link
Member

@Lukasa Lukasa commented Jan 29, 2014

I'm closing this for the same reason we closed urllib3/urllib3#311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.