New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check Content-Length when present. #2833

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@causton81

causton81 commented Oct 16, 2015

This is pretty hard to reproduce, but I have seen truncated responses in the presence of connection resets that are silently ignored by requests.

This link describes how the vagaries of TCP and socket API may return an error or 0 (end-of-file) when the stars align just right:
http://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable

This patch checks that the number of bytes read matches the declared Content-Length (when declared). I'm not an HTTP expert, so I may be missing something, but I hope this is useful. This also only covers the urllib3 code path.

Christopher F. Auston
@Lukasa

This comment has been minimized.

Member

Lukasa commented Oct 16, 2015

@causton httplib should do this check by itself already. Can you try to reproduce this to see if that happens? If it doesn't, can you post your repro code here?

@causton81

This comment has been minimized.

causton81 commented Oct 16, 2015

I used netcat to send a 1 MiB response that's one byte short of declared Content-Length. The script below shows the problem for me with the output shown. It fails silently and writes a file shorter than Content-Length.

#!/usr/bin/env bash

set -e

# 1 MiB
size=1048576

_16=1234567890123456

size_div_16=$(( $size / 16 ))

echo "creating $size byte file"
truncate -s0 body
for i in `seq $size_div_16`; do
    echo -n $_16 >> body
done

cat >hdr <<EOF
HTTP/1.1 200 OK
Content-Length: $size

EOF

cat hdr body >response

wc -c hdr body response


cat >rst.py <<"SCRIPT"
import requests

res = requests.get('http://localhost:1234')

with open('body.rst', 'w') as fp:
    fp.write(res.content)
SCRIPT

one_less=$(( size - 1 ))

dd if=response bs=$one_less count=1 | nc -l -p 1234 &>/dev/null &

job=$!

sleep 3

python rst.py &

sleep 3

kill $job

wait

wc -c body body.rst

Unpatched output:

(requests2710)[causton@causton requests]$ bash shoddy-server.sh 
creating 1048576 byte file
     41 hdr
1048576 body
1048617 response
2097234 total
1+0 records in
1+0 records out
1048575 bytes (1.0 MB) copied, 3.08301 s, 340 kB/s
1048576 body
1048534 body.rst
2097110 total
(requests2710)[causton@causton requests]$ 

Output after this PR:

(requests2710)[causton@causton requests]$ bash shoddy-server.sh 
creating 1048576 byte file
     41 hdr
1048576 body
1048617 response
2097234 total
1+0 records in
1+0 records out
1048575 bytes (1.0 MB) copied, 3.10435 s, 338 kB/s
Traceback (most recent call last):
  File "rst.py", line 3, in <module>
    res = requests.get('http://localhost:1234')
  File "/home/causton/git/requests/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/home/causton/git/requests/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/home/causton/git/requests/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/causton/git/requests/requests/sessions.py", line 608, in send
    r.content
  File "/home/causton/git/requests/requests/models.py", line 747, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "/home/causton/git/requests/requests/models.py", line 677, in generate
    expected_len, actual_len))
requests.exceptions.TruncatedContentError: Expected: 1048576, got: 1048534
1048576 body
wc: body.rst: No such file or directory
1048576 total
(requests2710)[causton@causton requests]$ 
@Lukasa

This comment has been minimized.

Member

Lukasa commented Oct 17, 2015

Iiiinteresting. So @causton81, I think you're right. Diving into the httplib source code shows this:

s = self.fp.read(amt)
if not s and amt:
    # Ideally, we would raise IncompleteRead if the content-length
    # wasn't satisfied, but it might break compatibility.
    self.close()

So, I think you're right and this doesn't work. I think this would be best implemented in urllib3, frankly: in urllib3 they can take advantage of the HTTPResponse.length parameter that httplib maintains (but is weirdly refusing to use) to make sure this works in almost all cases. @shazow does that sound reasonable to you?

@shazow

This comment has been minimized.

Contributor

shazow commented Oct 17, 2015

Could give it a try in urllib3. I've found that we can't rely on content-length being correct, though. Sometimes it'll be smaller than the response, other times it will be larger, and I'd prefer not to explode when it doesn't match up (but rather let the user handle it how they like).

@sigmavirus24

This comment has been minimized.

Member

sigmavirus24 commented Oct 17, 2015

Sometimes it'll be smaller than the response,

It'll be smaller when the it's compressed and we're decoding the content for the user right? Couldn't we do a pre-decompression check in that case? Or are there other cases that I'm not aware of?

I'd prefer not to explode when it doesn't match up (but rather let the user handle it how they like).

Perhaps a util function (to avoid yet another option/flag) that will take the other parameters for a call to stream as well as handling logic to check the size of the response and raise an exception? This then becomes a very explicit opt-in.

@shazow

This comment has been minimized.

Contributor

shazow commented Oct 17, 2015

It'll be smaller when the it's compressed and we're decoding the content for the user right? Couldn't we do a pre-decompression check in that case? Or are there other cases that I'm not aware of?

I mean roflservers. There are lots of roflservers out there that yoloreturn incorrect lolvalues.

Perhaps a util function...

Umm not sure what that might look like. How about we start a urllib3 issue thread with some specific API proposals?

@Lukasa

This comment has been minimized.

Member

Lukasa commented Oct 17, 2015

Sounds like a plan, let's move to urllib3.

@Lukasa Lukasa closed this Oct 17, 2015

@causton81

This comment has been minimized.

causton81 commented Oct 19, 2015

Should I start an issue on urllib3 then?

@Lukasa

This comment has been minimized.

Member

Lukasa commented Oct 19, 2015

Yes please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment