Timeouts do not occur when stream == True. #1803

Closed
tylercrompton opened this Issue Dec 16, 2013 · 19 comments

Comments

Projects
None yet
9 participants

As stated in the docs:

timeout is not a time limit on the entire response download; rather, an exception is raised if the server has not issued a response for timeout seconds (more precisely, if no bytes have been received on the underlying socket for timeout seconds).

With that said, it should still be possible to timeout when stream == True.

The following command times out as expected.

$ python3 -c "import requests; response = requests.request('GET', 'http://httpbin.org/delay/3', timeout=1); print(response)"

The following command does not timeout. A timeout is expected.

$ python3 -c "import requests; response = requests.request('GET', 'http://httpbin.org/delay/3', stream=True, timeout=1); print(response)"
Owner

Lukasa commented Dec 16, 2013

Thanks for raising this issue! Currently in stream mode we only apply the timeout to the connection attempt, not to the read. This will probably change in future (for instance, #1801 removes this limitation), but there's no strict timescale on this.

Owner

sigmavirus24 commented Dec 16, 2013

Once again @Lukasa is 100% correct. This isn't an issue or a bug in requests, just in your expectations of how it behaves.

I had the same issue today, and I try to work around it, but I can't find a clean way.

If the request is stream=True and the server just hangs, it will hang forever.

I also tried with Pyhton 3 futures, it still hangs forever.

from concurrent.futures import ThreadPoolExecutor
import requests


def make_request():
    response = requests.get('http://example.com', stream=True, timeout=5)
    response_text = ''
    chunks = response.iter_content(chunk_size=2048, decode_unicode=True)
    for chunk in chunks:
        response_text += chunk
    return response_text

with ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(make_request)
    response_text = future.result(timeout=5)
    print(response_text)

The timeout should be applied to every wait of data to be received on the socket, because the server hang can happen at anytime, before sending the headers, during, or after (when it is sending the body).

The only ways I see that could work are to start another process and kill after a certain time (which comes with IPC just to make an HTTP request), or to use signals, which I find not clean either.

Anorov commented Dec 18, 2013

@antoineleclair You could use multithreading, or the gevent library.

Owner

Lukasa commented Dec 19, 2013

So, I chatted briefly via email with @sigmavirus24 about this. Actually changing the library so that timeouts apply to streaming data is a trivial change with a negative diff. The problem is that it's a backwards incompatible change.

So I'm in favour of fixing this, but not right now. Maybe as part of 2.2. I'm going to schedule this issue for that release unless the BDFL or @sigmavirus24 complains.

Lukasa was assigned Dec 19, 2013

Makes sense. Thanks.

Owner

sigmavirus24 commented Dec 19, 2013

I have no complaint. I just want to be sure that you meant 2.2 and not 3.0.

Owner

Lukasa commented Dec 19, 2013

I did mean 2.2, not 3.0. I'm open to arguments that it should be pushed back to 3.0, especially if they're well-reasoned and/or provided alongside a beverage/food based bribe.

Contributor

zackw commented Feb 10, 2014

Hm, this might explain some mysterious hangs-forever cases in an application I have. Is there something I could do at the application level to cause code like this ...

def load_enough_content(resp):
    """Load no more than 16KB of a response, in 1K chunks.
       Allow this process to take no more than 5 seconds in total.
       These numbers are arbitrarily chosen to defend against
       teergrubes (intentional or not) while still allowing us a
       useful amount of data for anomaly post-mortem."""
    body = b""
    start = time.time()
    for chunk in resp.iter_content(chunk_size=1024):
        body += chunk
        if len(body) > 16*1024 or time.time() - start > 5:
            resp.close()
            break
    return body

... to fulfill its contract of returning in no more than 5 seconds no matter what the server does?

This issue was supposed to be fixed before 2.2. I'm not sure what's the state of it (requests==2.2.1 as of now).

If it is still not fixed, you could use another process and kill it if it times out.

@sigmavirus24 sigmavirus24 assigned sigmavirus24 and unassigned Lukasa Feb 18, 2014

Owner

Lukasa commented Mar 12, 2014

#1935 should fix this. =)

Lukasa added the Fixed label Mar 12, 2014

Just wondering if there's any general idea on when this might be released. I just hit this bug in my code and it's happening in a daemon that's attempting to download a large file, so the daemon just sits there forever.

For anyone else who might be hung up on this, a friend mentioned to me that I might be able to use signals to workaround this problem. Sure enough you can, and it's pretty simple. See the example here: http://docs.python.org/2/library/signal.html#example

Owner

Lukasa commented Mar 20, 2014

@andynemzek We're not on any release schedule, so it'll be released when we next release requests. =)

RuKeBo referenced this issue in chrippa/livestreamer Apr 2, 2014

Open

Issues with UStream's streaming protocol #265

Owner

sigmavirus24 commented Oct 5, 2014

This was fixed and never closed. Woops! 🍰

OK posting on an old issue but it still applies to master branch. The test does not set stream=True, so I don't think it is actually testing what it should test. Correct?

Owner

Lukasa commented Feb 7, 2017

@julienaubert Sorry, can you provide more explanation about what is going on here? The read timeout should still apply.

@Lukasa afaic, the test that was meant to test that timeout works when stream=True, is not correct, as it is not setting stream=True and the default value is stream=False. See: https://github.com/kennethreitz/requests/blob/master/tests/test_requests.py#L1943

Owner

nateprewitt commented Feb 8, 2017

@julienaubert The reason there isn't an explicit stream argument is due to the patch that fixed this (and added this test). #1935 removed the use of a different timeout object in send depending on the value of stream. That means Requests now always hands the read timeout so the stream flag isn't relevant in this regard.

Are you experiencing issues with a current version of Requests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment