Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2 higher number of timeouts compared with 1.1 #841

Closed
victoraugustolls opened this issue Mar 2, 2020 · 25 comments
Closed

HTTP/2 higher number of timeouts compared with 1.1 #841

victoraugustolls opened this issue Mar 2, 2020 · 25 comments
Labels
bug Something isn't working http/2 Issues and PRs related to HTTP/2

Comments

@victoraugustolls
Copy link

I recently observed that the number of timeouts with http2 enabled on Async client are higher than with it disabled. Made a gist with an example:

https://gist.github.com/victoraugustolls/01002ae218366b453e962446c9c8a274

@florimondmanca
Copy link
Member

Thanks, the results in the gist are very enlightening as to the magnitude of this issue... As you noted in #832, there might be an outstanding race condition or bug we fixed for HTTP/1.1 in the past but not HTTP/2. I guess we’ll have to dig into the HTTP/2 implementation to find out... If you’d like you can turn on TRACE logs and see if there’s anything odd there:

HTTPX_LOG_LEVEL=trace python program.py

@florimondmanca florimondmanca added http/2 Issues and PRs related to HTTP/2 bug Something isn't working labels Mar 2, 2020
@victoraugustolls
Copy link
Author

@florimondmanca updated gist with results from running with httpxprof of yours! Will enable trace and post it there.

@victoraugustolls
Copy link
Author

It seems to get stuck here:
TRACE: release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 2, 2020

Traces are too long and I can't upload files to gist, so I will upload them here:

With HTTP/2 enabled -> trace.txt

With HTTP/2 disabled -> trace_2.txt

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 2, 2020

But I think the interesting bits are before and after the exceptions:

TRACE:    receive_event stream_id=1 event=<StreamEnded stream_id:1>
TRACE [2020-03-02 13:13:09] httpx.dispatch.http2 - receive_event stream_id=0 event=<PingReceived ping_data:0000000000000000>
TRACE:    receive_event stream_id=0 event=<PingReceived ping_data:0000000000000000>
TRACE [2020-03-02 13:13:09] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE:    release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE [2020-03-02 13:13:09] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:09] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE:    release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE [2020-03-02 13:13:09] httpx.dispatch.connection - close_connection
TRACE:    close_connection
[2020-03-02 13:13:13] <class 'httpx.exceptions.ReadTimeout'>:
[2020-03-02 13:13:18] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 13:13:23] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 13:13:28] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 13:13:33] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 13:13:38] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 13:13:43] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 13:13:48] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
Count: 2
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection_pool - acquire_connection origin=Origin(scheme='https' host='google.com' port=443)
TRACE:    acquire_connection origin=Origin(scheme='https' host='google.com' port=443)
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection_pool - acquire_connection origin=Origin(scheme='https' host='google.com' port=443)
TRACE:    acquire_connection origin=Origin(scheme='https' host='google.com' port=443)
TRACE [2020-03-02 13:13:48] httpx.dispatch.connection_pool - acquire_connection origin=Origin(scheme='https' host='google.com' port=443)
TRACE:    acquire_connection origin=Origin(scheme='https' host='google.com' port=443)

@victoraugustolls
Copy link
Author

I tried to find something hanging, but no luck so far...

@florimondmanca
Copy link
Member

I notice the KeyError’s are exactly 5s apart from one another. This is both the default connect timeout and the keepalive period for connections. So are they caused by our keepalive checking mechanism? Or is it a retry mechanism? Or is it what should have been a ConnectTimeout whose handling resulted in another error?

@victoraugustolls
Copy link
Author

I think the KeyError's are due to #817 and should have been timeouts? Don't know for sure. But looks like all the connections actually raised an error, but due to #817 KeyError was raised

@florimondmanca
Copy link
Member

Wait. 5 is also the number of requests fired off in a batch. I notice the snippet fires them off perfectly in parallel via asyncio.gather(). I remembered this has caused issues in the past for HTTP/2.

Can you:

  • Add a random jitter delay to each request and see if that has any effect?
  • Change the batch size and connect timeout to distinct values that are not 5? Eg 3 and 4. That way we can better see what error is caused by what mechanism (parallel Requests, host connection, keepalive expiry, etc.)

@florimondmanca
Copy link
Member

As for #817, does #817 (comment) suppress the KeyError for you as well? If so, what do TRACE logs look like with this patch?

@victoraugustolls
Copy link
Author

Tried with asyncio.gather() with 10 requests in parallel and this was the output:

DEBUG:    HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
TRACE [2020-03-02 17:03:54] httpx.dispatch.http2 - receive_event stream_id=5 event=<DataReceived stream_id:5, flow_controlled_length:5454, data:1f8b08000000000002ffc55b5b73dbc6927ef7af>
TRACE:    receive_event stream_id=5 event=<DataReceived stream_id:5, flow_controlled_length:5454, data:1f8b08000000000002ffc55b5b73dbc6927ef7af>
TRACE [2020-03-02 17:03:54] httpx.dispatch.http2 - receive_event stream_id=5 event=<DataReceived stream_id:5, flow_controlled_length:96, data:>
TRACE:    receive_event stream_id=5 event=<DataReceived stream_id:5, flow_controlled_length:96, data:>
TRACE [2020-03-02 17:03:54] httpx.dispatch.http2 - receive_event stream_id=5 event=<StreamEnded stream_id:5>
TRACE:    receive_event stream_id=5 event=<StreamEnded stream_id:5>
TRACE [2020-03-02 17:03:54] httpx.dispatch.http2 - receive_event stream_id=0 event=<PingReceived ping_data:0000000000000004>
TRACE:    receive_event stream_id=0 event=<PingReceived ping_data:0000000000000004>
TRACE [2020-03-02 17:03:54] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE:    release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
Client returned
[2020-03-02 17:03:59] <class 'httpx.exceptions.ReadTimeout'>: 
[2020-03-02 17:04:04] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:04:09] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:04:14] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:04:19] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:04:24] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:04:29] <class 'httpx.exceptions.ReadTimeout'>: 
[2020-03-02 17:04:34] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
Count: 2
TRACE [2020-03-02 17:04:34] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:04:34] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:04:34] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:04:34] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:04:34] httpx.dispatch.connection - close_connection

@victoraugustolls
Copy link
Author

With 3 requests in parallel:

DEBUG [2020-03-02 17:06:15] httpx.client - HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
DEBUG:    HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
TRACE [2020-03-02 17:06:15] httpx.dispatch.http2 - receive_event stream_id=3 event=<DataReceived stream_id:3, flow_controlled_length:5496, data:1f8b08000000000002ffc55b5973dbc6967ef7af>
TRACE:    receive_event stream_id=3 event=<DataReceived stream_id:3, flow_controlled_length:5496, data:1f8b08000000000002ffc55b5973dbc6967ef7af>
TRACE [2020-03-02 17:06:15] httpx.dispatch.http2 - receive_event stream_id=3 event=<DataReceived stream_id:3, flow_controlled_length:126, data:>
TRACE:    receive_event stream_id=3 event=<DataReceived stream_id:3, flow_controlled_length:126, data:>
TRACE [2020-03-02 17:06:15] httpx.dispatch.http2 - receive_event stream_id=3 event=<StreamEnded stream_id:3>
TRACE:    receive_event stream_id=3 event=<StreamEnded stream_id:3>
TRACE [2020-03-02 17:06:15] httpx.dispatch.http2 - receive_event stream_id=0 event=<PingReceived ping_data:0000000000000002>
TRACE:    receive_event stream_id=0 event=<PingReceived ping_data:0000000000000002>
TRACE [2020-03-02 17:06:15] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE:    release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
Client returned
[2020-03-02 17:06:20] <class 'httpx.exceptions.ReadTimeout'>: 
[2020-03-02 17:06:25] <class 'httpx.exceptions.ReadTimeout'>: 
Count: 2
TRACE [2020-03-02 17:06:25] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:06:25] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:06:25] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:06:25] httpx.dispatch.connection - close_connection

@victoraugustolls
Copy link
Author

Add a random jitter delay to each request and see if that has any effect?

added await asyncio.sleep(random.random()) before the client.get() and the result (for 10 parallel requests) was:

TRACE [2020-03-02 17:11:30] httpx.dispatch.http2 - receive_event stream_id=31 event=<DataReceived stream_id:31, flow_controlled_length:5527, data:1f8b08000000000002ffc55b5b73db46967ef7af>
TRACE:    receive_event stream_id=31 event=<DataReceived stream_id:31, flow_controlled_length:5527, data:1f8b08000000000002ffc55b5b73db46967ef7af>
TRACE [2020-03-02 17:11:30] httpx.dispatch.http2 - receive_event stream_id=31 event=<DataReceived stream_id:31, flow_controlled_length:189, data:>
TRACE:    receive_event stream_id=31 event=<DataReceived stream_id:31, flow_controlled_length:189, data:>
TRACE [2020-03-02 17:11:30] httpx.dispatch.http2 - receive_event stream_id=31 event=<StreamEnded stream_id:31>
TRACE:    receive_event stream_id=31 event=<StreamEnded stream_id:31>
DEBUG [2020-03-02 17:11:30] httpx.client - HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
DEBUG:    HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
TRACE [2020-03-02 17:11:30] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE:    release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
Client returned
[2020-03-02 17:11:34] <class 'httpx.exceptions.ReadTimeout'>: 
[2020-03-02 17:11:39] <class 'httpx.exceptions.ReadTimeout'>: 
Count: 2
TRACE [2020-03-02 17:11:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:11:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:11:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:11:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:11:40] httpx.dispatch.connection - close_connection

@victoraugustolls
Copy link
Author

Will try #817 fix next

@victoraugustolls
Copy link
Author

If I did this right:

async def request_multiple(client: httpx.AsyncClient):
    count = 0

    while count <= 2:
        print("Count:", count)
        todo = [
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client)),
            asyncio.create_task(do_request(client=client))
        ]
        try:
            await asyncio.gather(*todo)
        finally:
            for t in todo:
                if not t.done():
                    t.cancel()
        count += 1
    print("Finished")

The output was (without jitter):

DEBUG [2020-03-02 17:16:05] httpx.client - HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
DEBUG:    HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
Client returned
TRACE [2020-03-02 17:16:05] httpx.dispatch.http2 - receive_event stream_id=7 event=<DataReceived stream_id:7, flow_controlled_length:5578, data:1f8b08000000000002ffc55b5b779bc8967ecfaf>
TRACE:    receive_event stream_id=7 event=<DataReceived stream_id:7, flow_controlled_length:5578, data:1f8b08000000000002ffc55b5b779bc8967ecfaf>
TRACE [2020-03-02 17:16:05] httpx.dispatch.http2 - receive_event stream_id=7 event=<DataReceived stream_id:7, flow_controlled_length:227, data:>
TRACE:    receive_event stream_id=7 event=<DataReceived stream_id:7, flow_controlled_length:227, data:>
TRACE [2020-03-02 17:16:05] httpx.dispatch.http2 - receive_event stream_id=7 event=<StreamEnded stream_id:7>
TRACE:    receive_event stream_id=7 event=<StreamEnded stream_id:7>
TRACE [2020-03-02 17:16:05] httpx.dispatch.http2 - receive_event stream_id=0 event=<PingReceived ping_data:0000000000000004>
TRACE:    receive_event stream_id=0 event=<PingReceived ping_data:0000000000000004>
DEBUG [2020-03-02 17:16:05] httpx.client - HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
DEBUG:    HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
TRACE [2020-03-02 17:16:05] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE:    release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
Client returned
[2020-03-02 17:16:10] <class 'httpx.exceptions.ReadTimeout'>: 
[2020-03-02 17:16:15] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:16:20] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:16:25] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:16:30] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:16:35] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:16:40] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
Count: 2
TRACE [2020-03-02 17:16:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:16:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:16:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:16:40] httpx.dispatch.connection - close_connection
TRACE:    close_connection

@victoraugustolls
Copy link
Author

With a timeout value of 10.0:

TRACE [2020-03-02 17:21:43] httpx.dispatch.http2 - receive_event stream_id=11 event=<DataReceived stream_id:11, flow_controlled_length:5331, data:1f8b08000000000002ffc55b5973dbc6967ef7af>
TRACE:    receive_event stream_id=11 event=<DataReceived stream_id:11, flow_controlled_length:5331, data:1f8b08000000000002ffc55b5973dbc6967ef7af>
TRACE [2020-03-02 17:21:43] httpx.dispatch.http2 - receive_event stream_id=11 event=<DataReceived stream_id:11, flow_controlled_length:184, data:>
TRACE:    receive_event stream_id=11 event=<DataReceived stream_id:11, flow_controlled_length:184, data:>
TRACE [2020-03-02 17:21:43] httpx.dispatch.http2 - receive_event stream_id=11 event=<StreamEnded stream_id:11>
TRACE:    receive_event stream_id=11 event=<StreamEnded stream_id:11>
DEBUG [2020-03-02 17:21:43] httpx.client - HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
DEBUG:    HTTP Request: GET https://www.google.com/ "HTTP/2 200 OK"
TRACE [2020-03-02 17:21:43] httpx.dispatch.connection_pool - release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
TRACE:    release_connection connection=HTTPConnection(origin=Origin(scheme='https' host='www.google.com' port=443))
Client returned
[2020-03-02 17:21:53] <class 'httpx.exceptions.ReadTimeout'>: 
[2020-03-02 17:22:03] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:22:13] <class 'httpx.exceptions.ReadTimeout'>: 
[2020-03-02 17:22:23] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
[2020-03-02 17:22:33] <class 'KeyError'>: HTTPConnection(origin=Origin(scheme='https' host='google.com' port=443))
Count: 2
TRACE [2020-03-02 17:22:33] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:22:33] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:22:33] httpx.dispatch.connection - close_connection
TRACE:    close_connection
TRACE [2020-03-02 17:22:33] httpx.dispatch.connection - close_connection
TRACE:    close_connection

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 2, 2020

It appears that the time between errors is indeed the timeout time (read timeout)

@florimondmanca florimondmanca changed the title HTTP/2 higher timeouts compared with 1.1 HTTP/2 higher number of timeouts compared with 1.1 Mar 5, 2020
@victoraugustolls
Copy link
Author

@florimondmanca @tomchristie excellent news here! I had 0 timeouts after retrying the tests here multiple times with the httpcore-interface branch!

@tomchristie
Copy link
Member

Fantastic! 👍

@victoraugustolls
Copy link
Author

Just deployed a version of the service using the branch with httpcore interface. So far so good, 0 errors. Will post and update in the end of the day.

@victoraugustolls
Copy link
Author

Update: now with errors :/ both connect timeout and <class 'httpcore._exceptions.ProtocolError'>: can't handle event type ConnectionClosed when role=SERVER and state=SEND_RESPONSE

@victoraugustolls
Copy link
Author

victoraugustolls commented Mar 9, 2020

And now: <class 'httpcore._exceptions.WriteError'>: [Errno 104] Connection reset by peer to a central microservice that everyone connects, and only the one with the httpcore interface receive this error.

Also: ProtocolError: Invalid input ConnectionInputs.RECV_HEADERS in state ConnectionState.CLOSED

Also: ProtocolError: <ConnectionTerminated error_code:ErrorCodes.NO_ERROR, last_stream_id:1999, additional_data:None>

@tomchristie
Copy link
Member

@victoraugustolls Okay, so let's take these one at a time, in a bit more detail. They're generally looking roughly like valid behaviors that just need some neater error handling.

I'd suggest starting by opening an issue on httpcore with one of these, complete with a full traceback, plus a bit of information about what server you're using and other potentiatially useful infor for replication, so that we can work through the case comprehensively.

Once we've done that let's have open issues for each of these seperately on httpcore, and work through them fully one at a time.

@yeraydiazdiaz
Copy link
Contributor

Closing as issues are reflected in httpcore encode/httpcore#45 and encode/httpcore#46.

For the record, I just tested the original gist against httpx==0.13.0.dev2 and got the same result for http1.1 and http2.

@victoraugustolls
Copy link
Author

Nice! Will update on my end (and my application) to see the result!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working http/2 Issues and PRs related to HTTP/2
Projects
None yet
Development

No branches or pull requests

4 participants