Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: HTTPConnection(origin=...) on ReadTimeout in connection_pool #817

Closed
ojii opened this issue Feb 19, 2020 · 8 comments
Closed

KeyError: HTTPConnection(origin=...) on ReadTimeout in connection_pool #817

ojii opened this issue Feb 19, 2020 · 8 comments
Labels
bug Something isn't working concurrency Issues related to concurrency and usage of async libraries pooling Issues and PRs relative to connection pooling

Comments

@ojii
Copy link

ojii commented Feb 19, 2020

In some cases, when httpx._dispatch.connection_pool:ConnectionPool.send fails, the call to httpx._dispatch.connection_pool:ConnectionStore.remove fails because the connection somehow is no longer in ConnectionStore.all.

This is in an http2-only context.

Python: 3.8.1
httpx: 0.11.1

ReadTimeout: null
  File "httpx/dispatch/connection_pool.py", line 153, in send
    response = await connection.send(request, timeout=timeout)
  File "httpx/dispatch/connection.py", line 44, in send
    return await self.connection.send(request, timeout=timeout)
  File "httpx/dispatch/http2.py", line 68, in send
    return await stream.send(request, timeout)
  File "httpx/dispatch/http2.py", line 214, in send
    status_code, headers = await self.receive_response(timeout)
  File "httpx/dispatch/http2.py", line 271, in receive_response
    event = await self.connection.wait_for_event(self.stream_id, timeout)
  File "httpx/dispatch/http2.py", line 138, in wait_for_event
    await self.receive_events(timeout)
  File "httpx/dispatch/http2.py", line 145, in receive_events
    data = await self.socket.read(self.READ_NUM_BYTES, timeout)
  File "httpx/backends/asyncio.py", line 134, in read
    raise ReadTimeout() from None
KeyError: HTTPConnection(origin=Origin(scheme='https' host='api.push.apple.com' port=443))
  File "app.py", line 240, in func
    await self.aapns_client.send_notification(
  File "aapns/api.py", line 89, in send_notification
    response = await self.client.post(
  File "httpx/client.py", line 1305, in post
    return await self.request(
  File "httpx/client.py", line 1096, in request
    response = await self.send(
  File "httpx/client.py", line 1117, in send
    response = await self.send_handling_redirects(
  File "httpx/client.py", line 1147, in send_handling_redirects
    response = await self.send_handling_auth(
  File "httpx/client.py", line 1184, in send_handling_auth
    response = await self.send_single_request(request, timeout)
  File "httpx/client.py", line 1208, in send_single_request
    response = await dispatcher.send(request, timeout=timeout)
  File "httpx/dispatch/connection_pool.py", line 155, in send
    self.active_connections.remove(connection)
  File "httpx/dispatch/connection_pool.py", line 72, in remove
    del self.all[connection]
@ojii
Copy link
Author

ojii commented Feb 19, 2020

I've made a repo that can usually reproduced the issue: https://github.com/HENNGE/httpx-h2-testing

When the client tries to send ~1000 requests the KeyError usually happens.

@tomchristie
Copy link
Member

Okay, thanks for the report.

I've been taking a refresh onto the connection pool handling in httpcore... https://github.com/encode/httpcore/blob/master/httpcore/_async/connection_pool.py

Might be that I need to prioritize working on that again, in order to get this, #283, and #514 resolved.

@ojii
Copy link
Author

ojii commented Feb 20, 2020

The example I made indeed also shows #514. I'd expect it to use a single connection (or maybe 1 connection per 100 requests since that's the stream limit). Also with different parameters (eg issue 10k requests) it also exhausts the per-connection stream limit. In our real-world scenario, we don't currently hit the second issue, but the first two (KeyError and lots of connections) do happen and we're also seeing some sort of memory leak (but haven't been able to track this down to httpx or our app yet).

Let me know if there's more information you need or if there are other ways I can help.

@dimaqq
Copy link

dimaqq commented Feb 28, 2020

I've been poking at @ojii 's test and I think (not fully sure yet) that KeyError only happens in a very peculiar circumstance:

  1. many requests are send in parallel
  2. one of the requests fails (Timeout or ConnectionReset), which is legal
  3. await gather(all those requests) propagates the exception
  4. asyncio.run finishes
  5. asyncio attempts to clean up outstanding tasks (via cancellation if I get it right)
  6. during that, KeyError is hit

If that's correct, then the problem is in cancellation and not on the happy path.

@dimaqq
Copy link

dimaqq commented Feb 28, 2020

Having instrumented connection pool / connection store:

diff --git a/httpx/_dispatch/connection_pool.py b/httpx/_dispatch/connection_pool.py
index 61d323c..20c8f31 100644
--- a/httpx/_dispatch/connection_pool.py
+++ b/httpx/_dispatch/connection_pool.py
@@ -62,6 +62,7 @@ class ConnectionStore:
         return connection

     def add(self, connection: HTTPConnection) -> None:
+        __import__("logging").warning("add called on %s", id(self))
         self.all[connection] = 0.0
         try:
             self.by_origin[connection.origin][connection] = 0.0
@@ -69,12 +70,14 @@ class ConnectionStore:
             self.by_origin[connection.origin] = {connection: 0.0}

     def remove(self, connection: HTTPConnection) -> None:
+        __import__("logging").warning("remove called on %s", id(self))
         del self.all[connection]
         del self.by_origin[connection.origin][connection]
         if not self.by_origin[connection.origin]:
             del self.by_origin[connection.origin]

     def clear(self) -> None:
+        __import__("logging").warning("clear called on %s", id(self))
         self.all.clear()
         self.by_origin.clear()

I get log:

WARNING:root:add called on 140611732770096
WARNING:root:add called on 140611732770096
...
WARNING:root:remove called on 140611732770096
WARNING:root:clear called on 140611732770144
WARNING:root:remove called on 140611732770096
...

This is the traceback when .clear() was is called:

Traceback (most recent call last):
  File "asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "tests/stress/test_cancellation.py", line 25, in run_client
    await asyncio.gather(*todo)
  File "httpx/_client.py", line 1401, in __aexit__
    await self.aclose()
  File "httpx/_client.py", line 1390, in aclose
    await self.dispatch.close()
  File "httpx/_dispatch/connection_pool.py", line 211, in close
    self.keepalive_connections.clear()
  File "httpx/_dispatch/connection_pool.py", line 81, in clear

So, basically, the async with statement finishes before individual requests.

@florimondmanca
Copy link
Member

florimondmanca commented Mar 13, 2020

Hi here, the httpcore-based implementation has come along nicely now; has anyone tried to reproduce this against an HTTPX installation pinned on #804? Is it still an issue on that branch?

@florimondmanca florimondmanca added bug Something isn't working concurrency Issues related to concurrency and usage of async libraries pooling Issues and PRs relative to connection pooling labels Mar 13, 2020
@tomchristie
Copy link
Member

I'm going to close this on the assumption that its resolved in the 0.13.dev0 pre-release, given that we've totally refreshed the pool handling, and it's a much more careful implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working concurrency Issues related to concurrency and usage of async libraries pooling Issues and PRs relative to connection pooling
Projects
None yet
Development

No branches or pull requests

4 participants