feat: share HTTP connection pool across SDK instances; refactor polling#797
Conversation
- Add hasattr guard to AsyncAPIClient.close() matching sync version - Track per-instance _closed flag so is_closed() works correctly for shared pool clients (underlying transport stays open for others) - Add shared_http_pool param to copy()/with_options() to match __init__ - Add 19 tests covering sharing, refcounting, copy propagation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
james-rl
left a comment
There was a problem hiding this comment.
Apparently there is some weird behavior in python that does not work the way that other languages do.
This will cause issues for the async client (but not the sync client):
import asyncio
from runloop_api_client import AsyncRunloop
async def run_once():
client = AsyncRunloop(bearer_token="token")
await client.devboxes.retrieve("devbox-id")
# imagine this client is not closed, or cleanup does not fully await close
asyncio.run(run_once()) # creates/uses shared async pool on loop A
asyncio.run(run_once()) # reuses same global AsyncClient on loop B
asyncio.run()creates a fresh event loop each time and closes it when done. On the second call, the global_shared_async_clientmay still point at the client first used on loop A. When loop B tries to use it, failures can show up as runtime errors like “event loop is closed”, “bound to a different event loop”, “attached to a different loop”, hanging cleanup, or flaky transport errors depending on exactly what statehttpx/anyiohas cached.
Python wants you to scope-manage the async pool instead of having one consolidated async pool. I find this extremely counterintuitive but it sorta makes sense given how the async client binds to the transport layer:
The reason that can break is that an httpx.AsyncClient internally uses async primitives and transports from httpcore/anyio. After it has opened connections, those internals can be tied to the async backend/loop that created or used them. If you later drive the same client from a different loop, Python may complain because loop-owned objects are being awaited or closed from the wrong loop.
The suggestion is to add an extra bit of loop awareness so that requests don't accidentally tread on each other. Something a bit like:
_async_pools: weakref.WeakKeyDictionary[asyncio.AbstractEventLoop, PoolState]
loop = asyncio.get_running_loop()
pool = _async_pools.get(loop)
if pool is None or pool.client.is_closed:
pool = PoolState(client=AsyncHttpxClientWrapper(...), refcount=0)
_async_pools[loop] = poolThe garbage collector should mop up resources, but this does also imply a need to change the close to ensure aclose and close play together nicely. So, the suggested changes:
- async client: shared pool keyed by current event loop, created lazily on first request
- explicit async close awaits final aclose()
- add tests that create/use clients across two separate asyncio.run(...) calls
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review feedback: async shared pool is now keyed by the running event loop using weakref.WeakKeyDictionary, so clients in different asyncio.run() calls get separate pools. close() properly awaits aclose() on the last release. __del__ uses a sync-only release path. Async tests converted to async def so they run inside an event loop and can exercise per-loop pool sharing correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Addressed all three points from your review in 8c24dd0:
|
Fixes three bugs in the shared HTTP pool: (1) async clients created without a running event loop leaked connections, (2) __del__ could drop clients without closing them, (3) sharing httpx.Client shared cookie jars across SDK instances. Now shares only the transport (connection pool) via refcounted wrappers, giving each SDK instance its own httpx client with isolated mutable state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
james-rl
left a comment
There was a problem hiding this comment.
This looks good to me. One potential edge case for user-supplied http clients in comments.
I also saw this changed:
DEFAULT_CONNECTION_LIMITS = httpx.Limits(max_connections=20, max_keepalive_connections=10)
I think this is intentional, but since it was 100 previously I just want to call it out in case it was a temporary change.
| if http_client is not None: | ||
| resolved_shared = False |
There was a problem hiding this comment.
i think there is an edge case where a passed in http_client with custom options will have those options dropped since we're no longer using a client if they pass one in:
The previous version had this:
http_client = http_client or self._client
Yep. With the move to HTTP2 we don't really benefit from lots of connections / backend. Trimming it down reduces risk of running out of FDs. |
After close(), Python can reuse the freed memory address for the next transport, making id() values match across separate asyncio.run() calls. Keep the first transport reference alive and use `is not` instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Runloop/AsyncRunloopinstances now share a singlehttpx.Client(orAsyncClient) by default, with thread-safe refcounting to close the pool when the last SDK client releases it. This enables HTTP/2 multiplexing across concurrent SDK instances and avoidsConnectTimeoutstorms under high concurrency. Users can opt out viashared_http_pool=Falseor by providing a customhttp_client.DefaultHttpxClientandDefaultAsyncHttpxClientnow sethttp2=True, enabling multiplexing over a single TCP connection.DEFAULT_CONNECTION_LIMITSfrom 100/20 to 20/10 max/keepalive connections — appropriate for a shared pool with HTTP/2 multiplexing.wait_for_status: Extracted the long-poll retry loop from devbox/execution resources intolib/wait_for_status.py, eliminating duplicated timeout/retry/placeholder logic across four call sites._sleep_for_retrynow logs the HTTP status code or exception type/message on retries, instead of a generic message.AsyncAPIClient.close()now has the samehasattr(self, "_client")guard as the sync version, preventingAttributeErrorif__init__fails before_clientis assigned.is_closed()now returnsTrueafterclose()is called on a shared-pool client, even if the underlying transport is still open for other clients.copy()/with_options()now accepts an explicitshared_http_poolparameter, matching the__init__signature.Test plan
tests/test_shared_pool.pycovering:http_clientbypasses sharingcopy()inheriting and overriding shared pool state🤖 Generated with Claude Code