Skip to content

feat: share HTTP connection pool across SDK instances; refactor polling#797

Merged
jrvb-rl merged 6 commits into
mainfrom
jrvb/polling
May 8, 2026
Merged

feat: share HTTP connection pool across SDK instances; refactor polling#797
jrvb-rl merged 6 commits into
mainfrom
jrvb/polling

Conversation

@jrvb-rl
Copy link
Copy Markdown
Contributor

@jrvb-rl jrvb-rl commented May 7, 2026

Summary

  • Shared HTTP connection pool: All Runloop / AsyncRunloop instances now share a single httpx.Client (or AsyncClient) by default, with thread-safe refcounting to close the pool when the last SDK client releases it. This enables HTTP/2 multiplexing across concurrent SDK instances and avoids ConnectTimeout storms under high concurrency. Users can opt out via shared_http_pool=False or by providing a custom http_client.
  • Enable HTTP/2 by default: DefaultHttpxClient and DefaultAsyncHttpxClient now set http2=True, enabling multiplexing over a single TCP connection.
  • Tune connection limits: Reduced DEFAULT_CONNECTION_LIMITS from 100/20 to 20/10 max/keepalive connections — appropriate for a shared pool with HTTP/2 multiplexing.
  • Refactor polling into wait_for_status: Extracted the long-poll retry loop from devbox/execution resources into lib/wait_for_status.py, eliminating duplicated timeout/retry/placeholder logic across four call sites.
  • Improve retry logging: _sleep_for_retry now logs the HTTP status code or exception type/message on retries, instead of a generic message.
  • Bug fixes:
    • AsyncAPIClient.close() now has the same hasattr(self, "_client") guard as the sync version, preventing AttributeError if __init__ fails before _client is assigned.
    • is_closed() now returns True after close() is called on a shared-pool client, even if the underlying transport is still open for other clients.
    • copy() / with_options() now accepts an explicit shared_http_pool parameter, matching the __init__ signature.

Test plan

  • 19 new tests in tests/test_shared_pool.py covering:
    • Shared vs private pool identity verification
    • Custom http_client bypasses sharing
    • Refcount lifecycle (partial close keeps pool open, final close tears it down)
    • Double-close safety (no negative refcounts)
    • Pool recreation after full release
    • copy() inheriting and overriding shared pool state
    • Async equivalents for all of the above
  • All 1048 existing unit tests pass
  • Manual smoke test with concurrent SDK instances

🤖 Generated with Claude Code

jrvb-rl and others added 2 commits May 7, 2026 07:50
- Add hasattr guard to AsyncAPIClient.close() matching sync version
- Track per-instance _closed flag so is_closed() works correctly for
  shared pool clients (underlying transport stays open for others)
- Add shared_http_pool param to copy()/with_options() to match __init__
- Add 19 tests covering sharing, refcounting, copy propagation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrvb-rl jrvb-rl changed the title Share HTTP connection pool across SDK instances; refactor polling feat: share HTTP connection pool across SDK instances; refactor polling May 7, 2026
@jrvb-rl jrvb-rl requested a review from james-rl May 7, 2026 18:45
Copy link
Copy Markdown
Contributor

@james-rl james-rl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there is some weird behavior in python that does not work the way that other languages do.

This will cause issues for the async client (but not the sync client):

import asyncio
from runloop_api_client import AsyncRunloop
async def run_once():
    client = AsyncRunloop(bearer_token="token")
    await client.devboxes.retrieve("devbox-id")
    # imagine this client is not closed, or cleanup does not fully await close
asyncio.run(run_once())  # creates/uses shared async pool on loop A
asyncio.run(run_once())  # reuses same global AsyncClient on loop B

asyncio.run() creates a fresh event loop each time and closes it when done. On the second call, the global _shared_async_client may still point at the client first used on loop A. When loop B tries to use it, failures can show up as runtime errors like “event loop is closed”, “bound to a different event loop”, “attached to a different loop”, hanging cleanup, or flaky transport errors depending on exactly what state httpx/anyio has cached.

Python wants you to scope-manage the async pool instead of having one consolidated async pool. I find this extremely counterintuitive but it sorta makes sense given how the async client binds to the transport layer:

The reason that can break is that an httpx.AsyncClient internally uses async primitives and transports from httpcore/anyio. After it has opened connections, those internals can be tied to the async backend/loop that created or used them. If you later drive the same client from a different loop, Python may complain because loop-owned objects are being awaited or closed from the wrong loop.

The suggestion is to add an extra bit of loop awareness so that requests don't accidentally tread on each other. Something a bit like:

_async_pools: weakref.WeakKeyDictionary[asyncio.AbstractEventLoop, PoolState]

loop = asyncio.get_running_loop()
pool = _async_pools.get(loop)
if pool is None or pool.client.is_closed:
    pool = PoolState(client=AsyncHttpxClientWrapper(...), refcount=0)
    _async_pools[loop] = pool

The garbage collector should mop up resources, but this does also imply a need to change the close to ensure aclose and close play together nicely. So, the suggested changes:

  • async client: shared pool keyed by current event loop, created lazily on first request
  • explicit async close awaits final aclose()
  • add tests that create/use clients across two separate asyncio.run(...) calls

jrvb-rl and others added 2 commits May 7, 2026 12:24
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review feedback: async shared pool is now keyed by the running
event loop using weakref.WeakKeyDictionary, so clients in different
asyncio.run() calls get separate pools. close() properly awaits
aclose() on the last release. __del__ uses a sync-only release path.

Async tests converted to async def so they run inside an event loop
and can exercise per-loop pool sharing correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrvb-rl
Copy link
Copy Markdown
Contributor Author

jrvb-rl commented May 7, 2026

Addressed all three points from your review in 8c24dd0:

  1. Per-loop async pool: Switched from a single global _shared_async_client to weakref.WeakKeyDictionary[asyncio.AbstractEventLoop, _AsyncPoolState]. Each asyncio.run() call gets its own pool, and the WeakKeyDictionary lets the GC clean up pool state when the loop is collected.

  2. Proper async close: close() now awaits _release_shared_pool() which awaits client.aclose() on the last release. __del__ uses a separate sync-only _release_shared_pool_sync() that does best-effort fire-and-forget via loop.create_task().

  3. Cross-loop tests: Added TestAsyncCrossLoop with two tests:

    • test_separate_loops_get_separate_pools — verifies clients in different asyncio.run() calls get different underlying httpx clients
    • test_same_loop_shares_pool — verifies clients in the same asyncio.run() share the pool

@jrvb-rl jrvb-rl requested a review from james-rl May 7, 2026 19:44
Fixes three bugs in the shared HTTP pool: (1) async clients created
without a running event loop leaked connections, (2) __del__ could
drop clients without closing them, (3) sharing httpx.Client shared
cookie jars across SDK instances. Now shares only the transport
(connection pool) via refcounted wrappers, giving each SDK instance
its own httpx client with isolated mutable state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@james-rl james-rl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. One potential edge case for user-supplied http clients in comments.

I also saw this changed:
DEFAULT_CONNECTION_LIMITS = httpx.Limits(max_connections=20, max_keepalive_connections=10)

I think this is intentional, but since it was 100 previously I just want to call it out in case it was a temporary change.

Comment on lines +286 to +287
if http_client is not None:
resolved_shared = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is an edge case where a passed in http_client with custom options will have those options dropped since we're no longer using a client if they pass one in:

The previous version had this:
http_client = http_client or self._client

@jrvb-rl
Copy link
Copy Markdown
Contributor Author

jrvb-rl commented May 7, 2026

This looks good to me. One potential edge case for user-supplied http clients in comments.

I also saw this changed: DEFAULT_CONNECTION_LIMITS = httpx.Limits(max_connections=20, max_keepalive_connections=10)

I think this is intentional, but since it was 100 previously I just want to call it out in case it was a temporary change.

Yep. With the move to HTTP2 we don't really benefit from lots of connections / backend. Trimming it down reduces risk of running out of FDs.

After close(), Python can reuse the freed memory address for the next
transport, making id() values match across separate asyncio.run()
calls. Keep the first transport reference alive and use `is not`
instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrvb-rl jrvb-rl merged commit 4f2fc4c into main May 8, 2026
6 of 7 checks passed
@jrvb-rl jrvb-rl deleted the jrvb/polling branch May 8, 2026 03:48
@stainless-app stainless-app Bot mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants