feat: share HTTP connection pool across SDK instances; refactor polling by jrvb-rl · Pull Request #797 · runloopai/api-client-python

jrvb-rl · 2026-05-07T18:44:17Z

Summary

Shared HTTP connection pool: All Runloop / AsyncRunloop instances now share a single httpx.Client (or AsyncClient) by default, with thread-safe refcounting to close the pool when the last SDK client releases it. This enables HTTP/2 multiplexing across concurrent SDK instances and avoids ConnectTimeout storms under high concurrency. Users can opt out via shared_http_pool=False or by providing a custom http_client.
Enable HTTP/2 by default: DefaultHttpxClient and DefaultAsyncHttpxClient now set http2=True, enabling multiplexing over a single TCP connection.
Tune connection limits: Reduced DEFAULT_CONNECTION_LIMITS from 100/20 to 20/10 max/keepalive connections — appropriate for a shared pool with HTTP/2 multiplexing.
Refactor polling into wait_for_status: Extracted the long-poll retry loop from devbox/execution resources into lib/wait_for_status.py, eliminating duplicated timeout/retry/placeholder logic across four call sites.
Improve retry logging: _sleep_for_retry now logs the HTTP status code or exception type/message on retries, instead of a generic message.
Bug fixes:
- AsyncAPIClient.close() now has the same hasattr(self, "_client") guard as the sync version, preventing AttributeError if __init__ fails before _client is assigned.
- is_closed() now returns True after close() is called on a shared-pool client, even if the underlying transport is still open for other clients.
- copy() / with_options() now accepts an explicit shared_http_pool parameter, matching the __init__ signature.

Test plan

19 new tests in tests/test_shared_pool.py covering:
- Shared vs private pool identity verification
- Custom http_client bypasses sharing
- Refcount lifecycle (partial close keeps pool open, final close tears it down)
- Double-close safety (no negative refcounts)
- Pool recreation after full release
- copy() inheriting and overriding shared pool state
- Async equivalents for all of the above
All 1048 existing unit tests pass
Manual smoke test with concurrent SDK instances

🤖 Generated with Claude Code

- Add hasattr guard to AsyncAPIClient.close() matching sync version - Track per-instance _closed flag so is_closed() works correctly for shared pool clients (underlying transport stays open for others) - Add shared_http_pool param to copy()/with_options() to match __init__ - Add 19 tests covering sharing, refcounting, copy propagation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

james-rl

Apparently there is some weird behavior in python that does not work the way that other languages do.

This will cause issues for the async client (but not the sync client):

import asyncio
from runloop_api_client import AsyncRunloop
async def run_once():
    client = AsyncRunloop(bearer_token="token")
    await client.devboxes.retrieve("devbox-id")
    # imagine this client is not closed, or cleanup does not fully await close
asyncio.run(run_once())  # creates/uses shared async pool on loop A
asyncio.run(run_once())  # reuses same global AsyncClient on loop B

asyncio.run() creates a fresh event loop each time and closes it when done. On the second call, the global _shared_async_client may still point at the client first used on loop A. When loop B tries to use it, failures can show up as runtime errors like “event loop is closed”, “bound to a different event loop”, “attached to a different loop”, hanging cleanup, or flaky transport errors depending on exactly what state httpx/anyio has cached.

Python wants you to scope-manage the async pool instead of having one consolidated async pool. I find this extremely counterintuitive but it sorta makes sense given how the async client binds to the transport layer:

The reason that can break is that an httpx.AsyncClient internally uses async primitives and transports from httpcore/anyio. After it has opened connections, those internals can be tied to the async backend/loop that created or used them. If you later drive the same client from a different loop, Python may complain because loop-owned objects are being awaited or closed from the wrong loop.

The suggestion is to add an extra bit of loop awareness so that requests don't accidentally tread on each other. Something a bit like:

_async_pools: weakref.WeakKeyDictionary[asyncio.AbstractEventLoop, PoolState]

loop = asyncio.get_running_loop()
pool = _async_pools.get(loop)
if pool is None or pool.client.is_closed:
    pool = PoolState(client=AsyncHttpxClientWrapper(...), refcount=0)
    _async_pools[loop] = pool

The garbage collector should mop up resources, but this does also imply a need to change the close to ensure aclose and close play together nicely. So, the suggested changes:

async client: shared pool keyed by current event loop, created lazily on first request
explicit async close awaits final aclose()
add tests that create/use clients across two separate asyncio.run(...) calls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Address review feedback: async shared pool is now keyed by the running event loop using weakref.WeakKeyDictionary, so clients in different asyncio.run() calls get separate pools. close() properly awaits aclose() on the last release. __del__ uses a sync-only release path. Async tests converted to async def so they run inside an event loop and can exercise per-loop pool sharing correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jrvb-rl · 2026-05-07T19:34:38Z

Addressed all three points from your review in 8c24dd0:

Per-loop async pool: Switched from a single global _shared_async_client to weakref.WeakKeyDictionary[asyncio.AbstractEventLoop, _AsyncPoolState]. Each asyncio.run() call gets its own pool, and the WeakKeyDictionary lets the GC clean up pool state when the loop is collected.
Proper async close: close() now awaits _release_shared_pool() which awaits client.aclose() on the last release. __del__ uses a separate sync-only _release_shared_pool_sync() that does best-effort fire-and-forget via loop.create_task().
Cross-loop tests: Added TestAsyncCrossLoop with two tests:
- test_separate_loops_get_separate_pools — verifies clients in different asyncio.run() calls get different underlying httpx clients
- test_same_loop_shares_pool — verifies clients in the same asyncio.run() share the pool

Fixes three bugs in the shared HTTP pool: (1) async clients created without a running event loop leaked connections, (2) __del__ could drop clients without closing them, (3) sharing httpx.Client shared cookie jars across SDK instances. Now shares only the transport (connection pool) via refcounted wrappers, giving each SDK instance its own httpx client with isolated mutable state. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

james-rl

This looks good to me. One potential edge case for user-supplied http clients in comments.

I also saw this changed:
DEFAULT_CONNECTION_LIMITS = httpx.Limits(max_connections=20, max_keepalive_connections=10)

I think this is intentional, but since it was 100 previously I just want to call it out in case it was a temporary change.

james-rl · 2026-05-07T21:06:10Z

+        if http_client is not None:
+            resolved_shared = False


i think there is an edge case where a passed in http_client with custom options will have those options dropped since we're no longer using a client if they pass one in:

The previous version had this:
http_client = http_client or self._client

jrvb-rl · 2026-05-07T22:15:49Z

This looks good to me. One potential edge case for user-supplied http clients in comments.

I also saw this changed: DEFAULT_CONNECTION_LIMITS = httpx.Limits(max_connections=20, max_keepalive_connections=10)

I think this is intentional, but since it was 100 previously I just want to call it out in case it was a temporary change.

Yep. With the move to HTTP2 we don't really benefit from lots of connections / backend. Trimming it down reduces risk of running out of FDs.

After close(), Python can reuse the freed memory address for the next transport, making id() values match across separate asyncio.run() calls. Keep the first transport reference alive and use `is not` instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jrvb-rl and others added 2 commits May 7, 2026 07:50

cp

bdf843a

jrvb-rl changed the title ~~Share HTTP connection pool across SDK instances; refactor polling~~ feat: share HTTP connection pool across SDK instances; refactor polling May 7, 2026

jrvb-rl requested a review from james-rl May 7, 2026 18:45

james-rl requested changes May 7, 2026

View reviewed changes

jrvb-rl and others added 2 commits May 7, 2026 12:24

Fix pyright lint errors in shared pool tests

01b73fa

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jrvb-rl requested a review from james-rl May 7, 2026 19:44

james-rl approved these changes May 7, 2026

View reviewed changes

jrvb-rl merged commit 4f2fc4c into main May 8, 2026
6 of 7 checks passed

jrvb-rl deleted the jrvb/polling branch May 8, 2026 03:48

stainless-app Bot mentioned this pull request May 8, 2026

release: 1.20.3 #796

Merged

jrvb-rl mentioned this pull request May 14, 2026

fix(upload_file): unstick concurrent uploads stalling at 30s #803

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: share HTTP connection pool across SDK instances; refactor polling#797

feat: share HTTP connection pool across SDK instances; refactor polling#797
jrvb-rl merged 6 commits into
mainfrom
jrvb/polling

jrvb-rl commented May 7, 2026

Uh oh!

james-rl left a comment

Uh oh!

jrvb-rl commented May 7, 2026

Uh oh!

james-rl left a comment

Uh oh!

james-rl May 7, 2026

Uh oh!

jrvb-rl commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jrvb-rl commented May 7, 2026

Summary

Test plan

Uh oh!

james-rl left a comment

Choose a reason for hiding this comment

Uh oh!

jrvb-rl commented May 7, 2026

Uh oh!

james-rl left a comment

Choose a reason for hiding this comment

Uh oh!

james-rl May 7, 2026

Choose a reason for hiding this comment

Uh oh!

jrvb-rl commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants