-
Notifications
You must be signed in to change notification settings - Fork 557
Description
Problem Statement
Many users need to supply an alternative interface-compatible implementation of openai.client (e.g., a GuardrailsAsyncOpenAI wrapper to implement OpenAI Guardrails). Today the SDK creates a new AsyncOpenAI client per request to avoid HTTPX connection sharing across event loops. This makes it impossible to:
- Inject a pre-configured, guardrails-enabled client.
- Reuse connection pools efficiently within a single event loop/worker.
- Centralise observability, retries, timeouts and networking policy on one client.
Separately, Strands currently has very limited support for guardrails outside of Bedrock Guardrails, so users commonly reach for OpenAI-side guardrails via a wrapper client. Without support for injecting a fixed client, this is impossible
Proposed Solution
Allow OpenAIModel to accept a fixed, injected AsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown. Continue to support current behaviour when no client is provided (backwards compatible).
Key changes (additive, non-breaking):
-
Constructor injection
OpenAIModel(client: Optional[Client] = None, client_args: Optional[dict] = None, …)- If
clientis provided, reuse it and do not create/close a new client internally. - If
clientisNone, retain current behaviour (construct ephemeral client).
-
Lifecycle guidance in docs
- Recommend creating one client per worker/event loop (e.g., FastAPI lifespan
startup/shutdown). - Emphasise that clients should not be shared across event loops, but can be safely reused across tasks within a loop.
- Recommend creating one client per worker/event loop (e.g., FastAPI lifespan
-
Acceptance criteria
- Works with any
AsyncOpenAI-compatible interface (e.g.,GuardrailsAsyncOpenAI, custom proxies, instrumentation wrappers). - Streaming and structured output paths both reuse the injected client.
- Clear examples for FastAPI and generic asyncio.
- Works with any
Code sketch (constructor + reuse):
from typing import Any, Optional, Protocol
from openai import AsyncOpenAI
class Client(Protocol):
@property
def chat(self) -> Any: ...
class OpenAIModel(Model):
def __init__(self, client: Optional[Client] = None, client_args: Optional[dict] = None, **config):
self.client = client
self._owns_client = client is None
self.client_args = client_args or {}
self.config = dict(config)
async def stream(...):
request = self.format_request(...)
if self.client is not None:
# Reuse injected client
response = await self.client.chat.completions.create(**request)
...
else:
# Back-compat
async with AsyncOpenAI(**self.client_args) as c:
response = await c.chat.completions.create(**request)
...Example usage (FastAPI lifespan, per-worker client):
from fastapi import FastAPI
from openai import AsyncOpenAI
# from my_guardrails import GuardrailsAsyncOpenAI
app = FastAPI()
@app.on_event("startup")
async def startup():
base = AsyncOpenAI()
app.state.oai = base # or GuardrailsAsyncOpenAI(base)
app.state.model = OpenAIModel(client=app.state.oai, model_id="gpt-4o")
@app.on_event("shutdown")
async def shutdown():
await app.state.oai.close()Use Case
- Guardrails: Wrap the OpenAI client with
GuardrailsAsyncOpenAIto enforce content filters, schema validation, and redaction before responses reach application code. - Observability & policy: Centralise timeouts, retries, logging, tracing, and network egress policy (e.g., custom
httpx.AsyncClient). - Performance: Reuse keep-alive connections and connection pools within a worker/event loop for lower latency and higher throughput.
- Multi-model routing: Swap the injected client to target proxies or gateways without touching model code (e.g., toggling
base_url, auth, or headers).
This would help with:
- Meeting compliance requirements where guardrails must run before responses are consumed.
- Reducing tail latency by avoiding per-request client construction.
- Simplifying integration with enterprise networking and telemetry.
Alternatives Solutions
-
Create a new client per request
- Pros: Safe wrt event-loop boundaries; current behaviour.
- Cons: Loses pooling; higher latency and allocation overhead; hard to apply cross-cutting concerns (guardrails, tracing) consistently.
-
Global client shared across event loops
- Pros: Simple in theory.
- Cons: Unsafe; HTTPX pools cannot be shared across loops; leads to intermittent runtime errors.
-
Disable pooling (force
Connection: close)- Pros: Avoids cross-loop sharing issues.
- Cons: Sacrifices performance; still doesn’t enable easy injection of guardrails wrappers.
Additional Context
- Rationale: HTTPX connection pools are not shareable across asyncio event loops; reuse is safe within a loop.
- Need: Strands’ current guardrails support focuses on Bedrock; many users need OpenAI-side guardrails today.
- The OpenAI Python SDK supports async reuse and custom HTTP clients (
http_client=), making injection straightforward.
If useful, I’m happy to contribute a PR with the constructor change, a small _stream_with_client helper, tests, and docs.