Skip to content

[FEATURE] Support a fixed, injected AsyncOpenAI client to enable alternative interface-compatible clients #1103

@dg-bain

Description

@dg-bain

Problem Statement

Many users need to supply an alternative interface-compatible implementation of openai.client (e.g., a GuardrailsAsyncOpenAI wrapper to implement OpenAI Guardrails). Today the SDK creates a new AsyncOpenAI client per request to avoid HTTPX connection sharing across event loops. This makes it impossible to:

  • Inject a pre-configured, guardrails-enabled client.
  • Reuse connection pools efficiently within a single event loop/worker.
  • Centralise observability, retries, timeouts and networking policy on one client.

Separately, Strands currently has very limited support for guardrails outside of Bedrock Guardrails, so users commonly reach for OpenAI-side guardrails via a wrapper client. Without support for injecting a fixed client, this is impossible

Proposed Solution

Allow OpenAIModel to accept a fixed, injected AsyncOpenAI-compatible client, created once per worker/event loop at application startup and closed at shutdown. Continue to support current behaviour when no client is provided (backwards compatible).

Key changes (additive, non-breaking):

  1. Constructor injection

    • OpenAIModel(client: Optional[Client] = None, client_args: Optional[dict] = None, …)
    • If client is provided, reuse it and do not create/close a new client internally.
    • If client is None, retain current behaviour (construct ephemeral client).
  2. Lifecycle guidance in docs

    • Recommend creating one client per worker/event loop (e.g., FastAPI lifespan startup/shutdown).
    • Emphasise that clients should not be shared across event loops, but can be safely reused across tasks within a loop.
  3. Acceptance criteria

    • Works with any AsyncOpenAI-compatible interface (e.g., GuardrailsAsyncOpenAI, custom proxies, instrumentation wrappers).
    • Streaming and structured output paths both reuse the injected client.
    • Clear examples for FastAPI and generic asyncio.

Code sketch (constructor + reuse):

from typing import Any, Optional, Protocol
from openai import AsyncOpenAI

class Client(Protocol):
    @property
    def chat(self) -> Any: ...

class OpenAIModel(Model):
    def __init__(self, client: Optional[Client] = None, client_args: Optional[dict] = None, **config):
        self.client = client
        self._owns_client = client is None
        self.client_args = client_args or {}
        self.config = dict(config)

    async def stream(...):
        request = self.format_request(...)
        if self.client is not None:
            # Reuse injected client
            response = await self.client.chat.completions.create(**request)
            ...
        else:
            # Back-compat
            async with AsyncOpenAI(**self.client_args) as c:
                response = await c.chat.completions.create(**request)
                ...

Example usage (FastAPI lifespan, per-worker client):

from fastapi import FastAPI
from openai import AsyncOpenAI
# from my_guardrails import GuardrailsAsyncOpenAI

app = FastAPI()

@app.on_event("startup")
async def startup():
    base = AsyncOpenAI()
    app.state.oai = base  # or GuardrailsAsyncOpenAI(base)
    app.state.model = OpenAIModel(client=app.state.oai, model_id="gpt-4o")

@app.on_event("shutdown")
async def shutdown():
    await app.state.oai.close()

Use Case

  • Guardrails: Wrap the OpenAI client with GuardrailsAsyncOpenAI to enforce content filters, schema validation, and redaction before responses reach application code.
  • Observability & policy: Centralise timeouts, retries, logging, tracing, and network egress policy (e.g., custom httpx.AsyncClient).
  • Performance: Reuse keep-alive connections and connection pools within a worker/event loop for lower latency and higher throughput.
  • Multi-model routing: Swap the injected client to target proxies or gateways without touching model code (e.g., toggling base_url, auth, or headers).

This would help with:

  • Meeting compliance requirements where guardrails must run before responses are consumed.
  • Reducing tail latency by avoiding per-request client construction.
  • Simplifying integration with enterprise networking and telemetry.

Alternatives Solutions

  1. Create a new client per request

    • Pros: Safe wrt event-loop boundaries; current behaviour.
    • Cons: Loses pooling; higher latency and allocation overhead; hard to apply cross-cutting concerns (guardrails, tracing) consistently.
  2. Global client shared across event loops

    • Pros: Simple in theory.
    • Cons: Unsafe; HTTPX pools cannot be shared across loops; leads to intermittent runtime errors.
  3. Disable pooling (force Connection: close)

    • Pros: Avoids cross-loop sharing issues.
    • Cons: Sacrifices performance; still doesn’t enable easy injection of guardrails wrappers.

Additional Context

  • Rationale: HTTPX connection pools are not shareable across asyncio event loops; reuse is safe within a loop.
  • Need: Strands’ current guardrails support focuses on Bedrock; many users need OpenAI-side guardrails today.
  • The OpenAI Python SDK supports async reuse and custom HTTP clients (http_client=), making injection straightforward.

If useful, I’m happy to contribute a PR with the constructor change, a small _stream_with_client helper, tests, and docs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions