-
Notifications
You must be signed in to change notification settings - Fork 0
Add custom streaming to open ai agents sdk and temporal integration #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
danielmillerp
merged 1 commit into
main
from
dm/add-custom-streaming-oai-agents-temporal
Oct 31, 2025
Merged
Add custom streaming to open ai agents sdk and temporal integration #144
danielmillerp
merged 1 commit into
main
from
dm/add-custom-streaming-oai-agents-temporal
Oct 31, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
763aa11 to
8ab7a73
Compare
43c1b2b to
5c8dfd7
Compare
src/agentex/lib/core/temporal/plugins/openai_agents/tests/test_streaming_model.py
Outdated
Show resolved
Hide resolved
src/agentex/lib/core/temporal/plugins/openai_agents/interceptors/context_interceptor.py
Show resolved
Hide resolved
src/agentex/lib/core/temporal/plugins/openai_agents/hooks/hooks.py
Outdated
Show resolved
Hide resolved
src/agentex/lib/core/temporal/plugins/openai_agents/models/temporal_tracing_model.py
Show resolved
Hide resolved
1f55769 to
a89c352
Compare
a89c352 to
c65f5eb
Compare
jasonyang101
approved these changes
Oct 31, 2025
9320038 to
f3650b2
Compare
f3650b2 to
2f2a6ed
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Temporal + OpenAI Agents SDK Streaming Implementation
TL;DR
We use Temporal interceptors to add real-time streaming to Redis/UI while maintaining workflow determinism with the STANDARD OpenAI Agents plugin. The key challenge was threading
task_id(only known at runtime) through a plugin system initialized at startup. We solved this using Temporal's interceptor pattern to inject task_id into activity headers, making it available via context variables in the model.What we built: Real-time streaming of LLM responses to users while preserving Temporal's durability guarantees.
How: Interceptors thread task_id → Model reads from context → stream to Redis during activity → return complete response for determinism.
The win: NO forked plugin needed - uses standard
temporalio.contrib.openai_agents.OpenAIAgentsPlugin!Table of Contents
Background: How OpenAI Agents SDK Works
Before diving into Temporal integration, let's understand the basic OpenAI Agents SDK flow:
The key insight:
model.get_response()is where the actual LLM call happens.How Temporal's OpenAI Plugin Works
The Temporal plugin intercepts this flow to make LLM calls durable by converting them into Temporal activities. Here's how:
1. Plugin Setup and Runner Override
When you create the Temporal plugin and pass it to the worker:
2. Model Interception Chain
Here's the clever interception that happens:
3. The Model Stub Trick
The
TemporalOpenAIRunnerreplaces the agent's model with_TemporalModelStub:4. Activity Creation
The
_TemporalModelStubdoesn't call the LLM directly. Instead, it creates a Temporal activity:5. Actual LLM Call in Activity
Finally, inside the activity, the real LLM call happens:
Summary: The plugin intercepts at TWO levels:
The Streaming Challenge
Why Temporal Doesn't Support Streaming by Default
Temporal's philosophy is that activities should be:
Streaming breaks these guarantees:
Why We Need Streaming Anyway
For Scale/AgentEx customers, latency is critical:
Our pragmatic decision: Accept the tradeoff. If streaming fails midway, we restart from the beginning. This may cause a brief UX hiccup but enables the streaming experience users expect.
Our Streaming Solution
The Key Insight: Where We Can Hook In
When we instantiate the OpenAI plugin for Temporal, we can pass in a model provider:
IMPORTANT: This model provider returns the ACTUAL model that makes the LLM call - this is the final layer, NOT the stub. This is where
model.get_response()actually calls OpenAI's API. By providing our own model here, we can:stream=TrueOur
StreamingModelimplementation:The Task ID Problem
Here's the critical issue we had to solve:
The problem: The model provider is configured before we know the task_id, but streaming requires task_id to route to the correct Redis channel.
Our Solution: Temporal Interceptors + Context Variables
Instead of forking the plugin, we use Temporal's interceptor pattern to thread task_id through the system. This elegant solution uses standard Temporal features and requires NO custom plugin components!
Here's exactly how task_id flows through the interceptor chain:
Implementation Details
The Interceptor Approach - Clean and Maintainable
Instead of forking components, we use Temporal's interceptor system. Here's what we built:
1. StreamingInterceptor - The Main Component
2. Task ID Flow - Using Standard Components
Here's EXACTLY how task_id flows through the system without any forked components:
Step 1: Workflow stores task_id in instance variable
Step 2: Outbound Interceptor injects task_id into headers
Step 3: Inbound Interceptor extracts from headers and sets context
Step 4: StreamingModel reads from context variable
3. Worker Configuration - Simply Add the Interceptor
4. The Streaming Model - Where Magic Happens
This is where the actual streaming happens. Our
StreamingModelis what gets called inside the activity:5. Redis and AgentEx Streaming Infrastructure
Here's what happens under the hood with AgentEx's streaming system:
Redis Implementation Details
stream:{task_id}- Each task gets its own Redis streamStreamTaskMessageDelta: For text chunks (token by token)StreamTaskMessageFull: For complete messages (tool calls, reasoning)What Gets Streamed
UI Subscription
The frontend subscribes to
stream:{task_id}and receives:This decoupling means we can stream anything we want through Redis!
6. Workflow Integration
Usage
Installation
This plugin is included in the agentex-python package. No additional installation needed.
Basic Setup
In Your Workflow
Comparison with Original Temporal Plugin
invoke_model_activityinvoke_model_activity_streaming_TemporalModelStubStreamingTemporalModelStubTemporalOpenAIRunnerStreamingTemporalRunnerBenefits of the Interceptor Approach
Major Advantages Over Forking
No Code Duplication: Uses standard
temporalio.contrib.openai_agentspluginClean Architecture:
Simplicity:
Minimal Limitations
Streaming Semantics (unchanged):
Worker Configuration:
Future Improvements
Contribute Back:
Enhanced Features:
Alternative Approaches Considered
Key Innovation
The most important innovation is using interceptors for runtime context threading. Instead of forking the plugin to pass task_id through custom components, we use Temporal's interceptor system with Python's ContextVar. This allows:
Troubleshooting
No streaming visible in UI:
context = {"task_id": params.task.id}Import errors:
uv add agentex-sdk openai-agents temporalioActivity not found:
invoke_model_activity_streamingis registeredTesting
Running Tests
The streaming model implementation has comprehensive tests in
tests/test_streaming_model.pythat verify all configurations, tool types, and edge cases.From Repository Root
From Test Directory
Test Coverage
The test suite covers:
Note: Tests run faster without parallel execution (
-n0flag) and avoid potential state pollution between test workers. All 29 tests pass individually; parallel execution may show 4-6 intermittent failures due to shared mock state.Conclusion
This implementation uses Temporal interceptors to thread task_id through the standard OpenAI plugin to enable real-time streaming while maintaining workflow determinism. The key innovation is using interceptors with Python's ContextVar to propagate runtime context without forking any Temporal components.
This approach provides the optimal user experience with:
The interceptor pattern demonstrates how to extend Temporal plugins without forking, setting a precedent for future enhancements.