-
Notifications
You must be signed in to change notification settings - Fork 730
Description
Summary
When using stream_async() with AgentCore (following the official example), events yielded to the caller contain non-JSON-serializable objects (Agent, OpenTelemetry Span, etc.) merged into the event dict by prepare(). This causes AgentCore to fall back to repr() serialization, producing massive Python repr strings on the SSE wire that the consumer cannot parse.
Observed behavior
On long-running agent invocations (~13 tool-calling turns with extended thinking enabled), the SSE response stream hits the AgentCore 100 MB response payload limit and is truncated with HTTP/2 INTERNAL_ERROR before the final structured_result can be emitted.
Measured breakdown of a typical stream:
| Content | Events | Size | % of stream |
|---|---|---|---|
ModelStreamEvent subclasses (repr serialized) |
782 | 99.8 MB | 99.8% |
ModelStreamChunkEvent (valid JSON) |
956 | 200 KB | 0.2% |
The 782 repr-serialized events average 131 KB each because they contain the full chain-of-thought reasoning text. They appear on the wire as Python repr (single quotes) instead of JSON, e.g.:
data: {"reasoningText": "Let me think...", "delta": {"reasoningContent": {"text": "Let me think..."}}, "reasoning": true, "agent": <strands.agent.agent.Agent object at 0x...>, "event_loop_cycle_span": <opentelemetry.trace.Span object at 0x...>, ...}
Root cause
ModelStreamEvent.prepare() (src/strands/types/_events.py) merges invocation_state directly into the event dict:
class ModelStreamEvent(TypedEvent):
def prepare(self, invocation_state: dict) -> None:
if "delta" in self:
self.update(invocation_state)invocation_state contains non-serializable objects (Agent instance, Span, etc.). Since all ModelStreamEvent subclasses (TextStreamEvent, ReasoningTextStreamEvent, ToolUseStreamEvent, ReasoningSignatureStreamEvent) have a "delta" key, they all get these fields merged in.
ModelStreamChunkEvent is unaffected because it has no prepare() override.
Reproduction
Follow the AgentCore streaming example with an agent that uses extended thinking and makes multiple tool calls. The stream will contain both valid JSON events and Python repr events. On long invocations the repr events accumulate past 100 MB and the stream is truncated.
Current workaround
We filter events before yielding to AgentCore:
try:
json.dumps(event)
except (TypeError, ValueError):
continue
yield eventThis works but feels like a workaround for something the SDK should handle.
Possible fixes (open question — we may be missing something)
We are not sure if there is a reason prepare() needs to merge into the dict itself, or if there is a recommended pattern for filtering events before yielding to AgentCore that we have missed. If so, we would appreciate guidance.
Some ideas if this is indeed unintended:
- Store internal state separately — e.g.
event._context = invocation_stateinstead ofself.update(invocation_state), keeping the dict serializable - Strip internal fields in
stream_async()before yielding to the caller - Document the filtering requirement in the AgentCore streaming example
Environment
- strands-agents >= 0.1.0
- AgentCore Runtime (us-east-1)
- Agent uses Bedrock Claude with extended thinking and structured output
- ~13 tool-calling turns per invocation