Skip to content

feat(openai-agents): Support span streaming#6404

Open
alexander-alderman-webb wants to merge 12 commits into
masterfrom
webb/openai-agents/span-first-2
Open

feat(openai-agents): Support span streaming#6404
alexander-alderman-webb wants to merge 12 commits into
masterfrom
webb/openai-agents/span-first-2

Conversation

@alexander-alderman-webb
Copy link
Copy Markdown
Contributor

@alexander-alderman-webb alexander-alderman-webb commented May 26, 2026

Description

Use sentry_sdk.traces.start_span, replace Span.set_data() with StreamedSpan.set_attribute() and Span.set_status(SPANSTATUS.INTERNAL_ERROR) with StreamedSpan.status = SpanStatus.ERROR when in span streaming mode.

Parametrize tests on the trace lifecycle option.

Issues

Closes #6041

Reminders

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Codecov Results 📊

282 passed | Total: 282 | Pass Rate: 100% | Execution Time: 44.17s

All tests are passing successfully.

❌ Patch coverage is 6.48%. Project has 14819 uncovered lines.

Files with missing lines (9)
File Patch % Lines
agent_run.py 4.76% ⚠️ 140 Missing
models.py 0.00% ⚠️ 105 Missing
runner.py 0.00% ⚠️ 98 Missing
utils.py 16.16% ⚠️ 83 Missing
invoke_agent.py 0.00% ⚠️ 46 Missing
execute_tool.py 0.00% ⚠️ 34 Missing
ai_client.py 20.00% ⚠️ 28 Missing
handoff.py 0.00% ⚠️ 17 Missing
agent_workflow.py 50.00% ⚠️ 6 Missing

Generated by Codecov Action

Base automatically changed from webb/openai-agents/remote-tool to master May 26, 2026 07:31
@alexander-alderman-webb alexander-alderman-webb marked this pull request as ready for review May 26, 2026 07:41
@alexander-alderman-webb alexander-alderman-webb requested a review from a team as a code owner May 26, 2026 07:41
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit bf37fa2. Configure here.

Comment thread sentry_sdk/integrations/openai_agents/patches/runner.py
Comment thread sentry_sdk/integrations/openai_agents/utils.py
Copy link
Copy Markdown
Member

@ericapisani ericapisani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good but there's a few things around the potential for some bugs with a couple of the conditionals that were updated, and the inconsistent way that we're processing spans in tests (some aren't asserting the total amount of spans and the order in which they appear, others are).

Comment on lines +118 to +122
span is None
or isinstance(span, StreamedSpan)
and span.end_timestamp is not None
or not isinstance(span, StreamedSpan)
and span.timestamp is not None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit difficult to read and I think could lead to an accidental bug because of orders or precedence with the and and or keywords (the former taking evaluation precedence over the latter).

Reading this, I'm not sure if the intention was to have the statement evaluate as isinstance(span, StreamedSpan) and span.end_timestamp is not None rather than (span is None or isinstance(span, StreamedSpan)) and ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point about lack of intentionality.
The intention is for the condition to evaluate to true if the end timestamp has not yet been set, indicating that the span is still active. The complex conditional is used because Span.timestamp and StreamedSpan.end_timestamp represent the end timestamp (i.e., because the instance variable was renamed).
I've added brackets in e243b86

Comment thread sentry_sdk/integrations/openai_agents/patches/agent_run.py Outdated
Comment thread sentry_sdk/integrations/openai_agents/spans/ai_client.py
name=f"chat {model_name}",
origin=SPAN_ORIGIN,
)
# TODO-anton: remove hardcoded stuff and replace something that also works for embedding and so on
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how useful this comment is anymore - any chance we can remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still a bug 😬.
Created #6417 to track.

span_streaming = has_span_streaming_enabled(sentry_sdk.get_client().options)
if span_streaming:
with sentry_sdk.traces.start_span(
name=f"handoff from {from_agent.name} to {to_agent_name}",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a gen_ai.agent.name attribute and I'm thinking this should maybe be added as an attribute with a value of from_agent.name (since it's the originator of the span in terms of the action being taken)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have the same instinct to add an attribute to avoid users matching on the name.

However, we're removing the workflow span entirely per the RFC.

Comment thread sentry_sdk/integrations/openai_agents/spans/invoke_agent.py
assert result.final_output == "Hello, how can I help you?"

sentry_sdk.flush()
spans = [item.payload for item in items]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we're only interested in the invoke agent span and ai client span, we should still assert the number of expected spans here since this should be stable. If there are more/less spans, we'd want this test to fail

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider this out of scope for this PR because the existing test did not assert the total number of spans the goal of the PR is to port the integration to support span streaming (and make the same test guarantees as before).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's always been the intention to open another pull request to introduce the stricter checks, but if so, please add that in the pull request descriptions going forward so it's clear if it's an intentional decision or an accidental oversight.

As we've been introducing these stricter checks as part of other streamed span integration migrations and code review comments have been left when it's not been done, it appears to me that these stricter checks are part of the migration process, not out of scope.

In the case of this change set it's especially noticeable because it's being applied in some places and not others.

Comment on lines +1093 to +1188
if "string" in param_id and instructions is None: # type: ignore
assert "gen_ai.system_instructions" not in ai_client_span["data"]

assert invoke_agent_span["data"][
"gen_ai.request.messages"
] == safe_serialize(
[
{
"content": [{"text": "Test input", "type": "text"}],
"role": "user",
},
]
)

elif "string" in param_id:
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{
"type": "text",
"content": "You are a coding assistant that talks like a pirate.",
},
]
)
elif "blocks_no_type" in param_id and instructions is None: # type: ignore
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{"type": "text", "content": "You are a helpful assistant."},
]
)
elif "blocks_no_type" in param_id:
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{
"type": "text",
"content": "You are a coding assistant that talks like a pirate.",
},
{"type": "text", "content": "You are a helpful assistant."},
]
)
elif "blocks" in param_id and instructions is None: # type: ignore
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{"type": "text", "content": "You are a helpful assistant."},
]
)
elif "blocks" in param_id:
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{
"type": "text",
"content": "You are a coding assistant that talks like a pirate.",
},
{"type": "text", "content": "You are a helpful assistant."},
]
)
elif "parts_no_type" in param_id and instructions is None:
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{"type": "text", "content": "You are a helpful assistant."},
{"type": "text", "content": "Be concise and clear."},
]
)
elif "parts_no_type" in param_id:
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{
"type": "text",
"content": "You are a coding assistant that talks like a pirate.",
},
{"type": "text", "content": "You are a helpful assistant."},
{"type": "text", "content": "Be concise and clear."},
]
)
elif instructions is None: # type: ignore
assert ai_client_span["data"][
"gen_ai.system_instructions"
] == safe_serialize(
[
{"type": "text", "content": "You are a helpful assistant."},
{"type": "text", "content": "Be concise and clear."},
]
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a large conditional block that's not the easiest to read because of there sometimes being only param_id, sometimes both param_id and instructions, and the one case with instructions.

Can we look to re-organize this/break it up a bit? Maybe have a general conditional grouping to cover cases where instructions are None with the relevant combos with param_id in there, and the other cases involving param_id as a separate grouping?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's hard to read and needs improvement.
The underlying logic is complex because instructions can be provided both in the input array and in the dedicated instructions parameter when using the Responses API.
I would like to avoid increasing indentation though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just split up some tests and make better use of pytest.mark.parametrize (only tomorrow though).

Comment on lines +1365 to +1372
invoke_agent_span = next(
span
for span in spans
if span["attributes"]["sentry.op"] == OP.GEN_AI_INVOKE_AGENT
)
ai_client_span = next(
span for span in spans if span["attributes"]["sentry.op"] == OP.GEN_AI_CHAT
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I'm reviewing, I'm noticing some test cases with this pattern, and other test cases with this pattern.

Is there a reason why some are implemented one way vs the other? Is it to avoid assigning spans to variables that won't be asserted on?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the authors of the integration were thinking. I have not made a conscious decision in favor of one or the other style.

Comment thread tests/integrations/openai_agents/test_openai_agents.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate openai_agents to span first

2 participants