-
Notifications
You must be signed in to change notification settings - Fork 571
fix(openai-agents): Avoid double span exit on exception #5174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix(openai-agents): Avoid double span exit on exception #5174
Conversation
…webb/openai-agents-span-management
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #5174 +/- ##
==========================================
+ Coverage 83.94% 83.95% +0.01%
==========================================
Files 181 181
Lines 18419 18451 +32
Branches 3281 3286 +5
==========================================
+ Hits 15462 15491 +29
- Misses 1944 1946 +2
- Partials 1013 1014 +1
|
| # Handled in _run_single_turn() patch. | ||
| raise exc.original from None | ||
| except Exception as exc: | ||
| # Invoke agent span is not finished in this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add a get_current_span() again in this case and we would never lose spans.
You also can't guarantee that the SDK does not crash if you try to exit a span you obtain with get_current_span(), so I decided the just drop the span for now.
Much of AgentRunner.run() that is not spent in _run_single_turn() is before the agent invocation span is created anyway. The only cases in which we lose a span is that any of the post-turn steps like output guardrails raise an exception that is not an AgentsException.
| _record_exception_on_span(span, exc) | ||
| end_invoke_agent_span(context_wrapper, agent) | ||
|
|
||
| raise _SingleTurnException(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created _SingleTurnException to avoid calling capture_exception() twice.
Otherwise, exceptions raised in _run_single_turn() that are not of type AgentsException would be captured in both run() and _run_single_turn().
| _capture_exception(exc) | ||
| raise exc from None | ||
|
|
||
| end_invoke_agent_span(run_result.context_wrapper, agent) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate of #5174 (comment)
| # Invoke agent span is not finished in this case. | ||
| # This is much less likely to occur than other cases because | ||
| # AgentRunner.run() is "just" a while loop around _run_single_turn. | ||
| _capture_exception(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's hard for me to follow the flow, but my recommendation would be this:
- we should not call
capture_exceptionhere at all, just have afinallyblock that takes care of the span management - the exception should be captured only in the
runner.pypatch - the library internally already decides what to re-raise and what to swallow and we don't need to care about those, only what the library finally emits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant that we should only have this one and not the one in agent_run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason there are special cases for exceptions, and for handling exceptions at different levels is that
openai-agentsonly attaches the context wrapper whenAgentsExceptionis raised; and- we always have a handle on the
context_wrapperinagent_run.py, and only sometimes inrunner.py.
Regarding having a finally block, I did not go with that approach for different reasons in agent_run.py and runner.py.
In agent_run.py, one agent invocation can span multiple calls to the function we patch, so it's intentional to only finish the span in the except block. The broad Exception is caught here so we can finish the span even when an exception which does not subclass AgentsException is raised. In runner.py we do not have a handle on the span if it is not an AgentsException.
In runner.py, we do not have a finally block because you access the context wrapper from different places depending on the code path. In the non-exception case, the context wrapper is on the return value of the function, and in the exception case, the context wrapper is attached to the exception object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it's intentional to only finish the span in the except block.
in that case keeping the span management in the except block is fine.
But I still think the capture_exception should only be there on the outermost layer. We should not be coupling exception handling and span management this tightly.
Regarding handling missing state on our end, we should simply write logic with presence checks so we don't rely on the actual underlying implementation of the lib so much. Currently, much of the design is very brittle to internals changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the span context manager __exit__ already responds to exceptions, but instead whoever wrote this decided to pass (None, None, None) here which makes everything extra complicated.
| span.__exit__(None, None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved capture_exception to the outer layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also realized we don't need this _SingleTurnException stuff so that's removed now.
Hopefully that makes it slightly less brittle to internals changing.
Description
With #5165 we have a handle on the agent invocation span, as the span is now attached to the context wrapper returned from the
AgentRunner.run()method.Exceptions of type
AgentsExceptionare now caught onAgentRunner.run()instead ofAgentRunner._run_single_turn(), becauseopenai-agentsattaches the context wrapper to these exceptions in the method.Removing
sentry_sdk.get_current_span()avoids double exit scenarios when the active span is not an agent invocation span. This is the case when theasynciointegration is used.Previously, the script below results in a double span exit unhandled exception.
Issues
Reminders
tox -e linters.feat:,fix:,ref:,meta:)