-
Notifications
You must be signed in to change notification settings - Fork 730
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
1.30.0 (regression from 1.21.0)
Python Version
3.12
Operating System
macOS (observed from local app, underlying service in Docker)
Installation Method
pip
Steps to Reproduce
- Install
strands-agents==1.30.0. - Create an agent execution that enters an
execute_event_loop_cycle, performs tool use, and then continues throughrecurse_event_loop(...). - Observe exported OpenTelemetry spans for
execute_event_loop_cycle. - Compare with
strands-agents==1.21.0.
Source-level regression:
- In
1.21.0,end_event_loop_cycle_span()called_end_span(...), ending the native span before recursion continued. - In
1.30.0,execute_event_loop_cycleis wrapped inwith trace_api.use_span(cycle_span, end_on_exit=True):, andend_event_loop_cycle_span()only sets attributes such asgen_ai.event.end_timewithout ending the native span. - In the tool-use path, Strands sets the logical end and then immediately recurses:
tracer.end_event_loop_cycle_span(span=cycle_span, message=message, tool_result_message=tool_result_message)
events = recurse_event_loop(...)
async for event in events:
yield eventBecause recursion happens inside the same use_span(cycle_span, end_on_exit=True) block, the native OTel parent span remains open until recursive child work completes.
Minimal timing repro we ran against OTel behavior:
parent_native_duration_ns 66952805child_native_duration_ns 45204568parent_logical_end_offset_ns 21620105parent_overhang_after_logical_end_ns 45332700
That overhang approximately matches the child duration, which is the cumulative behavior we see in practice.
Expected Behavior
Each execute_event_loop_cycle span should reflect that cycle's own duration only.
If a cycle logically ends before recursion begins, its native OTel span duration should also end there rather than continuing to accumulate child recursive cycle time.
Actual Behavior
After upgrading from strands-agents==1.21.0 to 1.30.0, execute_event_loop_cycle spans appear with cumulative bottom-up duration instead of per-cycle duration when a cycle performs tool use and then recurses.
The logical metadata on the span is correct:
gen_ai.event.start_timeis correctgen_ai.event.end_timeis correct
However, the rendered span duration appears to be computed from the native OTel span timestamps, and the native end timestamp of the parent cycle span is delayed until after recursive child cycle work finishes.
This causes the displayed span duration to include child recursive cycle work, producing cumulative bottom-up latency instead of per-step latency.
Additional Context
We observed this in Langfuse, but the underlying issue appears to be in native OTel span lifecycle rather than backend-specific parsing.
Relevant exported span metadata looked like:
- correct
gen_ai.event.start_time - correct
gen_ai.event.end_time - service version
1.30.0
We also verified by comparing the 1.21.0 and 1.30.0 Strands sources directly.
This seems related to the span lifecycle refactor where cycle/model spans are managed via use_span(..., end_on_exit=True) and end_event_loop_cycle_span() no longer calls _end_span(...).
Possible Solution
A likely fix would be to ensure the native execute_event_loop_cycle span is ended at the logical end of the cycle before recursive child cycles start, rather than relying on the outer use_span(..., end_on_exit=True) context to close it later.
Related Issues
Possibly related conceptually to #1876, but this report is about cycle duration becoming cumulative across recursive cycles rather than missing common attributes.