Skip to content

[BUG] Regression: event_loop_cycle span duration becomes cumulative across recursive cycles #1930

@schakraborty-staclline

Description

@schakraborty-staclline

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.30.0 (regression from 1.21.0)

Python Version

3.12

Operating System

macOS (observed from local app, underlying service in Docker)

Installation Method

pip

Steps to Reproduce

  1. Install strands-agents==1.30.0.
  2. Create an agent execution that enters an execute_event_loop_cycle, performs tool use, and then continues through recurse_event_loop(...).
  3. Observe exported OpenTelemetry spans for execute_event_loop_cycle.
  4. Compare with strands-agents==1.21.0.

Source-level regression:

  • In 1.21.0, end_event_loop_cycle_span() called _end_span(...), ending the native span before recursion continued.
  • In 1.30.0, execute_event_loop_cycle is wrapped in with trace_api.use_span(cycle_span, end_on_exit=True):, and end_event_loop_cycle_span() only sets attributes such as gen_ai.event.end_time without ending the native span.
  • In the tool-use path, Strands sets the logical end and then immediately recurses:
tracer.end_event_loop_cycle_span(span=cycle_span, message=message, tool_result_message=tool_result_message)
events = recurse_event_loop(...)
async for event in events:
    yield event

Because recursion happens inside the same use_span(cycle_span, end_on_exit=True) block, the native OTel parent span remains open until recursive child work completes.

Minimal timing repro we ran against OTel behavior:

  • parent_native_duration_ns 66952805
  • child_native_duration_ns 45204568
  • parent_logical_end_offset_ns 21620105
  • parent_overhang_after_logical_end_ns 45332700

That overhang approximately matches the child duration, which is the cumulative behavior we see in practice.

Expected Behavior

Each execute_event_loop_cycle span should reflect that cycle's own duration only.

If a cycle logically ends before recursion begins, its native OTel span duration should also end there rather than continuing to accumulate child recursive cycle time.

Actual Behavior

After upgrading from strands-agents==1.21.0 to 1.30.0, execute_event_loop_cycle spans appear with cumulative bottom-up duration instead of per-cycle duration when a cycle performs tool use and then recurses.

The logical metadata on the span is correct:

  • gen_ai.event.start_time is correct
  • gen_ai.event.end_time is correct

However, the rendered span duration appears to be computed from the native OTel span timestamps, and the native end timestamp of the parent cycle span is delayed until after recursive child cycle work finishes.

This causes the displayed span duration to include child recursive cycle work, producing cumulative bottom-up latency instead of per-step latency.

Additional Context

We observed this in Langfuse, but the underlying issue appears to be in native OTel span lifecycle rather than backend-specific parsing.

Relevant exported span metadata looked like:

  • correct gen_ai.event.start_time
  • correct gen_ai.event.end_time
  • service version 1.30.0

We also verified by comparing the 1.21.0 and 1.30.0 Strands sources directly.

This seems related to the span lifecycle refactor where cycle/model spans are managed via use_span(..., end_on_exit=True) and end_event_loop_cycle_span() no longer calls _end_span(...).

Possible Solution

A likely fix would be to ensure the native execute_event_loop_cycle span is ended at the logical end of the cycle before recursive child cycles start, rather than relying on the outer use_span(..., end_on_exit=True) context to close it later.

Related Issues

Possibly related conceptually to #1876, but this report is about cycle duration becoming cumulative across recursive cycles rather than missing common attributes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions