-
Notifications
You must be signed in to change notification settings - Fork 505
Description
Checks
- I have updated to the lastest minor and patch version of Strands
- I have checked the documentation and this is not expected behavior
- I have searched ./issues and there are no duplicates of my issue
Strands Version
1.18.0
Python Version
3.13
Operating System
macOS 26.1
Installation Method
pip
Steps to Reproduce
from strands import Agent
from strands.models import BedrockModel
bedrock_model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-west-2",
streaming=True,
)
agent = Agent(model=bedrock_model)
Run agent with multiple cycles (tool calls trigger multiple model invocations)
agent("What is the answer to all questions?")
Observe traces in Langfuse via OpenTelemetry
Token counts and costs appear doubled
- Configure OpenTelemetry to export traces to Langfuse
- Run an agent that performs multiple event loop cycles (e.g., with tool calls)
- View the trace in Langfuse
- Observe that token counts and costs are approximately 2x the expected values
Expected Behavior
Token usage and costs should be reported once per trace, reflecting the actual usage from all model invocations without duplication.
Actual Behavior
Token usage and costs are double-counted because the SDK reports gen_ai.usage.* attributes at two levels:
- Per-cycle: end_model_invoke_span (tracer.py:331-335) reports usage on each
chat child span - Accumulated: end_agent_span (tracer.py:669-681) reports accumulated_usage (sum of all cycles) on the parent invoke_agent span
When Langfuse aggregates costs from all spans with gen_ai.usage.* attributes, it sums:
- cycle1 + cycle2 + ... + cycleN (from child spans)
- accumulated_usage (from parent span) = cycle1 + cycle2 + ... + cycleN
Result: 2x actual cost
Additional Context
Span hierarchy:
invoke_agent span (parent)
├── gen_ai.usage.* = accumulated_usage (sum of ALL cycles) ← DUPLICATE
│
├── chat span (cycle 1)
│ └── gen_ai.usage.* = cycle 1 tokens
├── chat span (cycle 2)
│ └── gen_ai.usage.* = cycle 2 tokens
└── chat span (cycle N)
└── gen_ai.usage.* = cycle N tokens
Relevant code locations:
- src/strands/telemetry/tracer.py:311-364 - end_model_invoke_span (per-cycle reporting)
- src/strands/telemetry/tracer.py:669-681 - end_agent_span (accumulated reporting)
Possible Solution
Remove the gen_ai.usage.* attributes from end_agent_span (lines 669-681 in tracer.py). The child chat spans already capture all per-cycle usage details, so the parent span doesn't need to duplicate the totals.
Related Issues
No response