Skip to content

[BUG] OpenTelemetry token/cost metrics double-counted in Langfuse due to duplicate reporting on parent and child spans #1267

@schakraborty-staclline

Description

@schakraborty-staclline

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.18.0

Python Version

3.13

Operating System

macOS 26.1

Installation Method

pip

Steps to Reproduce

from strands import Agent
from strands.models import BedrockModel

bedrock_model = BedrockModel(
model_id="us.anthropic.claude-sonnet-4-20250514-v1:0",
region_name="us-west-2",
streaming=True,
)

agent = Agent(model=bedrock_model)

Run agent with multiple cycles (tool calls trigger multiple model invocations)

agent("What is the answer to all questions?")

Observe traces in Langfuse via OpenTelemetry

Token counts and costs appear doubled

  1. Configure OpenTelemetry to export traces to Langfuse
  2. Run an agent that performs multiple event loop cycles (e.g., with tool calls)
  3. View the trace in Langfuse
  4. Observe that token counts and costs are approximately 2x the expected values

Expected Behavior

Token usage and costs should be reported once per trace, reflecting the actual usage from all model invocations without duplication.

Actual Behavior

Token usage and costs are double-counted because the SDK reports gen_ai.usage.* attributes at two levels:

  1. Per-cycle: end_model_invoke_span (tracer.py:331-335) reports usage on each
    chat child span
  2. Accumulated: end_agent_span (tracer.py:669-681) reports accumulated_usage (sum of all cycles) on the parent invoke_agent span

When Langfuse aggregates costs from all spans with gen_ai.usage.* attributes, it sums:

  • cycle1 + cycle2 + ... + cycleN (from child spans)
  • accumulated_usage (from parent span) = cycle1 + cycle2 + ... + cycleN

Result: 2x actual cost

Additional Context

Span hierarchy:

invoke_agent span (parent)
├── gen_ai.usage.* = accumulated_usage (sum of ALL cycles)  ← DUPLICATE
│
├── chat span (cycle 1)
│   └── gen_ai.usage.* = cycle 1 tokens
├── chat span (cycle 2)
│   └── gen_ai.usage.* = cycle 2 tokens
└── chat span (cycle N)
    └── gen_ai.usage.* = cycle N tokens

Relevant code locations:

  • src/strands/telemetry/tracer.py:311-364 - end_model_invoke_span (per-cycle reporting)
  • src/strands/telemetry/tracer.py:669-681 - end_agent_span (accumulated reporting)

Possible Solution

Remove the gen_ai.usage.* attributes from end_agent_span (lines 669-681 in tracer.py). The child chat spans already capture all per-cycle usage details, so the parent span doesn't need to duplicate the totals.

Related Issues

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions