feat: adopt current OTEL cached token conventions by bml1g12 · Pull Request #5447 · livekit/agents

bml1g12 · 2026-04-14T13:52:17Z

Context

Today, when we emit the gen_ai.usage.input_tokens metric, we calculate this already correctly in LiveKit (as defined by https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/) however two gaps exist in LiveKit today:

The cached text token count is emited to a custom namespace not recognised by OTEL or major providers like LangFuse or Pydantic (see "## Removed trace type constants (not in the OTEL spec)" section below)
Audio token counting is not part of official OTEL syntax, but recently LangFuse fix(otel): normalize gen ai usage details langfuse/langfuse#13110 and pydantic More detailed token usage span attributes and metrics open-telemetry/semantic-conventions#1959 (comment) have adopted a convention for this that would be useful LiveKit to follow, as it allows cost estimation of gpt-realtime in LangFuse. LangFuse has recently merged a fix on their side to correctly map values stored in the convention used in this PR to their internal cost modelling .

Summary

Aligns realtime and related LLM spans with current OpenTelemetry GenAI usage conventions and Langfuse OTEL ingestion (langfuse#13110): top-level gen_ai.usage.input_tokens / output_tokens represent text counts; cache read uses the standard dotted key; audio breakdowns use gen_ai.usage.details.* so backends can normalize usage without double-counting cache or mixing modalities.

Internal metrics (RealtimeModelMetrics, ModelUsageCollector, Prometheus counters) are unchanged—only trace attributes and trace_types constants are updated.

Attribute mapping

Realtime spans (record_realtime_metrics) now set:

Intent	OTEL / Langfuse-oriented attribute	Source on `RealtimeModelMetrics`
Top-level input (text-only, inclusive of cached text)	`gen_ai.usage.input_tokens`	`input_token_details.text_tokens`
Top-level output (text-only)	`gen_ai.usage.output_tokens`	`output_token_details.text_tokens`
Cached text (cache read)	`gen_ai.usage.cache_read.input_tokens`	`cached_tokens_details.text_tokens` (if non-zero)
Input audio (non-cached)	`gen_ai.usage.details.input_audio_tokens`	`input_token_details.audio_tokens`
Cached audio read	`gen_ai.usage.details.cache_audio_read_tokens`	`cached_tokens_details.audio_tokens`
Output audio	`gen_ai.usage.details.output_audio_tokens`	`output_token_details.audio_tokens`

gen_ai.usage.cache_creation.input_tokens is not set until a provider exposes cache-creation counts in our metrics model.

Removed trace type constants (not in the OTEL spec)

These ATTR_GEN_AI_USAGE_* aliases were removed because their string keys were informal (LiveKit/Langfuse-style flat names) and not defined in OpenTelemetry GenAI semantic conventions:

Removed constant	Removed attribute key
`ATTR_GEN_AI_USAGE_INPUT_TEXT_TOKENS`	`gen_ai.usage.input_text_tokens`
`ATTR_GEN_AI_USAGE_INPUT_AUDIO_TOKENS`	`gen_ai.usage.input_audio_tokens`
`ATTR_GEN_AI_USAGE_INPUT_CACHED_TOKENS`	`gen_ai.usage.input_cached_tokens`
`ATTR_GEN_AI_USAGE_OUTPUT_TEXT_TOKENS`	`gen_ai.usage.output_text_tokens`
`ATTR_GEN_AI_USAGE_OUTPUT_AUDIO_TOKENS`	`gen_ai.usage.output_audio_tokens`

Top-level gen_ai.usage.input_tokens and gen_ai.usage.output_tokens remain; they now map to text token fields only on realtime spans, consistent with semconv + Langfuse normalization.

Other call sites

Cascaded LLM spans (llm.py): set gen_ai.usage.cache_read.input_tokens when prompt_cached_tokens > 0.
Eval judge spans (run_result.py): same cache-read key; redundant duplicate text keys removed.

Tests

Unit tests assert the attribute dict produced by record_realtime_metrics (including omission of zero breakdowns).

Breaking note: External code that imported the five removed ATTR_GEN_AI_USAGE_* names must switch to the new constants (ATTR_GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS, ATTR_GEN_AI_USAGE_DETAILS_*, etc.).

feat: adopt current OTEL cached token conventions

9bcb5e9

bml1g12 mentioned this pull request Apr 14, 2026

Improve gpt-realtime token counting/Fix cost tracking via OTEL spans #5121

Closed

refactor: linting

90f81f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adopt current OTEL cached token conventions#5447

feat: adopt current OTEL cached token conventions#5447
bml1g12 wants to merge 2 commits intolivekit:mainfrom
bml1g12:cached_otel_token_counts

bml1g12 commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bml1g12 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Summary

Attribute mapping

Removed trace type constants (not in the OTEL spec)

Other call sites

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bml1g12 commented Apr 14, 2026 •

edited

Loading