generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 655
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem Statement
When comparing model performance across providers (Bedrock, Ollama, OpenAI, etc.), there's no built-in way to measure generation throughput. Tools like Ollama display tokens/second after runs, which is valuable for performance tuning and provider comparison. Currently, Strands tracks outputTokens and latencyMs separately, but doesn't compute the rate.
Proposed Solution
Add tokens per second metrics to the existing metrics system:
- Add
output_tokens_per_secondtoEventLoopCycleMetric(per-turn) - Add
average_output_tokens_per_secondcomputed property onEventLoopMetrics(across all turns) - Export via OpenTelemetry histogram (
strands.event_loop.output_tokens_per_second)
Calculation: output_tokens_per_second = outputTokens / (latencyMs / 1000)
Use Case
- Compare generation speed across model providers
- Identify performance regressions when switching models
- Monitor throughput in production deployments
- Benchmark different model configurations
Alternatives Solutions
Users can manually compute this from existing usage and metrics data in AgentResult, but having it built-in provides consistency and enables OpenTelemetry-based monitoring dashboards.
Additional Context
Related issues:
- [FEATURE] Per-tool token counts #1503 - Per-tool token counts
- [FEATURE] Token Estimation API #1294 - Token Estimation API
- [FEATURE] Track agent.messages token size #1197 - Track agent.messages token size
The building blocks already exist in the codebase:
Usagetype insrc/strands/types/event_loop.pytracksoutputTokensMetricstype trackslatencyMsEventLoopMetricsinsrc/strands/telemetry/metrics.pyalready accumulates both- OpenTelemetry integration is in place for exporting new metrics
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request