You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extend the Token Cost Observatory to compute per-run Effective Tokens using GitHub's normalized ET formula (ET = m × (I + 0.1×C + 4×O)), track rolling averages per workflow, and alert when a run exceeds 2σ from its workflow's baseline. Add an optional auto-cancel mechanism for sessions exceeding a hard ET ceiling, preventing runaway agent loops from burning through token budgets.
Market Signal
GitHub's engineering team uses the Effective Token metric to normalize costs across model tiers with model multipliers (Haiku 0.25×, Sonnet 1.0×, Opus 5.0×), output tokens weighted 4× (most expensive token type), and cache reads weighted 0.1× (Improving token efficiency in GitHub Agentic Workflows). Their Daily Token Usage Auditor aggregates consumption by workflow and flags anomalous runs, achieving sustained 37-62% savings across production workflows. Industry consensus in 2026 is that reactive weekly cost reporting is insufficient — proactive per-run anomaly detection is the standard for production LLM operations.
User Signal
The Token Cost Observatory (#332) provides weekly reporting but does not detect anomalies in real-time. Issue #466 (dev-lead lone cancelled check cycles 30-minute retries forever) demonstrates that runaway agent sessions are a real operational risk — an infinite retry loop burns tokens until the 60-minute job timeout kills it. The token_report.sh and model-pricing.tsv infrastructure already collects the raw data needed; it just lacks real-time analysis.
Technical Opportunity
scripts/token_report.sh already aggregates token usage data and scripts/lib/model-pricing.tsv provides per-model pricing with effective dates. Adding ET computation requires:
ET formula in token-metrics.sh: Apply m × (I + 0.1×C + 4×O) to per-call JSONL records
Rolling baseline: Store per-workflow ET averages as a GitHub Actions cache artifact or gist
Post-run check: A ~30-line shell step after each agent workflow computes the run's total ET, compares against the rolling baseline, and posts an issue comment or Slack alert if > 2σ
Auto-cancel (opt-in): For truly runaway sessions (10× baseline), use gh run cancel to terminate
The ET metric also unlocks better cross-model cost comparison in the weekly Token Cost Observatory report — comparing Haiku and Opus runs on a normalized scale rather than raw token counts.
Assessment
Dimension
Score
Rationale
Feasibility
high
Raw token data already collected; ET formula is ~10 lines of computation; rolling average is a standard pattern
Proactive cost control; value increases as agent volume and model diversity grow
Adversarial Review
Strongest objection: False positives on legitimately complex reviews (large PRs, infrastructure changes) could trigger unnecessary alerts and auto-cancellations, interrupting valid work.
Rebuttal: The 2σ alerting threshold is configurable per workflow, and auto-cancel should be opt-in with a hard ceiling much higher than the alert threshold (e.g., 10× baseline). Legitimately complex PRs produce high ET but within expected variance for their workflow. The kill switch targets truly runaway sessions (30-minute retry loops per issue #466, compaction spirals) — not normal high-complexity reviews. Starting with alerts-only (no auto-cancel) lets the team calibrate thresholds for 2 weeks before enabling any automation.
Suggested Next Step
Add ET computation to scripts/lib/token-metrics.sh using the formula and model multipliers from model-pricing.tsv. Instrument one workflow (pr-review) with a post-run ET check step. Collect 2 weeks of baseline data before enabling anomaly alerts.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Extend the Token Cost Observatory to compute per-run Effective Tokens using GitHub's normalized ET formula (
ET = m × (I + 0.1×C + 4×O)), track rolling averages per workflow, and alert when a run exceeds 2σ from its workflow's baseline. Add an optional auto-cancel mechanism for sessions exceeding a hard ET ceiling, preventing runaway agent loops from burning through token budgets.Market Signal
GitHub's engineering team uses the Effective Token metric to normalize costs across model tiers with model multipliers (Haiku 0.25×, Sonnet 1.0×, Opus 5.0×), output tokens weighted 4× (most expensive token type), and cache reads weighted 0.1× (Improving token efficiency in GitHub Agentic Workflows). Their Daily Token Usage Auditor aggregates consumption by workflow and flags anomalous runs, achieving sustained 37-62% savings across production workflows. Industry consensus in 2026 is that reactive weekly cost reporting is insufficient — proactive per-run anomaly detection is the standard for production LLM operations.
User Signal
The Token Cost Observatory (#332) provides weekly reporting but does not detect anomalies in real-time. Issue #466 (dev-lead lone cancelled check cycles 30-minute retries forever) demonstrates that runaway agent sessions are a real operational risk — an infinite retry loop burns tokens until the 60-minute job timeout kills it. The
token_report.shandmodel-pricing.tsvinfrastructure already collects the raw data needed; it just lacks real-time analysis.Technical Opportunity
scripts/token_report.shalready aggregates token usage data andscripts/lib/model-pricing.tsvprovides per-model pricing with effective dates. Adding ET computation requires:token-metrics.sh: Applym × (I + 0.1×C + 4×O)to per-call JSONL recordsgh run cancelto terminateThe ET metric also unlocks better cross-model cost comparison in the weekly Token Cost Observatory report — comparing Haiku and Opus runs on a normalized scale rather than raw token counts.
Assessment
Adversarial Review
Strongest objection: False positives on legitimately complex reviews (large PRs, infrastructure changes) could trigger unnecessary alerts and auto-cancellations, interrupting valid work.
Rebuttal: The 2σ alerting threshold is configurable per workflow, and auto-cancel should be opt-in with a hard ceiling much higher than the alert threshold (e.g., 10× baseline). Legitimately complex PRs produce high ET but within expected variance for their workflow. The kill switch targets truly runaway sessions (30-minute retry loops per issue #466, compaction spirals) — not normal high-complexity reviews. Starting with alerts-only (no auto-cancel) lets the team calibrate thresholds for 2 weeks before enabling any automation.
Suggested Next Step
Add ET computation to
scripts/lib/token-metrics.shusing the formula and model multipliers frommodel-pricing.tsv. Instrument one workflow (pr-review) with a post-run ET check step. Collect 2 weeks of baseline data before enabling anomaly alerts.Beta Was this translation helpful? Give feedback.
All reactions