Skip to content

feat: add StatVar for numerically stable statistics in audit and logs commands#26479

Merged
pelikhan merged 4 commits intomainfrom
copilot/update-audit-logs-statistics
Apr 15, 2026
Merged

feat: add StatVar for numerically stable statistics in audit and logs commands#26479
pelikhan merged 4 commits intomainfrom
copilot/update-audit-logs-statistics

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 15, 2026

Summary

Adds a pkg/stats.StatVar statistical variable type and uses it throughout the audit and logs commands to compute richer, more statistically correct metrics.

Changes

New: pkg/stats.StatVar

  • Accumulates float64 observations incrementally
  • Uses Welford's online algorithm for numerically stable mean and variance (avoids catastrophic cancellation with large, tightly-clustered values)
  • Computes: count, sum, min, max, mean, population variance/stddev, sample variance/stddev (Bessel's correction), and exact median
  • Memory usage is documented: suitable for tens–hundreds of observations (typical agentic workflow scale)

pkg/workflow/metrics.goLogMetrics

  • Added MedianTimeBetweenTurns time.Duration
  • Added StdDevTimeBetweenTurns time.Duration (sample stddev)

pkg/cli/copilot_events_jsonl.go

  • Replaced manual TBT accumulation loop with stats.StatVar
  • Now sets AvgTimeBetweenTurns, MaxTimeBetweenTurns, MedianTimeBetweenTurns, and StdDevTimeBetweenTurns

pkg/cli/audit_cross_run.goMetricsTrendData

  • Added MedianCost, StdDevCost, MedianTokens, StdDevTokens, MedianTurns, StdDevTurns, MedianDurationNs, StdDevDurationNs
  • buildMetricsTrend() now uses StatVar for all four metric dimensions (cost, tokens, turns, duration), replacing verbose manual min/max/sum loops

pkg/cli/health_metrics.go

  • CalculateWorkflowHealth() now uses StatVar internally for duration, token, and cost averages

Test coverage

  • pkg/stats/statvar_test.go: 9 test cases covering empty, single, two-observation, odd/even count, known variance (Wikipedia example), numerical stability, median idempotency, and all-identical inputs

Copilot AI and others added 2 commits April 15, 2026 18:35
… commands

- Add pkg/stats.StatVar using Welford's online algorithm (mean, variance,
  stddev, median, min, max, count, sum)
- Add MedianTimeBetweenTurns and StdDevTimeBetweenTurns to workflow.LogMetrics
- Use StatVar in copilot_events_jsonl.go for TBT computation
- Add median/stddev fields to MetricsTrendData in audit_cross_run.go
- Use StatVar in buildMetricsTrend() replacing manual accumulation
- Use StatVar in health_metrics.go CalculateWorkflowHealth()

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/30f4a60c-5387-4216-b4a2-56643a7c1967

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
… memory usage

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/30f4a60c-5387-4216-b4a2-56643a7c1967

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI requested a review from pelikhan April 15, 2026 18:39
@pelikhan pelikhan marked this pull request as ready for review April 15, 2026 18:43
Copilot AI review requested due to automatic review settings April 15, 2026 18:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new pkg/stats.StatVar utility for numerically stable, incremental descriptive statistics (Welford) and wires it into CLI audit/log health metrics to compute richer aggregates (median/stddev in addition to avg/min/max).

Changes:

  • Added pkg/stats.StatVar (mean/variance/stddev + exact median) with unit tests.
  • Extended workflow.LogMetrics with median + sample stddev for time-between-turns and populated these in copilot_events_jsonl parsing.
  • Refactored cross-run audit and workflow health computations to use StatVar for cost/tokens/turns/duration aggregation.
Show a summary per file
File Description
pkg/stats/statvar.go New streaming stats accumulator (Welford) + exact median support.
pkg/stats/statvar_test.go Unit tests for StatVar behavior and numerical stability.
pkg/workflow/metrics.go Adds median/stddev fields for time-between-turns in LogMetrics.
pkg/cli/copilot_events_jsonl.go Uses StatVar to compute avg/max/median/stddev TBT from timestamps.
pkg/cli/audit_cross_run.go Uses StatVar to compute cross-run avg/median/stddev for key metrics.
pkg/cli/health_metrics.go Uses StatVar internally to compute workflow health averages.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 6/6 changed files
  • Comments generated: 3

Comment thread pkg/cli/health_metrics.go Outdated
Comment on lines +86 to +99
@@ -94,6 +96,9 @@ func CalculateWorkflowHealth(workflowName string, runs []WorkflowRun, threshold
totalDuration += run.Duration
totalTokens += run.TokenUsage
totalCost += run.EstimatedCost
durationStats.Add(float64(run.Duration))
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalDuration is accumulated but never used in this function (there is no TotalDuration field on WorkflowHealth). This will fail compilation due to an unused variable. Remove totalDuration and the totalDuration += run.Duration line, or add a corresponding output field if the total duration is intended to be reported.

Copilot uses AI. Check for mistakes.
Comment thread pkg/stats/statvar.go
Comment on lines +37 to +41
if v < s.min {
s.min = v
}
if v > s.max {
s.max = v
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment says NaN and ±Inf "will propagate through all derived statistics", but Min()/Max() may not propagate when NaN is added after initial finite values (comparisons with NaN are false, leaving min/max unchanged). Either update the comment to reflect actual behavior, or explicitly detect NaN/Inf in Add and set min/max accordingly (e.g., if math.IsNaN(v) then force min/max to NaN).

Suggested change
if v < s.min {
s.min = v
}
if v > s.max {
s.max = v
if math.IsNaN(v) {
s.min = v
s.max = v
} else {
if v < s.min {
s.min = v
}
if v > s.max {
s.max = v
}

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +51
// MetricsTrendData contains aggregated cost, token, turn, and duration statistics
// across multiple runs, with spike detection for anomalous runs.
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MetricsTrendData type comment is duplicated twice back-to-back, which makes the docs noisy and harder to read. Remove one of the duplicate two-line blocks so the doc comment appears only once.

Suggested change
// MetricsTrendData contains aggregated cost, token, turn, and duration statistics
// across multiple runs, with spike detection for anomalous runs.

Copilot uses AI. Check for mistakes.
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

… comment, clarify NaN doc

- health_metrics.go: remove unused totalDuration variable
- audit_cross_run.go: remove duplicate MetricsTrendData comment block
- statvar.go: clarify Add() doc to accurately reflect NaN/Min/Max behavior

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b393d657-9b75-407b-8c0e-e9dcd97e4bbe

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 15, 2026

@copilot review all comments

Addressed all three review comments in 39f35b1:

  1. health_metrics.go — removed unused totalDuration variable and accumulation line
  2. audit_cross_run.go — removed the duplicate MetricsTrendData comment block
  3. statvar.go — updated the Add() doc comment to accurately state that NaN will propagate through sum/mean/variance but that Min/Max may not update correctly for NaN inputs due to IEEE 754 comparison semantics, and notes that in practice all inputs are finite (time.Duration, token counts, costs)

@pelikhan pelikhan merged commit 01c6be5 into main Apr 15, 2026
51 checks passed
@pelikhan pelikhan deleted the copilot/update-audit-logs-statistics branch April 15, 2026 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants