You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Snapshot of agentic workflow activity in github/gh-aw. The logs tool times out (60s server deadline) above ~40 runs, so the fleet-wide figures below are a 40-run window (2026-06-29 09:13–10:31, ~8.1h of compute), and the reliability deep-dive is a true 7-day window for the only workflow that failed.
Key metrics (40-run window)
Metric
Value
Runs
40
Action-minutes
508
Avg duration
12.2 min/run
GitHub API calls
447
Failures
1 (2.5%)
Write-capable safe outputs
0 — all 40 read-only
Engines
copilot 30 · claude 7 · pi 2 · codex 1
Reliability — one failure hotspot
PR Description Updater: 2 failures / 20 runs over 7 days (10%). Both failures landed exclusively on signed/jsweep/* branches (unified-timeline, ai-credits-context); all 18 non-jsweep runs passed. This is a deterministic branch-class pattern, not random flake. Both failed runs were read-only (no safe outputs emitted) and burned ~10k tokens / ~7–10 min before erroring — wasted spend.
18/20 other runs over the week succeeded. Root-cause error text wasn't retrievable (network-restricted), but the 100% correlation with signed/jsweep/ strongly suggests the workflow mishandles auto-generated/signed bot branches.
Performance / resource usage
Top action-minute consumers (review-class workflows dominate):
Workflow
Min
Runs
PR Code Quality Reviewer
65
4
Matt Pocock Skills Reviewer
63
4
Impeccable Skills Reviewer
51
4
Design Decision Gate
33
4
Three skills/quality reviewers consume ~180 action-min combined — the largest cost center. PR Description Updater alone averages ~9 min/run.
Optimization opportunities
Skip jsweep branches in PR Description Updater — add a branch filter (signed/jsweep/*) so it stops failing and wasting ~8 min + tokens per jsweep PR. Fixes the only recurring failure.
Model downgrade for read-only maintenance — audit flagged PR Description Updater (repo-maintenance, read-only, moderate profile) as not needing a frontier model; claude-haiku-4-5 / gpt-4.1-mini would cut per-run cost.
Reviewer consolidation — 3 skills reviewers run on every PR (~60+ min total); consider merging or gating to relevant changes.
Tooling: logs --count > ~40 exceeds the 60s deadline, blocking true week-wide fleet analysis — raise the timeout or add pagination.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Snapshot of agentic workflow activity in
github/gh-aw. Thelogstool times out (60s server deadline) above ~40 runs, so the fleet-wide figures below are a 40-run window (2026-06-29 09:13–10:31, ~8.1h of compute), and the reliability deep-dive is a true 7-day window for the only workflow that failed.Key metrics (40-run window)
Reliability — one failure hotspot
PR Description Updater: 2 failures / 20 runs over 7 days (10%). Both failures landed exclusively on
signed/jsweep/*branches (unified-timeline,ai-credits-context); all 18 non-jsweep runs passed. This is a deterministic branch-class pattern, not random flake. Both failed runs were read-only (no safe outputs emitted) and burned ~10k tokens / ~7–10 min before erroring — wasted spend.Failure detail
signed/jsweep/unified-timelinesigned/jsweep/ai-credits-contextsigned/jsweep/strongly suggests the workflow mishandles auto-generated/signed bot branches.Performance / resource usage
Top action-minute consumers (review-class workflows dominate):
Three skills/quality reviewers consume ~180 action-min combined — the largest cost center. PR Description Updater alone averages ~9 min/run.
Optimization opportunities
signed/jsweep/*) so it stops failing and wasting ~8 min + tokens per jsweep PR. Fixes the only recurring failure.claude-haiku-4-5/gpt-4.1-miniwould cut per-run cost.logs --count > ~40exceeds the 60s deadline, blocking true week-wide fleet analysis — raise the timeout or add pagination.References: §28363364071 · §28318796220 · §28365689771
Beta Was this translation helpful? Give feedback.
All reactions