You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Daily audit of agentic workflow runs (evening window; log download capped at the ~60s bridge timeout, so 84 of 196 run directories had complete run_summary.json and were analyzed).
Overview
Metric
Value
Runs analyzed
84
Success
72 (85.7%)
Failure
12 (14.3%)
Total AIC (credits)
5,703.6
Action-minutes
1,037
Missing tools / data / MCP failures
0 / 0 / 0 β
Tokens
unreported β artifact empty since 2026-06-19; AIC is the primary usage metric
TokenUsage remains 0 across all runs (known empty-artifact condition since 06-19). Cost/usage tracked via AIC instead.
All 12 failures were 0-turn / 0-token β i.e. the agent never produced output; failures occurred in setup/CLI-launch or post-agent steps, not during reasoning.
Agent job fails at the Execute GitHub Copilot CLI step after a few minutes with Turns=0 / Tokens=0 / ErrorCount=0. Maps to known issues copilot-sdk-driver-failures (recurrence now 20) and smoke-ci-copilot-cli-100pct-fail-on-push (recurrence now 2 β 2nd consecutive 100%-fail window).
All three failed at the Install Playwright CLI agent-setup step in the same window, across three different engines. The cross-engine signature strongly implies an infra/network cause (Playwright browser download / npm registry / firewall allowlist), not an engine bug. New known issue: playwright-install-cli-failure.
Cluster C β codex "Process Safe Outputs" failure (1 fail)
Changeset Generator (codex): agent job succeeded, but the safe_outputs job failed at the Process Safe Outputs step. Consistent with the chronic safe-output-partial-failure-intolerance pattern (recurrence now 8) where a single safe-output item reds the whole job.
π 30-Day Trends
Workflow Health β success/failure counts with success-rate overlay (dashed line = 85% baseline). Volume swings reflect the audit's variable download window; today's 85.7% sits right on the long-run baseline, with the failure share dominated by the copilot Execute-CLI cluster.
Token Usage β daily tokens (M) + 7-day moving average. The shaded region marks the ongoing gap: the token artifact has been empty since 2026-06-19, so per-run token accounting is unavailable and AIC has become the effective usage metric (5,703.6 today). Restoring token reporting would re-enable this chart.
π― Recommended Actions
[HIGH] Escalate Smoke CI. copilot/claude-sonnet-4.6 Smoke CI is now 100% red across 2+ consecutive windows at Execute GitHub Copilot CLI (0-turn). If Smoke CI is the copilot-engine canary, its constant failure on push likely signals a real copilot CLI startup regression on main. Add a fast-fail diagnostic that surfaces why the CLI aborts 0-turn.
[MEDIUM] Harden Playwright install. Confirm the Playwright download/npm hosts are on the firewall allowlist for playwright-enabled workflows, add retry-with-backoff, and cache browser binaries so a transient download failure doesn't red the whole agent job.
[MEDIUM] Isolate safe-output failures. The Process Safe Outputs step continues to red otherwise-green runs (Changeset Generator today). Make per-item safe-output failures non-fatal to the job.
β Positives
pi engine: 16/16 = 100% success.
Zero missing-tools, missing-data, and MCP-failure reports across all 84 runs β tool/permission surface is healthy.
No new categories of failure beyond the Playwright infra cluster; the rest are known, tracked recurrences.
Repo memory updated
audit-history.jsonl, known-issues.json (+1 new: playwright-install-cli-failure; bumped copilot-sdk, smoke-ci, safe-output recurrences), anomalies.json, recommendations.json, metrics-summary.json, workflow-trends.json. Files compacted to fit the 60 KB memory budget (50.8 KB total, push validated).
References:
Β§28544490596 β Smoke CI (Execute Copilot CLI 0-turn)
Β§28542293845 β Smoke Claude (Playwright install fail)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Daily audit of agentic workflow runs (evening window; log download capped at the ~60s bridge timeout, so 84 of 196 run directories had complete
run_summary.jsonand were analyzed).Overview
Engine Health
Failure Clusters (12 failures β 3 root-cause groups)
All 12 failures were 0-turn / 0-token β i.e. the agent never produced output; failures occurred in setup/CLI-launch or post-agent steps, not during reasoning.
Cluster A β copilot "Execute GitHub Copilot CLI" 0-turn (8 fails) β CHRONIC
Agent job fails at the
Execute GitHub Copilot CLIstep after a few minutes withTurns=0 / Tokens=0 / ErrorCount=0. Maps to known issuescopilot-sdk-driver-failures(recurrence now 20) andsmoke-ci-copilot-cli-100pct-fail-on-push(recurrence now 2 β 2nd consecutive 100%-fail window).Cluster B β NEW: cross-engine "Install Playwright CLI" failures (3 fails)
All three failed at the
Install Playwright CLIagent-setup step in the same window, across three different engines. The cross-engine signature strongly implies an infra/network cause (Playwright browser download / npm registry / firewall allowlist), not an engine bug. New known issue:playwright-install-cli-failure.Cluster C β codex "Process Safe Outputs" failure (1 fail)
Changeset Generator (codex): agent job succeeded, but the
safe_outputsjob failed at theProcess Safe Outputsstep. Consistent with the chronicsafe-output-partial-failure-intolerancepattern (recurrence now 8) where a single safe-output item reds the whole job.π 30-Day Trends
Workflow Health β success/failure counts with success-rate overlay (dashed line = 85% baseline). Volume swings reflect the audit's variable download window; today's 85.7% sits right on the long-run baseline, with the failure share dominated by the copilot Execute-CLI cluster.
Token Usage β daily tokens (M) + 7-day moving average. The shaded region marks the ongoing gap: the token artifact has been empty since 2026-06-19, so per-run token accounting is unavailable and AIC has become the effective usage metric (5,703.6 today). Restoring token reporting would re-enable this chart.
π― Recommended Actions
Execute GitHub Copilot CLI(0-turn). If Smoke CI is the copilot-engine canary, its constant failure onpushlikely signals a real copilot CLI startup regression on main. Add a fast-fail diagnostic that surfaces why the CLI aborts 0-turn.Process Safe Outputsstep continues to red otherwise-green runs (Changeset Generator today). Make per-item safe-output failures non-fatal to the job.β Positives
Repo memory updated
audit-history.jsonl,known-issues.json(+1 new:playwright-install-cli-failure; bumped copilot-sdk, smoke-ci, safe-output recurrences),anomalies.json,recommendations.json,metrics-summary.json,workflow-trends.json. Files compacted to fit the 60 KB memory budget (50.8 KB total, push validated).References:
Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
awmgmcpgSee Network Configuration for more information.
Beta Was this translation helpful? Give feedback.
All reactions