[audit-workflows] Agentic Workflow Audit — 2026-07-03: pi engine collapse (PR Sous Chef 80% fail, 0-turn) #43267
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
🔎 Agentic Workflow Audit — 2026-07-03
Window: ~5.4h evening cluster (15:51–21:16Z). Partial as usual — the
logsMCP bridge hit its 60s cap; 47 of 118 run dirs completed full download (70 download-incomplete). Numbers are biased toward the active failure cluster and undercount total volume.token_usage_summary🚨 Headline: pi engine collapsed to 30.8%
The pi engine crashed from 100% (06-30/07-01) → 30.8% (4/13) today, driven almost entirely by a sustained PR Sous Chef incident. This is the escalation of the NEW pi-0turn signal first seen 07-02 (then just 1/2).
copilot/gpt-5.4): 8/10 = 80% fail. The 2 successes were early; the following 8 runs failed consecutively — a sustained mid-window incident, not scatter. All are 0-turn / 0-tok failures at theagentstep (agent never produced output).copilot/gpt-5.4backend, this is the chroniccopilot-sdk-driver-failuresfamily now surfacing dominantly through the pi engine.Every one of the 15 failures was a 0-turn pre-agent driver failure — no agent logic ran, no missing tools, no MCP errors.
All 15 failures (all 0-turn / 0-tok)
copilot-sdk-driver-failures(ESCALATING)smoke-ci-copilot-cli-100pct-fail-on-push(chronic)doc-unbloat-empty-output— now on pi enginecodex-gh-aw-binary-not-found-for-mcp(chronic, unfixed since 06-15)Engine breakdown
claudewas fully healthy. The fleet-wide degradation is concentrated in copilot-backed drivers (direct copilot and pi→copilot/gpt-5.4).📈 Trend Charts
30-day health shows the fleet oscillating around a ~85–93% baseline on full windows, with sharp dips on partial evening-only windows (06-30, 07-02, 07-03) that over-sample the active failure cluster. Today's 68.1% is a partial-window artifact layered on a real pi/PR-Sous-Chef incident — prod-main ex-clusters (86.2%) sits near baseline.
AI-credit usage tracks window size, peaking on full-day windows (06-24/06-25 ≈ $24–27k) and low on today's partial slice ($3.5k). Raw token counts remain empty in
metrics.TokenUsagefleet-wide (since ~06-19), so AIC fromtoken_usage_summaryis used as the consistent cost proxy — note: this run confirmed real token data (1.08M) is still present intoken_usage_summary, only themetrics.TokenUsageartifact is zeroed.✅ Healthy signals
claudeengine 8/8 = 100%.🎯 Recommended actions
copilot/gpt-5.4backend (non-copilot engine/model). Root cause is the copilot-backed driver failing before the first turn.Notes & caveats
audit-history.jsonl(full entry) +metrics-summary.jsonpushed. Non-essential memory files (anomalies/recommendations/known-issues) were reverted this run to stay under the 50KB patch limit — full failure detail, known-issue cross-refs, and the pi-collapse escalation are all captured inside the audit-history entry.run.headBranch/run.Duration; engine fromaw_info.json.engine_id(authoritative).References:
Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
awmgmcpgSee Network Configuration for more information.
Beta Was this translation helpful? Give feedback.
All reactions