[audit-workflows] Daily Agentic Workflow Audit — 2026-06-10: 89.1% success, all 5 failures known-recurring #38444
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #38734. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily Agentic Workflow Audit — 2026-06-10 (evening window)
Audited 47 runs in the 17:46Z → 21:45Z evening cluster (a separate cluster from the morning 48-run window already recorded; ~8h mid-day gap unobserved). 41 succeeded, 5 hard-failed, 1 in-progress (the audit agent itself) → 89.1% success on 46 terminal runs, above the ~84% 30-day average.
The headline: every one of today's 5 failures was a
copilotrun, and every one maps to an already-tracked, recurring issue — nothing new surfaced.claudewent 11/11 andcodex1/1 (100%). Notably, GitHub's runstatusreportedcompletedfor all five — the failures live in the conclusion job (each created an[aw] ... failedissue), so status-only dashboards mask them.Failure breakdown (all 5 → known recurring issues)
session.idle870s timeoutcopilot-sdk-driver-failuressession.idle870s timeoutcopilot-sdk-driver-failurescopilot-sdk-driver-failuresdaily-ai-credits-cap-429cli-proxy-difc-liveness-probe-failedCritical / high-severity findings
1.
copilot-sdk-driversession.idle870s timeout — dominant class (2 of 5), recurrence → 4, HIGH.Both PR Code Quality Reviewer runs ran the full review (
hasOutput=true, last message: "Now I have a thorough picture... let me run the grumpy-coder sub-agent pass and compile the final review"), then the driver blocked 14m30s waiting forsession.idle, exited 1, and retried — ultimately reddening. The work was done but never flushed. This same class also hit[aw] Failure Investigator(claude, 2 attempts) which recovered, confirming it is engine-agnostic, not copilot-only.→ Rec
rec-sdk-session-idle-emit-then-timeout(HIGH): treat the emitted final assistant message as a terminal/idle signal and flush the safe-output instead of blocking 14.5 min then failing.2.
daily-ai-credits-cap-429recurred (rec 1 → 2), HIGH.Daily Ambient Context Optimizer again hit
CAPIError 429 Maximum AI credits exceeded (1003.010010 / 1000)after 5 retries (~88s wait). Same workflow as the first sighting — the heavy prod-main aggregator reliably brushes the ~1000 daily AIC cap. The soft pre-cap guard is still not shipped.→ Rec
rec-ai-credits-soft-cap-guard(HIGH): check remaining credits before the heaviest aggregation turns and degrade to partial-output +noopinstead of 429-aborting.3.
cli-proxy-difc-liveness-probe-failedrecurred — day 4 (rec → 4), MEDIUM.awf-cli-proxyexited(1); DIFC liveness probe tolocalhost:18443refused → Daily Secrets Analysis Agent's agent was never invoked (turns=0). Tracked upstream as #38309. Also caused a transientapi.github.comconnection-refused inside the Failure Investigator (recovered).Medium / informational findings
4. Tool-permission lockout — Daily Safe Output Integrator (recurring). 5 denials:
read(/home/runner/work/gh-aw/gh-aw),shell(sed -n ..._test.go), comment-prefixed shell —hasNumerousPermissionDenied=trueaborted the attempt. Either widen the workflow's allowed-tools to include the repo-source reads/sedit legitimately needs, or stop the agent probing denied paths. This was also the window's singlemissing_toolsignal.5. Firewall blocks are benign (1.0%, 32/3136). The "network friction hotspot" flagged on Smoke Claude (31 blocks) is entirely Chrome/Google background telemetry —
accounts.google.com,content-autofill.googleapis.com,safebrowsingohttpgateway.googleapis.com,www.google.com,android.clients.google.com. No workflow lost access to anything it needed. By-design; no action.6. Execution drift — Test Quality Sentinel varied 4→34 turns (avg 14). Recurring prompt/task-shape instability; not failing, worth a stabilization pass.
Trends (21-day)
Success rate holds in the low-to-high 80s% band (avg 83.7%) with one sharp dip to 41.6% on 05-23. Today's 89.1% evening window sits comfortably above trend. Run volume is bursty (17–174/day) driven by smoke/PR clusters, but the failure floor is persistent rather than spiking — consistent with the same handful of recurring infra issues.
Daily token usage averages ~37.5M with a smooth 7-day moving average; no runaway growth. The session.idle and 429 failures waste tokens on completed-then-discarded work, so fixing finding #1 and #2 would trim the spend tail without reducing real coverage.
Recommendations (priority order)
copilot-sdk-driverto kill the 870ssession.idleredden-after-success class (2 fails today).awf-cli-proxyDIFC liveness (retry/backoff) so agents aren't killed at turns=0 ([aw-failures] [aw] Harden DIFC awf-cli-proxy startup — one transient incident failed Auto-Triage + Sub-Issue Closer (runs 27261698585, 2726137 [Content truncated due to length] #38309, day 4).sedit performs.References:
Beta Was this translation helpful? Give feedback.
All reactions