[audit-workflows] Daily Agentic Workflow Audit — 2026-06-10: 89.1% success, all 5 failures known-recurring #38444

2026-06-10T22:03:33Z

github-actions[bot]
Bot Jun 10, 2026

Daily Agentic Workflow Audit — 2026-06-10 (evening window)

Audited 47 runs in the 17:46Z → 21:45Z evening cluster (a separate cluster from the morning 48-run window already recorded; ~8h mid-day gap unobserved). 41 succeeded, 5 hard-failed, 1 in-progress (the audit agent itself) → 89.1% success on 46 terminal runs, above the ~84% 30-day average.

The headline: every one of today's 5 failures was a copilot run, and every one maps to an already-tracked, recurring issue — nothing new surfaced. claude went 11/11 and codex 1/1 (100%). Notably, GitHub's run status reported completed for all five — the failures live in the conclusion job (each created an [aw] ... failed issue), so status-only dashboards mask them.

Failure breakdown (all 5 → known recurring issues)

Workflow	Run	Root cause	Tracking issue
PR Code Quality Reviewer	27296662875	`session.idle` 870s timeout	`copilot-sdk-driver-failures`
PR Code Quality Reviewer	27297195086	`session.idle` 870s timeout	`copilot-sdk-driver-failures`
Daily Safe Output Integrator	27300679069	tool-permission lockout (5 denials)	`copilot-sdk-driver-failures`
Daily Ambient Context Optimizer	27303973614	429 AI-credits cap (1003/1000)	`daily-ai-credits-cap-429`
Daily Secrets Analysis Agent	27297209601	DIFC cli-proxy liveness (:18443)	`cli-proxy-difc-liveness-probe-failed`

Critical / high-severity findings

1. copilot-sdk-driver session.idle 870s timeout — dominant class (2 of 5), recurrence → 4, HIGH.
Both PR Code Quality Reviewer runs ran the full review (hasOutput=true, last message: "Now I have a thorough picture... let me run the grumpy-coder sub-agent pass and compile the final review"), then the driver blocked 14m30s waiting for session.idle, exited 1, and retried — ultimately reddening. The work was done but never flushed. This same class also hit [aw] Failure Investigator (claude, 2 attempts) which recovered, confirming it is engine-agnostic, not copilot-only.
→ Rec rec-sdk-session-idle-emit-then-timeout (HIGH): treat the emitted final assistant message as a terminal/idle signal and flush the safe-output instead of blocking 14.5 min then failing.

2. daily-ai-credits-cap-429 recurred (rec 1 → 2), HIGH.
Daily Ambient Context Optimizer again hit CAPIError 429 Maximum AI credits exceeded (1003.010010 / 1000) after 5 retries (~88s wait). Same workflow as the first sighting — the heavy prod-main aggregator reliably brushes the ~1000 daily AIC cap. The soft pre-cap guard is still not shipped.
→ Rec rec-ai-credits-soft-cap-guard (HIGH): check remaining credits before the heaviest aggregation turns and degrade to partial-output + noop instead of 429-aborting.

3. cli-proxy-difc-liveness-probe-failed recurred — day 4 (rec → 4), MEDIUM.
awf-cli-proxy exited(1); DIFC liveness probe to localhost:18443 refused → Daily Secrets Analysis Agent's agent was never invoked (turns=0). Tracked upstream as #38309. Also caused a transient api.github.com connection-refused inside the Failure Investigator (recovered).

Medium / informational findings

4. Tool-permission lockout — Daily Safe Output Integrator (recurring). 5 denials: read(/home/runner/work/gh-aw/gh-aw), shell(sed -n ..._test.go), comment-prefixed shell — hasNumerousPermissionDenied=true aborted the attempt. Either widen the workflow's allowed-tools to include the repo-source reads/sed it legitimately needs, or stop the agent probing denied paths. This was also the window's single missing_tool signal.

5. Firewall blocks are benign (1.0%, 32/3136). The "network friction hotspot" flagged on Smoke Claude (31 blocks) is entirely Chrome/Google background telemetry — accounts.google.com, content-autofill.googleapis.com, safebrowsingohttpgateway.googleapis.com, www.google.com, android.clients.google.com. No workflow lost access to anything it needed. By-design; no action.

6. Execution drift — Test Quality Sentinel varied 4→34 turns (avg 14). Recurring prompt/task-shape instability; not failing, worth a stabilization pass.

Trends (21-day)

Success rate holds in the low-to-high 80s% band (avg 83.7%) with one sharp dip to 41.6% on 05-23. Today's 89.1% evening window sits comfortably above trend. Run volume is bursty (17–174/day) driven by smoke/PR clusters, but the failure floor is persistent rather than spiking — consistent with the same handful of recurring infra issues.

Daily token usage averages ~37.5M with a smooth 7-day moving average; no runaway growth. The session.idle and 429 failures waste tokens on completed-then-discarded work, so fixing finding #1 and #2 would trim the spend tail without reducing real coverage.

Recommendations (priority order)

HIGH — Flush safe-output on the final-message/idle signal in copilot-sdk-driver to kill the 870s session.idle redden-after-success class (2 fails today).
HIGH — Ship the soft AI-credits pre-cap guard for Daily Ambient Context Optimizer (recurring 429).
MEDIUM — Harden awf-cli-proxy DIFC liveness (retry/backoff) so agents aren't killed at turns=0 ([aw-failures] [aw] Harden DIFC awf-cli-proxy startup — one transient incident failed Auto-Triage + Sub-Issue Closer (runs 27261698585, 2726137 [Content truncated due to length] #38309, day 4).
MEDIUM — Reconcile Daily Safe Output Integrator's allowed-tools with the repo reads/sed it performs.

No genuinely new failure modes this window. The reliability ceiling is gated entirely by four known, fixable infrastructure issues — three of which redden runs after the agent has already produced valid output.

References:

§27297195086 — PR CQR session.idle timeout
§27303973614 — Ambient 429 AI-credits
§27297209601 — Daily Secrets DIFC liveness

Generated by 🔍 Agentic Workflow Audit Agent · 331.2 AIC · ⌖ 26.2 AIC · ⊞ 8K · ◷

expires on Jun 11, 2026, 2:03 PM UTC-08:00

2026-06-11T22:07:05Z

github-actions[bot]
Bot Jun 11, 2026
Author

This discussion has been marked as outdated by Agentic Workflow Audit Agent.

A newer discussion is available at Discussion #38734.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] Daily Agentic Workflow Audit — 2026-06-10: 89.1% success, all 5 failures known-recurring #38444

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[audit-workflows] Daily Agentic Workflow Audit — 2026-06-10: 89.1% success, all 5 failures known-recurring #38444

Uh oh!

github-actions[bot] Bot Jun 10, 2026

Daily Agentic Workflow Audit — 2026-06-10 (evening window)

Failure breakdown (all 5 → known recurring issues)

Critical / high-severity findings

Trends (21-day)

Recommendations (priority order)

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 11, 2026 Author

github-actions[bot]
Bot Jun 10, 2026

github-actions[bot]
Bot Jun 11, 2026
Author