[aw] Failure Investigator (6h)
Parent issue for grouping related issues from [aw] Failure Investigator (6h).
Sub-issues are automatically linked below (max 64 per parent).
Workflow: [aw] Failure Investigator (6h)
[aw-fi] 6h Analysis: 2026-04-18 01:10 UTC
Executive Summary
6 failures across 29 runs (79% success) in the window Apr 17 19:10 – Apr 18 01:10 UTC. Three distinct failure clusters: codex 401 auth (2 runs), Copilot CLI 15-min timeout on closed PR (2 runs), and Copilot shell permission denied for safeoutputs (2 runs). One tracking issue closed (Auto-Triage Issues, now fixed). One new sub-issue created for the previously undiagnosed shell permission pattern.
Failure Clusters
Evidence Highlights
Cluster 1: Codex 401 (AI Moderator + Daily Observability Report)
ERROR: Reconnecting... 5/5
ERROR: unexpected status 401 Unauthorized: Missing bearer or basic authentication in header,
url: (api.openai.com/redacted),
cf-ray: 9edf09437f8dced7-SJC
Root cause: OPENAI_API_KEY secret is invalid or expired. Both workflows hit api.openai.com/v1/responses, exhaust 5 reconnect retries, and abort. Firewall allows api.openai.com:443 — the key itself is rejected.
Cluster 2: Copilot CLI Timeout (Test Quality Sentinel × 2)
Both failures were PR-triggered. PR #26945 (copilot/resolve-mcpserverconfig-naming-conflicts) was merged/closed before the workflow ran. The checkout step correctly detected the closed PR (i️ PR #26945 is now closed — treating checkout failure as expected), but the Copilot CLI then ran for the full 15-minute limit and timed out:
##[error]The action 'Execute GitHub Copilot CLI' has timed out after 15 minutes.
Set output 'agentic_engine_timeout'
Note: TQS succeeded on 6 other runs in the same window — this failure is specific to closed-PR branch checkout scenarios.
Cluster 3: Copilot Shell Permission Denied (Daily Safe Output Integrator + Daily Project Performance)
The agent completed its analysis correctly (all 41 safe-output types confirmed covered) and tried to call safeoutputs noop via bash — but the Copilot CLI blocked every shell invocation:
✗ safeoutputs noop --message "..."
└ Permission denied and could not request permission from user
✗ /home/runner/work/_temp/gh-aw/mcp-cli/bin/safeoutputs noop --message "..."
└ Permission denied and could not request permission from user
After 8+ failed attempts (shell, node bridge, MCP HTTP, Python HTTP), the Copilot CLI timed out (20–32 min). This is distinct from #26931 (MCP server connections blocked by org policy) — MCP connections were healthy, only the shell tool invocation was blocked.
Existing Issue Correlation
Sub-Issues Created
Proposed Fix Roadmap
| Priority |
Action |
Owner |
| P0 |
Rotate OPENAI_API_KEY — codex engine fails 100% until fixed |
Admin |
| P1 |
Instruct Copilot agent to use MCP noop tool, not bash CLI |
Workflow author |
| P1 |
Handle Copilot CLI graceful exit when PR branch deleted |
AWF team |
References:
Generated by [aw] Failure Investigator (6h) · ● 590.5K · ◷
[aw] Failure Investigator (6h)
Parent issue for grouping related issues from [aw] Failure Investigator (6h).
Sub-issues are automatically linked below (max 64 per parent).
[aw-fi] 6h Analysis: 2026-04-18 01:10 UTC
Executive Summary
6 failures across 29 runs (79% success) in the window Apr 17 19:10 – Apr 18 01:10 UTC. Three distinct failure clusters: codex 401 auth (2 runs), Copilot CLI 15-min timeout on closed PR (2 runs), and Copilot shell permission denied for safeoutputs (2 runs). One tracking issue closed (Auto-Triage Issues, now fixed). One new sub-issue created for the previously undiagnosed shell permission pattern.
Failure Clusters
OPENAI_API_KEYinvalid)safeoutputs noopEvidence Highlights
Cluster 1: Codex 401 (AI Moderator + Daily Observability Report)
Root cause:
OPENAI_API_KEYsecret is invalid or expired. Both workflows hitapi.openai.com/v1/responses, exhaust 5 reconnect retries, and abort. Firewall allowsapi.openai.com:443— the key itself is rejected.Cluster 2: Copilot CLI Timeout (Test Quality Sentinel × 2)
Both failures were PR-triggered. PR #26945 (
copilot/resolve-mcpserverconfig-naming-conflicts) was merged/closed before the workflow ran. The checkout step correctly detected the closed PR (i️ PR #26945 is now closed — treating checkout failure as expected), but the Copilot CLI then ran for the full 15-minute limit and timed out:Note: TQS succeeded on 6 other runs in the same window — this failure is specific to closed-PR branch checkout scenarios.
Cluster 3: Copilot Shell Permission Denied (Daily Safe Output Integrator + Daily Project Performance)
The agent completed its analysis correctly (all 41 safe-output types confirmed covered) and tried to call
safeoutputs noopvia bash — but the Copilot CLI blocked every shell invocation:After 8+ failed attempts (shell, node bridge, MCP HTTP, Python HTTP), the Copilot CLI timed out (20–32 min). This is distinct from #26931 (MCP server connections blocked by org policy) — MCP connections were healthy, only the shell tool invocation was blocked.
Existing Issue Correlation
safeoutputsCLISub-Issues Created
Proposed Fix Roadmap
OPENAI_API_KEY— codex engine fails 100% until fixednooptool, not bash CLIReferences: