Executive Summary
Analysis of 249 workflow runs in the 6-hour window 07:20–13:20 UTC on 2026-04-17.
15 hard failures detected: 7 agentic workflow failures + 7 CI test failures + 1 main-branch CI failure. All 7 individual agentic failures have auto-generated tracking issues. Two root causes are not yet covered: a Node.js binary path issue (P1) and a Copilot timeout pattern (P2).
No P0 failures. No issues are stale or fixed — all were created today.
Failure Clusters
| Cluster |
Runs |
Severity |
Tracking |
| Copilot engine timeouts |
4 |
P2 |
Individually: #26866, #26865, #26848, #26833 |
| Node.js v24.15.0 binary missing |
2 |
P1 |
Individually: #26839, #26829 — root cause: #26876 |
MCP Gateway schema validation (mempalace) |
1 |
P1 |
Individually: #26852 — root cause: #26822 |
| WASM golden test failures (CI) |
8 |
P2 |
Out of scope (CI, not agentic) |
action_required on PR workflows |
96 |
— |
Expected behavior; not failures |
Evidence
Node.js binary missing (P1) — runs 24560753963, 24557049556
Both the Daily Issues Report Generator (§24560753963, 10:34Z) and Daily News (§24557049556, 09:03Z) fail inside the firewall agent container at the copilot driver launch step:
/bin/bash: line 1: /home/runner/work/_tool/node/24.15.0/x64/bin/node: No such file or directory
The entrypoint executes: \$\{GH_AW_NODE_BIN:-node} \$\{RUNNER_TEMP}/gh-aw/actions/copilot_driver.cjs — meaning GH_AW_NODE_BIN is set to the hardcoded toolcache path which does not exist. The container starts successfully; the issue is with the Node.js resolution inside it.
See sub-issue: #26876
MCP Gateway schema validation failure (P1) — run 24562726732
Daily Fact About gh-aw (§24562726732, 11:27Z) fails at "Start MCP Gateway" with:
config:validation_schema Schema validation failed: jsonschema: '/mcpServers/mempalace'
does not validate with .../mcp-gateway-config.schema.json
.../oneOf/0/$ref/required: missing properties: 'container'
The mempalace MCP server config doesn't satisfy any of the three server config variants (stdio requires container, http requires url, custom disallows the current type value). This is a breaking schema change in gh-aw-mcpg v0.2.22 — the workflow's lock file was compiled against v0.2.19 and is now out of sync. Root cause tracked in #26822.
Copilot CLI timeouts (P2) — 4 runs
| Run ID |
Workflow |
Timeout |
Turns |
Tokens |
| §24563452029 |
Daily Firewall Logs Collector and Reporter |
45 min |
— |
— |
| §24561152092 |
Daily Community Attribution Updater |
30 min |
194 |
20.5M effective |
| §24557132232 |
Dev |
30 min |
— |
— |
| §24564007646 |
Dead Code Removal Agent |
30 min |
21 |
1.5M effective |
The Community Attribution run (194 turns, 20.5M effective tokens) and Dead Code Removal (found unreachable functions but ran out of time) show task-complexity timeouts, not infrastructure failures. All individually tracked.
Existing Issue Correlation
| Issue |
Type |
Notes |
| #26822 agentic workflows out of sync |
Root cause |
Covers mempalace schema + lock file recompile |
| #26866 Daily Firewall Logs failed |
Symptom |
Copilot timeout |
| #26865 Dead Code Removal Agent failed |
Symptom |
Copilot timeout |
| #26852 Daily Fact About gh-aw failed |
Symptom |
Fixed when #26822 resolved |
| #26848 Daily Community Attribution failed |
Symptom |
Copilot timeout |
| #26839 Daily Issues Report failed |
Symptom |
Node.js binary missing |
| #26833 Dev failed |
Symptom |
Copilot timeout |
| #26829 Daily News failed |
Symptom |
Node.js binary missing |
Proposed Fix Roadmap
| Priority |
Fix |
Tracking |
| P1 |
Recompile lock files (fixes mempalace schema + lock sync) |
#26822 |
| P1 |
Fix Node.js v24.15.0 binary resolution in agent container |
#26876 |
| P2 |
Review Copilot timeout limits for high-complexity daily workflows |
Individual issues |
Sub-Issues
References:
Follow-up Window: 2026-04-17 13:20–19:20 UTC
5 hard failures detected across 18 runs in this window. Two distinct root-cause clusters, both with new sub-issues.
Failure Clusters
| Cluster |
Runs |
Severity |
Symptom Issues |
Root-Cause Sub-Issue |
| AI Moderator — codex 401 auth |
3 (+ 1 prior) |
P1 |
#26911 |
#aw_c401 |
| Copilot MCP servers blocked by policy |
2 |
P1 |
#26909, #26928 |
#aw_mcp2 |
Evidence
AI Moderator — codex 401 auth (3 consecutive failures)
All three AI Moderator runs (engine: codex, event: issues) fail at agent activation with:
◆ Reconnecting... 1/5 ... 5/5
◆ unexpected status 401 Unauthorized: Missing bearer or basic authentication in header
url: (api.openai.com/redacted)
SECRET_OPENAI_API_KEY is present in the runner environment but rejected by the OpenAI API. Secondary signal: chatgpt.com:443 blocked by firewall (1 request) but unrelated to the 401.
Runs: §24577634319, §24579500430, §24579781734
Copilot MCP policy block — Test Quality Sentinel & Auto-Triage Issues
Both workflows emit mcp_policy_error at conclusion. The Copilot CLI refuses MCP connections before agent starts, so neither GitHub API nor safe-outputs tools are available.
- Test Quality Sentinel: §24580129809 — PR event on
copilot/fix-create-pull-request-team-reviewers, 118 turns, 6.27M tokens, no safe outputs
- Auto-Triage Issues: §24581292756 — scheduled, 102 turns, 5.86M tokens, engine terminated unexpectedly
Note: other copilot-engine workflows (Design Decision Gate, Issue Monster, PR Triage Agent) succeeded in this window, suggesting the policy block is workflow-specific or intermittent.
New Sub-Issues
- #aw_c401 — Codex 401 persistent auth failure (OPENAI_API_KEY rotation needed)
- #aw_mcp2 — Copilot MCP policy block (admin must enable "MCP servers in Copilot")
Updated Fix Roadmap
| Priority |
Fix |
Tracking |
| P1 |
Recompile lock files (mempalace schema) |
#26822 |
| P1 |
Fix Node.js v24.15.0 binary resolution |
#26876 |
| P1 |
Rotate OPENAI_API_KEY for codex engine |
#aw_c401 |
| P1 |
Enable "MCP servers in Copilot" org policy |
#aw_mcp2 |
References:
Generated by [aw] Failure Investigator (6h) · ● 498.9K · ◷
Executive Summary
Analysis of 249 workflow runs in the 6-hour window 07:20–13:20 UTC on 2026-04-17.
15 hard failures detected: 7 agentic workflow failures + 7 CI test failures + 1 main-branch CI failure. All 7 individual agentic failures have auto-generated tracking issues. Two root causes are not yet covered: a Node.js binary path issue (P1) and a Copilot timeout pattern (P2).
No P0 failures. No issues are stale or fixed — all were created today.
Failure Clusters
mempalace)action_requiredon PR workflowsEvidence
Node.js binary missing (P1) — runs 24560753963, 24557049556
Both the Daily Issues Report Generator (§24560753963, 10:34Z) and Daily News (§24557049556, 09:03Z) fail inside the firewall agent container at the copilot driver launch step:
The entrypoint executes:
\$\{GH_AW_NODE_BIN:-node} \$\{RUNNER_TEMP}/gh-aw/actions/copilot_driver.cjs— meaningGH_AW_NODE_BINis set to the hardcoded toolcache path which does not exist. The container starts successfully; the issue is with the Node.js resolution inside it.See sub-issue: #26876
MCP Gateway schema validation failure (P1) — run 24562726732
Daily Fact About gh-aw (§24562726732, 11:27Z) fails at "Start MCP Gateway" with:
The
mempalaceMCP server config doesn't satisfy any of the three server config variants (stdio requirescontainer, http requiresurl, custom disallows the currenttypevalue). This is a breaking schema change ingh-aw-mcpgv0.2.22 — the workflow's lock file was compiled against v0.2.19 and is now out of sync. Root cause tracked in #26822.Copilot CLI timeouts (P2) — 4 runs
The Community Attribution run (194 turns, 20.5M effective tokens) and Dead Code Removal (found unreachable functions but ran out of time) show task-complexity timeouts, not infrastructure failures. All individually tracked.
Existing Issue Correlation
mempalaceschema + lock file recompileProposed Fix Roadmap
mempalaceschema + lock sync)Sub-Issues
References:
Follow-up Window: 2026-04-17 13:20–19:20 UTC
5 hard failures detected across 18 runs in this window. Two distinct root-cause clusters, both with new sub-issues.
Failure Clusters
Evidence
AI Moderator — codex 401 auth (3 consecutive failures)
All three AI Moderator runs (engine:
codex, event:issues) fail at agent activation with:SECRET_OPENAI_API_KEYis present in the runner environment but rejected by the OpenAI API. Secondary signal:chatgpt.com:443blocked by firewall (1 request) but unrelated to the 401.Runs: §24577634319, §24579500430, §24579781734
Copilot MCP policy block — Test Quality Sentinel & Auto-Triage Issues
Both workflows emit
mcp_policy_errorat conclusion. The Copilot CLI refuses MCP connections before agent starts, so neither GitHub API nor safe-outputs tools are available.copilot/fix-create-pull-request-team-reviewers, 118 turns, 6.27M tokens, no safe outputsNote: other copilot-engine workflows (Design Decision Gate, Issue Monster, PR Triage Agent) succeeded in this window, suggesting the policy block is workflow-specific or intermittent.
New Sub-Issues
Updated Fix Roadmap
References: