Executive summary
In the 6-hour window ending 2026-05-24T07:44 UTC, 16 agentic workflow runs failed with this distribution by engine:
| Engine |
Failures |
Cluster |
Severity |
| GitHub Copilot CLI 1.0.51 |
13 |
A — anthropic-beta: context-1m-2025-08-07 rejected (HTTP 400) |
P0 |
| Codex 0.130.0 |
1 |
B — Missing OPENAI_API_KEY after fallback retry |
P2 |
| Claude Code |
2 |
C — 30-min action timeout; D — max_turns=30 exhausted |
P2 |
The dominant cluster (81% of failures) is a single, fully reproducible upstream regression in Copilot CLI 1.0.51. The fix is already proposed in #34390 (bump to Copilot 1.0.52 / Codex 0.133.0 / GitHub MCP v1.0.5) — merging that PR should resolve Cluster A immediately. This report tracks the production impact and ties together the 13 per-workflow auto-failure issues already created.
Failure clusters
| Cluster |
Engine |
Runs |
Symptom |
Recommendation |
| A |
Copilot CLI 1.0.51 |
13 |
400 Unexpected value(s) \context-1m-2025-08-07` for the `anthropic-beta` header` on every attempt → all 4 retries fail |
Merge #34390 (Copilot 1.0.51 → 1.0.52) |
| B |
Codex 0.130.0 |
1 |
First attempt: invalid_request_error. Retries 2–4: Missing environment variable: 'OPENAI_API_KEY' (env not propagated to fresh-run retry) |
Same upgrade in #34390 bumps Codex; also investigate why fresh-run retry loses OPENAI_API_KEY |
| C |
Claude Code |
1 |
Agent completed successfully (called noop), but the Execute Claude Code CLI action timed out at 30 minutes during browser-driven docs testing |
Raise action timeout or shorten the per-device test plan in multi-device-docs-tester.md |
| D |
Claude Code |
1 |
terminal_reason: max_turns, Reached maximum number of turns (30). Many turns burned on permission-denied bash attempts against /tmp/gh-aw/cache-memory/ and /tmp/gh-aw/agent/ |
Either raise max-turns for step-name-alignment.md or allow-list those tmp paths |
Evidence — Cluster A (P0)
All 13 Copilot runs use identical engine config:
engine_id: copilot
model: claude-sonnet-4.5
version: 1.0.51
firewall_version: v0.25.53
The error appears on every Copilot CLI attempt (4 retries per run × 13 runs):
Sample agent stdio log (run 26354999018, Sub-Issue Closer)
● Request failed (transient_bad_request). Retrying...
● Request failed (transient_bad_request). Retrying...
400 Unexpected value(s) `context-1m-2025-08-07` for the `anthropic-beta` header.
Please consult our documentation at docs.anthropic.com or try again without the header.
[copilot-harness] attempt 1: process closed exitCode=1 duration=8s
[copilot-harness] attempt 1 failed: isCAPIError400=false isMCPPolicyError=false
[copilot-harness] retry 1/3 → attempt 2 → same 400 → retry 2/3 → attempt 3 → same 400
[copilot-harness] all 3 retries exhausted — giving up (exitCode=1)
Affected runs (all conclusion=failure, error_count=1, missing_tool_count=0):
All 13 Cluster-A runs and their auto-issues
Evidence — Cluster B (Codex)
Run §26353930732 (Duplicate Code Detector). Auto-issue: #34384.
First attempt produced an invalid_request_error (the model gpt-5.5 is configured in codex.turn). Retries 2–4 then failed with Missing environment variable: 'OPENAI_API_KEY'. This suggests the harness's --continue / fresh-run retry path does not re-inject the API key secret, which converts a transient model error into a hard auth failure.
Codex turn errors across 4 attempts (run 26353930732)
attempt 1: codex_core::session::turn: Turn error: { "type": "invalid_request_error", ... }
attempt 2: codex_core::session::turn: Turn error: Missing environment variable: `OPENAI_API_KEY`.
attempt 3: codex_core::session::turn: Turn error: Missing environment variable: `OPENAI_API_KEY`.
attempt 4: codex_core::session::turn: Turn error: Missing environment variable: `OPENAI_API_KEY`.
[codex-harness] all 3 retries exhausted — giving up (exitCode=1)
Evidence — Cluster C (Claude Code action timeout)
Run §26354235499 (Multi-Device Docs Tester). No auto-issue was created because the agent itself reported success.
The Claude agent ran for ~26 turns, completed multi-device docs testing successfully across iPhone 12 / iPad / FHD viewports, and called noop. However, the wrapping GitHub Action (Execute Claude Code CLI) hit a 30-minute timeout — the agent finished but the action runner kept blocking. Two permission_denials (nohup npm run dev, redirect to /tmp/gh-aw/agent/astro-dev.log) likely lengthened the run as the agent worked around restrictions.
##[error]The action 'Execute Claude Code CLI' has timed out after 30 minutes.
[claude-harness] attempt 1: process closed duration=2m 56s
# (agent ran 2m 56s but the action wrapper waited longer)
Evidence — Cluster D (Claude Code max_turns)
Run §26352648076 (Step Name Alignment). Auto-issue: #34371.
terminal_reason: max_turns
errors: ["Reached maximum number of turns (30)"]
num_turns: 31
permission_denials: 10 (all reading /tmp/gh-aw/cache-memory/ or /tmp/gh-aw/agent/*)
The agent burned many turns on bash commands probing tmp paths it could not access. Compaction-style behavior plus permission friction caused it to hit the 30-turn cap. Either raise max-turns for this workflow or extend the allowed-paths list to include /tmp/gh-aw/cache-memory/ and /tmp/gh-aw/agent/.
Audit-diff baseline comparison (Cluster A)
For run 26354999018 vs. the most recent successful Sub-Issue Closer baseline (§26274652795, 2026-05-22):
| Metric |
Successful baseline |
Failed run |
Delta |
| turns |
5 |
0 |
agent never produced a turn |
| blocked_requests |
12 |
8 |
both runs have firewall noise from (unknown) domains |
| posture |
read_only |
read_only |
unchanged |
The collapse from 5 turns to 0 turns is the smoking gun: the Copilot CLI's HTTP 400 fails before any agent reasoning step occurs.
Existing tracking & correlation
| Issue |
Type |
Relevance |
| #34386 |
CLI Version Checker auto-report |
Documents the Copilot 1.0.51 → 1.0.52 upgrade with full release notes (created 06:42 UTC, after the first 9 Cluster-A failures) |
| #34390 |
PR/issue with code change |
Already implements the fix: bumps DefaultCopilotVersion to 1.0.52, DefaultCodexVersion to 0.133.0, regenerates 235 lockfiles |
| 13 per-workflow auto-issues |
[aw] <workflow> failed |
One per failed Copilot run; listed in the Cluster A table above |
| #34342 |
Prior failure-investigator self-failure |
Unrelated root cause (npm error notarget @anthropic-ai/claude-code@2.1.150 with a date before 5/21/2026) — not part of this cluster |
Proposed fix roadmap
P0 — immediate
- Merge #34390 to pin Copilot CLI 1.0.52. This unblocks all 13 Cluster-A workflows. Anthropic's API rejection of
anthropic-beta: context-1m-2025-08-07 is hardcoded inside Copilot CLI 1.0.51; only a CLI upgrade fixes it.
- After merge, re-trigger one Copilot workflow (e.g. Sub-Issue Closer) and confirm exit 0.
P1 — short-term
3. Codex retry resilience (Cluster B): investigate why codex exec retries lose OPENAI_API_KEY. The harness retry path likely respawns the process without preserving secrets. Add an env-inheritance check before retry, or surface Missing OPENAI_API_KEY as a non-retriable auth error so the workflow fails fast with a clearer signal.
4. Action-timeout vs agent-completion mismatch (Cluster C): the Execute Claude Code CLI step waited past agent completion. Confirm the harness exits promptly after the agent's final tool call.
P2 — backlog
5. Workflow-specific tuning (Cluster D): step-name-alignment.md either needs max-turns raised or its bash allow-list widened to include /tmp/gh-aw/cache-memory/ and /tmp/gh-aw/agent/ so the agent stops burning turns on denied probes.
Sub-issues created
No new sub-issues are filed. Per-workflow auto-failure issues already exist for 14 of the 16 runs (linked above). Creating duplicates would add noise; this parent issue links the existing tracking issues via GitHub's sub-issue mechanism for visibility.
Confidence & unknowns
- High confidence: Cluster A root cause and fix path. Verified across 4 sampled runs (
grep -c context-1m-2025-08-07 returned 4 occurrences per agent-stdio.log, one per Copilot retry).
- Medium confidence: Cluster B is fixable by the same upgrade. Codex 0.133.0 ships with auth/session changes; the missing-env-var symptom may not survive the bump even if the underlying
invalid_request_error does.
- Unknowns: Why PR Description Updater (run 26352706155) and Multi-Device Docs Tester (run 26354235499) did not generate auto-failure issues. Possibly
report-failure-as-issue: false in their frontmatter, or auto-issue creation skipped because the agent step itself reported success while a later job failed.
References
Generated by 🔍 [aw] Failure Investigator (6h) · ● opu47 21.2M · ◷
6h follow-up — 2026-05-24T13:11 UTC
Cluster A (Copilot CLI 1.0.51 anthropic-beta 400) — RESOLVED on main. Fix PR #34390 ("Bump pinned Copilot/Codex/GitHub MCP versions and regenerate workflow artifacts") was merged at 2026-05-24T12:59:44 UTC.
Failures observed in the 6h window ending 2026-05-24T13:11 UTC
12 additional failures observed; 11 of 12 occurred before the fix was merged, and the single post-merge failure is on a PR branch that has not yet rebased onto the new pinned version. No new failure modes detected.
Failures in this window (12)
| Run ID |
Workflow |
Engine |
Created |
Pre/Post merge |
Cluster |
| §26361438009 |
PR Triage Agent |
Copilot 1.0.51 |
12:37:18Z |
Pre |
A |
| §26361477910 |
PR Description Updater |
Copilot 1.0.51 |
12:39:08Z |
Pre |
A |
| §26361491942 |
PR Description Updater |
Copilot 1.0.51 |
12:39:49Z |
Pre |
A |
| §26361510407 |
PR Description Updater |
Copilot 1.0.51 |
12:40:37Z |
Pre |
A |
| §26361519301 |
PR Description Updater |
Copilot 1.0.51 |
12:41:02Z |
Pre |
A |
| §26361533861 |
PR Description Updater |
Copilot 1.0.51 |
12:41:43Z |
Pre |
A |
| §26361624539 |
PR Code Quality Reviewer |
Copilot 1.0.51 |
12:45:51Z |
Pre |
A |
| §26361627620 |
PR Description Updater |
Copilot 1.0.51 |
12:46:00Z |
Pre |
A |
| §26361673647 |
Changeset Generator |
Codex 0.133.0 |
12:48:04Z |
Pre |
B |
| §26361673669 |
Smoke Codex |
Codex 0.133.0 |
12:48:04Z |
Pre |
B |
| §26361673672 |
Smoke Copilot |
Copilot 1.0.51 |
12:48:04Z |
Pre |
A |
| §26361959405 |
PR Code Quality Reviewer |
Copilot 1.0.51 |
13:00:50Z |
Post (PR branch unmerged) |
A |
Cluster A — closing thoughts
The fix is live on main. Outstanding PR branches that pin Copilot 1.0.51 in their workflow artifacts will continue to fail until they rebase. This is a transient effect and does not require additional remediation beyond normal PR maintenance. Recommended: confirm post-merge Copilot runs on main succeed in the next 6h cycle, then close this issue.
Cluster B — still active and not addressed by the version bump
The two Codex failures in this window already run on Codex 0.133.0 (the bumped version), yet still exhibit the documented pattern: attempt 1 produces an invalid_request_error and retries 2–4 fail with Missing environment variable: 'OPENAI_API_KEY'. The version bump did not fix the harness retry env-propagation bug — that remains a separate, actionable harness-side issue that should be investigated independently.
Codex stdio excerpt (run 26361673647)
attempt 1: Turn error: { "type": "invalid_request_error", ... } (model=gpt-5.4-mini)
attempt 2: Turn error: Missing environment variable: `OPENAI_API_KEY`.
attempt 3: Turn error: Missing environment variable: `OPENAI_API_KEY`.
attempt 4: Turn error: Missing environment variable: `OPENAI_API_KEY`.
[codex-harness] all 3 retries exhausted — giving up (exitCode=1)
No new clusters
No new failure modes appeared in this window. Clusters C (Claude Code action timeout) and D (Claude Code max_turns) had no recurrence.
Update generated by [aw] Failure Investigator (6h) run §26362131625.
Generated by 🔍 [aw] Failure Investigator (6h) · ● opu47 9.3M · ◷
Status: resolved — closing
The dominant P0 Cluster A tracked here (Copilot CLI 1.0.51 anthropic-beta: context-1m-2025-08-07 regression that broke 13 workflows) has been resolved by PR #34390, which bumped DefaultCopilotVersion from 1.0.51 to 1.0.52. As of 2026-05-25T01:30 UTC:
pkg/constants/version_constants.go has const DefaultCopilotVersion Version = "1.0.52" (fix landed).
- Zero Copilot-engine failures observed in the last 6h sample (43 runs total, 3 failures — all on Codex/Claude, not Copilot).
Residual concerns from this report — already tracked or self-healed
| Cluster |
Original symptom |
Current state |
| A — Copilot 1.0.51 anthropic-beta |
13 failures |
Fixed by #34390 → 1.0.52. No recurrences. |
B — Codex 0.130.0 Missing OPENAI_API_KEY after fallback retry |
1 failure |
Codex regression now tracked separately in #34522 (different root cause: PR #34390 also bumped Codex to 0.133.0, which introduced the stream_options.include_usage bug). |
| C — Claude 30-min action timeout (multi-device-docs-tester) |
1 failure |
Not observed in last 6h. Standalone follow-up if it recurs. |
D — Claude max_turns=30 exhausted |
1 failure |
Avenger ran into a similar max_turns=25 situation 2026-05-24 18:46–20:39 (§26372192847 and 2 others), then self-recovered. Not a persistent issue — no follow-up filed. |
Why close now
Closing as resolved. Re-open if the Copilot regression resurfaces.
Closed 2026-05-25 by failure-investigator after verifying DefaultCopilotVersion = 1.0.52 and zero Copilot failures in 6h lookback.
Generated by 🔍 [aw] Failure Investigator (6h) · opus47 19.7M · ◷
Executive summary
In the 6-hour window ending 2026-05-24T07:44 UTC, 16 agentic workflow runs failed with this distribution by engine:
anthropic-beta: context-1m-2025-08-07rejected (HTTP 400)Missing OPENAI_API_KEYafter fallback retrymax_turns=30exhaustedThe dominant cluster (81% of failures) is a single, fully reproducible upstream regression in Copilot CLI 1.0.51. The fix is already proposed in #34390 (bump to Copilot 1.0.52 / Codex 0.133.0 / GitHub MCP v1.0.5) — merging that PR should resolve Cluster A immediately. This report tracks the production impact and ties together the 13 per-workflow auto-failure issues already created.
Failure clusters
400 Unexpected value(s) \context-1m-2025-08-07` for the `anthropic-beta` header` on every attempt → all 4 retries failinvalid_request_error. Retries 2–4:Missing environment variable: 'OPENAI_API_KEY'(env not propagated to fresh-run retry)OPENAI_API_KEYnoop), but theExecute Claude Code CLIaction timed out at 30 minutes during browser-driven docs testingmulti-device-docs-tester.mdterminal_reason: max_turns,Reached maximum number of turns (30). Many turns burned on permission-denied bash attempts against/tmp/gh-aw/cache-memory/and/tmp/gh-aw/agent/max-turnsforstep-name-alignment.mdor allow-list those tmp pathsEvidence — Cluster A (P0)
All 13 Copilot runs use identical engine config:
The error appears on every Copilot CLI attempt (4 retries per run × 13 runs):
Sample agent stdio log (run 26354999018, Sub-Issue Closer)
Affected runs (all conclusion=failure, error_count=1, missing_tool_count=0):
All 13 Cluster-A runs and their auto-issues
Evidence — Cluster B (Codex)
Run §26353930732 (Duplicate Code Detector). Auto-issue: #34384.
First attempt produced an
invalid_request_error(the modelgpt-5.5is configured incodex.turn). Retries 2–4 then failed withMissing environment variable: 'OPENAI_API_KEY'. This suggests the harness's--continue/ fresh-run retry path does not re-inject the API key secret, which converts a transient model error into a hard auth failure.Codex turn errors across 4 attempts (run 26353930732)
Evidence — Cluster C (Claude Code action timeout)
Run §26354235499 (Multi-Device Docs Tester). No auto-issue was created because the agent itself reported success.
The Claude agent ran for ~26 turns, completed multi-device docs testing successfully across iPhone 12 / iPad / FHD viewports, and called
noop. However, the wrapping GitHub Action (Execute Claude Code CLI) hit a 30-minute timeout — the agent finished but the action runner kept blocking. Twopermission_denials(nohup npm run dev, redirect to/tmp/gh-aw/agent/astro-dev.log) likely lengthened the run as the agent worked around restrictions.Evidence — Cluster D (Claude Code max_turns)
Run §26352648076 (Step Name Alignment). Auto-issue: #34371.
The agent burned many turns on bash commands probing tmp paths it could not access. Compaction-style behavior plus permission friction caused it to hit the 30-turn cap. Either raise
max-turnsfor this workflow or extend the allowed-paths list to include/tmp/gh-aw/cache-memory/and/tmp/gh-aw/agent/.Audit-diff baseline comparison (Cluster A)
For run 26354999018 vs. the most recent successful Sub-Issue Closer baseline (§26274652795, 2026-05-22):
(unknown)domainsThe collapse from 5 turns to 0 turns is the smoking gun: the Copilot CLI's HTTP 400 fails before any agent reasoning step occurs.
Existing tracking & correlation
DefaultCopilotVersionto1.0.52,DefaultCodexVersionto0.133.0, regenerates 235 lockfiles[aw] <workflow> failednpm error notarget@anthropic-ai/claude-code@2.1.150 with a date before 5/21/2026) — not part of this clusterProposed fix roadmap
P0 — immediate
anthropic-beta: context-1m-2025-08-07is hardcoded inside Copilot CLI 1.0.51; only a CLI upgrade fixes it.P1 — short-term
3. Codex retry resilience (Cluster B): investigate why
codex execretries loseOPENAI_API_KEY. The harness retry path likely respawns the process without preserving secrets. Add an env-inheritance check before retry, or surfaceMissing OPENAI_API_KEYas a non-retriable auth error so the workflow fails fast with a clearer signal.4. Action-timeout vs agent-completion mismatch (Cluster C): the
Execute Claude Code CLIstep waited past agent completion. Confirm the harness exits promptly after the agent's final tool call.P2 — backlog
5. Workflow-specific tuning (Cluster D):
step-name-alignment.mdeither needsmax-turnsraised or its bash allow-list widened to include/tmp/gh-aw/cache-memory/and/tmp/gh-aw/agent/so the agent stops burning turns on denied probes.Sub-issues created
No new sub-issues are filed. Per-workflow auto-failure issues already exist for 14 of the 16 runs (linked above). Creating duplicates would add noise; this parent issue links the existing tracking issues via GitHub's sub-issue mechanism for visibility.
Confidence & unknowns
grep -c context-1m-2025-08-07returned 4 occurrences per agent-stdio.log, one per Copilot retry).invalid_request_errordoes.report-failure-as-issue: falsein their frontmatter, or auto-issue creation skipped because the agent step itself reported success while a later job failed.References
6h follow-up — 2026-05-24T13:11 UTC
Cluster A (Copilot CLI 1.0.51
anthropic-beta400) — RESOLVED onmain. Fix PR #34390 ("Bump pinned Copilot/Codex/GitHub MCP versions and regenerate workflow artifacts") was merged at 2026-05-24T12:59:44 UTC.Failures observed in the 6h window ending 2026-05-24T13:11 UTC
12 additional failures observed; 11 of 12 occurred before the fix was merged, and the single post-merge failure is on a PR branch that has not yet rebased onto the new pinned version. No new failure modes detected.
Failures in this window (12)
Cluster A — closing thoughts
The fix is live on
main. Outstanding PR branches that pin Copilot 1.0.51 in their workflow artifacts will continue to fail until they rebase. This is a transient effect and does not require additional remediation beyond normal PR maintenance. Recommended: confirm post-merge Copilot runs onmainsucceed in the next 6h cycle, then close this issue.Cluster B — still active and not addressed by the version bump
The two Codex failures in this window already run on Codex 0.133.0 (the bumped version), yet still exhibit the documented pattern: attempt 1 produces an
invalid_request_errorand retries 2–4 fail withMissing environment variable: 'OPENAI_API_KEY'. The version bump did not fix the harness retry env-propagation bug — that remains a separate, actionable harness-side issue that should be investigated independently.Codex stdio excerpt (run 26361673647)
No new clusters
No new failure modes appeared in this window. Clusters C (Claude Code action timeout) and D (Claude Code
max_turns) had no recurrence.Update generated by [aw] Failure Investigator (6h) run §26362131625.
Status: resolved — closing
The dominant P0 Cluster A tracked here (Copilot CLI 1.0.51
anthropic-beta: context-1m-2025-08-07regression that broke 13 workflows) has been resolved by PR #34390, which bumpedDefaultCopilotVersionfrom1.0.51to1.0.52. As of 2026-05-25T01:30 UTC:pkg/constants/version_constants.gohasconst DefaultCopilotVersion Version = "1.0.52"(fix landed).Residual concerns from this report — already tracked or self-healed
Missing OPENAI_API_KEYafter fallback retrystream_options.include_usagebug).max_turns=30exhaustedmax_turns=25situation 2026-05-24 18:46–20:39 (§26372192847 and 2 others), then self-recovered. Not a persistent issue — no follow-up filed.Why close now
stream_options.include_usagerejected bygpt-5.5#34522 — keeping this issue open would conflate two distinct (and now-disjoint) root causes.Closing as resolved. Re-open if the Copilot regression resurfaces.
Closed 2026-05-25 by failure-investigator after verifying
DefaultCopilotVersion = 1.0.52and zero Copilot failures in 6h lookback.