Skip to content

[WIP] Fix Codex Engine 401 Unauthorized error in workflows#27366

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/fix-codex-auth-401-error
Closed

[WIP] Fix Codex Engine 401 Unauthorized error in workflows#27366
Copilot wants to merge 1 commit intomainfrom
copilot/fix-codex-auth-401-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 20, 2026

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.


This section details on the original issue you should resolve

<issue_title>[Workflow Health Dashboard] 2026-04-20 — Score 73/100 | P0: Codex auth | P1: node not found, rate limits, MCP gateway</issue_title>
<issue_description>### Overview

Workflow health assessment for 197 agentic workflows in this repository. Run: §24665804498

Metric Value
Total workflows 197
Lock files present 197/197 ✅
Stale lock files 0 ✅
Today's confirmed failures 5 workflows
Estimated schedule success rate ~85%
Overall health score 73/100

Critical Issues 🚨

P0: Codex Engine 401 Auth (Ongoing since Apr 18)

Tracked in #27127 (OPEN, assigned to @pelikhan + Copilot).

All Codex-engine workflows continue to fail with 401 Unauthorized from OpenAI. Confirmed new failures today:

Both show identical error:

unexpected status 401 Unauthorized: Missing bearer or basic authentication in header
url: (api.openai.com/redacted)

Impact: All workflows using engine: codex (AI Moderator, Duplicate Code Detector, Schema Feature Coverage, Daily Observability Report, etc.) are completely blocked.
Action needed: Rotate/restore OPENAI_API_KEY repository secret.

High Priority Issues ⚠️

P1: Recurring node: command not found on GPU Runner

Tracked in #27337 (new issue, OPEN).

Copilot-engine workflows on aw-gpu-runner-T4 are failing with /bin/bash: line 1: node: command not found. Recurring across 2+ days:

Impact: 2+ GPU-runner workflows blocked. Likely affects other aw-gpu-runner-T4 workflows.

P1: MCP Gateway Startup Failure

Daily Fact About gh-aw failed at "Start MCP Gateway" step today:

Impact: Isolated to workflows using custom MCP CLI servers (mempalace) on this run. May be transient.

P1: GitHub App Rate Limit Exhaustion (Co-scheduled Workflows)

Tracked in #27251 (OPEN, assigned to @pelikhan + Copilot).

Co-scheduled workflows at 23:44 UTC exhaust the GitHub App installation rate limit. First observed Apr 19.

Impact: Multiple workflows failing at guard/firewall policy fetch step. Staggering cron schedules is the recommended fix.

Resolved Since Last Run ✅

Today's Auto-Generated Failure Issues
Issue Workflow Error Status
#27328 Duplicate Code Detector Codex 401 auth P0 (tracked in #27127)
#27317 Daily Fact About gh-aw MCP Gateway startup failure P1 (new)
#27301 Daily Issues Report Generator node: command not found P1 (tracked in #27337)
#27295 Daily News node: command not found P1 (tracked in #27337)
#27286 Schema Feature Coverage Checker Codex 401 auth P0 (tracked in #27127)
Compilation Status Details
  • Total MD workflows: 197 (excluding shared/ subdirectory)
  • Lock files: 197/197 present ✅
  • Stale lock files: 0 (all lock files up-to-date) ✅
  • Shared imports (excluded): Files in .github/workflows/shared/ are not compiled standalone

Systemic Issues

Rate Limit Clustering

Multiple workflows share identical or near-identical cron schedules. The guard/firewall policy check consumes installation API rate limit. When 3+ workflows start simultaneously, rate limits can be exhausted.

Recommendation: Audit cron schedules and stagger by 3-5 minutes minimum between co-scheduled workflows.

Codex Engine Credential Dependency

All Codex-engine workflows are single-point-of-failure dependent on OPENAI_API_KEY. When the secret expires or is misconfigured, all such workflows fail simultaneously with no graceful degradation.

Recommendation: Add credential validation as a pre-flight check in activation job with clear error message and early exit.

Recommendations

Immediate (P0)

  1. Restore Codex auth — Rotate OPENAI_API_KEY secret in repository settings ([aw-failures] Codex engine 401 auth failure — OPENAI_API_KEY credential missing or invalid #27127)

High Priority (P1)

  1. Fix Node.js PATH on GPU runner — Investigate aw-gpu-runner-T4 Node.js availability ([P1] Recurring node: command not found on aw-gpu-runner-T4 (Daily News, Daily Issues Report) #27337)
  2. Stagger cron schedules — Offset co-scheduled workflows by 3-5 minutes ([aw-failures] GitHub App installation rate limit exhaustion from co-scheduled workflows at 23:44 UTC #27251)
  3. Investigate MCP Gateway failure — Determine if Daily Fact MCP Gateway startup issue is transient or systemic ([aw] Daily Fact About gh-aw failed #27317)

Medium Priority (P2)

  1. Safe Outputs conformance — 4 handler files need sanitization ([Safe Outputs Conformance] SEC-004: Multiple handlers have body fields without content sanitization #27235)
  2. Performance regressions — CompileComplexWorkflow +29%, CompileSimpleWorkflow +39%, Validation +96% ([performance] Regression in CompileComplexWorkflow: 29.2% slower #27280, [performance] Regression in CompileSimpleWorkflow: 39.3% slower #27279, [performance] Regression in Validation: 95.6% slower #27278)

Trends

  • Overall health score: 73/100 (→ stable from 75 last run)
  • P0 issues: 1 (Codex 401 auth — unresolved since Apr 18, day 3)
  • P1 issues: 3 (rate limit, node not found, MCP gateway)
  • New failures today: 5 workflows
  • Fixed since last run: CLI updates, stale lock files
  • Workflows with stale locks: 0 (↓ from 17 last run)

Actions Taken This Run


Last updated: 2026-04-20T12:14Z
Next check: 2026-04-21T12:00Z (daily schedule)

[!NOTE]

🔒 Integrity filter blocked 5 items

The following items were blocked because they don't meet the GitHub integrity level.

  • #19099 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #21784 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #27282 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #27260 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • #27259 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Workflow Health Manager - Meta-Orchestrator · ● 3.6M ·

  • expires on Apr 21, 2026, 12:25 PM UTC

Comments on the Issue (you are @copilot in this section)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Workflow Health Dashboard] 2026-04-20 — Score 73/100 | P0: Codex auth | P1: node not found, rate limits, MCP gateway

2 participants