Skip to content

[aw-failures] Copilot CLI dies in ~1s with isAuthError (0 turns / 0 tokens) — COPILOT_GITHUB_TOKEN/GH_TOKEN/GITHUB_TOKEN absent on scheduled r [Content truncated due to length] #36656

@github-actions

Description

@github-actions

Problem statement

  1. Two scheduled copilot-engine agentic workflows failed in the last 6h window, both in the agent job, with an identical, newly-classified signature: the GitHub Copilot CLI process exits in ~0–1s with exitCode=1, recording 0 turns / 0 tokens and an empty agent_output.json ({"items":[]}).
  2. The harness classifies the failure as isAuthError=true and refuses to retry, emitting:

    no authentication information found — not retrying (COPILOT_GITHUB_TOKEN, GH_TOKEN, and GITHUB_TOKEN are all absent or invalid)

  3. This is a fresh regression on 2026-06-03: both affected workflows had a successful prior scheduled run earlier the same day (see table), so this is not a chronic per-workflow misconfiguration.
  4. This signature is not covered by any open agentic-workflows issue. It is filed as a sub-issue of the most-recent open failure report [aw-failures] Daily Issues Report Generator — 100% failure (5+ days): copilot CLI exits in ~30s with zero token usage, unclassif [Content truncated due to length] #36325 (which tracks a related but distinct copilot symptom: a ~30s exit on a single workflow, root cause unclassified). This issue classifies the early-copilot-exit symptom for the first time as auth-token absence and supplies fresh same-day run IDs.

Affected workflows and run IDs

Workflow Engine Event Failed run (UTC) Prior run (same day) Failure
PR Triage Agent copilot (claude-sonnet-4.6) schedule §26889734909 — 13:59 §26871722704 — 08:00 ✅ agent job, 4.1m, 0 turns
Agent Performance Analyzer - Meta-Orchestrator copilot schedule §26890506259 — 14:12 §26824629412 — 06-02 13:57 ✅ agent job, 3.6m, 0 turns

Note on coverage: the deterministic pre-fetch payload reported 0 failures for this window, and the logs MCP tool timed out (120s); gh run list pagination beyond page 1 was firewall-refused (dial tcp ...: connection refused). The two runs above were found by sampling page 1 only — the true count of affected scheduled copilot runs in the 6h window is likely higher and could not be enumerated.

Probable root cause

  1. At the moment the Copilot CLI is invoked inside the awf-agent container, none of COPILOT_GITHUB_TOKEN, GH_TOKEN, or GITHUB_TOKEN is present/valid in the process environment, so the CLI aborts immediately before any model turn.
  2. Because prior same-day runs of the same workflows succeeded with the same lock files, the token was present then and absent now — pointing at a token provisioning / propagation issue at run time (e.g. a secret/token not injected into the agent container, an expired or empty short-lived token, or a race where the token is fetched after the CLI starts) rather than a workflow-definition defect.
  3. The failure is intermittent across the fleet — other copilot runs in the same window succeeded (e.g. Copilot §26890979977, Agentic Maintenance §26890508514), so this is not a total auth outage.

Proposed remediation

  1. Diagnose token injection: in the agent job, verify that the resolved Copilot/GitHub token is non-empty immediately before the CLI launches (mask-safe presence check, not value logging). Confirm whether the token is being passed into the awf-agent container env for scheduled events.
  2. Fail fast & loud: when isAuthError=true with all three token vars absent, surface a distinct, classified error in the step summary (today it lands as a generic exitCode=1 and the pre-fetch counts it as zero failures). This would also let the deterministic pre-fetch detect it.
  3. Bounded retry on auth-null: the harness currently does not retry on "no authentication information found." If the absence is a provisioning race, a short bounded re-resolve-and-retry (re-fetch token, 1–2 attempts) would recover transient cases instead of burning the whole scheduled run.
  4. Cross-check with [aw-failures] Daily Issues Report Generator — 100% failure (5+ days): copilot CLI exits in ~30s with zero token usage, unclassif [Content truncated due to length] #36325: confirm whether the Daily Issues Report Generator ~30s exits share this exact isAuthError cause; if so, merge tracking.

Success criteria / verification

  1. A scheduled copilot run whose token is momentarily unavailable either (a) recovers via bounded retry, or (b) fails with an explicit, classified auth error visible in the step summary and counted by the pre-fetch.
  2. No copilot agentic run exits with 0 turns / 0 tokens without a classified auth diagnostic.
  3. Re-run PR Triage Agent and Meta-Orchestrator on schedule for 48h with 0 isAuthError-class 0-turn failures.

Evidence

agent-stdio.log tail — PR Triage Agent §26889734909 (identical for §26890506259)
[copilot-harness] attempt 1: process exit event exitCode=1
[copilot-harness] attempt 1: process closed exitCode=1 duration=0s stdout=0B stderr=746B hasOutput=true
[copilot-harness] attempt 1 failed: exitCode=1 ... isAuthError=true isAuthenticationFailedError=false permissionDeniedCount=0 ... retriesRemaining=3
[copilot-harness] attempt 1: no authentication information found — not retrying (COPILOT_GITHUB_TOKEN, GH_TOKEN, and GITHUB_TOKEN are all absent or invalid)
[copilot-harness] done: exitCode=1 totalDuration=1s
audit-diff — failed vs successful PR Triage Agent baseline (same workflow)
  • Failed §26889734909: agent job failure, 0 turns, token_usage=None, errors=0, empty agent_output.json, missing_tools=None, mcp_failures=None.
  • Successful baseline §26871722704 (08:00 same day): conclusion success, 1 turn, 347,622 tokens / 384,088 effective, 0 errors, read_only posture. Stable cohort (turns 1→1, posture read_only).
  • Delta: the failure is a clean pre-turn abort — no token usage, no tool calls, no model interaction — consistent with the CLI never authenticating.

Parent / correlation

References: §26889734909 · §26890506259 · §26871722704
Related to #36325

Generated by 🔍 [aw] Failure Investigator (6h) · opus48 9M ·

  • expires on Jun 10, 2026, 2:31 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions