Skip to content

[aw-failures] copilot-sdk driver mode broken by PR #36679 — Session was not created with authentication info or custom provider kills 9 work [Content truncated due to length] #36710

@github-actions

Description

@github-actions

Executive summary

In the 2026-06-03 17:30–19:53 UTC window, 10 scheduled copilot-engine runs across 9 distinct workflows died in ~1s with exitCode=1, 0 turns / 0 tokens, empty agent_output.json. This is the same surface symptom tracked by parent #36656, but fresh log evidence supersedes that issue's root-cause hypothesis: the failure is not absent GitHub tokens. It is a regression in copilot-sdk driver mode introduced by PR #36679 ("Remove copilot SDK driver inlined mode", merged 2026-06-03 17:29:43 UTC). The first failure landed at 17:30:06 UTC — ~1 minute after the merge.

Every failing run uses the copilot-sdk subprocess driver (copilot_sdk_driver.cjs + headless sidecar on 127.0.0.1:3002) and aborts with Error: Session was not created with authentication info or custom provider. Every passing copilot run in the same window invokes /usr/local/bin/copilot directly and succeeds. The token health-check (COPILOT_GITHUB_TOKEN is placeholder value (correct), offline+BYOK mode) is identical in both — so the placeholder tokens are correct by design, and the harness's isAuthError=true classification is a misdiagnosis of a driver-side session-bootstrap failure.

Copilot failure rate in-window: 11/28 runs (39%); 10 are this SDK-driver regression, 1 is an unrelated Smoke Copilot test-verdict FAIL.

Failure clusters

Cluster Class Engine Runs Workflows Priority
A copilot-sdk driver session abort (PR #36679 regression) copilot 10 9 P0
B effective_tokens_limit_exceeded (25M cap) claude 1 1 (Documentation Unbloat) P1 — already tracked by #35661 / #36594
C Smoke-test FAIL verdict (not an infra fault) copilot 1 1 (Smoke Copilot) n/a

Affected workflows and run IDs (Cluster A — P0)

Workflow Failed run (UTC) Mode Harness
Daily Security Observability Report §26901717191 — 17:30 SDK-driver 1s, 0 turns
Daily SPDD Spec Planner §26901762271 — 17:30 SDK-driver 1s, 0 turns
GEO Optimizer Daily Audit §26903386460 — 18:01 SDK-driver 2s, 0 turns
PR Code Quality Reviewer §26903689020 — 18:07 SDK-driver 1s, 0 turns
Linter Miner §26905302792 — 18:37 SDK-driver 1s, 0 turns
Daily Secrets Analysis Agent §26906100801 — 18:52 SDK-driver 1s, 0 turns
Daily Testify Uber Super Expert §26906843892 — 19:06 SDK-driver 1s, 0 turns
PR Triage Agent §26908265361 — 19:34 SDK-driver 1s, 0 turns
PR Code Quality Reviewer §26908785711 — 19:44 SDK-driver 1s, 0 turns
Daily Safe Output Integrator §26909254135 — 19:53 SDK-driver 1s, 0 turns

Probable root cause

  1. PR Remove copilot SDK driver inlined mode #36679 removed the in-process inline path from copilot_harness.cjs, so copilot-sdk mode now always spawns the copilot_sdk_driver.cjs subprocess against a headless Copilot CLI sidecar.
  2. In driver mode the harness generates a per-run COPILOT_CONNECTION_TOKEN, starts the sidecar (--headless --port 3002), then copilot_sdk_driver.cjs connects and creates a session that is missing the custom-provider / auth context, producing Error: Session was not created with authentication info or custom provider the moment the prompt is sent.
  3. The harness maps this to isAuthError=true and emits no authentication information found — not retrying (COPILOT_GITHUB_TOKEN, GH_TOKEN, and GITHUB_TOKEN are all absent or invalid) — a false signal: the env tokens are intentional offline+BYOK placeholders and are identical on passing runs. Because it's classified as a non-retryable auth error, the run is lost immediately.
  4. Correction to parent [aw-failures] Copilot CLI dies in ~1s with isAuthError (0 turns / 0 tokens) — COPILOT_GITHUB_TOKEN/GH_TOKEN/GITHUB_TOKEN absent on scheduled r [Content truncated due to length] #36656: the prior "token provisioning / propagation at run time" hypothesis is disproven — the divergence is execution mode (SDK-driver vs direct), not token presence.

Proposed remediation

  1. Immediate: revert PR Remove copilot SDK driver inlined mode #36679 (or restore the inline copilot-sdk path) so copilot-sdk workflows stop bleeding scheduled runs, until the driver session bootstrap is fixed.
  2. Driver fix: in copilot_sdk_driver.cjs, propagate the custom provider / COPILOT_CONNECTION_TOKEN into the SDK session before sending prompt (the sidecar starts and the session is created, but without provider context).
  3. Classifier fix: distinguish driver-side Session was not created with authentication info or custom provider from genuine env-token absence; do not suppress retry for it (allow one bounded re-bootstrap-and-retry of the sidecar).
  4. Recompile/redeploy the affected copilot-sdk .lock.yml workflows after the fix.

Success criteria / verification

  1. copilot-sdk workflows complete with > 0 turns; no Session was not created with authentication info or custom provider in copilot_sdk_driver logs.
  2. The isAuthError=true / 0-turn 1s-death signature does not recur for copilot-sdk runs.
  3. Re-run the 9 affected workflows on schedule for 24–48h with 0 auth-class 0-turn failures.

Evidence

Divergence: failed (SDK-driver) vs successful (direct) — same window, same token health-check
FAILED §26909254135 (Daily Safe Output Integrator):
  [health-check] ✓ COPILOT_GITHUB_TOKEN is placeholder value (correct)   # offline+BYOK, by design
  [copilot-harness] copilot-sdk mode active: generated per-run COPILOT_CONNECTION_TOKEN
  [copilot-harness] copilot-sdk driver mode: starting sidecar command=/usr/local/bin/copilot
  [copilot-sdk-driver] [sdk-driver] session created: sessionId=401a2cce-...
  [copilot-sdk-driver] [sdk-driver] sending prompt...
  [Error: Execution failed: Error: Session was not created with authentication info or custom provider]
  [copilot-harness] attempt 1 failed: ... isAuthError=true ... retriesRemaining=3
  [copilot-harness] attempt 1: no authentication information found — not retrying (...absent or invalid)
  [copilot-harness] done: exitCode=1 totalDuration=1s

SUCCESS §26908784431 (Auto-Triage Issues):
  [health-check] ✓ COPILOT_GITHUB_TOKEN is placeholder value (correct)   # identical
  [copilot-harness] attempt 1: spawning: /usr/local/bin/copilot --add-dir ...   # DIRECT, no sidecar
  [copilot-harness] success on attempt 1: totalDuration=40s
SDK-driver vs direct mode confirmed across the cluster
26909254135 FAIL  mode=SDK-DRIVER  sessionAuthErr=1
26908785711 FAIL  mode=SDK-DRIVER  sessionAuthErr=1
26906100801 FAIL  mode=SDK-DRIVER  sessionAuthErr=1
26901717191 FAIL  mode=SDK-DRIVER  sessionAuthErr=1
26903386460 FAIL  mode=SDK-DRIVER  sessionAuthErr=1
26908784431 OK    mode=DIRECT      sessionAuthErr=0
26908146923 OK    mode=DIRECT      sessionAuthErr=0
26908410399 OK    mode=DIRECT      sessionAuthErr=0

PR #36679 merged 17:29:43 UTC → first cluster failure 17:30:06 UTC.

audit-diff — clean pre-turn abort

Failed SDK-driver runs show agent job failure with 0 turns, token_usage=None, errors=0, empty agent_output.json, missing_tools=None, mcp_failures=None — a pre-turn abort with no model interaction. Successful direct runs in the same cohort complete in 40s–1m with normal turn/token counts and read_only posture. The delta is execution mode, not workload.

Existing-issue correlation

References: §26909254135 · §26901717191 · §26908784431
Related to #36656

Generated by 🔍 [aw] Failure Investigator (6h) · opus48 17.1M ·

  • expires on Jun 10, 2026, 8:16 PM UTC

Investigation update — 2026-06-04 (6h failure sweep)

Fix the copilot-sdk driver session bootstrap — done: PR #36769 (merged 2026-06-04 01:29:18Z) resolves this P0 at the root. The driver now resolves the BYOK custom provider before sending the prompt, eliminating Session was not created with authentication info or custom provider.

Verification status: RESOLVED — pending one clean scheduled cycle

Every cluster-A failure in the last 6h, including the most recent, checked out pre-fix code — confirming the fix, not contradicting it:

Issue Workflow Run Started (UTC) Checked-out SHA
#36745 Plan Command §26917699243 22:46 pre-fix
#36765 Daily Model Inventory Checker §26921759874 00:24 pre-fix
#36768 Daily Sentrux Report §26921808538 00:26 pre-fix
#36784 PR Triage Agent §26924221485 01:28:49 318e58a2 (direct parent of fix 4757ec0)

The last failure (#36784) started 29 seconds before the fix merged — it is the regression's tail, not a fix gap.

Correction to remediation step 4 (no .lock.yml recompile needed)

Lock files invoke the driver by path (${RUNNER_TEMP}/gh-aw/actions/copilot_sdk_driver.cjs), which actions/checkout + ./actions/setup populate fresh from actions/setup/js/ at runtime. Verified: 0 of 62 copilot_sdk_driver lock files inline the driver. The merged source fix is therefore live for the next scheduled run of every affected workflow with no recompile/redeploy.

Note on remediation step 3 (classifier/retry)

PR #36769 deliberately fails fast (process.exit(1)) when the BYOK provider cannot be resolved, with a clear diagnostic, rather than adding a bounded retry. This is a reasonable design choice (retrying a misconfigured env is pointless), so no separate classifier sub-issue is warranted.

Consolidation

The four auto-filed per-failure duplicates of this regression — #36745, #36765, #36768, #36784 — are being closed as duplicates of this issue / fixed by #36769. This issue remains the single tracking point.

Recommended next step: close this issue once the next scheduled cycle of the affected copilot-sdk workflows completes with > 0 turns and no recurrence of the session-bootstrap error.

Out of scope (distinct clusters, already tracked separately)

References: §26924221485, §26921808538, §26917699243

Generated by 🔍 [aw] Failure Investigator (6h) · opus48 9.3M ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions