Skip to content

[aw-failures] [aw-fix] PR Code Quality Reviewer intermittently red — copilot-sdk-driver hits 870s session.idle timeout, burns ~1 [Content truncated due to length] #43392

Description

@github-actions

Classify the 870s session.idle timeout as a real engine timeout (and stop hard-redding sessions that finished with output) — it silently reds PR Code Quality Reviewer after ~175 AIC is already spent.

This is a new, untracked P1 signature identified in the last 6h window. Grouped under the current open failure-tracking parent #43031; the root cause is unrelated to that report's BYOK-400 cluster.

Problem statement

PR Code Quality Reviewer fails intermittently (mixed pass/fail in the same hour) when the copilot-sdk-driver hits its idle timeout. In the audited run the driver logged:

[sdk-driver] error: Timeout after 870000ms waiting for session.idle
[sdk-driver] warning: SDK idle-timeout with collected output and no pending tool calls — treating as completed
[sdk-driver] session completed: hasOutput=true durationMs=870248

Despite hasOutput=true and "treating as completed", the Execute GitHub Copilot CLI step exits 1 and the run hard-reds. The conclusion job then classifies every known error class as false — including Agentic engine timeout: false — so the failure surfaces as an unclassified hard-red even though the driver plainly reported a timeout. ~175.4 AIC and ~77.5k output tokens are consumed before the red.

Affected workflow & run IDs

  • Workflow: PR Code Quality Reviewer (.github/workflows/pr-code-quality-reviewer.lock.yml)
  • Failing step: Execute GitHub Copilot CLI (agent job)
  • Representative (audited, clear 870s timeout): §28703868952
  • Same-signature, pattern-matched (not individually audited): §28703807560, §28703040649, §28701059943, §28700970833
  • Comparator (nearest success, same workflow): §28702852738 — passed
  • Intermittent: 5 failures interleaved with passes in the same window → persistent pattern, not a one-off.

Probable root cause

Large-diff reviews run the agent session past the driver's 870000ms (14.5m) session.idle timeout. Two defects compound:

  1. Misclassification: the conclusion classifier's agentic_engine_timeout heuristic does not match the Timeout after <n>ms waiting for session.idle driver message, so timeouts are reported as unclassified reds — corrupting red/false-red accounting and skipping timeout-specific handling.
  2. Exit code vs. graceful completion: the driver says it collected output and "treats as completed", yet the step still exits 1 — so a session that produced usable output is thrown away as a hard failure.

audit-diff of the failed run vs the nearest success (28702852738) shows no firewall/domain/tooling regression (0 new domains, 0 status changes) — this is purely a duration/idle-timeout issue, not a network or MCP problem.

Proposed remediation

  1. Add the Timeout after <n>ms waiting for session.idle driver signature to the timeout classifier so these runs set agentic_engine_timeout=true (correct labeling, proper retry/suppression, accurate accounting).
  2. When the driver reports hasOutput=true and "treating as completed" on idle-timeout, propagate a graceful exit for the step rather than exit 1 — don't hard-red a session that produced output.
  3. Tune the idle timeout (or add streaming keepalives) for large-PR reviews so long-but-healthy reviews don't trip the idle threshold.

Success criteria / verification

  • Runs hitting the SDK idle-timeout are classified agentic_engine_timeout=true in the conclusion outputs.
  • PR Code Quality Reviewer no longer emits unclassified hard-reds when the session gracefully completes with output.
  • Over the next 6h window, the session.idle failure signature for PR Code Quality Reviewer drops (runs pass or are correctly classified as timeouts).

References: §28703868952, §28702852738, §28703040649
Related to #43031

Generated by 🔍 [aw] Failure Investigator (6h) · 229.2 AIC · ⌖ 38.4 AIC · ⊞ 5.2K ·

  • expires on Jul 11, 2026, 5:23 AM UTC-08:00

Metadata

Metadata

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions