[aw-failures] [aw-fix] PR Code Quality Reviewer intermittently red — copilot-sdk-driver hits 870s session.idle timeout, burns ~1
[Content truncated due to length]

### Classify the 870s `session.idle` timeout as a real engine timeout (and stop hard-redding sessions that finished with output) — it silently reds PR Code Quality Reviewer after ~175 AIC is already spent.

This is a **new, untracked P1 signature** identified in the last 6h window. Grouped under the current open failure-tracking parent #43031; the root cause is unrelated to that report's BYOK-400 cluster.

### Problem statement

`PR Code Quality Reviewer` fails intermittently (mixed pass/fail in the same hour) when the `copilot-sdk-driver` hits its idle timeout. In the audited run the driver logged:

```
[sdk-driver] error: Timeout after 870000ms waiting for session.idle
[sdk-driver] warning: SDK idle-timeout with collected output and no pending tool calls — treating as completed
[sdk-driver] session completed: hasOutput=true durationMs=870248
```

Despite `hasOutput=true` and "treating as completed", the `Execute GitHub Copilot CLI` step exits 1 and the run hard-reds. The `conclusion` job then classifies every known error class as `false` — including `Agentic engine timeout: false` — so the failure surfaces as an **unclassified hard-red** even though the driver plainly reported a timeout. ~**175.4 AIC** and ~77.5k output tokens are consumed before the red.

### Affected workflow & run IDs

- **Workflow:** PR Code Quality Reviewer (`.github/workflows/pr-code-quality-reviewer.lock.yml`)
- **Failing step:** `Execute GitHub Copilot CLI` (agent job)
- **Representative (audited, clear 870s timeout):** [§28703868952](https://github.com/github/gh-aw/actions/runs/28703868952)
- **Same-signature, pattern-matched (not individually audited):** [§28703807560](https://github.com/github/gh-aw/actions/runs/28703807560), [§28703040649](https://github.com/github/gh-aw/actions/runs/28703040649), [§28701059943](https://github.com/github/gh-aw/actions/runs/28701059943), [§28700970833](https://github.com/github/gh-aw/actions/runs/28700970833)
- **Comparator (nearest success, same workflow):** [§28702852738](https://github.com/github/gh-aw/actions/runs/28702852738) — passed
- **Intermittent:** 5 failures interleaved with passes in the same window → persistent pattern, not a one-off.

### Probable root cause

Large-diff reviews run the agent session past the driver's **870000ms (14.5m)** `session.idle` timeout. Two defects compound:
1. **Misclassification:** the conclusion classifier's `agentic_engine_timeout` heuristic does not match the `Timeout after <n>ms waiting for session.idle` driver message, so timeouts are reported as unclassified reds — corrupting red/false-red accounting and skipping timeout-specific handling.
2. **Exit code vs. graceful completion:** the driver says it collected output and "treats as completed", yet the step still exits 1 — so a session that produced usable output is thrown away as a hard failure.

`audit-diff` of the failed run vs the nearest success (28702852738) shows **no firewall/domain/tooling regression** (0 new domains, 0 status changes) — this is purely a duration/idle-timeout issue, not a network or MCP problem.

### Proposed remediation

1. Add the `Timeout after <n>ms waiting for session.idle` driver signature to the timeout classifier so these runs set `agentic_engine_timeout=true` (correct labeling, proper retry/suppression, accurate accounting).
2. When the driver reports `hasOutput=true` and "treating as completed" on idle-timeout, propagate a graceful exit for the step rather than exit 1 — don't hard-red a session that produced output.
3. Tune the idle timeout (or add streaming keepalives) for large-PR reviews so long-but-healthy reviews don't trip the idle threshold.

### Success criteria / verification

- Runs hitting the SDK idle-timeout are classified `agentic_engine_timeout=true` in the `conclusion` outputs.
- PR Code Quality Reviewer no longer emits **unclassified** hard-reds when the session gracefully completes with output.
- Over the next 6h window, the `session.idle` failure signature for PR Code Quality Reviewer drops (runs pass or are correctly classified as timeouts).

**References:** [§28703868952](https://github.com/github/gh-aw/actions/runs/28703868952), [§28702852738](https://github.com/github/gh-aw/actions/runs/28702852738), [§28703040649](https://github.com/github/gh-aw/actions/runs/28703040649)
Related to #43031







> Generated by [🔍 [aw] Failure Investigator (6h)](https://github.com/github/gh-aw/actions/runs/28707213727) · 229.2 AIC · ⌖ 38.4 AIC · ⊞ 5.2K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Faw-failure-investigator%22&type=issues)
> - [x] expires  on Jul 11, 2026, 5:23 AM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[aw-failures] [aw-fix] PR Code Quality Reviewer intermittently red — copilot-sdk-driver hits 870s session.idle timeout, burns ~1 [Content truncated due to length] #43392

Classify the 870s `session.idle` timeout as a real engine timeout (and stop hard-redding sessions that finished with output) — it silently reds PR Code Quality Reviewer after ~175 AIC is already spent.

Problem statement

Affected workflow & run IDs

Probable root cause

Proposed remediation

Success criteria / verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[aw-failures] [aw-fix] PR Code Quality Reviewer intermittently red — copilot-sdk-driver hits 870s session.idle timeout, burns ~1 [Content truncated due to length] #43392

Description

Classify the 870s session.idle timeout as a real engine timeout (and stop hard-redding sessions that finished with output) — it silently reds PR Code Quality Reviewer after ~175 AIC is already spent.

Problem statement

Affected workflow & run IDs

Probable root cause

Proposed remediation

Success criteria / verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Classify the 870s `session.idle` timeout as a real engine timeout (and stop hard-redding sessions that finished with output) — it silently reds PR Code Quality Reviewer after ~175 AIC is already spent.