Improve PR Sous Chef engine-failure context for AWF startup crashes by Copilot · Pull Request #34524 · github/gh-aw

Copilot · 2026-05-24T20:02:46Z

PR Sous Chef failures were being reported as generic Copilot termination with low-signal output (stdout/stderr: undefined), masking the actual AWF startup cause. This change makes failure context extraction recognize AWF startup error patterns so issues surface actionable root-cause lines.

Failure-context parsing updates
- Extend buildEngineFailureContext() to treat AWF bracketed error logs as first-class engine errors:
  - ^\[ERROR\]\s*(.+)$
- Capture AWF docker-compose dependency startup failures emitted without log-level prefix:
  - ^dependency failed to start:\s*(.+)$
- Result: generated issue body now includes concrete startup failure details (e.g., unhealthy awf-squid) instead of fallback noise.
Regression coverage for PR Sous Chef failure shape
- Add focused test case in handle_agent_failure.test.cjs with real failure-style lines:
  - dependency failed to start: container awf-squid is unhealthy
  - [ERROR] Failed to start containers: ... docker compose up -d --pull never
- Assert output includes those root-cause messages and excludes fallback-only fields (stdout/stderr: undefined).

// New extraction patterns in engine failure context parsing
const bracketErrorMatch = line.match(/^\[ERROR\]\s*(.+)$/);
const awfDependencyFailureMatch = line.match(/^dependency failed to start:\s*(.+)$/);

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Improves PR Sous Chef’s engine-failure context extraction so AWF startup crashes surface actionable root-cause lines in generated issue bodies instead of low-signal fallback output.

Changes:

Extend buildEngineFailureContext() to capture AWF-style bracketed error logs ([ERROR] ...) and AWF docker-compose dependency startup failures (dependency failed to start: ...).
Add a regression test to ensure the extracted failure context includes the AWF root-cause messages and does not include fallback-only noise like stdout/stderr: undefined.

Show a summary per file

File	Description
actions/setup/js/handle_agent_failure.cjs	Adds new AWF-specific parsing patterns to engine failure context extraction.
actions/setup/js/handle_agent_failure.test.cjs	Adds a focused test case covering AWF startup failure log shapes and expected extracted context.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

actions/setup/js/handle_agent_failure.cjs:1594

The function header comment says it matches known patterns like ERROR:, Error:, Fatal:, panic:, etc., but it now also matches AWF-specific patterns ([ERROR] ... and dependency failed to start: ...). Please update the doc comment to include these patterns so future changes don’t accidentally regress them.

      // AWF docker-compose dependency failures surface this root-cause line without
      // an explicit log-level prefix.
      const awfDependencyFailureMatch = line.match(/^dependency failed to start:\s*(.+)$/);
      if (awfDependencyFailureMatch) {
        errorMessages.add(`dependency failed to start: ${awfDependencyFailureMatch[1].trim()}`);
        continue;
      }

Files reviewed: 2/2 changed files
Comments generated: 2

+      // AWF runtime logs: "[ERROR] <message>"
+      const bracketErrorMatch = line.match(/^\[ERROR\]\s*(.+)$/);
+      if (bracketErrorMatch) {
+        errorMessages.add(bracketErrorMatch[1].trim());


+    it("extracts AWF startup errors from bracketed logs", () => {
+      const lines = [
+        " Container awf-squid  Error",
+        "dependency failed to start: container awf-squid is unhealthy",
+        "[ERROR] Failed to start containers: Error: Command failed with exit code 1: docker compose up -d --pull never",


github-actions · 2026-05-24T20:19:32Z

@copilot review all comments

Please address the unresolved review feedback on actions/setup/js/handle_agent_failure.cjs, then re-request review and refresh the branch if needed.

Generated by 👨‍🍳 PR Sous Chef · gpt54 2M · ◷

github-actions · 2026-05-24T20:19:34Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-05-24T20:19:37Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #34524 does not have the 'implementation' label and has 0 new lines of code in default business logic directories (well under the 100-line threshold).

github-actions · 2026-05-24T20:19:42Z

🧪 Test Quality Sentinel completed test quality analysis.

github-actions · 2026-05-24T20:19:45Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 701.5K

github-actions · 2026-05-24T20:23:08Z

      }

+      // AWF docker-compose dependency failures surface this root-cause line without
+      // an explicit log-level prefix.


Redundant string reconstruction: the template literal re-assembles the full original line unnecessarily.

💡 Suggested simplification

Instead of:

errorMessages.add(`dependency failed to start: ${awfDependencyFailureMatch[1].trim()}`);

Use the capture group directly or just trim the original line:

errorMessages.add(line.trim()); // or errorMessages.add(`dependency failed to start: ${awfDependencyFailureMatch[1].trim()}`);

Actually the current form is fine, but note the inconsistency: the [ERROR] handler strips its prefix (bracketErrorMatch[1].trim()), while this handler preserves it (dependency failed to start: ...). If both are intentional, a brief comment on why [ERROR] is stripped would help readers understand the difference.

github-actions · 2026-05-24T20:23:47Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 100/100 — Excellent

Analyzed 1 test: 1 design, 0 implementation, 0 guideline violations.

📊 Metrics & Test Classification (1 test analyzed)

Metric	Value
New/modified tests analyzed	1
✅ Design tests (behavioral contracts)	1 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	1 (100%)
Duplicate test clusters	0
Test inflation detected	No
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`extracts AWF startup errors from bracketed logs`	`actions/setup/js/handle_agent_failure.test.cjs:1208`	✅ Design	None

Language Support

Tests analyzed:

🐹 Go (*_test.go): 0 tests
🟨 JavaScript (*.test.cjs, *.test.js): 1 test (vitest)

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1M · ◷

github-actions

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — clean, well-scoped bug fix. Approving with minor suggestions.

📋 Key Themes & Highlights

Key Themes

Pattern breadth: The [ERROR] prefix is common across many tools — worth a short comment confirming it's AWF-specific in this context.
Test assertion durability: not.toContain("Last agent output") works today but is fragile if the heading changes.
Minor reconstruction quirk: The dependency match rebuilds the prefix string rather than using match[0] — safe but slightly roundabout.

Positive Highlights

✅ Purely additive — zero risk of breaking existing extraction paths
✅ Regression test covers both the positive (root-cause lines included) and negative (fallback noise excluded) cases
✅ Consistent with existing pattern-matching structure in buildEngineFailureContext()
✅ PR description is clear and includes concrete before/after examples

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.2M

github-actions · 2026-05-24T20:24:38Z

        continue;
      }

+      // AWF runtime logs: "[ERROR] <message>"


[/diagnose] The [ERROR] pattern is broad — any log line from any tool/library that uses [ERROR] prefix (e.g., npm, Docker BuildKit, other CLI tools) will be captured as an engine failure context line.

💡 Consider making the pattern more specific

If AWF startup logs have a consistent prefix (e.g., [awf] or [compose]), tightening the pattern would prevent false positives from third-party tooling:

// More specific — only AWF logs const bracketErrorMatch = line.match(/^\[ERROR\]\s*(.+)$/); // Could be guarded by checking surrounding context or using a more specific prefix

That said, if the [ERROR] prefix is reliably AWF-specific in practice (e.g., only gh-aw's own logger emits it at startup), this is fine as-is. Worth a short comment confirming the assumption.

github-actions · 2026-05-24T20:24:38Z

+      // AWF docker-compose dependency failures surface this root-cause line without
+      // an explicit log-level prefix.
+      const awfDependencyFailureMatch = line.match(/^dependency failed to start:\s*(.+)$/);
+      if (awfDependencyFailureMatch) {


[/diagnose] Minor: the dependency failure message is reconstructed from the capture group rather than using line.trim() or match[0]. This normalises whitespace between the prefix and container name, which is intentional but subtle.

💡 Simpler alternative

Using match[0] (the full matched string) or line.trim() avoids the reconstruction and is more obviously correct:

// Option A — use the full match (already trimmed by regex anchor) errorMessages.add(awfDependencyFailureMatch[0]); // Option B — trim the original line errorMessages.add(line.trim());

The current approach is safe, but a comment noting the whitespace normalisation intent would help the next reader.

github-actions · 2026-05-24T20:24:38Z

+      expect(result).not.toContain("stderr: undefined");
+    });
+
    it("detects Fatal: prefix pattern", () => {


[/tdd] not.toContain("Last agent output") ties the test to a specific heading string — if that section heading changes the assertion silently becomes weaker without failing.

💡 Consider a more structural assertion

If the intent is to assert the fallback path was not taken, a positive assertion on the structured output (e.g. that only the Engine Failure section is present and not the fallback block) is more durable:

// Assert the fallback section is absent by checking its structural marker expect(result).not.toMatch(/Last agent output|Fallback/i); // or: assert the result does NOT contain the fallback sentinel string you own

Alternatively, a short comment explaining what "Last agent output" refers to (the fallback section heading) would make the intent clear for future maintainers.

pelikhan · 2026-05-24T20:27:43Z

@copilot review all comments and review comments

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-24T20:37:24Z

@copilot review all comments and review comments

Addressed. I reviewed the feedback and pushed e3bc7d9 with the requested updates:

narrowed bracketed [ERROR] extraction to actionable AWF startup failures (Failed to start... / dependency failed to start...) so infra wrapper [ERROR] noise stays excluded
renamed the AWF startup test to accurately reflect both covered patterns
added infra-only regression input including [ERROR] Command completed with exit code: 1 to verify it is not surfaced as engine error context

Validated with npm run lint:cjs and npm run test:js -- handle_agent_failure.test.cjs, and final parallel validation is clean.

github-actions · 2026-05-24T21:14:23Z

``
@copilot review all comments
Please address the unresolved review threads and re-request review when ready.

Generated by 👨‍🍳 PR Sous Chef · gpt54 6.4M · ◷

github-actions · 2026-05-24T21:14:24Z

``
Please summarize the remaining blockers and the next step to land this PR.

Generated by 👨‍🍳 PR Sous Chef · gpt54 6.4M · ◷

github-actions · 2026-05-24T21:42:13Z

@copilot Please summarize the remaining startup-crash blockers and next steps.

Generated by 👨‍🍳 PR Sous Chef · gpt54 4.2M · ◷

Initial plan

e3a3d8f

Copilot AI assigned Copilot and pelikhan May 24, 2026

Copilot started work on behalf of pelikhan May 24, 2026 20:02 View session

Copilot AI linked an issue May 24, 2026 that may be closed by this pull request

[aw] PR Sous Chef failed #34523

Closed

Copilot AI and others added 2 commits May 24, 2026 20:10

Improve AWF startup error extraction in engine failure context

e3dc0c0

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Add missing stderr assertion for AWF startup failure test

441f86e

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Debug PR Sous Chef workflow failure~~ Improve PR Sous Chef engine-failure context for AWF startup crashes May 24, 2026

Copilot AI requested a review from pelikhan May 24, 2026 20:11

Copilot finished work on behalf of pelikhan May 24, 2026 20:11

pelikhan approved these changes May 24, 2026

View reviewed changes

pelikhan marked this pull request as ready for review May 24, 2026 20:13

Copilot AI review requested due to automatic review settings May 24, 2026 20:13

Copilot started reviewing on behalf of pelikhan May 24, 2026 20:14 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 24, 2026

[aw] No-Op Runs #34290

Open

github-actions Bot reviewed May 24, 2026

View reviewed changes

github-actions Bot approved these changes May 24, 2026

View reviewed changes

Copilot started work on behalf of pelikhan May 24, 2026 20:27 View session

Refine AWF [ERROR] startup parsing and test coverage naming

e3bc7d9

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 24, 2026 20:37

Copilot AI requested a review from pelikhan May 24, 2026 20:37

pelikhan merged commit 6e79d8d into main May 24, 2026
12 checks passed

pelikhan deleted the copilot/debug-pr-sous-chef-failure branch May 24, 2026 22:59

Conversation

Copilot AI commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 24, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

pelikhan commented May 24, 2026

Uh oh!

Copilot AI commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

github-actions Bot commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading