Skip to content

Improve PR Sous Chef engine-failure context for AWF startup crashes#34524

Merged
pelikhan merged 4 commits into
mainfrom
copilot/debug-pr-sous-chef-failure
May 24, 2026
Merged

Improve PR Sous Chef engine-failure context for AWF startup crashes#34524
pelikhan merged 4 commits into
mainfrom
copilot/debug-pr-sous-chef-failure

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 24, 2026

PR Sous Chef failures were being reported as generic Copilot termination with low-signal output (stdout/stderr: undefined), masking the actual AWF startup cause. This change makes failure context extraction recognize AWF startup error patterns so issues surface actionable root-cause lines.

  • Failure-context parsing updates

    • Extend buildEngineFailureContext() to treat AWF bracketed error logs as first-class engine errors:
      • ^\[ERROR\]\s*(.+)$
    • Capture AWF docker-compose dependency startup failures emitted without log-level prefix:
      • ^dependency failed to start:\s*(.+)$
    • Result: generated issue body now includes concrete startup failure details (e.g., unhealthy awf-squid) instead of fallback noise.
  • Regression coverage for PR Sous Chef failure shape

    • Add focused test case in handle_agent_failure.test.cjs with real failure-style lines:
      • dependency failed to start: container awf-squid is unhealthy
      • [ERROR] Failed to start containers: ... docker compose up -d --pull never
    • Assert output includes those root-cause messages and excludes fallback-only fields (stdout/stderr: undefined).
// New extraction patterns in engine failure context parsing
const bracketErrorMatch = line.match(/^\[ERROR\]\s*(.+)$/);
const awfDependencyFailureMatch = line.match(/^dependency failed to start:\s*(.+)$/);

Copilot AI linked an issue May 24, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits May 24, 2026 20:10
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Debug PR Sous Chef workflow failure Improve PR Sous Chef engine-failure context for AWF startup crashes May 24, 2026
Copilot AI requested a review from pelikhan May 24, 2026 20:11
@pelikhan pelikhan marked this pull request as ready for review May 24, 2026 20:13
Copilot AI review requested due to automatic review settings May 24, 2026 20:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves PR Sous Chef’s engine-failure context extraction so AWF startup crashes surface actionable root-cause lines in generated issue bodies instead of low-signal fallback output.

Changes:

  • Extend buildEngineFailureContext() to capture AWF-style bracketed error logs ([ERROR] ...) and AWF docker-compose dependency startup failures (dependency failed to start: ...).
  • Add a regression test to ensure the extracted failure context includes the AWF root-cause messages and does not include fallback-only noise like stdout/stderr: undefined.
Show a summary per file
File Description
actions/setup/js/handle_agent_failure.cjs Adds new AWF-specific parsing patterns to engine failure context extraction.
actions/setup/js/handle_agent_failure.test.cjs Adds a focused test case covering AWF startup failure log shapes and expected extracted context.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

actions/setup/js/handle_agent_failure.cjs:1594

  • The function header comment says it matches known patterns like ERROR:, Error:, Fatal:, panic:, etc., but it now also matches AWF-specific patterns ([ERROR] ... and dependency failed to start: ...). Please update the doc comment to include these patterns so future changes don’t accidentally regress them.
      // AWF docker-compose dependency failures surface this root-cause line without
      // an explicit log-level prefix.
      const awfDependencyFailureMatch = line.match(/^dependency failed to start:\s*(.+)$/);
      if (awfDependencyFailureMatch) {
        errorMessages.add(`dependency failed to start: ${awfDependencyFailureMatch[1].trim()}`);
        continue;
      }
  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment on lines +1574 to +1577
// AWF runtime logs: "[ERROR] <message>"
const bracketErrorMatch = line.match(/^\[ERROR\]\s*(.+)$/);
if (bracketErrorMatch) {
errorMessages.add(bracketErrorMatch[1].trim());
Comment on lines +1211 to +1215
it("extracts AWF startup errors from bracketed logs", () => {
const lines = [
" Container awf-squid Error",
"dependency failed to start: container awf-squid is unhealthy",
"[ERROR] Failed to start containers: Error: Command failed with exit code 1: docker compose up -d --pull never",
@github-actions
Copy link
Copy Markdown
Contributor

@copilot review all comments

Please address the unresolved review feedback on actions/setup/js/handle_agent_failure.cjs, then re-request review and refresh the branch if needed.

Generated by 👨‍🍳 PR Sous Chef · gpt54 2M ·

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

PR Code Quality Reviewer completed the code quality review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #34524 does not have the 'implementation' label and has 0 new lines of code in default business logic directories (well under the 100-line threshold).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

🧪 Test Quality Sentinel completed test quality analysis.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions github-actions Bot mentioned this pull request May 24, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔎 Code quality review by PR Code Quality Reviewer · sonnet46 701.5K

}

// AWF docker-compose dependency failures surface this root-cause line without
// an explicit log-level prefix.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant string reconstruction: the template literal re-assembles the full original line unnecessarily.

💡 Suggested simplification

Instead of:

errorMessages.add(`dependency failed to start: ${awfDependencyFailureMatch[1].trim()}`);

Use the capture group directly or just trim the original line:

errorMessages.add(line.trim());
// or
errorMessages.add(`dependency failed to start: ${awfDependencyFailureMatch[1].trim()}`);

Actually the current form is fine, but note the inconsistency: the [ERROR] handler strips its prefix (bracketErrorMatch[1].trim()), while this handler preserves it (dependency failed to start: ...). If both are intentional, a brief comment on why [ERROR] is stripped would help readers understand the difference.

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 100/100 — Excellent

Analyzed 1 test: 1 design, 0 implementation, 0 guideline violations.

📊 Metrics & Test Classification (1 test analyzed)
Metric Value
New/modified tests analyzed 1
✅ Design tests (behavioral contracts) 1 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 1 (100%)
Duplicate test clusters 0
Test inflation detected No
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
extracts AWF startup errors from bracketed logs actions/setup/js/handle_agent_failure.test.cjs:1208 ✅ Design None

Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 0 tests
  • 🟨 JavaScript (*.test.cjs, *.test.js): 1 test (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%).

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · sonnet46 1M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%).

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — clean, well-scoped bug fix. Approving with minor suggestions.

📋 Key Themes & Highlights

Key Themes

  • Pattern breadth: The [ERROR] prefix is common across many tools — worth a short comment confirming it's AWF-specific in this context.
  • Test assertion durability: not.toContain("Last agent output") works today but is fragile if the heading changes.
  • Minor reconstruction quirk: The dependency match rebuilds the prefix string rather than using match[0] — safe but slightly roundabout.

Positive Highlights

  • ✅ Purely additive — zero risk of breaking existing extraction paths
  • ✅ Regression test covers both the positive (root-cause lines included) and negative (fallback noise excluded) cases
  • ✅ Consistent with existing pattern-matching structure in buildEngineFailureContext()
  • ✅ PR description is clear and includes concrete before/after examples

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.2M

continue;
}

// AWF runtime logs: "[ERROR] <message>"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The [ERROR] pattern is broad — any log line from any tool/library that uses [ERROR] prefix (e.g., npm, Docker BuildKit, other CLI tools) will be captured as an engine failure context line.

💡 Consider making the pattern more specific

If AWF startup logs have a consistent prefix (e.g., [awf] or [compose]), tightening the pattern would prevent false positives from third-party tooling:

// More specific — only AWF logs
const bracketErrorMatch = line.match(/^\[ERROR\]\s*(.+)$/);
// Could be guarded by checking surrounding context or using a more specific prefix

That said, if the [ERROR] prefix is reliably AWF-specific in practice (e.g., only gh-aw's own logger emits it at startup), this is fine as-is. Worth a short comment confirming the assumption.

// AWF docker-compose dependency failures surface this root-cause line without
// an explicit log-level prefix.
const awfDependencyFailureMatch = line.match(/^dependency failed to start:\s*(.+)$/);
if (awfDependencyFailureMatch) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Minor: the dependency failure message is reconstructed from the capture group rather than using line.trim() or match[0]. This normalises whitespace between the prefix and container name, which is intentional but subtle.

💡 Simpler alternative

Using match[0] (the full matched string) or line.trim() avoids the reconstruction and is more obviously correct:

// Option A — use the full match (already trimmed by regex anchor)
errorMessages.add(awfDependencyFailureMatch[0]);

// Option B — trim the original line
errorMessages.add(line.trim());

The current approach is safe, but a comment noting the whitespace normalisation intent would help the next reader.

expect(result).not.toContain("stderr: undefined");
});

it("detects Fatal: prefix pattern", () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] not.toContain("Last agent output") ties the test to a specific heading string — if that section heading changes the assertion silently becomes weaker without failing.

💡 Consider a more structural assertion

If the intent is to assert the fallback path was not taken, a positive assertion on the structured output (e.g. that only the Engine Failure section is present and not the fallback block) is more durable:

// Assert the fallback section is absent by checking its structural marker
expect(result).not.toMatch(/Last agent output|Fallback/i);
// or: assert the result does NOT contain the fallback sentinel string you own

Alternatively, a short comment explaining what "Last agent output" refers to (the fallback section heading) would make the intent clear for future maintainers.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments and review comments

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 24, 2026

@copilot review all comments and review comments

Addressed. I reviewed the feedback and pushed e3bc7d9 with the requested updates:

  • narrowed bracketed [ERROR] extraction to actionable AWF startup failures (Failed to start... / dependency failed to start...) so infra wrapper [ERROR] noise stays excluded
  • renamed the AWF startup test to accurately reflect both covered patterns
  • added infra-only regression input including [ERROR] Command completed with exit code: 1 to verify it is not surfaced as engine error context

Validated with npm run lint:cjs and npm run test:js -- handle_agent_failure.test.cjs, and final parallel validation is clean.

Copilot AI requested a review from pelikhan May 24, 2026 20:37
@github-actions
Copy link
Copy Markdown
Contributor

``
@copilot review all comments
Please address the unresolved review threads and re-request review when ready.

Generated by 👨‍🍳 PR Sous Chef · gpt54 6.4M ·

@github-actions
Copy link
Copy Markdown
Contributor

``
Please summarize the remaining blockers and the next step to land this PR.

Generated by 👨‍🍳 PR Sous Chef · gpt54 6.4M ·

@github-actions
Copy link
Copy Markdown
Contributor

@copilot Please summarize the remaining startup-crash blockers and next steps.

Generated by 👨‍🍳 PR Sous Chef · gpt54 4.2M ·

@pelikhan pelikhan merged commit 6e79d8d into main May 24, 2026
12 checks passed
@pelikhan pelikhan deleted the copilot/debug-pr-sous-chef-failure branch May 24, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw] PR Sous Chef failed

3 participants