Skip to content

fix: detection job never fails in warn mode due to parse/infrastructure errors#29031

Merged
pelikhan merged 4 commits intomainfrom
copilot/fix-threat-detection-result-parse-error
Apr 29, 2026
Merged

fix: detection job never fails in warn mode due to parse/infrastructure errors#29031
pelikhan merged 4 commits intomainfrom
copilot/fix-threat-detection-result-parse-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 29, 2026

The detection job was failing (conclusion=failure) when AWF's squid container was unhealthy or the model produced no THREAT_DETECTION_RESULT token, blocking safe_outputs via the needs.detection.result == 'success' gate and marking the entire workflow run as failed.

Changes

  • pkg/workflow/threat_detection.gobuildDetectionConclusionStep now emits continue-on-error: true on the "Parse and conclude" step when the workflow is in warn mode (the default). In strict mode (continue-on-error: false) it is deliberately omitted so real threat detections still block safe_outputs.
# warn mode (default) — parse step can no longer fail the detection job
- name: Parse and conclude threat detection
  id: detection_conclusion
  if: always()
  continue-on-error: true   # ← new
  uses: actions/github-script@...
  • pkg/workflow/threat_detection.go (buildResultsParsingScriptRequire) — The generated github-script snippet now wraps the require()/main() call in a try/catch. If the parse module fails to load entirely (e.g. missing file, broken setup_globals.cjs), fallback outputs are set (reason=parse_error, conclusion=warning/failure, success=false) so detection_* outputs are never left empty.

  • actions/setup/js/parse_threat_detection_results.cjs — Added a top-level try/catch wrapping the inner runMain() logic. Any unanticipated runtime exception routes through setDetectionFailure() with reason=parse_error (aligned with the documented reason contract in messages_run_status.cjs) instead of crashing the step without setting outputs. This complements the YAML-level continue-on-error and the generated-script try/catch.

  • Lock files — All 204 compiled .lock.yml files recompiled to apply all fixes.

  • pkg/workflow/detection_success_test.go — New TestDetectionConclusionStepContinueOnError table-driven test asserts the warn-mode and strict-mode behavior.

…r handling

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/804c490f-da87-4b17-8cbb-108ea36917dd

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix systemic THREAT_DETECTION_RESULT parse error in workflows fix: detection job never fails in warn mode due to parse/infrastructure errors Apr 29, 2026
Copilot AI requested a review from gh-aw-bot April 29, 2026 02:59
@pelikhan pelikhan marked this pull request as ready for review April 29, 2026 04:03
Copilot AI review requested due to automatic review settings April 29, 2026 04:03
@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 80/100

Excellent test quality

Metric Value
New/modified tests analyzed 1
✅ Design tests (behavioral contracts) 1 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 1 (100%)
Duplicate test clusters 0
Test inflation detected Yes (113 lines added in test vs. 13 in threat_detection.go)
🚨 Coding-guideline violations 0

Test Classification Details

Test File Classification Issues Detected
TestDetectionConclusionStepContinueOnError pkg/workflow/detection_success_test.go:215 ✅ Design None

Analysis

TestDetectionConclusionStepContinueOnError (pkg/workflow/detection_success_test.go:215)

Classification: Design test (behavioral contract)

Design invariant enforced: The test verifies that compiled workflow YAML includes continue-on-error: true on the detection_conclusion step when in warn mode (default), and does not include it in strict mode. This is directly the behavioral contract that the PR's bug fix addresses.

Value if deleted: High — deleting this test would allow a regression where the detection job's continue-on-error behavior reverts without detection. The test covers three meaningful scenarios: warn mode (default), explicit warn mode, and strict mode.

Structure quality:

  • ✅ Table-driven with 3 subtests covering the key behavioral variants
  • ✅ All t.Fatal / t.Error calls include descriptive messages
  • ✅ No mock libraries used — compiles a real workflow and inspects the output
  • ✅ Strict mode row effectively tests a "negative" condition (absence of the flag)

Minor observation — test inflation: 113 lines were added to the test file vs. 13 lines to threat_detection.go (ratio ~8.7:1). However, this is expected for compiler tests in this codebase: embedded YAML frontmatter fixtures are inherently verbose, and the test file already existed. The 10-point inflation penalty was applied to the score.


Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs, *.test.js): 0 tests

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). The single new test function directly verifies the fix's behavioral contract by compiling real workflows and checking generated YAML output.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25090276800

🧪 Test quality analysis by Test Quality Sentinel · ● 440.9K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 80/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts threat-detection workflow behavior so that, in warn mode (default), parse/infrastructure issues don’t fail the detection job and block downstream safe_outputs.

Changes:

  • Add continue-on-error: true to the “Parse and conclude threat detection” step when threat detection is configured for warn mode.
  • Wrap the threat-detection parsing script’s main logic in a top-level try/catch to route unexpected runtime errors through the normal output-setting path.
  • Add a table-driven Go test asserting warn-mode vs strict-mode continue-on-error behavior, and recompile workflow lockfiles.
Show a summary per file
File Description
pkg/workflow/threat_detection.go Conditionally emits continue-on-error: true for the detection conclusion step in warn mode.
actions/setup/js/parse_threat_detection_results.cjs Adds top-level try/catch around the parsing/conclusion flow to avoid uncaught exceptions bypassing outputs.
pkg/workflow/detection_success_test.go Adds coverage asserting presence/absence of continue-on-error: true based on warn vs strict mode.
.github/workflows/*.lock.yml Recompiled generated workflow YAML to include the new warn-mode continue-on-error behavior.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 193/193 changed files
  • Comments generated: 3

Comment on lines +288 to 299
// Top-level try/catch ensures outputs are always set and the step never throws
// unexpectedly. Any unanticipated runtime error (e.g. missing module, I/O error
// outside the guarded paths) is caught here and surfaced as a parse_error warning
// (in warn mode) or failure (in strict mode). This is a defence-in-depth measure
// complementing the continue-on-error: true that is set on the parse step in warn mode.
try {
await runMain();
} catch (/** @type {any} */ unexpectedError) {
const errorMsg = getErrorMessage(unexpectedError);
core.error(`❌ Unexpected error in threat detection parse: ${errorMsg}`);
setDetectionFailure("unexpected_error", `${ERR_SYSTEM}: ❌ Unexpected error in threat detection parse: ${errorMsg}`);
}
Comment on lines +288 to +298
// Top-level try/catch ensures outputs are always set and the step never throws
// unexpectedly. Any unanticipated runtime error (e.g. missing module, I/O error
// outside the guarded paths) is caught here and surfaced as a parse_error warning
// (in warn mode) or failure (in strict mode). This is a defence-in-depth measure
// complementing the continue-on-error: true that is set on the parse step in warn mode.
try {
await runMain();
} catch (/** @type {any} */ unexpectedError) {
const errorMsg = getErrorMessage(unexpectedError);
core.error(`❌ Unexpected error in threat detection parse: ${errorMsg}`);
setDetectionFailure("unexpected_error", `${ERR_SYSTEM}: ❌ Unexpected error in threat detection parse: ${errorMsg}`);
Comment on lines +395 to +405
if continueOnError {
steps = append(steps, " continue-on-error: true\n")
}
steps = append(steps, []string{
fmt.Sprintf(" uses: %s\n", getCachedActionPin("actions/github-script", data)),
" env:\n",
" RUN_DETECTION: ${{ steps.detection_guard.outputs.run_detection }}\n",
fmt.Sprintf(" GH_AW_DETECTION_CONTINUE_ON_ERROR: %q\n", strconv.FormatBool(continueOnError)),
" with:\n",
" script: |\n",
}
}...)
…ence in warn mode

Generated by Design Decision Gate — records the decision to add
continue-on-error: true and a top-level try/catch as defence-in-depth
for the threat detection parse step when operating in warn mode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Commit pushed: 332ef8f

🏗️ ADR gate enforced by Design Decision Gate 🏗️

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (126 new lines across pkg/workflow/ and actions/setup/js/) but does not have a linked Architecture Decision Record (ADR).

AI has analyzed the PR diff and generated a draft ADR to help you get started:

📄 Draft ADR: docs/adr/29031-threat-detection-resilience-in-warn-mode.md

The draft captures the key design decision: using a two-layer defence-in-depth (YAML continue-on-error: true + JavaScript top-level try/catch) to make the threat detection parse step resilient to infrastructure failures in warn mode, while preserving strict-mode blocking behaviour.

What to do next

  1. Review the draft ADR committed to your branch — it was generated from the PR diff
  2. Complete the missing sections — add context the AI couldn't infer, refine the decision rationale, and list real alternatives you considered
  3. Commit the finalized ADR to docs/adr/ on your branch
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-29031: Threat Detection Parse-Step Resilience in Warn Mode

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

Why ADRs Matter

"AI made me procrastinate on key design decisions. Because refactoring was cheap, I could always say 'I'll deal with this later.' Deferring decisions corroded my ability to think clearly."

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.


📋 Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 29031-threat-detection-resilience-in-warn-mode.md for PR #29031).

🔒 This PR cannot merge until an ADR is linked in the PR body.

References: §25090276793

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · ● 183.5K ·

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot review all comments

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 29, 2026

@copilot review all comments

Addressed all three reviewer comments in ad02b9e:

  1. Module load failuresbuildResultsParsingScriptRequire now wraps the entire require()/main() call in a try/catch that sets reason=parse_error, conclusion=warning/failure, and success=false as appropriate. This covers failures before main() even runs (e.g. missing module file, broken setup_globals.cjs).

  2. unexpected_errorparse_error — The inner try/catch in parse_threat_detection_results.cjs now uses reason: 'parse_error' to match the documented contract in messages_run_status.cjs. The comment was also updated to match.

  3. Empty outputs when continue-on-error: true fires — Both the outer YAML-level continue-on-error: true and the new JS-level try/catch now guarantee outputs are always set (detection_conclusion, detection_success, reason) even if the module fails to load entirely.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/graphql
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw (http block)
    • Triggering command: /usr/bin/gh /usr/bin/gh api graphql -f query=query($owner: String!, $name: String!) { repository(owner: $owner, name: $name) { hasDiscussionsEnabled } } -f owner=github -f name=gh-aw lidation_test.go-C utils.go validation.go 1/x6�� validation_test.go nsistency_test.go test.go (http block)
  • https://api.github.com/repos/actions/checkout/git/ref/tags/v6
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv .js&#39; --ignore-paremote.origin.url --get tions/setup/node_modules/.bin/node (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/checkout/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv .js&#39; --ignore-paremote.origin.url --get ode (http block)
  • https://api.github.com/repos/actions/setup-node/git/ref/tags/v6
    • Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv (http block)
    • Triggering command: /usr/bin/gh gh api /repos/actions/setup-node/git/ref/tags/v6 --jq [.object.sha, .object.type] | @tsv .js&#39; --ignore-pa-f --get de_modules/.bin/-f (http block)
  • https://api.github.com/repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b
    • Triggering command: /usr/bin/gh gh api /repos/astral-sh/setup-uv/git/ref/tags/eac588ad8def6316056a12d4907a9d4d84ff7a3b --jq [.object.sha, .object.type] | @tsv re --log-level=e!../../../pkg/workflow/js/**/*.json (http block)
  • https://api.github.com/repos/github/gh-aw
    • Triggering command: /usr/bin/gh gh api /repos/github/gh-aw --jq .default_branch at-detection-resremote.origin.url --local r: $owner, name: $name) { hasDiscussionsEnabled } } credential.userngit (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI requested a review from pelikhan April 29, 2026 05:08
@pelikhan pelikhan merged commit 54bbc36 into main Apr 29, 2026
@pelikhan pelikhan deleted the copilot/fix-threat-detection-result-parse-error branch April 29, 2026 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Workflow Health Dashboard — 2026-04-28 ⚠️ Regression: 57% success rate

4 participants