Summary
When GH_AW_DETECTION_CONTINUE_ON_ERROR=true (the compiled default for all workflows at v0.68.3), the detection parse step calls core.warning() instead of core.setFailed() for every detection failure path — threat_detected, parse_error, and agent_failure. The detection GitHub job therefore always exits with result success, regardless of what the model returns. The safe_outputs job condition gates solely on needs.detection.result == 'success' (job-level exit code), not on the semantic step output needs.detection.outputs.detection_success. As a result, safe outputs are never blocked by threat detection in the default configuration, even when the model flags prompt_injection=true, secret_leak=true, or malicious_patch=true. This directly contradicts the published documentation, which states detection is a hard gate.
Affected Area
Output trust boundary — threat detection gate (parse_threat_detection_results.cjs, safe_output_handler_manager.cjs, compiled safe_outputs job condition in lock files).
Reproduction Outline
- Compile any gh-aw workflow with threat detection enabled (v0.68.3; default:
GH_AW_DETECTION_CONTINUE_ON_ERROR: "true" in all 32 tested lock files).
- Observe the generated
safe_outputs job condition: if: (!cancelled()) && needs.agent.result != 'skipped' && needs.detection.result == 'success'.
- Trigger the workflow and allow the detection model to return
prompt_injection=true (or simulate a parse error).
- Observe in
parse_threat_detection_results.cjs: setDetectionFailure calls core.warning(), not core.setFailed(). Detection job exits 0.
- Because
needs.detection.result == 'success' is satisfied, the safe_outputs job runs and processes output regardless of detection verdict.
- Confirm that
needs.detection.outputs.detection_success (which correctly holds "false") is produced but never referenced in the job condition or in safe_output_handler_manager.cjs.
Observed Behavior
All compiled workflows with threat detection enabled operate in advisory-only mode by default. Detected threats (prompt injection, secret leak, malicious patch) and detection failures are logged as warnings, but safe outputs are processed unconditionally. GH_AW_DETECTION_CONCLUSION and GH_AW_DETECTION_REASON are passed to the safe_outputs job environment but are never consumed by safe_output_handler_manager.cjs.
Expected Behavior
Per published documentation: "If any threat is detected, the workflow fails and safe outputs are blocked." When the detection model flags a threat, the safe_outputs job should not run. At minimum, the safe_outputs job condition should include && needs.detection.outputs.detection_success == 'true', so the semantic verdict gates output processing independently of warn/strict mode.
Security Relevance
The threat detection gate is documented as the primary control preventing AI-generated malicious output (prompt injection results, secret leaks, malicious patches) from reaching GitHub write surfaces via safe outputs. If this gate is advisory-only by default, any payload that triggers a positive detection result is still processed through safe outputs — the documented security control provides no runtime enforcement under the default compiled configuration.
Suggested Fixes
- Smallest direct fix: Add
&& needs.detection.outputs.detection_success == 'true' to the generated safe_outputs job condition in the compiler (e.g., safe_jobs_threat_detection.go). Works regardless of warn/strict mode because detection_success is always set.
- Change default: Set
continue-on-error: false as the default in ThreatDetectionConfig so the job-level exit code also reflects the detection outcome.
- Defense-in-depth: Add a detection conclusion check at the top of
safe_output_handler_manager.cjs that reads GH_AW_DETECTION_CONCLUSION (already in env) and fails early if it is "warning" or "failure".
- Documentation: Update
docs/src/content/docs/reference/threat-detection.md to document the continue-on-error option and clarify that hard-gate behavior requires explicit configuration.
Additional Context
If warn mode is an intentional design decision (e.g., to preserve availability when the detection engine fails unexpectedly), this assumption should be documented explicitly. The published documentation currently describes only hard-gate semantics with no mention of warn mode or continue-on-error. A more targeted design would hard-fail on threat_detected while falling back to warn mode only for agent_failure / parse_error.
gh-aw version: v0.68.3
Original finding: https://github.com/githubnext/gh-aw-security/issues/2188
Generated by File Issue · ● 357.8K · ◷
Summary
When
GH_AW_DETECTION_CONTINUE_ON_ERROR=true(the compiled default for all workflows at v0.68.3), the detection parse step callscore.warning()instead ofcore.setFailed()for every detection failure path —threat_detected,parse_error, andagent_failure. The detection GitHub job therefore always exits with resultsuccess, regardless of what the model returns. Thesafe_outputsjob condition gates solely onneeds.detection.result == 'success'(job-level exit code), not on the semantic step outputneeds.detection.outputs.detection_success. As a result, safe outputs are never blocked by threat detection in the default configuration, even when the model flagsprompt_injection=true,secret_leak=true, ormalicious_patch=true. This directly contradicts the published documentation, which states detection is a hard gate.Affected Area
Output trust boundary — threat detection gate (
parse_threat_detection_results.cjs,safe_output_handler_manager.cjs, compiledsafe_outputsjob condition in lock files).Reproduction Outline
GH_AW_DETECTION_CONTINUE_ON_ERROR: "true"in all 32 tested lock files).safe_outputsjob condition:if: (!cancelled()) && needs.agent.result != 'skipped' && needs.detection.result == 'success'.prompt_injection=true(or simulate a parse error).parse_threat_detection_results.cjs:setDetectionFailurecallscore.warning(), notcore.setFailed(). Detection job exits 0.needs.detection.result == 'success'is satisfied, thesafe_outputsjob runs and processes output regardless of detection verdict.needs.detection.outputs.detection_success(which correctly holds"false") is produced but never referenced in the job condition or insafe_output_handler_manager.cjs.Observed Behavior
All compiled workflows with threat detection enabled operate in advisory-only mode by default. Detected threats (prompt injection, secret leak, malicious patch) and detection failures are logged as warnings, but safe outputs are processed unconditionally.
GH_AW_DETECTION_CONCLUSIONandGH_AW_DETECTION_REASONare passed to thesafe_outputsjob environment but are never consumed bysafe_output_handler_manager.cjs.Expected Behavior
Per published documentation: "If any threat is detected, the workflow fails and safe outputs are blocked." When the detection model flags a threat, the
safe_outputsjob should not run. At minimum, thesafe_outputsjob condition should include&& needs.detection.outputs.detection_success == 'true', so the semantic verdict gates output processing independently of warn/strict mode.Security Relevance
The threat detection gate is documented as the primary control preventing AI-generated malicious output (prompt injection results, secret leaks, malicious patches) from reaching GitHub write surfaces via safe outputs. If this gate is advisory-only by default, any payload that triggers a positive detection result is still processed through safe outputs — the documented security control provides no runtime enforcement under the default compiled configuration.
Suggested Fixes
&& needs.detection.outputs.detection_success == 'true'to the generatedsafe_outputsjob condition in the compiler (e.g.,safe_jobs_threat_detection.go). Works regardless of warn/strict mode becausedetection_successis always set.continue-on-error: falseas the default inThreatDetectionConfigso the job-level exit code also reflects the detection outcome.safe_output_handler_manager.cjsthat readsGH_AW_DETECTION_CONCLUSION(already in env) and fails early if it is"warning"or"failure".docs/src/content/docs/reference/threat-detection.mdto document thecontinue-on-erroroption and clarify that hard-gate behavior requires explicit configuration.Additional Context
If warn mode is an intentional design decision (e.g., to preserve availability when the detection engine fails unexpectedly), this assumption should be documented explicitly. The published documentation currently describes only hard-gate semantics with no mention of warn mode or
continue-on-error. A more targeted design would hard-fail onthreat_detectedwhile falling back to warn mode only foragent_failure/parse_error.gh-aw version: v0.68.3
Original finding: https://github.com/githubnext/gh-aw-security/issues/2188