Skip to content

Threat detection gate is advisory-only by default — needs.detection.outputs.detection_success unused in safe_outputs condition [Content truncated due to length] #31708

@szabta89

Description

@szabta89

Summary

When GH_AW_DETECTION_CONTINUE_ON_ERROR=true (the compiled default for all workflows at v0.68.3), the detection parse step calls core.warning() instead of core.setFailed() for every detection failure path — threat_detected, parse_error, and agent_failure. The detection GitHub job therefore always exits with result success, regardless of what the model returns. The safe_outputs job condition gates solely on needs.detection.result == 'success' (job-level exit code), not on the semantic step output needs.detection.outputs.detection_success. As a result, safe outputs are never blocked by threat detection in the default configuration, even when the model flags prompt_injection=true, secret_leak=true, or malicious_patch=true. This directly contradicts the published documentation, which states detection is a hard gate.

Affected Area

Output trust boundary — threat detection gate (parse_threat_detection_results.cjs, safe_output_handler_manager.cjs, compiled safe_outputs job condition in lock files).

Reproduction Outline

  1. Compile any gh-aw workflow with threat detection enabled (v0.68.3; default: GH_AW_DETECTION_CONTINUE_ON_ERROR: "true" in all 32 tested lock files).
  2. Observe the generated safe_outputs job condition: if: (!cancelled()) && needs.agent.result != 'skipped' && needs.detection.result == 'success'.
  3. Trigger the workflow and allow the detection model to return prompt_injection=true (or simulate a parse error).
  4. Observe in parse_threat_detection_results.cjs: setDetectionFailure calls core.warning(), not core.setFailed(). Detection job exits 0.
  5. Because needs.detection.result == 'success' is satisfied, the safe_outputs job runs and processes output regardless of detection verdict.
  6. Confirm that needs.detection.outputs.detection_success (which correctly holds "false") is produced but never referenced in the job condition or in safe_output_handler_manager.cjs.

Observed Behavior

All compiled workflows with threat detection enabled operate in advisory-only mode by default. Detected threats (prompt injection, secret leak, malicious patch) and detection failures are logged as warnings, but safe outputs are processed unconditionally. GH_AW_DETECTION_CONCLUSION and GH_AW_DETECTION_REASON are passed to the safe_outputs job environment but are never consumed by safe_output_handler_manager.cjs.

Expected Behavior

Per published documentation: "If any threat is detected, the workflow fails and safe outputs are blocked." When the detection model flags a threat, the safe_outputs job should not run. At minimum, the safe_outputs job condition should include && needs.detection.outputs.detection_success == 'true', so the semantic verdict gates output processing independently of warn/strict mode.

Security Relevance

The threat detection gate is documented as the primary control preventing AI-generated malicious output (prompt injection results, secret leaks, malicious patches) from reaching GitHub write surfaces via safe outputs. If this gate is advisory-only by default, any payload that triggers a positive detection result is still processed through safe outputs — the documented security control provides no runtime enforcement under the default compiled configuration.

Suggested Fixes

  1. Smallest direct fix: Add && needs.detection.outputs.detection_success == 'true' to the generated safe_outputs job condition in the compiler (e.g., safe_jobs_threat_detection.go). Works regardless of warn/strict mode because detection_success is always set.
  2. Change default: Set continue-on-error: false as the default in ThreatDetectionConfig so the job-level exit code also reflects the detection outcome.
  3. Defense-in-depth: Add a detection conclusion check at the top of safe_output_handler_manager.cjs that reads GH_AW_DETECTION_CONCLUSION (already in env) and fails early if it is "warning" or "failure".
  4. Documentation: Update docs/src/content/docs/reference/threat-detection.md to document the continue-on-error option and clarify that hard-gate behavior requires explicit configuration.

Additional Context

If warn mode is an intentional design decision (e.g., to preserve availability when the detection engine fails unexpectedly), this assumption should be documented explicitly. The published documentation currently describes only hard-gate semantics with no mention of warn mode or continue-on-error. A more targeted design would hard-fail on threat_detected while falling back to warn mode only for agent_failure / parse_error.

gh-aw version: v0.68.3

Original finding: https://github.com/githubnext/gh-aw-security/issues/2188

Generated by File Issue · ● 357.8K ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions