Skip to content

Specify warn-mode threat-failure policies for all safe outputs#32382

Merged
pelikhan merged 3 commits into
mainfrom
copilot/update-safe-outputs-specification
May 15, 2026
Merged

Specify warn-mode threat-failure policies for all safe outputs#32382
pelikhan merged 3 commits into
mainfrom
copilot/update-safe-outputs-specification

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

✨ Enhancement

What does this improve?
This updates the Safe Outputs specification so every safe output type has explicit behavior when threat detection flags content in warn mode. Reviewable outputs now require standardized threat annotations, convertible outputs define fallback behavior, and non-reviewable outputs are explicitly aborted.

Why is this valuable?
The spec now removes ambiguity in warn-mode threat handling by making fallback and abort paths normative and type-specific, including a required conversion from push_to_pull_request_branch to create_pull_request.

Implementation approach:

  • Version + publication metadata
    • Bumped spec metadata to 1.20.0 and updated publication date.
  • Normative warn-mode policy section
    • Added a new execution guarantees subsection defining:
      • WTD1: reviewable outputs MUST include a prominent caution, label text agentic threat detected, and XML comment marker.
      • WTD2: convertible outputs MUST be transformed to their mapped reviewable type.
      • WTD3: non-reviewable outputs MUST be aborted and routed through a threat-detected code path.
  • Per-type policy coverage
    • Added a complete type matrix mapping every safe output type to Reviewable, Convertible, or Abort.
    • Included explicit fallback mapping: push_to_pull_request_branch (aka update-pull-request-branch) → create_pull_request.
  • Compliance coverage
    • Added conformance test IDs T-WTD-001 through T-WTD-003 for annotation, conversion fallback, and abort behavior.
  • Document history
    • Added a 1.20.0 changelog entry summarizing the new warn-mode threat-failure requirements.
> [!CAUTION]
> agentic threat detected
> Threat detection flagged this output in warn mode. Manual review is REQUIRED before any follow-up automation.

<!-- agentic threat detected -->

Copilot AI and others added 2 commits May 15, 2026 13:29
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Update safe outputs spec for warn-mode threat detection failures Specify warn-mode threat-failure policies for all safe outputs May 15, 2026
Copilot AI requested a review from pelikhan May 15, 2026 13:31
@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot merge main and recompile

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 15, 2026

@copilot merge main and recompile

Done. Merged origin/main and recompiled workflow lock files in commit a2cd065.

@pelikhan pelikhan marked this pull request as ready for review May 15, 2026 13:49
Copilot AI review requested due to automatic review settings May 15, 2026 13:49
@pelikhan pelikhan merged commit fd167e9 into main May 15, 2026
@pelikhan pelikhan deleted the copilot/update-safe-outputs-specification branch May 15, 2026 13:49
Copilot AI restored the copilot/update-safe-outputs-specification branch May 15, 2026 13:49
Copilot stopped work on behalf of pelikhan due to an error May 15, 2026 13:49
Copilot AI requested a review from pelikhan May 15, 2026 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Safe Outputs MCP Gateway Specification (to v1.20.0) to define normative warn-mode threat-detection failure behavior per safe output type, and regenerates a large number of *.lock.yml workflow files to normalize the trigger key from "on": to on:.

Changes:

  • Adds Section 10.5 with requirements WTD1/WTD2/WTD3, Table WTD-A classifying every safe output type as Reviewable/Convertible/Abort, and compliance tests T-WTD-001..003.
  • Renumbers prior "Edge Case Behavior" section to 10.6, bumps version to 1.20.0 / 2026-05-15, and adds an Appendix F changelog entry.
  • Regenerates ~190 workflow lock files, unquoting the on: YAML key (and updating auto-generated heredoc marker hashes in a few).
Show a summary per file
File Description
docs/src/content/docs/reference/safe-outputs-specification.md Adds Section 10.5 warn-mode threat-detection policy, Table WTD-A, conformance tests, renumbers Edge Case section to 10.6, bumps version/date, updates changelog.
.github/workflows/*.lock.yml (~190 files) Auto-regenerated lock files: "on":on: normalization plus regenerated heredoc marker hashes in a few cases.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 230/230 changed files
  • Comments generated: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants