Specify warn-mode threat-failure policies for all safe outputs#32382
Merged
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot
AI
changed the title
Update safe outputs spec for warn-mode threat detection failures
Specify warn-mode threat-failure policies for all safe outputs
May 15, 2026
Copilot created this pull request from a session on behalf of
pelikhan
May 15, 2026 13:31
View session
Collaborator
|
@copilot merge main and recompile |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Contributor
Author
Done. Merged |
pelikhan
approved these changes
May 15, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Safe Outputs MCP Gateway Specification (to v1.20.0) to define normative warn-mode threat-detection failure behavior per safe output type, and regenerates a large number of *.lock.yml workflow files to normalize the trigger key from "on": to on:.
Changes:
- Adds Section 10.5 with requirements WTD1/WTD2/WTD3, Table WTD-A classifying every safe output type as Reviewable/Convertible/Abort, and compliance tests T-WTD-001..003.
- Renumbers prior "Edge Case Behavior" section to 10.6, bumps version to 1.20.0 / 2026-05-15, and adds an Appendix F changelog entry.
- Regenerates ~190 workflow lock files, unquoting the
on:YAML key (and updating auto-generated heredoc marker hashes in a few).
Show a summary per file
| File | Description |
|---|---|
| docs/src/content/docs/reference/safe-outputs-specification.md | Adds Section 10.5 warn-mode threat-detection policy, Table WTD-A, conformance tests, renumbers Edge Case section to 10.6, bumps version/date, updates changelog. |
| .github/workflows/*.lock.yml (~190 files) | Auto-regenerated lock files: "on": → on: normalization plus regenerated heredoc marker hashes in a few cases. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 230/230 changed files
- Comments generated: 0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Enhancement
What does this improve?
This updates the Safe Outputs specification so every safe output type has explicit behavior when threat detection flags content in
warnmode. Reviewable outputs now require standardized threat annotations, convertible outputs define fallback behavior, and non-reviewable outputs are explicitly aborted.Why is this valuable?
The spec now removes ambiguity in warn-mode threat handling by making fallback and abort paths normative and type-specific, including a required conversion from
push_to_pull_request_branchtocreate_pull_request.Implementation approach:
1.20.0and updated publication date.agentic threat detected, and XML comment marker.Reviewable,Convertible, orAbort.push_to_pull_request_branch(akaupdate-pull-request-branch) →create_pull_request.T-WTD-001throughT-WTD-003for annotation, conversion fallback, and abort behavior.1.20.0changelog entry summarizing the new warn-mode threat-failure requirements.