You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a comprehensive threat model and hardening playbook for the org's agentic CI workflows, addressing the "Comment and Control" attack class (March 2026) where PR titles and issue bodies are weaponized to hijack AI agents. The playbook codifies the "Rule of Two" principle: never let an agent simultaneously process untrusted input and hold state-changing tool access.
Market Signal
The agent security landscape has shifted dramatically in Q1-Q2 2026:
"Comment and Control" disclosure (March 2026): A single PR title simultaneously hijacked Claude Code, Gemini CLI, and GitHub Copilot, exfiltrating repository secrets. Anthropic patched with default-deny filesystem reads and system-prompt hardening
Microsoft disclosed a Claude Code GitHub Action vulnerability (CVSS 9.6) in May 2026 — the agent could be tricked into reading sensitive CI runner files
NSA published "MCP: Security Design Considerations for AI-Driven Automation" (17 pages, May 20, 2026), recommending filtering proxies, sandboxing, message integrity checks, and output validation
Industry consensus: Prompt injection cannot be fully solved within current LLM architectures — layered defense is the only viable approach
The "Rule of Two" governance pattern emerged: never let an agent simultaneously process untrusted input and hold state-changing tool access
The org runs five agent-driven workflows that process untrusted inputs:
Workflow
Untrusted Input Sources
Privileges Held
dev-lead.yml
Issue titles, labels, bodies
Code write, issue write
feature-ideation.yml
Discussion titles, bodies
Discussions write
compliance-audit.yml
API responses, file contents
Issues write, PRs
claude-code-reusable.yml
PR diffs, review comments
Code write, PR write
agent-shield.yml
PR content (deliberately)
Read-only (good)
Existing ideas #269 (Agent Shield v2) and #367 (Agent Input Sanitization) address individual pieces but no unified threat model ties them together. The feature-ideation auto-enhance work (PR #448) introduced a new attack surface where Discussion content is processed by an agent.
Technical Opportunity
The existing agent-shield.yml provides a runtime detection foundation. ci-standards.md already mandates action pinning and permission scoping. A unified threat model would connect these existing controls with new defenses:
Input sanitization at workflow boundaries (strip control characters, limit length, detect injection patterns)
Privilege separation between input processing steps and mutation steps
Output validation before agent-generated content is posted to GitHub
The Rule of Two architecture: split agent workflows into a read-only analysis phase and a separate, validated mutation phase
Assessment
Dimension
Score
Rationale
Feasibility
high
Primarily documentation + architectural patterns, not new tooling
Impact
high
Addresses #1 security risk for agentic CI (OWASP, NSA, CSA all flag it)
Urgency
high
Active exploitation in the wild; org runs 5 agent workflows with untrusted inputs
Adversarial Review
Strongest objection: The org already has agent-shield.yml and existing ideas #269 (Agent Shield v2) and #367 (Agent Input Sanitization). Is this just duplication? Rebuttal: Agent-shield is a runtime detection tool — it detects attacks after they happen. Ideas #269 and #367 are point solutions for specific attack vectors. This playbook is the strategic layer above them: a threat model that maps the org's actual attack surfaces to controls, identifies gaps, and prioritizes hardening work. The "Comment and Control" disclosure, Microsoft CVE, and NSA MCP guidance provide new urgency and structure that didn't exist when those ideas were proposed in May 2026. The playbook references agent-shield as one control among many, giving it architectural context.
Suggested Next Step
Draft a standards/agent-ci-threat-model.md covering:
Attack surface inventory (which workflows process untrusted input, with what privileges)
The Rule of Two principle and privilege separation architecture
Input sanitization patterns for each untrusted input type
Output validation requirements for agent-generated content
A per-workflow hardening checklist with current status and gaps
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Create a comprehensive threat model and hardening playbook for the org's agentic CI workflows, addressing the "Comment and Control" attack class (March 2026) where PR titles and issue bodies are weaponized to hijack AI agents. The playbook codifies the "Rule of Two" principle: never let an agent simultaneously process untrusted input and hold state-changing tool access.
Market Signal
The agent security landscape has shifted dramatically in Q1-Q2 2026:
Sources: CSA Research Note, Comment and Control disclosure, NSA MCP Guidance (PDF)
User Signal
The org runs five agent-driven workflows that process untrusted inputs:
dev-lead.ymlfeature-ideation.ymlcompliance-audit.ymlclaude-code-reusable.ymlagent-shield.ymlExisting ideas #269 (Agent Shield v2) and #367 (Agent Input Sanitization) address individual pieces but no unified threat model ties them together. The feature-ideation auto-enhance work (PR #448) introduced a new attack surface where Discussion content is processed by an agent.
Technical Opportunity
The existing
agent-shield.ymlprovides a runtime detection foundation.ci-standards.mdalready mandates action pinning and permission scoping. A unified threat model would connect these existing controls with new defenses:Assessment
Adversarial Review
Strongest objection: The org already has
agent-shield.ymland existing ideas #269 (Agent Shield v2) and #367 (Agent Input Sanitization). Is this just duplication?Rebuttal: Agent-shield is a runtime detection tool — it detects attacks after they happen. Ideas #269 and #367 are point solutions for specific attack vectors. This playbook is the strategic layer above them: a threat model that maps the org's actual attack surfaces to controls, identifies gaps, and prioritizes hardening work. The "Comment and Control" disclosure, Microsoft CVE, and NSA MCP guidance provide new urgency and structure that didn't exist when those ideas were proposed in May 2026. The playbook references agent-shield as one control among many, giving it architectural context.
Suggested Next Step
Draft a
standards/agent-ci-threat-model.mdcovering:Beta Was this translation helpful? Give feedback.
All reactions