💡 Agent Prompt Injection Threat Model and CI Hardening Playbook #485

2026-06-19T11:25:44Z

github-actions[bot]
Bot Jun 19, 2026

Summary

Create a comprehensive threat model and hardening playbook for the org's agentic CI workflows, addressing the "Comment and Control" attack class (March 2026) where PR titles and issue bodies are weaponized to hijack AI agents. The playbook codifies the "Rule of Two" principle: never let an agent simultaneously process untrusted input and hold state-changing tool access.

Market Signal

The agent security landscape has shifted dramatically in Q1-Q2 2026:

Prompt injection surged 340% YoY per OWASP's 2026 LLM Security Report
"Comment and Control" disclosure (March 2026): A single PR title simultaneously hijacked Claude Code, Gemini CLI, and GitHub Copilot, exfiltrating repository secrets. Anthropic patched with default-deny filesystem reads and system-prompt hardening
Microsoft disclosed a Claude Code GitHub Action vulnerability (CVSS 9.6) in May 2026 — the agent could be tricked into reading sensitive CI runner files
NSA published "MCP: Security Design Considerations for AI-Driven Automation" (17 pages, May 20, 2026), recommending filtering proxies, sandboxing, message integrity checks, and output validation
Industry consensus: Prompt injection cannot be fully solved within current LLM architectures — layered defense is the only viable approach
The "Rule of Two" governance pattern emerged: never let an agent simultaneously process untrusted input and hold state-changing tool access

Sources: CSA Research Note, Comment and Control disclosure, NSA MCP Guidance (PDF)

User Signal

The org runs five agent-driven workflows that process untrusted inputs:

Workflow	Untrusted Input Sources	Privileges Held
`dev-lead.yml`	Issue titles, labels, bodies	Code write, issue write
`feature-ideation.yml`	Discussion titles, bodies	Discussions write
`compliance-audit.yml`	API responses, file contents	Issues write, PRs
`claude-code-reusable.yml`	PR diffs, review comments	Code write, PR write
`agent-shield.yml`	PR content (deliberately)	Read-only (good)

Existing ideas #269 (Agent Shield v2) and #367 (Agent Input Sanitization) address individual pieces but no unified threat model ties them together. The feature-ideation auto-enhance work (PR #448) introduced a new attack surface where Discussion content is processed by an agent.

Technical Opportunity

The existing agent-shield.yml provides a runtime detection foundation. ci-standards.md already mandates action pinning and permission scoping. A unified threat model would connect these existing controls with new defenses:

Input sanitization at workflow boundaries (strip control characters, limit length, detect injection patterns)
Privilege separation between input processing steps and mutation steps
Output validation before agent-generated content is posted to GitHub
The Rule of Two architecture: split agent workflows into a read-only analysis phase and a separate, validated mutation phase

Assessment

Dimension	Score	Rationale
Feasibility	high	Primarily documentation + architectural patterns, not new tooling
Impact	high	Addresses #1 security risk for agentic CI (OWASP, NSA, CSA all flag it)
Urgency	high	Active exploitation in the wild; org runs 5 agent workflows with untrusted inputs

Adversarial Review

Strongest objection: The org already has agent-shield.yml and existing ideas #269 (Agent Shield v2) and #367 (Agent Input Sanitization). Is this just duplication?
Rebuttal: Agent-shield is a runtime detection tool — it detects attacks after they happen. Ideas #269 and #367 are point solutions for specific attack vectors. This playbook is the strategic layer above them: a threat model that maps the org's actual attack surfaces to controls, identifies gaps, and prioritizes hardening work. The "Comment and Control" disclosure, Microsoft CVE, and NSA MCP guidance provide new urgency and structure that didn't exist when those ideas were proposed in May 2026. The playbook references agent-shield as one control among many, giving it architectural context.

Suggested Next Step

Draft a standards/agent-ci-threat-model.md covering:

Attack surface inventory (which workflows process untrusted input, with what privileges)
The Rule of Two principle and privilege separation architecture
Input sanitization patterns for each untrusted input type
Output validation requirements for agent-generated content
A per-workflow hardening checklist with current status and gaps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

💡 Agent Prompt Injection Threat Model and CI Hardening Playbook #485

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

💡 Agent Prompt Injection Threat Model and CI Hardening Playbook #485

Uh oh!

github-actions[bot] Bot Jun 19, 2026

Summary

Market Signal

User Signal

Technical Opportunity

Assessment

Adversarial Review

Suggested Next Step

Replies: 0 comments

github-actions[bot]
Bot Jun 19, 2026