A step-by-step guide to securing AI agents against prompt injection.
π§ Work in progress, and contributions are welcome. This is a living learning guide, not a finished reference. It is scoped deliberately: prompt injection, reasoned from first principles, not the whole OWASP AI surface. I am publishing it in pieces and walking through the reasoning essay by essay on luisalima.com. Found a gap, a better defense, or a broken assumption? Open an issue or a PR.
AI agents are vulnerable to prompt injection attacks. This is more concerning since they can take actions and "live" in spaces that can access (and edit) private information. This repository provides practical, runnable examples of defense patterns, from simple detection to secure multi-agent architectures.
Start here: Principles β The mental model for agentic security, before you touch any code.
Your AI agent is vulnerable if it has the Lethal Trifecta (coined by Simon Willison):
- Access to Private Data β Can read your emails, files, credentials, PII
- Exposure to Untrusted Content β Processes text or images controlled by a potential attacker (emails, documents, web, RAG)
- Ability to Exfiltrate β Can externally communicate in ways that could steal your data (send emails, API calls, outbound network)
Unlike traditional injection attacks (SQL injection, XSS), there's no equivalent to parameterized queries for LLMs. Instructions and data flow through the same channel.
Your threat model should be simple: the agent can go rogue. Ask yourself: if this agent is fully compromised right now, what's the worst that can happen?
| Blast Radius | Example | Acceptable? |
|---|---|---|
| Agent tries to send 1 email to wrong person | Scoped token, approval required | Usually yes |
| Agent exfiltrates all contacts | Full contact access, outbound network | No |
| Agent pushes malicious code to prod | Git credentials, CI/CD access | Never |
| Agent deletes database | DB write credentials in env | Never!!! |
If the blast radius is unacceptable, you need more isolation..
β Full threat modeling guide: docs/reference/threat_model.md
| Level | Approach | What Changes | Security Effect |
|---|---|---|---|
| 1. Detection | Filter malicious inputs | Add a library | Catches common attack patterns |
| 2. Prompt Engineering | Harden the prompt | Change prompts | Marginal on its own |
| 3. Isolation (Infra) | Containers, network, permissions | Wrap the agent | Reduces blast radius |
| 4. Secure Architecture (Software) | Dual LLM, dry-run, typed extraction | Redesign system | Removes dangerous data flows |
| 5. Defense in Depth | Layer everything | Full investment | Raises attacker cost and limits failures |
This repo mixes vulnerable baselines, teaching examples, and patterns you can actually build around. Use the labels below as a guide:
- Teaching example β Useful for understanding the attack or defense shape. Not enough on its own for production.
- Defense-in-depth layer β Worth adding as a supporting control, but not a primary trust boundary.
- Production-hardenable component β Reasonable building block for real systems when paired with deterministic checks, least privilege, and monitoring.
- High-risk reference architecture β A stronger starting point for high-stakes systems, but still requires environment-specific hardening.
In this repo, detection and most prompt-engineering patterns are teaching examples or defense-in-depth layers; dual LLM, typed extraction, output validation, tool/MCP validation, and memory isolation are the closest to production-hardenable components.
# Clone and setup
git clone https://github.com/luisalima/agentic-security.git
cd agentic-security
uv sync
# For local LLM testing (optional)
# Install Ollama: https://ollama.com
ollama pull llama3.1:8b# See the vulnerability (baseline)
uv run marimo edit notebooks/0_vulnerabilities/1_baseline.py
# Try a defense pattern
uv run marimo edit notebooks/4_secure_architecture_software/1_dual_llm.pyDon't want to run code? Read the guide at luisalima.com.
agentic-security/
βββ notebooks/ # Interactive Marimo notebooks
β βββ 0_vulnerabilities/ # The vulnerability
β βββ 1_detection/ # YARA, vectors, ML, LLM-as-judge, canaries
β βββ 2_prompt_engineering/ # Delimiters, hardening
β βββ 3_isolation_infra_level/ # Containers, network, permissions
β βββ 4_secure_architecture_software/ # Dual LLM, typed extraction, dry-run
β βββ 5_defense_in_depth/ # Layered defense
β βββ 6_integration/ # LangChain, framework patterns
βββ docs/ # MkDocs site
β βββ guide/ # Hand-written guide pages
β βββ reference/ # Tools, attack taxonomy, threat model, etc.
βββ diagrams/ # Excalidraw visuals
βββ src/agentic_security/ # Supporting code
Read the full guide at luisalima.com, or run the interactive notebooks locally with uv run marimo edit.
| Level | Guide | Notebooks |
|---|---|---|
| 0. Vulnerabilities | The Problem | notebooks/0_vulnerabilities/ |
| 1. Detection | Detection | notebooks/1_detection/ |
| 1b. Observability | Observability & Audit Trails | β |
| 2. Prompt Engineering | Prompt Engineering | notebooks/2_prompt_engineering/ |
| 3. Isolation (Infra) | Isolation | notebooks/3_isolation_infra_level/ |
| 4. Secure Architecture | Secure Architecture | notebooks/4_secure_architecture_software/ |
| 5. Defense in Depth | Defense in Depth | notebooks/5_defense_in_depth/ |
| 6. Integration | Framework Integration | notebooks/6_integration/ |
| 7. Pre-Packaged Agents | Securing Pre-Packaged Agents | β |
| 8. Enterprise Zero Trust | Enterprise Zero Trust | β |
| 9. MCP Security | MCP Security | notebooks/4_secure_architecture_software/6_mcp_security.py |
| 10. Memory & Context | Memory & Context Security | notebooks/4_secure_architecture_software/7_memory_security.py |
- Architectural separation β Keep raw untrusted content out of privileged prompts
- Typed extraction β Tight schemas sharply limit payload capacity
- Output validation β Check what the LLM tries to do, not just what it receives
- Dry-run evaluation β Generate plans, evaluate them, then execute
- "Just add another LLM to check" β Same vulnerability class
- Delimiters alone β Easily bypassed with "ignore the delimiters"
- Waiting for smarter models β This is architectural, not an intelligence problem
- Blocklist keywords β Trivially rephrased
See docs/reference/tools.md for detailed comparison. Quick picks:
| Need | Tool |
|---|---|
| Quick start, open source | LLM Guard |
| Red teaming (comprehensive) | DeepTeam |
| Red teaming (CI/CD native) | Promptfoo |
| Enterprise, managed | Lakera Guard (Check Point) |
| MCP server security | Snyk Agent-Scan (formerly MCP-Scan) |
| Output validation | Guardrails AI |
This is a living learning resource, scoped on purpose: prompt injection β the through-line of agent security β and the architectural defenses around it, not the entire OWASP agentic top-10. It's a work in progress, and contributions are welcome:
- New attack patterns and defenses
- Framework integration examples (LangChain, LlamaIndex, etc.)
- Improvements to existing notebooks
- Translations
- OWASP Top 10 for Agentic Applications (December 2025)
- OWASP GenAI Data Security Risks & Mitigations (2026)
- MITRE ATLAS β Adversarial Threat Landscape for AI Systems
- NIST AI 600-1 β GenAI Risk Management Profile
- Simon Willison's Prompt Injection Series
- Google DeepMind CaMeL Paper
- Microsoft Spotlighting Research
- NCSC β Prompt Injection Is Not SQL Injection (Dec 2025)
- Zhan et al. β Adaptive Attacks Break Defenses Against Indirect Prompt Injection (NAACL 2025)
MIT β Use freely, please link back if this helped you!