Agentic Security

A step-by-step guide to securing AI agents against prompt injection.

🚧 Work in progress, and contributions are welcome. This is a living learning guide, not a finished reference. It is scoped deliberately: prompt injection, reasoned from first principles, not the whole OWASP AI surface. I am publishing it in pieces and walking through the reasoning essay by essay on luisalima.com. Found a gap, a better defense, or a broken assumption? Open an issue or a PR.

AI agents are vulnerable to prompt injection attacks. This is more concerning since they can take actions and "live" in spaces that can access (and edit) private information. This repository provides practical, runnable examples of defense patterns, from simple detection to secure multi-agent architectures.

Start here: Principles — The mental model for agentic security, before you touch any code.

The Problem

Your AI agent is vulnerable if it has the Lethal Trifecta (coined by Simon Willison):

Access to Private Data — Can read your emails, files, credentials, PII
Exposure to Untrusted Content — Processes text or images controlled by a potential attacker (emails, documents, web, RAG)
Ability to Exfiltrate — Can externally communicate in ways that could steal your data (send emails, API calls, outbound network)

Unlike traditional injection attacks (SQL injection, XSS), there's no equivalent to parameterized queries for LLMs. Instructions and data flow through the same channel.

Threat Model

Your threat model should be simple: the agent can go rogue. Ask yourself: if this agent is fully compromised right now, what's the worst that can happen?

Blast Radius	Example	Acceptable?
Agent tries to send 1 email to wrong person	Scoped token, approval required	Usually yes
Agent exfiltrates all contacts	Full contact access, outbound network	No
Agent pushes malicious code to prod	Git credentials, CI/CD access	Never
Agent deletes database	DB write credentials in env	Never!!!

If the blast radius is unacceptable, you need more isolation..

→ Full threat modeling guide: docs/reference/threat_model.md

Defense Levels

Level	Approach	What Changes	Security Effect
1. Detection	Filter malicious inputs	Add a library	Catches common attack patterns
2. Prompt Engineering	Harden the prompt	Change prompts	Marginal on its own
3. Isolation (Infra)	Containers, network, permissions	Wrap the agent	Reduces blast radius
4. Secure Architecture (Software)	Dual LLM, dry-run, typed extraction	Redesign system	Removes dangerous data flows
5. Defense in Depth	Layer everything	Full investment	Raises attacker cost and limits failures

How to Read This Repo

This repo mixes vulnerable baselines, teaching examples, and patterns you can actually build around. Use the labels below as a guide:

Teaching example — Useful for understanding the attack or defense shape. Not enough on its own for production.
Defense-in-depth layer — Worth adding as a supporting control, but not a primary trust boundary.
Production-hardenable component — Reasonable building block for real systems when paired with deterministic checks, least privilege, and monitoring.
High-risk reference architecture — A stronger starting point for high-stakes systems, but still requires environment-specific hardening.

In this repo, detection and most prompt-engineering patterns are teaching examples or defense-in-depth layers; dual LLM, typed extraction, output validation, tool/MCP validation, and memory isolation are the closest to production-hardenable components.

Quick Start

Prerequisites

# Clone and setup
git clone https://github.com/luisalima/agentic-security.git
cd agentic-security
uv sync

# For local LLM testing (optional)
# Install Ollama: https://ollama.com
ollama pull llama3.1:8b

Run a Notebook

# See the vulnerability (baseline)
uv run marimo edit notebooks/0_vulnerabilities/1_baseline.py

# Try a defense pattern
uv run marimo edit notebooks/4_secure_architecture_software/1_dual_llm.py

Read the Guide

Don't want to run code? Read the guide at luisalima.com.

Repository Structure

agentic-security/
├── notebooks/                   # Interactive Marimo notebooks
│   ├── 0_vulnerabilities/        # The vulnerability
│   ├── 1_detection/             # YARA, vectors, ML, LLM-as-judge, canaries
│   ├── 2_prompt_engineering/    # Delimiters, hardening
│   ├── 3_isolation_infra_level/  # Containers, network, permissions
│   ├── 4_secure_architecture_software/  # Dual LLM, typed extraction, dry-run
│   ├── 5_defense_in_depth/      # Layered defense
│   └── 6_integration/           # LangChain, framework patterns
├── docs/                        # MkDocs site
│   ├── guide/                   # Hand-written guide pages
│   └── reference/               # Tools, attack taxonomy, threat model, etc.
├── diagrams/                    # Excalidraw visuals
└── src/agentic_security/        # Supporting code

Learning Path

Read the full guide at luisalima.com, or run the interactive notebooks locally with uv run marimo edit.

Level	Guide	Notebooks
0. Vulnerabilities	The Problem	`notebooks/0_vulnerabilities/`
1. Detection	Detection	`notebooks/1_detection/`
1b. Observability	Observability & Audit Trails	—
2. Prompt Engineering	Prompt Engineering	`notebooks/2_prompt_engineering/`
3. Isolation (Infra)	Isolation	`notebooks/3_isolation_infra_level/`
4. Secure Architecture	Secure Architecture	`notebooks/4_secure_architecture_software/`
5. Defense in Depth	Defense in Depth	`notebooks/5_defense_in_depth/`
6. Integration	Framework Integration	`notebooks/6_integration/`
7. Pre-Packaged Agents	Securing Pre-Packaged Agents	—
8. Enterprise Zero Trust	Enterprise Zero Trust	—
9. MCP Security	MCP Security	`notebooks/4_secure_architecture_software/6_mcp_security.py`
10. Memory & Context	Memory & Context Security	`notebooks/4_secure_architecture_software/7_memory_security.py`

Key Insights

What Works

Architectural separation — Keep raw untrusted content out of privileged prompts
Typed extraction — Tight schemas sharply limit payload capacity
Output validation — Check what the LLM tries to do, not just what it receives
Dry-run evaluation — Generate plans, evaluate them, then execute

What Doesn't Work

"Just add another LLM to check" — Same vulnerability class
Delimiters alone — Easily bypassed with "ignore the delimiters"
Waiting for smarter models — This is architectural, not an intelligence problem
Blocklist keywords — Trivially rephrased

Tools Landscape

See docs/reference/tools.md for detailed comparison. Quick picks:

Need	Tool
Quick start, open source	LLM Guard
Red teaming (comprehensive)	DeepTeam
Red teaming (CI/CD native)	Promptfoo
Enterprise, managed	Lakera Guard (Check Point)
MCP server security	Snyk Agent-Scan (formerly MCP-Scan)
Output validation	Guardrails AI

Contributing

This is a living learning resource, scoped on purpose: prompt injection — the through-line of agent security — and the architectural defenses around it, not the entire OWASP agentic top-10. It's a work in progress, and contributions are welcome:

New attack patterns and defenses
Framework integration examples (LangChain, LlamaIndex, etc.)
Improvements to existing notebooks
Translations

References

License

MIT — Use freely, please link back if this helped you!

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.github		.github
benchmark		benchmark
data		data
diagrams		diagrams
docs		docs
notebooks		notebooks
src/agentic_security		src/agentic_security
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Security

The Problem

Threat Model

Defense Levels

How to Read This Repo

Quick Start

Prerequisites

Run a Notebook

Read the Guide

Repository Structure

Learning Path

Key Insights

What Works

What Doesn't Work

Tools Landscape

Contributing

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentic Security

The Problem

Threat Model

Defense Levels

How to Read This Repo

Quick Start

Prerequisites

Run a Notebook

Read the Guide

Repository Structure

Learning Path

Key Insights

What Works

What Doesn't Work

Tools Landscape

Contributing

References

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages