Skip to content

luisalima/agentic-security

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

139 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agentic Security

A step-by-step guide to securing AI agents against prompt injection.

🚧 Work in progress, and contributions are welcome. This is a living learning guide, not a finished reference. It is scoped deliberately: prompt injection, reasoned from first principles, not the whole OWASP AI surface. I am publishing it in pieces and walking through the reasoning essay by essay on luisalima.com. Found a gap, a better defense, or a broken assumption? Open an issue or a PR.

AI agents are vulnerable to prompt injection attacks. This is more concerning since they can take actions and "live" in spaces that can access (and edit) private information. This repository provides practical, runnable examples of defense patterns, from simple detection to secure multi-agent architectures.

Start here: Principles β€” The mental model for agentic security, before you touch any code.

Mental model for agentic security


The Problem

Your AI agent is vulnerable if it has the Lethal Trifecta (coined by Simon Willison):

  1. Access to Private Data β€” Can read your emails, files, credentials, PII
  2. Exposure to Untrusted Content β€” Processes text or images controlled by a potential attacker (emails, documents, web, RAG)
  3. Ability to Exfiltrate β€” Can externally communicate in ways that could steal your data (send emails, API calls, outbound network)

Unlike traditional injection attacks (SQL injection, XSS), there's no equivalent to parameterized queries for LLMs. Instructions and data flow through the same channel.


Threat Model

Your threat model should be simple: the agent can go rogue. Ask yourself: if this agent is fully compromised right now, what's the worst that can happen?

Blast Radius Example Acceptable?
Agent tries to send 1 email to wrong person Scoped token, approval required Usually yes
Agent exfiltrates all contacts Full contact access, outbound network No
Agent pushes malicious code to prod Git credentials, CI/CD access Never
Agent deletes database DB write credentials in env Never!!!

If the blast radius is unacceptable, you need more isolation..

β†’ Full threat modeling guide: docs/reference/threat_model.md


Defense Levels

Level Approach What Changes Security Effect
1. Detection Filter malicious inputs Add a library Catches common attack patterns
2. Prompt Engineering Harden the prompt Change prompts Marginal on its own
3. Isolation (Infra) Containers, network, permissions Wrap the agent Reduces blast radius
4. Secure Architecture (Software) Dual LLM, dry-run, typed extraction Redesign system Removes dangerous data flows
5. Defense in Depth Layer everything Full investment Raises attacker cost and limits failures

How to Read This Repo

This repo mixes vulnerable baselines, teaching examples, and patterns you can actually build around. Use the labels below as a guide:

  • Teaching example β€” Useful for understanding the attack or defense shape. Not enough on its own for production.
  • Defense-in-depth layer β€” Worth adding as a supporting control, but not a primary trust boundary.
  • Production-hardenable component β€” Reasonable building block for real systems when paired with deterministic checks, least privilege, and monitoring.
  • High-risk reference architecture β€” A stronger starting point for high-stakes systems, but still requires environment-specific hardening.

In this repo, detection and most prompt-engineering patterns are teaching examples or defense-in-depth layers; dual LLM, typed extraction, output validation, tool/MCP validation, and memory isolation are the closest to production-hardenable components.


Quick Start

Prerequisites

# Clone and setup
git clone https://github.com/luisalima/agentic-security.git
cd agentic-security
uv sync

# For local LLM testing (optional)
# Install Ollama: https://ollama.com
ollama pull llama3.1:8b

Run a Notebook

# See the vulnerability (baseline)
uv run marimo edit notebooks/0_vulnerabilities/1_baseline.py

# Try a defense pattern
uv run marimo edit notebooks/4_secure_architecture_software/1_dual_llm.py

Read the Guide

Don't want to run code? Read the guide at luisalima.com.


Repository Structure

agentic-security/
β”œβ”€β”€ notebooks/                   # Interactive Marimo notebooks
β”‚   β”œβ”€β”€ 0_vulnerabilities/        # The vulnerability
β”‚   β”œβ”€β”€ 1_detection/             # YARA, vectors, ML, LLM-as-judge, canaries
β”‚   β”œβ”€β”€ 2_prompt_engineering/    # Delimiters, hardening
β”‚   β”œβ”€β”€ 3_isolation_infra_level/  # Containers, network, permissions
β”‚   β”œβ”€β”€ 4_secure_architecture_software/  # Dual LLM, typed extraction, dry-run
β”‚   β”œβ”€β”€ 5_defense_in_depth/      # Layered defense
β”‚   └── 6_integration/           # LangChain, framework patterns
β”œβ”€β”€ docs/                        # MkDocs site
β”‚   β”œβ”€β”€ guide/                   # Hand-written guide pages
β”‚   └── reference/               # Tools, attack taxonomy, threat model, etc.
β”œβ”€β”€ diagrams/                    # Excalidraw visuals
└── src/agentic_security/        # Supporting code

Learning Path

Read the full guide at luisalima.com, or run the interactive notebooks locally with uv run marimo edit.

Level Guide Notebooks
0. Vulnerabilities The Problem notebooks/0_vulnerabilities/
1. Detection Detection notebooks/1_detection/
1b. Observability Observability & Audit Trails β€”
2. Prompt Engineering Prompt Engineering notebooks/2_prompt_engineering/
3. Isolation (Infra) Isolation notebooks/3_isolation_infra_level/
4. Secure Architecture Secure Architecture notebooks/4_secure_architecture_software/
5. Defense in Depth Defense in Depth notebooks/5_defense_in_depth/
6. Integration Framework Integration notebooks/6_integration/
7. Pre-Packaged Agents Securing Pre-Packaged Agents β€”
8. Enterprise Zero Trust Enterprise Zero Trust β€”
9. MCP Security MCP Security notebooks/4_secure_architecture_software/6_mcp_security.py
10. Memory & Context Memory & Context Security notebooks/4_secure_architecture_software/7_memory_security.py

Key Insights

What Works

  • Architectural separation β€” Keep raw untrusted content out of privileged prompts
  • Typed extraction β€” Tight schemas sharply limit payload capacity
  • Output validation β€” Check what the LLM tries to do, not just what it receives
  • Dry-run evaluation β€” Generate plans, evaluate them, then execute

What Doesn't Work

  • "Just add another LLM to check" β€” Same vulnerability class
  • Delimiters alone β€” Easily bypassed with "ignore the delimiters"
  • Waiting for smarter models β€” This is architectural, not an intelligence problem
  • Blocklist keywords β€” Trivially rephrased

Tools Landscape

See docs/reference/tools.md for detailed comparison. Quick picks:

Need Tool
Quick start, open source LLM Guard
Red teaming (comprehensive) DeepTeam
Red teaming (CI/CD native) Promptfoo
Enterprise, managed Lakera Guard (Check Point)
MCP server security Snyk Agent-Scan (formerly MCP-Scan)
Output validation Guardrails AI

Contributing

This is a living learning resource, scoped on purpose: prompt injection β€” the through-line of agent security β€” and the architectural defenses around it, not the entire OWASP agentic top-10. It's a work in progress, and contributions are welcome:

  • New attack patterns and defenses
  • Framework integration examples (LangChain, LlamaIndex, etc.)
  • Improvements to existing notebooks
  • Translations

References


License

MIT β€” Use freely, please link back if this helped you!


About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors