Optimal Attack and Defense for Reinforcement Learning

The companion code to the AAAI paper Optimal Attacks and Defense for Reinforcement Learning. Provides a framework for computing and simulating optimal adversarial attacks and optimal defense policies in Reinforcement Learning environments. In particular, our framework permits all possible attack surfaces: State, Perception, Action, and Reward. We visualize the impacts of these attacks, as well as the effectivenss of our robust defense policies, on a simple mini-grid environment.

🌟 Features

Multi-Surface Attack Framework: An efficiently computable optimal attack framework that includes each online attack surface (test-time attack formulation).
Optimal Defense: A game-theoretic defense mechanism where the agent learns a policy that is robust to the worst-case attacks (Minimax formulation).
Custom GridWorld Environment: A flexible maze environment supporting custom layouts, hazards (lava), and goals to visualize our attacks and defenses.

📂 Project Structure

RL-Attack-Defense/
├── data/                   # Generated mazes and constraint files
├── results/                # Output plots and logs
├── scripts/
│   ├── generate_data.py    # Generates mazes and constraint masks
│   └── run_experiments.py  # Visualizes the attacks and defense
├── src/
│   ├── envs/
│   │   └── maze.py         # GridWorld environment
│   ├── models.py           # Data structures (MDP, Game)
│   ├── solvers.py          # MDP & Game solvers
│   ├── attack.py           # Optimal Attack Computation
│   ├── defense.py          # Optimal Defense Computation
│   └── simulation.py       # Attack Interaction Simulation
└── README.md

🚀 Quick Start

1. Installation

Clone the repository and install dependencies.

Bash
git clone https://github.com/jermcmahan/RL-Attack-Defense.git
cd RL-Attack-Defense
pip install -r requirements.txt

2. Data Generation

First, generate the maze layout and the constraint masks that define the "Danger Zones" (where the attacker has power).

Bash
# Generate the standard paper experiment data (Maze + Constraints)
python scripts/generate_data.py

# OR Generate a random maze
python scripts/generate_data.py --random --n 15 --p 0.2 --name my_random_maze

3. Run Experiments

Run the end-to-end experiment pipeline. This calculates the optimal baseline, computes the optimal strategy for all attacks, and solves for the robust action-defense policy.

Bash
python scripts/run_experiments.py

4. Visualizing Results

The results/ folder will contain visualizations of the agent's trajectories under attack:

baseline.png: The optimal path with no interference.
state_attack.png: The path taken when the agent is teleported.
perceived_state_attack.png: The path taken when the agent is hallucinating.
action_attack.png: The path taken when actions are overridden.
robust_defense.png: The path of the robust agent surviving the action attack.

📊 Reproducibility

To reproduce the exact charts found in the report:

Run the full generation pipeline
Run the experiment suite

📜 Citation

If you use this code for your research, please cite:

Jeremy McMahan. (2025). Optimal Attack and Defense for Reinforcement Learning. 
GitHub Repository: https://github.com/jermcmahan/RL-Attack-Defense

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimal Attack and Defense for Reinforcement Learning

🌟 Features

📂 Project Structure

🚀 Quick Start

1. Installation

2. Data Generation

3. Run Experiments

4. Visualizing Results

📊 Reproducibility

📜 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
results		results
scripts		scripts
src		src
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Optimal Attack and Defense for Reinforcement Learning

🌟 Features

📂 Project Structure

🚀 Quick Start

1. Installation

2. Data Generation

3. Run Experiments

4. Visualizing Results

📊 Reproducibility

📜 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages