Welcome to the Reinforcement Learning (RL) course repository!
This repository serves as the central hub for our multi-agent box-pushing RL environment course. Here you will find the course environment, exercises, and assignment instructions.
Before you can run the simulations, you need to install the required Python dependencies. The environment relies on pygame for graphics, minigrid / pettingzoo for RL interfaces, and unified-planning with fast-downward for solving logical PDDL puzzles.
Since modern operating systems (like macOS) protect the system Python environment, you must create a Virtual Environment before installing the packages:
# 1. Create a virtual environment named 'venv'
python3 -m venv venv
# 2. Activate the virtual environment
source venv/bin/activate
# 3. Install the required dependencies safely
pip install -r requirements.txtWe have included a full end-to-end visualizer that automatically builds a PDDL domain out of a 2D grid, sends it to the Fast Downward AI planner, and then plays the optimal solution visually on your screen.
To run the large multi-agent simulation where two agents cooperate to push a Big Box:
python3 visualize_plan.pyThis script will:
- Load a hardcoded 8x8 ASCII map.
- Generate
domain.pddlandproblem.pddlin thepddl/folder. - Call the classical planner to find the shortest list of actions.
- Launch a
pygamewindow and execute the actions step-by-step.
Here is an overview of the core files in this repository and what they do:
environment/multi_agent_env.pyThe heart of the simulation! This defines theMultiAgentBoxPushEnvclass, inheriting from PettingZoo'sParallelEnv. It handles the core physics: small box pushes, two-agent joint Big Box pushes, grid overlaps, and generating visual frames for the agents.environment/box_push_env.pyA simpler, single-agent Gym environment (used for basic training and earlier exercises before moving to multi-agent).environment/objects.pyDefines the visual rendering rules for our custom grid objects (AgentObj,SmallBox,BigBox) using standard PyGame polygon rendering metrics.environment/wrappers.pyContains advanced RL wrappers to increase difficulty:StochasticActionWrapper: Adds a chance for agent actions to fail.NoisyObservationWrapper: Adds visual static/noise to the agent's observation matrix.
environment/pddl_extractor.pyActs as a bridge between the Python grid and classical planning. It parses the live environment state and writes valid mathematically-constraineddomainandproblemfiles.planner/pddl_solver.pyConnects to theunified-planninglibrary and pipes the generated PDDL files into thefast-downwardengine, returning a parsed list of steps if a valid solution exists.
visualize_plan.pyThe main testing script. It strings together the Environment, the Extractor, and the Solver, and then renders the output visually.
exercises/README.mdContains the homework assignments for the students taking this course (e.g. creating custom maps, integrating wrappers, adding constraints).
IMPORTANT: When working on your assignments, you must create a new branch for each exercise. Your branch name must follow this format:
student-{firstname}-{lastname}-{exercise}
For example:
student-yossi-cohen-ex1student-sarah-levi-ex2
Please ensure you adhere to this naming convention, as it will be used for grading and tracking your progress.
Submission consists of two parts — both are required for a complete grade.
When you open a Pull Request, your branch must include all of the following files:
| File | Description |
|---|---|
llm_pipeline.py (or similar name) |
Your pipeline script that queries the LLM, generates the PDDL files, and runs the planner |
pddl/domain.pddl |
The generated PDDL domain file |
pddl/problem.pddl |
The generated PDDL problem file |
pddl_to_map.py (or similar name) |
A script that parses your domain.pddl / problem.pddl and translates them back into an ASCII map recognized by the visualizer |
planner_output.txt |
The full terminal log from running the planner (Fast Downward output) |
How to capture the terminal log:
python3 visualize_plan.py 2>&1 | tee planner_output.txtThis prints to the terminal and saves everything to
planner_output.txtsimultaneously.
In addition to the Pull Request, you will present your work live in front of the course instructor.
During the demo you are expected to:
- Run your full pipeline end-to-end from the terminal.
- Show the planner finding a valid plan.
- Run the visual simulator and demonstrate the agents reaching the goal state on your map.
- Explain your prompting strategy — how you described the world to the LLM and what design choices you made.
No submission is considered complete without the live demo.