How does this work again?

Multiagent RL is a system to simulate and evaluate reinforcement learning algorithms in environments with multiple agents acting concurrently. As any reinforcement learning problem, the agents must learn from their own experiences to select the appropriate actions or behaviors that will maximize the future rewards.

Currently, Multiagent RL contains two modules out-of-the-box in the experiments/ directory:

Windy world: A toy scenario where a single agent must learn to reach a goal position from a world containing a lake, which yields huge punishments when the agents steps into it, and windy, which displaces the agent randomly by one position.
Pac-Man: Modified version of the Pac-Man game, developed at UC Berkeley, where either the Pac-Man and the ghosts can benefit from reinforcement learning algorithms. The Pac-Man should learn how to eat as many food as possible while fleeing from ghosts, whereas ghosts should learn to cooperate to capture the Pac-Man quickly.

In order to run the simulations, you only need to launch the simulation.py script with a module name. For instance:

$ python simulation.py windy

Each module has its own arguments, which can be accessed with the -h flag:

$ python simulation.py windy -h

When starting the simulation, the script launches two threads. One executes the controller, responsible by launching agents and routing messages, and the other runs the adapter, code in the experiment's adapter.py module that connects Multiagent RL with the module simulator.

After running a simulation, the results will be saved into files if you configured to do so. Every module has its own plot.py script to parse simulation results and show graphs that enable us to interpret the agents performances. All it requires is the module name and a result file:

$ python plot.py windy windy_results.res

You can also find out the module's plot specific arguments using the -h flag:

$ python plot.py windy -h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does this work again?

Clone this wiki locally