Skip to content

floringogianu/atari-agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trained Atari Agents

Download models | Getting started | What's included | Acknowledgements

A source of 2️⃣5️⃣,5️⃣0️⃣0️⃣ checkpoints (and growing) of curated DQN, M-DQN and C51 agents trained on modern and classic protocols, matching or besting the performance reported in the literature.

We release these models hoping it will help to advance the research in:

  • Reproducing DRL results
  • Imitation Learning
  • Batch/Offline RL
  • Multi-task learning

What's included

Checkpoints for DQN, M-DQN and C51 agents across two or three training seeds, on modern or classic protocols.

How many checkpoints?

An agent trained on 200M frames usually produces 200 checkpoints times the number of training seeds. In order not to make the download size overly large we only include 51 checkpoints per training run. These are sampled geometrically, with denser checkpoints towards the end of the training. This results in the last 20 checkpoints of the full 200 (last 10% of the training run) and then sparser checkpoints towards the beginning of the run, with only 10 out of 51 from the first half. It looks a bit like this:

checkpoint sampling

Note it's not mandatory the best performing checkpoint is included since on some combinations of algorithms and agents the peak performance occurs earlier in training. However this sampling should characterize fairly well the performance of an agent most of the time.

❗✋ If there is demand we can provide the full list of checkpoints for a given agent.

Agents have been trained using PyTorch and the models are stored as compressed state_dict pickle files. Since the networks used on ALE are fairly simple these could easily be converted for use in other deep learning frameworks.

A word on training and evaluation protocols

There are two common training and evaluation protocols encountered in the literature. We will call them classic and modern across this project:

  • classic: it originates from (Mnih, 2015)1 Nature paper and it mostly appears in DeepMind papers.
  • modern: it originates from (Machado, 2017)2 and a variation of it was adopted by Dopamine3. Since then it started to show more and more often.

The main two differences between the two are the way stochasticity is induced in the environment and how the loss of a life is treated.

We mention again that while we use Dopamine's protocol and sometimes hyperparameters, our agents are trained in PyTorch.

Available agents

Check the table below for a summary.

Algorithm Protocol Games Seeds Observations
DQN modern 60 3 DQN agent using the settings from dopamine. It's optimised with Adam and uses MSE instead of Huber loss. A surprisingly strong agent on this protocol.
M-DQN modern 60 3 DQN above but using the Munchausen trick4. Even stronger performance.
C51 classic 28/57 3 Closely follows the original paper5.
DQN Adam classic 28/57 2 A DQN agent trained according to the Rainbow paper6. The exact settings and plots can be found in our paper7.

Right off-the bat you can notice that on the classic protocol there are only 28 games out of the usual 57. We trained the two agents on this protocol over one year ago using the now deprecated atari-py project which officially provided the ALE Python bindings in OpenAI's Gym. Unfortunately the package came with a large number of ROMs that are not supported by the current, official, ale-py library. The agents trained on the modern protocol (as well as the code we provide for visualising agents) all use the new ale-py. Therefore we decided against providing support for the older library event if it meant dropping half of the trained models. A great resource for reading about this issue is Jesse's Farebrother ALE v0.7 release notes. Importantly, we found out about the issue while checking the performance of the trained models on the new ale-py back-end and we provide plots showing the remaining 28 agents perform as expected (C51_classic, DQN_classic).

How to use it

Installation

⏬ Download ⏬ the saved models.

Install the conda environment using conda env create -f environment.yml. If this fails for some reason the main requirements are:

  • pytorch 1.11.0
  • ale-py 0.7.4
  • opencv 4.5.2

An easy way to install ale-py, download and install the ROMs is to just install gym:

pip install 'gym [atari,accept-rom-license]'

If for some reason the SDL support is not just right, you might have better luck cloning ALE and installing from source using pip install .. Just make sure then to use register the ROM files again:

ale-import-roms path/to/roms

See this excellent post about what's new in ALE 0.7 and how to install ROMs.

Play using a saved model

Just do:

python play.py models/AGENT/GAME/SEED/model_STEP.gz

Passing the -r/--record flag will create a ./movies folder and save the screens and audio.

We also support game modes and difficulty levels introduced by Machado, 20172. You can use -v to activate an interactive mode for selecting game modes and difficulty levels:

python play.py models/AGENT/GAME/SEED/model_STEP.gz -v

Folder structure

There are some conventions encoded in the folder structure used by play.py to configure the model and the environment using the name of the directory containing the checkpoints. For example DQN_modern will configure a DQN network and evaluate it on the modern protocol while C51_classic will configure a C51-style network and evaluate it on the classic protocol.

You should end with something like this after downloading all the agents:

.
├── ale_env.py
├── human_play.py
├── play.py
├── README.md
├── models
│   ├── C51_classic
│       └── ...
│   ├── DQN_classic_adam
│       └── ...
│   └── DQN_modern
│       ├── AirRaid
│       │   ├── 0
│       │   ├── 1
│       │   └── 2
│      ...
│       └── Zaxxon
│           ├── 0
│           ├── 1
│           └── 2

Just how well trained are these agents?

Our PyTorch implementation of DQN trained using Adam on the modern protocol compares favourable to the exact same agent trained using Dopamine. The plots below have been generated using the tools provided by rliable.

dopamine_vs_pytorch

Some more comparisons can be found here.

A detailed discussion about the performance of DQN + Adam and C51 trained on the classic protocol can be found in our paper7, where we used these checkpoints as baselines.

Acknowledgements

  • Bitdefender, for providing all the material resources that made possible this project and my colleagues in Bitdefender's Machine Learning & Crypto Research Unit for all their support.
  • Kai Arulkumaran, for providing the atari-py/ale-py wrapper I used extensively in my research and who helped me many times figuring out some of the more arcane details of the various training and evaluation protocols in DRL.
  • Dopamine baselines and configs, which I used extensively for comparing the performance of our implementations and for figuring various hyperparameters.

Related projects

Giving credit

If you use these checkpoints in your research and published work, please consider citing this project:

@misc{gogianu2022agents,
  title  = {Atari Agents},
  author = {Florin Gogianu and Tudor Berariu and Lucian Bușoniu and Elena Burceanu},
  year   = {2022},
  url    = {https://github.com/floringogianu/atari-agents},
}

Footnotes

  1. Mnih, et al. 2015, _Human-level control through deep reinforcement learning

  2. Machado, et al. 2017, Revisiting the Arcade Learning Environment... 2

  3. Castro, et al. 2018, Dopamine: A Research Framework for Deep RL

  4. Vieillard, et al. 2020, Munchausen Reinforcement Learning

  5. Bellemare, et al. 2017, A distributional perspective...

  6. Hessel, et al. 2017, Combining Improvements in Deep RL

  7. Gogianu, et al. 2021, Spectral Normalisation... 2

About

Code and links for over 25,000 trained Atari agents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages