This document gives examples and pointers on how to experiment with and extend Dopamine.
You can find the documentation for each module in our codebase in our API documentation.
Dopamine is organized as follows:
agentscontains agent implementations.
ataricontains Atari-specific code, including code to run experiments and preprocessing code.
commoncontains additional helper functionality, including logging and checkpointing.
replay_memorycontains the replay memory schemes used in Dopamine.
colabcontains code used to inspect the results of experiments, as well as example colab notebooks.
testscontains all our test files.
The whole of Dopamine is easily configured using the gin configuration framework.
We provide a number of configuration files for each of the agents. The main configuration file for each agent corresponds to an "apples to apples" comparison, where hyperparameters have been selected to give a standardized performance comparison between agents. These are
More details on the exact choices behind these parameters are given in our baselines page.
We also provide configuration files corresponding to settings previously used in the literature. These are
dopamine/agents/dqn/configs/dqn_nature.gin(Mnih et al., 2015)
dopamine/agents/dqn/configs/dqn_icml.gin(Bellemare et al., 2017)
dopamine/agents/rainbow/configs/c51_icml.gin(Bellemare et al., 2017)
dopamine/agents/implicit_quantile/configs/implicit_quantile_icml.gin(Dabney et al., 2018)
All of these use the deterministic version of the Arcade Learning Environment (ALE), and slightly different hyperparameters.
Checkpointing and logging
Dopamine provides basic functionality for performing experiments. This
functionality can be broken down into two main components: checkpointing and
logging. Both components depend on the command-line parameter
which informs Dopamine of where it should store experimental data.
By default, Dopamine will save an experiment checkpoint every iteration: one
training and one evaluation phase, following a standard set by Mnih et al.
Checkpoints are saved in the
checkpoints subdirectory under
base_dir. At a
high-level, the following are checkpointed:
- Experiment statistics (number of iterations performed, learning curves,
etc.). This happens in
dopamine/atari/run_experiment.py, in the method
- Agent variables, including the tensorflow graph. This happens in
dopamine/agents/dqn/dqn_agent.py, in the methods
- Replay buffer data. Atari 2600 replay buffers have a large memory footprint.
As a result, Dopamine uses additional code to keep memory usage low. The
relevant methods are found in
dopamine/agents/replay_memory/circular_replay_buffer.py, and are called
If you're curious, the checkpointing code itself is in
At the end of each iteration, Dopamine also records the agent's performance,
both during training and (if enabled) during an optional evaluation phase. The
log files are generated in
and more specifically in
and are pickle files containing a dictionary mapping iteration keys
"iteration_47") to dictionaries containing data.
We provide a colab to illustrate how you can load the statistics from an experiment and plot them against our provided baseline runs.
Modifying and extending agents
Dopamine is designed to make algorithmic research simple. With this in mind, we decided to keep a relatively flat class hierarchy, with no abstract base class; we've found this sufficient for our research purposes, with the added benefits of simplicity and ease of use. To begin, we recommend modifying the agent code directly to suit your research purposes.
We provide a colab where we illustrate how one can extend the DQN agent, or create a new agent from scratch, and then plot the experimental results against our provided baselines.
The DQN agent is contained in two files:
- The agent class, in
- The replay buffer, in
The agent class defines the DQN network, the update rule, and also the basic
operations of a RL agent (epsilon-greedy action selection, storing transitions,
episode bookkeeping, etc.). For example, the Q-Learning update rule used in DQN
is defined in two methods,
Rainbow and C51
The Rainbow agent is contained in two files:
- The agent class in
dopamine/agents/rainbow/rainbow_agent.py, inheriting from the DQN agent.
- The replay buffer in
dopamine/replay_memory/prioritized_replay_buffer.py, inheriting from DQN's replay buffer.
The C51 agent is a specific parametrization of the Rainbow agent, where
n in n-step update) is set to 1 and a uniform replay
scheme is used.
Implicit quantile networks (IQN)
The IQN agent is defined by one additional file:
dopamine/agents/implicit_quantile/implicit_quantile_agent.py, inheriting from the Rainbow agent.
We provide a series of files for all 4 agents on all 60 games. These are all
*.tar.gz files which you will need to uncompress:
- The raw logs are available
- You can view this colab for instructions on how to load and visualize them.
- The compiled pickle files are available here
- The Tensorboard event files are available
- We provide a
where you can start Tensorboard directly from the colab using
ngrok. In the provided example your Tensorboard will look something like this:
- We provide a colab where you can start Tensorboard directly from the colab using
* You can also view these with Tensorboard on your machine. For instance, after uncompressing the files you can run: ``` tensorboard --logdir c51/Asterix/ ``` to display the training runs for C51 on Asterix: