Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This is the codebase for our paper "The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning " by M.Igl, G. Farquhar, J. Luketina, W. Boehmer and S. Whiteson.

It also includes an implementation of IBAC-SNI on ProcGen.

It comprises several sub-folders:

  1. gym-minigrid contains the grid-world environment (for the Multiroom experiments) and is adapted from This environment is used together with torch_rl
  2. torch_rl contains the agent to run on the gym-minigrid environment and is adapted from
  3. multiroom_exps contains the training code for the Multiroom experiments.
  4. train-procgen contains the code for the results on the ProcGen domain. It is code adapted from
  5. cifar contains the code for the supervised experiments. It is code adapted from

Plotting is explain at the very end.


All experiments can be run in the accompanying docker container. To build it, call


in the root folder.

Then, an interactive docker session can be started with

./ <GPU-ID> <Containername>
./ 0 iter

where <GPU-ID> is the GPU you want to use and <Containername> can be anything or left empty.

Supervised Learning

After starting interactive session (./ in root folder), move to cifar folder:

cd cifar

Figure 2

Annealing the fraction of correct datapoints from 0 to 1

Run the baseline:

python -p

Run with non-stationarity:

python -p with annealing.every_n_epochs=1 annealing.type=<type>

where <type> can either be size (=Dataset size), random (=Noisy labels) or consistent (=Wrong labels).

Figure 3 left

Results for self-distillation

For the baseline (i.e. no non-stationarity)

python -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500 

And for non-stationarities:

python -p with epochs=2500 annealing.every_n_epochs=1 self_distillation=1500 annealing.type=<type>

where again, please fill in type <type> as desired.

Figure 3 middle

Two phase training

python -p with epochs=1500 annealing.duration=700 frozen_test_epochs=800 annealing.type=<type> annealing.start_fraction=<fraction>

where <type> and <fraction> should be filled out as desired. In the experiments, we used the following values for <fraction>.

For Wrong labels and Noisy Labels: 0.05, 0.1 0.2, 0.3, 0.4, 0.5, 0.75, 1.0
Additionally For Dataset Size: 0.005, 0.01, 0.02 + same as for the others



  • Start the interactive docker session: in root folder
  • Install gym-minigrid: pip install -e gym-minigrid
  • Install torch_rl: pip install -e torch_rl
  • Move to multiroom_exps: cd multiroom_exps

Running commands

python -p with iter_type=<type>

where type can be either none (small caps!) or distill


For ProcGen we need 4 GPUs at once, so we need to start the interactive docker container as

./ 0,1,2,3 procgen

for GPUs 0,1,2,3. Then go to subfolder cd train-procgen/train_procgen/

Running the experiments

Baseline PPO:

mpiexec -np 4 python -p with env_name=<env_name>

where <env_name> can be any of the ProcGen environments. The ones used in this paper were starpilot, dodgeball, climber, ninja and bigfish.


mpiexec -np 4 python -p with env_name=<env_name> iter_loss.use=True

Baseline IBAC:

mpiexec -np 4 python -p with env_name=<env_name> arch.reg=ibac

where we use selective noise injection as well.


mpiexec -np 4 python -p with env_name=<env_name> arch.reg=ibac iter_loss.use=True

Ablation studies

Sequential ITER

mpiexec -np 4 python -p with env_name=<env_name> iter_loss.use=True \
iter_loss.v2=True \
iter_loss.timesteps_initial=71_000_000 \
iter_loss.timesteps_anneal=9_000_000 \

Careful: This will generate about 500GB of data!

ITER without RL terms in distillation

mpiexec -np 4 python -p with env_name=<env_name> iter_loss.use=True \
iter_loss.alpha_reg.schedule=const \


The experiments are using sacred for configuration and logging. For more thorough use of this codebase, I'd recommend setting up a MongoDB to store the results. At the moment, results are logged using the FileStorageObserver into a db folder in the root directory.

There is a very simple plotting script included:

python --id <id> --metric <metric>

where <id> is the unique experiment id assigned to each run by sacred. It is printed in stdout somewhere at the beginning when starting a new run.

<metric> is the name of what you want to plot. This is train_acc and test_acc for the supervised experiments, rreturn_mean for Multiroom and eprewmean for ProcGen. Many more things are also being logged, either check the code or metrics.json file to see what.

Special for Procgen: The ProcGen experiments run 4 threads, three for training, one for testing. Each of those 4 threads gets one unique id, but only two of those threads are acutally logging something, one the training, one the test performance. Just try plotting for each of those 4 ids, it will either crash (if that id wasn't logging) or the plotting script will actually print out whether that's the train or test performance.


Repository for Iterated Relearning: The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning







No releases published


No packages published