Skip to content
Tutorial & scripts to run a meta-rl model on DeepMind Lab's Harlow task environment.
Branch: master
Clone or download
Pull request Compare This branch is 36 commits ahead, 10 commits behind deepmind:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets [assets, deepmind] Minor style fixes Jan 11, 2019
data replaced black and white with mtrazzi and kcosta Mar 23, 2019
engine [dmlab_connect] Fix memory deallocation order. Feb 4, 2019
lua_tests [log] Make logging functions stringize their arguments. Jul 3, 2018
testing [deepmind, public, testing] Use Abseil hash maps Oct 22, 2018
.gitignore added pretty prints + removed lua prints + quick fix in train/test Feb 18, 2019
.gitmodules Clean submodule Mar 8, 2019
CONTRIBUTING Rules for contributing Nov 29, 2016
WORKSPACE Linker script for isolating the main code into a DSO Nov 30, 2016
eigen.BUILD Bazel workspace and external build files Nov 30, 2016
get_started_workspace.ipynb Lab can launch but agent doesnt seems to learn Feb 14, 2019
lua.BUILD [WORKSPACE, docs] Add hermetic dependency on Lua 5.1. Oct 16, 2018
python.BUILD Lab can launch but agent doesnt seems to learn Feb 14, 2019

Harlow Task

In environment footage, captured by human player

Run on FloydHub

Note: This is the code for my article Meta-Reinforcement Learning on FloydHub. This repository is for DeepMind Lab and the Harlow task environment. For the git submodule containing all the tensorflow code and the DeepMind Lab wrapper, see this repository. For the two-step task see this repository instead.⚠

Here, we try to reproduce the simulations regarding the harlow task as described in the two papers:

To reproduce the Harlow Task, we used DeepMind Lab, a 3D learning environment that provides a suite of challenging 3D navigation and puzzle-solving tasks for learning agents. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning. For more info about DeepMind Lab, you can checkout their repo here.


I answer questions and give more informations here:

Getting Started

  1. Clone the repository:

$ git clone

  1. Change current directory to harlow/python:

$ cd harlow/python

  1. Fetch the git submodule meta_rl:
python$ git submodule init
python$ git submodule update --remote
  1. Change current directory to the root of the repo:
python$ cd ..
  1. Make sure that everything is correctly setup in WORKSPACE and python.BUILD (cf. section Configure the repo).

  2. Install dependencies and build with bazel:

harlow$ sh
harlow$ sh
  1. Train the meta-RL Agent:
harlow$ sh

Useful commands

To run the harlow environment as human, run:

harlow$ bazel run :game -- --level_script=contributed/psychlab/harlow

For a live example of the harlow agent, run

harlow$ bazel run :python_harlow --define graphics=sdl --incompatible_remove_native_http_archive=false -- --level_script contributed/psychlab/harlow  --width=640 --height=480

Directory structure

├── python.BUILD
└── python
    ├── meta-rl
    │   ├──
    │   └── meta_rl
    │       ├──
    │       └──
    └── dmlab_module.c
└── data
    └── brady_konkle_oliva2008
        ├── 0001.png
        ├── 0002.png

Most of our code is in the repository python where you'll find our git submodule meta-rl, that contains the three most important files:, meta_rl/ and meta_rl/

For more details about those three important files, check out the README of the meta-rl repo, that also contains more information about the different architectures we tried and the on-going projects.

Apart from that, the other essential files are:



We tried to reproduce the results from Prefrontal cortex as a meta-reinforcement learning system (see Simulation 5 in Methods). We launched n=5 trainings using 5 different seeds, with the same hyperparameters as the paper, to compare to the results obtained by Wang et al.

Main differences

  • we removed the CNN.
  • we replaced the stacked LSTM with a 48 units LSTM (same as for the Harlow task).
  • we drastically reduced the action space so that the agent could do only left and right actions (that would directly target the center of the image.
  • we add artificial NO-OPS to force the agent to fixate for multiple frames (and to remove the noise at the beginning of an episode).
  • we used a dataset of 42 pictures, instead of 1000 images samples form ImageNet.
  • we used only 1 thread, on one CPU, instead of 32 threads on 32 GPUs.

For each seed, the training consisted in ~10k episodes (instead of 10^5 episodes per thread (32 threads) in the paper). The reason for our number of episodes choice is that, in our case, the learning seemed to have reached a threshold after ~7k episodes for all seeds.


For our dataset we used profile pictures of our friends at 42 (software enginneering education), resized to 256x256 (to tweak the dataset to your own needs, see here).

Example of a run of the agent on the dataset (after training):


What the agent sees for the run above (after pre-processing):

agent view

Reward Curve

Here is the reward curve (one color per seed) after ~10k episodes (which took approximately 3 days to train) on FloydHub's CPU:

reward curve 5 seeds 42 images

Additional results

I added a repo containing checkpoints (for tensorboard) for the 5 seeds here, and the corresponding curves for rewards, policy loss, entropy loss in this repository.

Configure the repo


To tweak the dataset:

  1. add your own images in data/brady_konkle_oliva2008/. Use the following names: 0001.png, 0002.png, etc. The sizes of the images must be 256x256, like in the original brady_konkle_oliva2008 dataset.
  2. change DATASET_SIZE in game_scripts/datasets/brady_konkle_oliva2008.lua to your number of images.
  3. change TRAIN_BATCH and TEST_BATCH to the number of images you will use respectively in train and test in game_scripts/levels/contributed/psychlab/factories/harlow_factory.lua.
  4. change DATASET_SIZE in python/meta-rl/ to your number of images.
  5. change DATASET_SIZE in python/meta-rl/meta_rl/ to your number of images.

Linux/Python3 vs. MacOS/Python2.7

The DeepMind Lab release supports Python2.7, but you can find some documentation for Python3 here.

Currently, our branch python2 supports Python2.7 and MacOS, and our branch master supports Python3.6 and Linux.

  • The branch python2 should work on iMac's available at 42 (software engineering education).
  • The branch master was tested on FloydHub's instances (using Tensorflow 1.12 and CPU). To change for GPU, change tf.device("/cpu:0") with tf.device("/device:GPU:0") in


All the pip packages should be either installed on FloydHub or installed with

However, if you want to run this repository on your machine, here are the requirements:



This work uses awjuliani's Meta-RL implementation.

I couldn't have done without my dear friend Kevin Costa, and the additional details provided kindly by Jane Wang.

You can’t perform that action at this time.