In environment footage, captured by human player
⚠ Note: This is the code for my article Meta-Reinforcement Learning on FloydHub. This repository is for DeepMind Lab and the Harlow task environment. For the git submodule containing all the tensorflow code and the DeepMind Lab wrapper, see this repository. For the two-step task see this repository instead.⚠
Here, we try to reproduce the simulations regarding the harlow task as described in the two papers:
- Learning to Reinforcement Learn, Wang et al., 2016
- Prefrontal cortex as a meta-reinforcement learning system, Wang et al., 2018
To reproduce the Harlow Task, we used DeepMind Lab, a 3D learning environment that provides a suite of challenging 3D navigation and puzzle-solving tasks for learning agents. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning. For more info about DeepMind Lab, you can checkout their repo here.
I answer questions and give more informations here:
- Clone the repository:
$ git clone https://github.com/mtrazzi/harlow.git
- Change current directory to
$ cd harlow/python
- Fetch the git submodule
python$ git submodule init python$ git submodule update --remote
- Change current directory to the root of the repo:
python$ cd ..
Make sure that everything is correctly setup in
python.BUILD(cf. section Configure the repo).
Install dependencies and build with bazel:
harlow$ sh install.sh harlow$ sh build.sh
- Train the meta-RL Agent:
harlow$ sh train.sh
To run the harlow environment as human, run:
harlow$ bazel run :game -- --level_script=contributed/psychlab/harlow
For a live example of the harlow agent, run
harlow$ bazel run :python_harlow --define graphics=sdl --incompatible_remove_native_http_archive=false -- --level_script contributed/psychlab/harlow --width=640 --height=480
harlow ├── WORKSPACE ├── python.BUILD └── python ├── meta-rl │ ├── harlow.py │ └── meta_rl │ ├── worker.py │ └── ac_network.py └── dmlab_module.c └── data └── brady_konkle_oliva2008 ├── 0001.png ├── 0002.png └── README.md
⚠ For more details about those three important files, check out the README of the
meta-rl repo, that also contains more information about the different architectures we tried and the on-going projects. ⚠
Apart from that, the other essential files are:
- The Lua script for the Harlow Task Environment:
- The file
dmlab_module.cthat creates a Python API to use DeepMind Lab.
- The folder data/brady_konkle_oliva2008 that you can tweak to use your own dataset using the instructions in Dataset (the instructions in the README.md are to download the data for this paper).
We tried to reproduce the results from Prefrontal cortex as a meta-reinforcement learning system (see Simulation 5 in Methods). We launched n=5 trainings using 5 different seeds, with the same hyperparameters as the paper, to compare to the results obtained by Wang et al.
- we removed the CNN.
- we replaced the stacked LSTM with a 48 units LSTM (same as for the Harlow task).
- we drastically reduced the action space so that the agent could do only left and right actions (that would directly target the center of the image.
- we add artificial
NO-OPSto force the agent to fixate for multiple frames (and to remove the noise at the beginning of an episode).
- we used a dataset of 42 pictures, instead of 1000 images samples form ImageNet.
- we used only 1 thread, on one CPU, instead of 32 threads on 32 GPUs.
For each seed, the training consisted in ~10k episodes (instead of 10^5 episodes per thread (32 threads) in the paper). The reason for our number of episodes choice is that, in our case, the learning seemed to have reached a threshold after ~7k episodes for all seeds.
For our dataset we used profile pictures of our friends at 42 (software enginneering education), resized to 256x256 (to tweak the dataset to your own needs, see here).
Example of a run of the agent on the dataset (after training):
What the agent sees for the run above (after pre-processing):
Here is the reward curve (one color per seed) after ~10k episodes (which took approximately 3 days to train) on FloydHub's CPU:
Configure the repo
To tweak the dataset:
- add your own images in data/brady_konkle_oliva2008/. Use the following names:
0002.png, etc. The sizes of the images must be 256x256, like in the original
game_scripts/datasets/brady_konkle_oliva2008.luato your number of images.
TEST_BATCHto the number of images you will use respectively in train and test in
python/meta-rl/harlow.pyto your number of images.
python/meta-rl/meta_rl/ac_network.pyto your number of images.
Linux/Python3 vs. MacOS/Python2.7
The DeepMind Lab release supports Python2.7, but you can find some documentation for Python3 here.
- The branch
python2should work on iMac's available at 42 (software engineering education).
- The branch
masterwas tested on FloydHub's instances (using
CPU). To change for
All the pip packages should be either installed on FloydHub or installed with
However, if you want to run this repository on your machine, here are the requirements:
numpy==1.16.2 tensorflow==1.12.0 six==1.12.0 scipy==1.2.1 skimage==0.0 setuptools==40.8.0 Pillow==5.4.1
This work uses awjuliani's Meta-RL implementation.