Hyoungseok Kim* 1 2, Jaekyeom Kim* 1 2, Yeonwoo Jeong1 2, Sergey Levine3, Hyun Oh Song1 2
*: Equal contribution, 1: Seoul National University, Department of Computer Science and Engineering, 2: Neural Processing Research Center, 3: UC Berkeley, Department of Electrical Engineering and Computer Sciences
This codebase contains the source code for our paper, EMI: Exploration with Mutual Information.
Please cite if you find our work helpful to your research:
@inproceedings{kimICML19,
Author = {Hyoungseok Kim and Jaekyeom Kim and Yeonwoo Jeong and Sergey Levine and Hyun Oh Song},
Title = {EMI: Exploration with Mutual Information},
Booktitle = {International Conference on Machine Learning (ICML)},
Year = {2019}}
A non-virtual machine with the following components:
- Ubuntu 16.04
- CUDA 8.0
- cuDNN 6.0
- Conda
- Run
conda env create -f environment.yml
. - After activating the created environment by executing
conda activate rllab3
, runpip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip
.
- Create a subdirectory,
./vendor/mujoco/
. - Obtain a MuJoCo license for your machine by following the instructions from their website if you don't have one. They offer a number of licensing options including 30-day free trials.
- Copy
mjkey.txt
, the license key file, into./vendor/mujoco/
. - Get the version 1.31 of the MuJoCo binaries for Linux from their website. Unzip the file.
- Copy all the files inside the directory
mjpro131/bin/
from the extracted content, into./vendor/mujoco/
.
-
Before running experiments, activate the conda environment by running
conda activate rllab3
. -
To train an EMI agent on SwimmerGather, run:
python examples/trpo_emi_mujoco.py
-
To train an EMI agent on SparseHalfCheetah, run:
python examples/trpo_emi_mujoco.py --env=SparseHalfCheetah
-
To train an EMI agent on Montezuma's Revenge, run:
python examples/trpo_emi_atari.py
-
The first run will end with no operations other than creating a config. Run the command again if you see the configuration message.
This work was partially supported by Samsung Advanced Institute of Technology and Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01367, BabyMind).
MIT License