haarnoja · haarnoja · Jan 29, 2018 · Jan 23, 2018 · Jan 24, 2018 · Jan 24, 2018
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,3 @@
-*__pycache__*
-.idea
-rllab/config_personal.py
 data
-vendor/mujoco
+*.pyc
+.idea
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,81 @@
+FROM ubuntu:16.04
+
+
+# ========== Anaconda ==========
+# https://github.com/ContinuumIO/docker-images/blob/master/anaconda/Dockerfile
+ENV LANG=C.UTF-8 LC_ALL=C.UTF-8
+
+RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
+    libglib2.0-0 libxext6 libsm6 libxrender1 \
+    git mercurial subversion
+
+RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
+    wget --quiet https://repo.continuum.io/archive/Anaconda2-5.0.1-Linux-x86_64.sh -O ~/anaconda.sh && \
+    /bin/bash ~/anaconda.sh -b -p /opt/conda && \
+    rm ~/anaconda.sh
+
+RUN apt-get install -y curl grep sed dpkg && \
+    TINI_VERSION=`curl https://github.com/krallin/tini/releases/latest | grep -o "/v.*\"" | sed 's:^..\(.*\).$:\1:'` && \
+    curl -L "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb" > tini.deb && \
+    dpkg -i tini.deb && \
+    rm tini.deb && \
+    apt-get clean
+
+ENV PATH /opt/conda/bin:$PATH
+
+
+# ========== Special Deps ==========
+RUN apt-get -y install git make cmake unzip
+RUN pip install awscli
+# ALE requires zlib
+RUN apt-get -y install zlib1g-dev
+# MUJOCO requires graphics stuff (Why?)
+RUN apt-get -y build-dep glfw
+RUN apt-get -y install libxrandr2 libxinerama-dev libxi6 libxcursor-dev
+RUN apt-get install -y vim ack-grep
+RUN pip install --upgrade pip
+# usual pip install pygame will fail
+RUN apt-get build-dep -y python-pygame
+RUN pip install Pillow
+
+
+# ========== Add codebase stub ==========
+WORKDIR /root/sql
+
+ADD environment.yml /root/sql/environment.yml
+RUN conda env create -f /root/sql/environment.yml \
+    && conda env update
+
+ENV PYTHONPATH /root/sql:$PYTHONPATH
+ENV PATH /opt/conda/envs/sql/bin:$PATH
+RUN echo "source activate sql" >> /root/.bashrc
+ENV BASH_ENV /root/.bashrc
+
+
+# ========= rllab ===============
+# We need to clone rllab repo in order to use the
+# `rllab.sandbox.rocky.tf` functions.
+
+ENV RLLAB_PATH=/root/rllab \
+    RLLAB_VERSION=b3a28992eca103cab3cb58363dd7a4bb07f250a0
+
+RUN git clone https://github.com/rll/rllab.git ${RLLAB_PATH} \
+    && cd ${RLLAB_PATH} \
+    && git checkout ${RLLAB_VERSION} \
+    && mkdir ${RLLAB_PATH}/vendor/mujoco \
+    && python -m rllab.config
+
+ENV PYTHONPATH ${RLLAB_PATH}:${PYTHONPATH}
+
+
+# ========= MuJoCo ===============
+ENV MUJOCO_VERSION=1.3.1 \
+    MUJOCO_PATH=/root/.mujoco
+
+RUN MUJOCO_ZIP="mjpro$(echo ${MUJOCO_VERSION} | sed -e "s/\.//g")_linux.zip" \
+    && mkdir -p ${MUJOCO_PATH} \
+    && wget -P ${MUJOCO_PATH} https://www.roboti.us/download/${MUJOCO_ZIP} \
+    && unzip ${MUJOCO_PATH}/${MUJOCO_ZIP} -d ${MUJOCO_PATH} \
+    && cp ${MUJOCO_PATH}/mjpro131/bin/libmujoco131.so ${RLLAB_PATH}/vendor/mujoco/ \
+    && cp ${MUJOCO_PATH}/mjpro131/bin/libglfw.so.3 ${RLLAB_PATH}/vendor/mujoco/ \
+    && rm ${MUJOCO_PATH}/${MUJOCO_ZIP}
diff --git a/README.md b/README.md
@@ -1,27 +1,120 @@
 # Soft Q-Learning
-Soft Q-Learning is a deep reinforcement learning framework for training expressive, energy-based policies in continuous domains. This implementation is based on [rllab](https://github.com/openai/rllab). Full algorithm is detailed in our paper, [Reinforcement Learning with Deep Energy-Based Policies](https://arxiv.org/abs/1702.08165), and videos can be found [here](https://sites.google.com/view/softqlearning/home).
-# Installation
-The implementation is compatible with the rllab interface (see [documentation](https://rllab.readthedocs.io/en/latest/index.html)), and depends on some of its features which are included in this package for convenience. Additionally, some of the examples uses [MuJoCo](http://www.mujoco.org/) physics engine. For installation, you might find [rllab documentation](http://rllab.readthedocs.io/en/latest/user/installation.html) useful. You should add the MuJoCo library files and the key in `/vendor/mujoco` folder.
+Soft Q-learning (SQL) is a deep reinforcement learning framework for training maximum entropy policies in continous domains. The algorithm is based on the paper [Reinforcement Learning with Deep Energy-Based Policies](https://arxiv.org/abs/1702.08165) presented at the International Conference on Machine Learning (ICML), 2017.
 
-You will need Tensorflow 1.0 or later. Full list of dependencies is listed in [`requirements.txt`](https://github.com/haarnoja/softqlearning/blob/master/requirements.txt).
+# Getting Started
 
-# Examples
-There are three example environments:
-- In the `MultiGoal` environment, task is to move a point-mass into one of four equally good goal locations (see details in [our paper](https://arxiv.org/abs/1702.08165)).
-- In the [`Swimmer`](https://gym.openai.com/envs/Swimmer-v1) environment, a two-dimensional, three-link snake needs to learn to swim forwards and backwards.
+Soft Q-learning can be run either locally or through Docker.
 
-To train these models run
+## Prerequisites
+
+You will need to have [Docker](https://docs.docker.com/engine/installation/) and [Docker Compose](https://docs.docker.com/compose/install/) installed unless you want to run the environment locally.
+
+Most of the models require a [MuJoCo](https://www.roboti.us/license.html) license.
+
+## Docker Installation
+
+Currently, rendering of simulations is not supported on Docker due to a missing display setup. As a fix, you can use a [local installation](#local-installation). If you want to run the MuJoCo environments without rendering, the docker environment needs to know where to find your MuJoCo license key (`mjkey.txt`). You can either copy your key into `<PATH_TO_THIS_REPOSITY>/.mujoco/mjkey.txt`, or you can specify the path to the key in your environment variables:
+
+```
+export MUJOCO_LICENSE_PATH=<path_to_mujoco>/mjkey.txt
+```
+
+Once that's done, you can run the Docker container with
+
+```
+docker-compose up
+```
+
+Docker compose creates a Docker container named `soft-q-learning` and automatically sets the needed environment variables and volumes.
+
+You can access the container with the typical Docker [exec](https://docs.docker.com/engine/reference/commandline/exec/)-command, i.e.
+
+```
+docker exec -it soft-q-learning bash
+```
+
+See examples section for examples of how to train and simulate the agents.
+
+To clean up the setup:
 ```
-python softqlearning/scripts/learn_<env>.py
+docker-compose down
 ```
-and to test a trained model, run
+
+## Local Installation
+
+To get the environment installed correctly, you will first need to clone [rllab](https://github.com/rll/rllab), and have its path added to your PYTHONPATH environment variable.
+
+1. Clone rllab
+```
+cd <installation_path_of_your_choice>
+git clone https://github.com/rll/rllab.git
+cd rllab
+git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
+export PYTHONPATH=$(pwd):${PYTHONPATH}
+```
+
+2. [Download](https://www.roboti.us/index.html) and copy MuJoCo files to rllab path:
+  If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the `.dylib` files instead of `.so` files.
 ```
-python softqlearning/scripts/sim_policy.py data/<env>/itr_<#>.pkl
+mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
+wget -P . https://www.roboti.us/download/mjpro131_linux.zip
+unzip mjpro131_linux.zip
+mkdir <installation_path_of_your_choice>/rllab/vendor/mujoco
+cp ./mjpro131/bin/libmujoco131.so <installation_path_of_your_choice>/rllab/vendor/mujoco
+cp ./mjpro131/bin/libglfw.so.3 <installation_path_of_your_choice>/rllab/vendor/mujoco
+cd ..
+rm -rf /tmp/mujoco_tmp
+```
+
+3. Copy your MuJoCo license key (mjkey.txt) to rllab path:
+```
+cp <mujoco_key_folder>/mjkey.txt <installation_path_of_your_choice>/rllab/vendor/mujoco
+```
+
+4. Clone `softqlearning`
+```
+cd <installation_path_of_your_choice>
+git clone https://github.com/haarnoja/softqlearning.git
+```
+
+5. Create and activate conda environment
+```
+cd softqlearning
+conda env create -f environment.yml
+source activate sql
+```
+
+The environment should be ready to run. See examples section for examples of how to train and simulate the agents.
+
+Finally, to deactivate and remove the conda environment:
+```
+source deactivate
+conda remove --name sql --all
+```
+
+## Examples
+### Training and simulating an agent
+1. To train the agent
+```
+python ./examples/mujoco_all_sql.py --env=swimmer --log_dir="/root/sql/data/swimmer-experiment"
+```
+
+2. To simulate the agent (*NOTE*: This step currently fails with the Docker installation, due to missing display.)
+```
+python ./scripts/sim_policy.py /root/sql/data/swimmer-experiment/itr_<iteration>.pkl
+```
+
+`mujoco_all_sql.py` contains several different environments and there are more example scripts available in the  `/examples` folder. For more information about the agents and configurations, run the scripts with `--help` flag. For example:
+```
+python ./examples/mujoco_all_sql.py --help
+usage: mujoco_all_sql.py [-h]
+                         [--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
+                         [--exp_name EXP_NAME] [--mode MODE]
+                         [--log_dir LOG_DIR]
 ```
-where `<env>` is the name of an environment and `<#>` is a iteration number.
 
 # Credits
-The Soft Q-Learning package was developed by Haoran Tang and Tuomas Haarnoja, under the supervision of Pieter Abbeel and Sergey Levine, in 2017 at UC Berkeley. We thank Vitchyr Pong and Shane Gu, who helped us implementing some parts of the code. The work was supported by [Berkeley Deep Drive](https://deepdrive.berkeley.edu/).
+The soft q-learning algorithm was developed by [Haoran Tang](https://math.berkeley.edu/~hrtang/) and [Tuomas Haarnoja](https://people.eecs.berkeley.edu/~haarnoja/) under the supervision of Prof. [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/) and Prof. [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/) at UC Berkeley. Special thanks to [Vitchyr Pong](https://github.com/vitchyr), who wrote some parts of the code, and [Kristian Hartikainen](https://github.com/hartikainen) who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by [Berkeley Deep Drive](https://deepdrive.berkeley.edu/).
 
 # Reference
 ```
@@ -31,4 +124,4 @@ The Soft Q-Learning package was developed by Haoran Tang and Tuomas Haarnoja, un
   booktitle={International Conference on Machine Learning},
   year={2017}
 }
-```
+```
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -0,0 +1,13 @@
+version: "3"
+services:
+  sql:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: soft-q-learning
+    entrypoint: tail -f /dev/null
+    volumes:
+      - .:/root/sql
+      - ${MUJOCO_LICENSE_PATH:-./.mujoco/mjkey.txt}:/root/rllab/vendor/mujoco/mjkey.txt
+    environment:
+      MUJOCO_PY_MJKEY_PATH: /root/rllab/vendor/mujoco/mjkey.txt
diff --git a/environment.yml b/environment.yml
@@ -0,0 +1,22 @@
+name: sql
+dependencies:
+    - cached-property
+    - cloudpickle
+    - flask
+    - joblib
+    - lasagne
+    - matplotlib
+    - mako
+    - numpy==1.13.1
+    - path.py
+    - plotly
+    - PyOpenGL
+    - python==3.5.2
+    - six
+    - theano==0.8.2
+    - pip:
+        - gtimer
+        - gym==0.8
+        - mujoco_py==0.5.7
+        - pyprind
+        - tensorflow