Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major code refactoring #6

Merged
merged 7 commits into from
Jan 29, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
*__pycache__*
.idea
rllab/config_personal.py
data
vendor/mujoco
*.pyc
.idea
81 changes: 81 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
FROM ubuntu:16.04


# ========== Anaconda ==========
# https://github.com/ContinuumIO/docker-images/blob/master/anaconda/Dockerfile
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8

RUN apt-get update --fix-missing && apt-get install -y wget bzip2 ca-certificates \
libglib2.0-0 libxext6 libsm6 libxrender1 \
git mercurial subversion

RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh && \
wget --quiet https://repo.continuum.io/archive/Anaconda2-5.0.1-Linux-x86_64.sh -O ~/anaconda.sh && \
/bin/bash ~/anaconda.sh -b -p /opt/conda && \
rm ~/anaconda.sh

RUN apt-get install -y curl grep sed dpkg && \
TINI_VERSION=`curl https://github.com/krallin/tini/releases/latest | grep -o "/v.*\"" | sed 's:^..\(.*\).$:\1:'` && \
curl -L "https://github.com/krallin/tini/releases/download/v${TINI_VERSION}/tini_${TINI_VERSION}.deb" > tini.deb && \
dpkg -i tini.deb && \
rm tini.deb && \
apt-get clean

ENV PATH /opt/conda/bin:$PATH


# ========== Special Deps ==========
RUN apt-get -y install git make cmake unzip
RUN pip install awscli
# ALE requires zlib
RUN apt-get -y install zlib1g-dev
# MUJOCO requires graphics stuff (Why?)
RUN apt-get -y build-dep glfw
RUN apt-get -y install libxrandr2 libxinerama-dev libxi6 libxcursor-dev
RUN apt-get install -y vim ack-grep
RUN pip install --upgrade pip
# usual pip install pygame will fail
RUN apt-get build-dep -y python-pygame
RUN pip install Pillow


# ========== Add codebase stub ==========
WORKDIR /root/sql

ADD environment.yml /root/sql/environment.yml
RUN conda env create -f /root/sql/environment.yml \
&& conda env update

ENV PYTHONPATH /root/sql:$PYTHONPATH
ENV PATH /opt/conda/envs/sql/bin:$PATH
RUN echo "source activate sql" >> /root/.bashrc
ENV BASH_ENV /root/.bashrc


# ========= rllab ===============
# We need to clone rllab repo in order to use the
# `rllab.sandbox.rocky.tf` functions.

ENV RLLAB_PATH=/root/rllab \
RLLAB_VERSION=b3a28992eca103cab3cb58363dd7a4bb07f250a0

RUN git clone https://github.com/rll/rllab.git ${RLLAB_PATH} \
&& cd ${RLLAB_PATH} \
&& git checkout ${RLLAB_VERSION} \
&& mkdir ${RLLAB_PATH}/vendor/mujoco \
&& python -m rllab.config

ENV PYTHONPATH ${RLLAB_PATH}:${PYTHONPATH}


# ========= MuJoCo ===============
ENV MUJOCO_VERSION=1.3.1 \
MUJOCO_PATH=/root/.mujoco

RUN MUJOCO_ZIP="mjpro$(echo ${MUJOCO_VERSION} | sed -e "s/\.//g")_linux.zip" \
&& mkdir -p ${MUJOCO_PATH} \
&& wget -P ${MUJOCO_PATH} https://www.roboti.us/download/${MUJOCO_ZIP} \
&& unzip ${MUJOCO_PATH}/${MUJOCO_ZIP} -d ${MUJOCO_PATH} \
&& cp ${MUJOCO_PATH}/mjpro131/bin/libmujoco131.so ${RLLAB_PATH}/vendor/mujoco/ \
&& cp ${MUJOCO_PATH}/mjpro131/bin/libglfw.so.3 ${RLLAB_PATH}/vendor/mujoco/ \
&& rm ${MUJOCO_PATH}/${MUJOCO_ZIP}
123 changes: 108 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,120 @@
# Soft Q-Learning
Soft Q-Learning is a deep reinforcement learning framework for training expressive, energy-based policies in continuous domains. This implementation is based on [rllab](https://github.com/openai/rllab). Full algorithm is detailed in our paper, [Reinforcement Learning with Deep Energy-Based Policies](https://arxiv.org/abs/1702.08165), and videos can be found [here](https://sites.google.com/view/softqlearning/home).
# Installation
The implementation is compatible with the rllab interface (see [documentation](https://rllab.readthedocs.io/en/latest/index.html)), and depends on some of its features which are included in this package for convenience. Additionally, some of the examples uses [MuJoCo](http://www.mujoco.org/) physics engine. For installation, you might find [rllab documentation](http://rllab.readthedocs.io/en/latest/user/installation.html) useful. You should add the MuJoCo library files and the key in `/vendor/mujoco` folder.
Soft Q-learning (SQL) is a deep reinforcement learning framework for training maximum entropy policies in continous domains. The algorithm is based on the paper [Reinforcement Learning with Deep Energy-Based Policies](https://arxiv.org/abs/1702.08165) presented at the International Conference on Machine Learning (ICML), 2017.

You will need Tensorflow 1.0 or later. Full list of dependencies is listed in [`requirements.txt`](https://github.com/haarnoja/softqlearning/blob/master/requirements.txt).
# Getting Started

# Examples
There are three example environments:
- In the `MultiGoal` environment, task is to move a point-mass into one of four equally good goal locations (see details in [our paper](https://arxiv.org/abs/1702.08165)).
- In the [`Swimmer`](https://gym.openai.com/envs/Swimmer-v1) environment, a two-dimensional, three-link snake needs to learn to swim forwards and backwards.
Soft Q-learning can be run either locally or through Docker.

To train these models run
## Prerequisites

You will need to have [Docker](https://docs.docker.com/engine/installation/) and [Docker Compose](https://docs.docker.com/compose/install/) installed unless you want to run the environment locally.

Most of the models require a [MuJoCo](https://www.roboti.us/license.html) license.

## Docker Installation

Currently, rendering of simulations is not supported on Docker due to a missing display setup. As a fix, you can use a [local installation](#local-installation). If you want to run the MuJoCo environments without rendering, the docker environment needs to know where to find your MuJoCo license key (`mjkey.txt`). You can either copy your key into `<PATH_TO_THIS_REPOSITY>/.mujoco/mjkey.txt`, or you can specify the path to the key in your environment variables:

```
export MUJOCO_LICENSE_PATH=<path_to_mujoco>/mjkey.txt
```

Once that's done, you can run the Docker container with

```
docker-compose up
```

Docker compose creates a Docker container named `soft-q-learning` and automatically sets the needed environment variables and volumes.

You can access the container with the typical Docker [exec](https://docs.docker.com/engine/reference/commandline/exec/)-command, i.e.

```
docker exec -it soft-q-learning bash
```

See examples section for examples of how to train and simulate the agents.

To clean up the setup:
```
python softqlearning/scripts/learn_<env>.py
docker-compose down
```
and to test a trained model, run

## Local Installation

To get the environment installed correctly, you will first need to clone [rllab](https://github.com/rll/rllab), and have its path added to your PYTHONPATH environment variable.

1. Clone rllab
```
cd <installation_path_of_your_choice>
git clone https://github.com/rll/rllab.git
cd rllab
git checkout b3a28992eca103cab3cb58363dd7a4bb07f250a0
export PYTHONPATH=$(pwd):${PYTHONPATH}
```

2. [Download](https://www.roboti.us/index.html) and copy MuJoCo files to rllab path:
If you're running on OSX, download https://www.roboti.us/download/mjpro131_osx.zip instead, and copy the `.dylib` files instead of `.so` files.
```
python softqlearning/scripts/sim_policy.py data/<env>/itr_<#>.pkl
mkdir -p /tmp/mujoco_tmp && cd /tmp/mujoco_tmp
wget -P . https://www.roboti.us/download/mjpro131_linux.zip
unzip mjpro131_linux.zip
mkdir <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libmujoco131.so <installation_path_of_your_choice>/rllab/vendor/mujoco
cp ./mjpro131/bin/libglfw.so.3 <installation_path_of_your_choice>/rllab/vendor/mujoco
cd ..
rm -rf /tmp/mujoco_tmp
```

3. Copy your MuJoCo license key (mjkey.txt) to rllab path:
```
cp <mujoco_key_folder>/mjkey.txt <installation_path_of_your_choice>/rllab/vendor/mujoco
```

4. Clone `softqlearning`
```
cd <installation_path_of_your_choice>
git clone https://github.com/haarnoja/softqlearning.git
```

5. Create and activate conda environment
```
cd softqlearning
conda env create -f environment.yml
source activate sql
```

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:
```
source deactivate
conda remove --name sql --all
```

## Examples
### Training and simulating an agent
1. To train the agent
```
python ./examples/mujoco_all_sql.py --env=swimmer --log_dir="/root/sql/data/swimmer-experiment"
```

2. To simulate the agent (*NOTE*: This step currently fails with the Docker installation, due to missing display.)
```
python ./scripts/sim_policy.py /root/sql/data/swimmer-experiment/itr_<iteration>.pkl
```

`mujoco_all_sql.py` contains several different environments and there are more example scripts available in the `/examples` folder. For more information about the agents and configurations, run the scripts with `--help` flag. For example:
```
python ./examples/mujoco_all_sql.py --help
usage: mujoco_all_sql.py [-h]
[--env {ant,walker,swimmer,half-cheetah,humanoid,hopper}]
[--exp_name EXP_NAME] [--mode MODE]
[--log_dir LOG_DIR]
```
where `<env>` is the name of an environment and `<#>` is a iteration number.

# Credits
The Soft Q-Learning package was developed by Haoran Tang and Tuomas Haarnoja, under the supervision of Pieter Abbeel and Sergey Levine, in 2017 at UC Berkeley. We thank Vitchyr Pong and Shane Gu, who helped us implementing some parts of the code. The work was supported by [Berkeley Deep Drive](https://deepdrive.berkeley.edu/).
The soft q-learning algorithm was developed by [Haoran Tang](https://math.berkeley.edu/~hrtang/) and [Tuomas Haarnoja](https://people.eecs.berkeley.edu/~haarnoja/) under the supervision of Prof. [Sergey Levine](https://people.eecs.berkeley.edu/~svlevine/) and Prof. [Pieter Abbeel](https://people.eecs.berkeley.edu/~pabbeel/) at UC Berkeley. Special thanks to [Vitchyr Pong](https://github.com/vitchyr), who wrote some parts of the code, and [Kristian Hartikainen](https://github.com/hartikainen) who helped testing, documenting, and polishing the code and streamlining the installation process. The work was supported by [Berkeley Deep Drive](https://deepdrive.berkeley.edu/).

# Reference
```
Expand All @@ -31,4 +124,4 @@ The Soft Q-Learning package was developed by Haoran Tang and Tuomas Haarnoja, un
booktitle={International Conference on Machine Learning},
year={2017}
}
```
```
13 changes: 13 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: "3"
services:
sql:
build:
context: .
dockerfile: Dockerfile
container_name: soft-q-learning
entrypoint: tail -f /dev/null
volumes:
- .:/root/sql
- ${MUJOCO_LICENSE_PATH:-./.mujoco/mjkey.txt}:/root/rllab/vendor/mujoco/mjkey.txt
environment:
MUJOCO_PY_MJKEY_PATH: /root/rllab/vendor/mujoco/mjkey.txt
22 changes: 22 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: sql
dependencies:
- cached-property
- cloudpickle
- flask
- joblib
- lasagne
- matplotlib
- mako
- numpy==1.13.1
- path.py
- plotly
- PyOpenGL
- python==3.5.2
- six
- theano==0.8.2
- pip:
- gtimer
- gym==0.8
- mujoco_py==0.5.7
- pyprind
- tensorflow
Loading