Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.

Learning Invariant Representations for Reinforcement Learning without Reconstruction


We assume you have access to a gpu that can run CUDA 9.2. Then, the simplest way to install all required dependencies is to create an anaconda environment by running:

conda env create -f conda_env.yml

After the installation ends you can activate your environment with:

source activate dbc


To train a DBC agent on the cheetah run task from image-based observations run:

python \
    --domain_name cheetah \
    --task_name run \
    --encoder_type pixel \
    --decoder_type identity \
    --action_repeat 4 \
    --save_video \
    --save_tb \
    --work_dir ./log \
    --seed 1

This will produce 'log' folder, where all the outputs are going to be stored including train/eval logs, tensorboard blobs, and evaluation episode videos. One can attacha tensorboard to monitor training by running:

tensorboard --logdir log

and opening up tensorboad in your browser.

The console output is also available in a form:

| train | E: 1 | S: 1000 | D: 0.8 s | R: 0.0000 | BR: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | RLOSS: 0.0000

a training entry decodes as:

train - training episode
E - total number of episodes 
S - total number of environment steps
D - duration in seconds to train 1 episode
R - episode reward
BR - average reward of sampled batch
ALOSS - average loss of actor
CLOSS - average loss of critic
RLOSS - average reconstruction loss (only if it is trained from pixels and decoder)

while an evaluation entry:

| eval | S: 0 | ER: 21.1676

which just tells the expected reward ER evaluating current policy after S steps. Note that ER is average evaluation performance over num_eval_episodes episodes (usually 10).

Running the natural video setting

You can download the Kinetics 400 dataset and grab the driving_car label from the train dataset to replicate our setup. Some instructions for downloading the dataset can be found here:


Download CARLA from, e.g.:


Add to your python path:

export PYTHONPATH=$PYTHONPATH:/home/rmcallister/code/bisim_metric/CARLA_0.9.6/PythonAPI
export PYTHONPATH=$PYTHONPATH:/home/rmcallister/code/bisim_metric/CARLA_0.9.6/PythonAPI/carla
export PYTHONPATH=$PYTHONPATH:/home/rmcallister/code/bisim_metric/CARLA_0.9.6/PythonAPI/carla/dist/carla-0.9.8-py3.5-linux-x86_64.egg

and merge the directories.

Then pull altered carla branch files:

git fetch
git checkout carla


pip install pygame
pip install networkx

Terminal 1:

cd CARLA_0.9.6
bash -fps 20

Terminal 2:

cd CARLA_0.9.6
# can run expert autopilot (uses privileged game-state information):
python PythonAPI/carla/agents/navigation/
# or can run bisim:
./ --agent bisim --transition_model_type probabilistic --domain_name carla


This project is CC-BY-NC 4.0 licensed, as found in the LICENSE file.


Learning Invariant Representations for Reinforcement Learning without Reconstruction



Code of conduct

Security policy





No releases published


No packages published