Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Official code repo for the paper "Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning". This paper introduces a new method which we call Contrastive Successor Features (CSF), which achieves compareable performance to current SOTA unsupervised skill discovery methods while at its core relying on mutual information maximization.

Installation 🔌

After cloning this repo, please run the following commands at the root of the project:

# Setting up the conda environment
conda create --name csf python=3.9
conda activate csf

# Installing dependencies
pip install -r requirements.txt --no-deps
pip install -e .
pip install -e garaged
pip install --upgrade joblib
pip install patchelf

Note

Pip might complain about incompatible versions -- this is expected and can be safely ignored.

Next, we need to do some Mujoco setup.

conda activate csf
conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c anaconda mesa-libgl-cos6-x86_64
conda install -c menpo glfw3

We also need to tell Mujoco which backend to use. This can be done by setting the appropriate environment variables.

conda env config vars set MUJOCO_GL=egl PYOPENGL_PLATFORM=egl
conda deactivate && conda activate csf

If you don't already have Mujoco, you will need it. Install Mujoco in a folder called .mujoco. More instructions on how to do so are linked here.

Finally, you may want to add the following environment variables to your .bashrc file:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
export CPATH=$CONDA_PREFIX/include

Remember to source your .bashrc file after changing it: source ~/.bashrc.

Running Experiments 🏃‍♂️

For unsupervised pretraining (state coverage), you can use the following commands. Make sure to run these from the root of the project.

# Ant
sh scripts/pretrain/csf_ant.sh

# HalfCheetah
sh scripts/pretrain/csf_halfcheetah.sh

# Humanoid
sh scripts/pretrain/csf_humanoid.sh

# Quadruped
sh scripts/pretrain/csf_quadruped.sh

# Kitchen 
sh scripts/pretrain/csf_kitchen.sh

# Robobin
sh scripts/pretrain/csf_robobin.sh

Note

All experiments were run on a single GPU, usually with between 8 - 10 workers (see the --n_parallel flag). In addition, we found we needed 32GB of CPU memory (RAM) for all state-based experiments (Ant and HalfCheetah), while we needed 40GB of CPU memory for all image-based experiments (Humanoid, Quadruped, Kitchen, Robobin).

Once experiments are running, they will be logged under the exp folder.

Videos of learned policies

Our key theoretical result is that a prominent skill learning algorithm (METRA) can be reinterpreted as doing mutual information maximization. This opens the door to a new skill learning method (CSF) that is simpler, has the same objective as a long line of prior work, and achieves results that are on par with the current SOTA.

The videos below indeed show that our method (CSF) qualitatively learns similar skills to the prior method (METRA).

Robobin

csf_robobin_3000.mp4

metra_robobin_3000.mp4

Quadruped

csf_quadruped_3000.mp4

metra_quadruped_3000.mp4

Humanoid

csf_humanoid_3000.mp4

metra_humanoid_3000.mp4

Ant

csf_ant_40k.mp4

metra_ant_40k.mp4

Cheetah

csf_cheetah_28k.mp4

metra_cheetah_28k.mp4

Kitchen

csf_kitchen_3000.mp4

metra_kitchen_3000.mp4

Acknowledgements

This code repo was built on the original METRA repo.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dowel		dowel
envs		envs
garaged		garaged
garagei		garagei
iod		iod
networks		networks
run		run
scripts/pretrain		scripts/pretrain
LICENSE_garage		LICENSE_garage
README.md		README.md
analyze_ant_goal.ipynb		analyze_ant_goal.ipynb
analyze_ant_multi_goal.ipynb		analyze_ant_multi_goal.ipynb
analyze_ant_zero_shot_multi_goal.ipynb		analyze_ant_zero_shot_multi_goal.ipynb
craftax-reqs.txt		craftax-reqs.txt
dowel_wrapper.py		dowel_wrapper.py
global_context.py		global_context.py
play.ipynb		play.ipynb
play.py		play.py
play_unnormalized.ipynb		play_unnormalized.ipynb
remove.py		remove.py
requirements.txt		requirements.txt
setup.py		setup.py
test_bessel.py		test_bessel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Installation 🔌

Running Experiments 🏃‍♂️

Videos of learned policies

Robobin

Quadruped

Humanoid

Ant

Cheetah

Kitchen

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

jens321/csf

Folders and files

Latest commit

History

Repository files navigation

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Installation 🔌

Running Experiments 🏃‍♂️

Videos of learned policies

Robobin

Quadruped

Humanoid

Ant

Cheetah

Kitchen

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages