Skip to content


Repository files navigation


Here we introduce a new sampling strategy for buffers in reinforcement learning that is based on critic estimates.

Environment Setup for Conda

The experiments rely on MuJoCo being installed. Follow the instructions from mujoco-py and download the archive with MuJoCo (version 2.1.0). Then extract the file into the standard path ~/.mujoco/mujoco210.

After this step, we use conda to create a python environment with all the dependencies:

conda env create -f ./scripts/environment.yml
conda activate mujoco

This will install the latest versions of the packages in ./scripts/environment.yml. If you want to use exactly the same versions we use (and you are on on a linux-64 platform) you can use

conda create --name mujoco --file ./scripts/explicit_env_specs_linux64.txt
conda activate mujoco

In order to run the codes you then need to activate the conda environment and set some variables (to allow MuJoCo to compile some libraries). We used gcc v8.3.0 and added the following env variables:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<YOUR-HOME>/.mujoco/mujoco210/bin
export CPATH=<YOUR-CONDA-PREFIX>/envs/mujoco/include

where you should change <YOUR-HOME> to your home folder and <YOUR-CONDA-PREFIX> to the loation of the conda installation.

The code is tested with

  • Python (3.9.12)
  • Python (3.10.4)

Code execution from terminal


  • -- buffer (Optional, default:uniform) : uncertainty (MEET), prioritized or uniform

  • -- buffer_size (Optional, default:1e5) : buffer capacity in the format of 1e5

  • -- alg (Optional, default:sac) : implemented algorithms are sac, ddpg, td3

  • -- n_epochs (Optional, default:1000) : number of epochs. Steps per epoch is fixed to 1000. Thus, n_epochs=1000 result in 1 million steps.

  • -- n_experiments (Optional, default:1) : number of experiments for sequential execution

  • -- env (Optional, default:Humanoid-v3) : Choose a desired mujoco environment out of

    • Humanoid-v3
    • Ant-v3,
    • HalfCheetah-v3
    • Walker2d-v3
    • InvertedPendulum-v2
    • InvertedDoublePendulum-v2
    • HumanoidStandup-v2
    • Reacher-v2
    • Swimmer-v3
    • Hopper-v3

Example command

python playground/ --buffer=uncertainty --buffer_size=1e5 --alg=sac --n_epochs=1000 --n_experiments=1 --env=Walker2d-v3

Google Colab Execution

These steps need no further installation and works out of the box

  1. upload notebooks/RL_multi_head.ipynb in google colab
  2. compress the the directory with the code to a zip file
  3. follow the steps in the explanation video


Introduce uncertainty sampling based on critic estimates







No releases published


No packages published