Here we introduce a new sampling strategy for buffers in reinforcement learning that is based on critic estimates.
The experiments rely on MuJoCo being installed. Follow the instructions from mujoco-py
and download the archive with MuJoCo (version 2.1.0). Then extract the file into the standard path ~/.mujoco/mujoco210
.
After this step, we use conda
to create a python
environment with all the dependencies:
conda env create -f ./scripts/environment.yml
conda activate mujoco
This will install the latest versions of the packages in ./scripts/environment.yml
. If you want to use exactly the same versions we use (and you are on on a linux-64
platform) you can use
conda create --name mujoco --file ./scripts/explicit_env_specs_linux64.txt
conda activate mujoco
In order to run the codes you then need to activate the conda
environment and set some variables (to allow MuJoCo to compile some libraries).
We used gcc
v8.3.0 and added the following env
variables:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<YOUR-HOME>/.mujoco/mujoco210/bin
export CPATH=<YOUR-CONDA-PREFIX>/envs/mujoco/include
where you should change <YOUR-HOME>
to your home folder and <YOUR-CONDA-PREFIX>
to the loation of the conda
installation.
The code is tested with
- Python (3.9.12)
- Python (3.10.4)
-
-- buffer (Optional, default:uniform)
: uncertainty (MEET), prioritized or uniform -
-- buffer_size (Optional, default:1e5)
: buffer capacity in the format of 1e5 -
-- alg (Optional, default:sac)
: implemented algorithms are sac, ddpg, td3 -
-- n_epochs (Optional, default:1000)
: number of epochs. Steps per epoch is fixed to 1000. Thus, n_epochs=1000 result in 1 million steps. -
-- n_experiments (Optional, default:1)
: number of experiments for sequential execution -
-- env (Optional, default:Humanoid-v3)
: Choose a desired mujoco environment out of- Humanoid-v3
- Ant-v3,
- HalfCheetah-v3
- Walker2d-v3
- InvertedPendulum-v2
- InvertedDoublePendulum-v2
- HumanoidStandup-v2
- Reacher-v2
- Swimmer-v3
- Hopper-v3
python playground/reinforcement_learning_multi_head.py --buffer=uncertainty --buffer_size=1e5 --alg=sac --n_epochs=1000 --n_experiments=1 --env=Walker2d-v3
- upload notebooks/RL_multi_head.ipynb in google colab
- compress the the directory with the code to a zip file
- follow the steps in the explanation video