"Sim-to-Real transfer of Reinforcement Learning policies in robotics" exam project.
Note! This template has changed on Dec 21st, 2022. Make sure to clone the latest version.
You can play around with the code on your local machine, and use Google Colab for training on GPUs. When dealing with simple multi-layer perceptrons (MLPs), you can even attempt training on your local machine.
Before starting to implement your own code, make sure to:
- read and study the material provided
- read the documentation of the main packages you will be using (mujoco-py, Gym, stable-baselines3)
- play around with the code in the template to familiarize with all the tools. Especially with the
test_random_policy.py
script.
You can work on your local machine directly, at least for the first stages of the project. By doing so, you will also be able to render the Mujoco environments and visualize what's happening. This code has been tested on Linux with python 3.7 (Windows is somewhat deprecated, but may work also).
Dependencies
- Install MuJoCo and the Python Mujoco interface following the instructions here: https://github.com/openai/mujoco-py
- Run
pip install -r requirements.txt
to further installGym
andStable-baselines3
.
Check your installation by launching python test_random_policy.py
.
As the latest version of mujoco-py
is not compatible for Windows, you may:
- Try downloading a previous version (not recommended)
- Try installing WSL2 (requires fewer resources) or a full Virtual Machine to run Linux on Windows. Then you can follow the instructions above for Linux.
- Stick to the Google Colab template (see below), which runs on the browser regardless of the operating system. This option, however, will not allow you to render the environment in an interactive window for debugging purposes.
You can also run the code on Google Colab
- Download all files contained in the
colab_template
folder in this repo. - Load the
test_random_policy.ipynb
file on https://colab.research.google.com/ and follow the instructions on it
NOTE 1: rendering is currently not officially supported on Colab, making it hard to see the simulator in action. We recommend that each group manages to play around with the visual interface of the simulator at least once, to best understand what is going on with the Hopper environment.
NOTE 2: you need to stay connected to the Google Colab interface at all times for your python scripts to keep training.
The core of our project is in the src
folder, that contains:
- Some utilities in
utils
folder networks
: two models used in the step 4.2 as feature extractor.src
: the implementation of the project steps
The single steps of the project can be run by the root directory with the command python main.py --step STEP [--logdir] [-f] [-v] [--test]
.
The optional params represents:
--logdir BASE_PREFIX
: The directory of log files-f
, --force : force the execution and overwrite the previous logs-v V
: Which version of current step to run--test
: Skip the training phase and load the corresponding saved model