In this repo you can find my implementation for exercises of the Deep Reinforcement Learning Course from Hugging Face.
The original course material is implemented in notebooks for Google Colab. However, I have tried to implement and run locally to gain better understanding of these exercises.
You can find the exercise for each unit in his respective folder. Here is a brief summary of each one:
-
Unit 1: Introduction A general introduction to Reinforcement Learning, where you can learn the basic concepts. In the exercise you can train an agent controlling a simple spaceship to land on the moon.
-
Unit 2: Q-Learning model explanation. In the exercise you can train a Q-Learning agent (implemented from scratch) to play in two different environments: Frozen Lake v1 and Taxi v3.
-
Unit 3: Deep Q-Learning model explanation. In the exercise you can train a Deep Q-Learning agent to play Atari games using the game frames and Convolutional Neural Network (CNN).
-
Unit 4: A review of Policy-based methods. In the exercise you have to implement the Policy Gradient algorithm using Pytorch to play in two different environments.
-
Unit 5: Introduction to the fundamentals ot the ML-Agents toolkit. In the exercise you have to train agents for two Unity environments: SnowballTarget (created at Hugging Face) and Pyramids (created by the Unity team).
-
Unit 6: This unit explains a new algorithm called Actor-Critic, which is a combination of Value-Based and Policy-Based methods. In the exercise you can train agents for two robotic based environments.
-
Unit 7: An introduction to Multi-Agents Reinforcement Learning (MARL). In the exercise you have to train a MARL system to play soccer in a 2vs2 match. The environment is a Unity environment and the model is trained using ML-Agents.
-
Unit 8 part 1: Explanation of the Proximal Policy Optimization (PPO) algorithm. In the exercise you implement a PPO agent from scratch using Pytorch to play Lunar Lander environment.
-
Unit 8 part 2: Application of the Proximal Policy Optimization (PPO) algorithm in a VizDoom environment. The exercise uses the Health Gathering Supreme environment from VizDoom, and uses the Sample Factory library (focused on efficiency) to train the models with a high-throughput pipeline.
