<a href="https://colab.research.google.com/github/r-scoville/deep-reinforcement-learning-huggy/blob/main/huggy_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Reinforcement Learning Agent (Huggy)  
## <Model Type>

---

November 2025  
R. Scoville

### About  

This notebook creates, trains, and evaluates a deep reinforcement learning agent.  

- Environment: Huggy the Dog (created by [Thomas Simonini](https://huggingface.co/spaces/ThomasSimonini/) based on [Puppo The Corgi](https://blog.unity.com/technology/puppo-the-corgi-cuteness-overload-with-the-unity-ml-agents-toolkit))
- RL library: [ML-Agents](https://github.com/Unity-Technologies/ml-agents)

### References
- [Hugging Face: Deep Reinforcement Learning Course, Bonus Unit 1](https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction)
- [ML-Agents Training Configuration](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md)

---
## 1. Set the GPU
This will accelerate the agent's training.  
`Runtime > Change runtime type > Hardware accelerator > T4 GPU`

---

## 2. Clone the ML-Agents repository

In [None]:
%%capture
# Clone the repository (can take 3min)
!git clone --depth 1 https://github.com/Unity-Technologies/ml-agents

---

## 3. Set up the virtual environment

In order for the ML-Agents to run successfully in Colab, Colab's Python version must meet the library's Python requirements.

We can check for the supported Python version under the `python_requires` parameter in the setup.py files. These files are required to set up the ML-Agents library for use and can be found in the following locations:

- [`/content/ml-agents/ml-agents/setup.py`](https://github.com/Unity-Technologies/ml-agents/blob/develop/ml-agents/setup.py)
- [`/content/ml-agents/ml-agents-envs/setup.py`](https://github.com/Unity-Technologies/ml-agents/blob/develop/ml-agents-envs/setup.py)

To resolve incompatibility errors, create a virtual environment with a Python version compatible with the ML-Agents library.

ML-Agents library's Python requirement at the time of this script's creation:
`>= 3.10.1, <= 3.10.12`

In [None]:
# Check Colab's current Python version (incompatible with ML-Agents)
!python --version

Python 3.10.12


In [None]:
# Install virtualenv and create a virtual environment
!pip install virtualenv
!virtualenv myenv

# Download and install Miniconda
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local

# Activate Miniconda and install Python ver 3.10.12
!source /usr/local/bin/activate
!conda install -q -y --prefix /usr/local python=3.10.12 ujson  # Specify the version here

# Set environment variables for Python and conda paths
!export PYTHONPATH=/usr/local/lib/python3.10/site-packages/
!export CONDA_PREFIX=/usr/local/envs/myenv

Collecting virtualenv
  Using cached virtualenv-20.35.4-py3-none-any.whl.metadata (4.6 kB)
Collecting distlib<1,>=0.3.7 (from virtualenv)
  Using cached distlib-0.4.0-py2.py3-none-any.whl.metadata (5.2 kB)
Collecting filelock<4,>=3.12.2 (from virtualenv)
  Downloading filelock-3.20.0-py3-none-any.whl.metadata (2.1 kB)
Using cached virtualenv-20.35.4-py3-none-any.whl (6.0 MB)
Using cached distlib-0.4.0-py2.py3-none-any.whl (469 kB)
Downloading filelock-3.20.0-py3-none-any.whl (16 kB)
Installing collected packages: distlib, filelock, virtualenv
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m3/3[0m [virtualenv]
[1A[2KSuccessfully installed distlib-0.4.0 filelock-3.20.0 virtualenv-20.35.4
created virtual environment CPython3.13.9.final.0-64 in 574ms
  creator CPython3Posix(dest=/content/myenv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, via=

In [None]:
# Python version in new virtual environment (compatible with ML-Agents)
!python --version

Python 3.10.12


---
## 4. Install the dependencies

In [None]:
%%capture
# Go inside the repository and install the package (this can take a few min.)
%cd ml-agents
!pip3 install -e ./ml-agents-envs
!pip3 install -e ./ml-agents

---
## 5. Download and move the environment zip file

The environment executable is in a zip file. Download it and move it to `./trained-envs-executables/linux/`

In [None]:
# Create the file's destination
!mkdir ./trained-envs-executables
!mkdir ./trained-envs-executables/linux

Download the file `Huggy.zip` from `https://github.com/huggingface/Huggy` using `wget`.

In [None]:
!wget "https://github.com/huggingface/Huggy/raw/main/Huggy.zip" -O ./trained-envs-executables/linux/Huggy.zip

--2025-11-11 17:35:27--  https://github.com/huggingface/Huggy/raw/main/Huggy.zip
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://media.githubusercontent.com/media/huggingface/Huggy/main/Huggy.zip [following]
--2025-11-11 17:35:27--  https://media.githubusercontent.com/media/huggingface/Huggy/main/Huggy.zip
Resolving media.githubusercontent.com (media.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to media.githubusercontent.com (media.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39214997 (37M) [application/zip]
Saving to: ‚Äò./trained-envs-executables/linux/Huggy.zip‚Äô


2025-11-11 17:35:28 (70.5 MB/s) - ‚Äò./trained-envs-executables/linux/Huggy.zip‚Äô saved [39214997/39214997]



In [None]:
# Extract the file
%%capture
!unzip -d ./trained-envs-executables/linux/ ./trained-envs-executables/linux/Huggy.zip

In [None]:
# Ensure the file is accessible
!chmod -R 755 ./trained-envs-executables/linux/Huggy

---

## üåê Environment Overview

### State Space
Huggy's state (information on its environment) informs what actions the agent takes. The state space includes:
- The target (stick) position
- The relative position between it and the target
- The orientation of its legs

</br>

### Action Space
Joint motors drive Huggy's legs. To reach the target, the agent must learn to rotate the joint motors of each leg correctly to move.

</br>

### Reward Function
The reinforcement learning reward function follows the *reward hypothesis*:  
A goal can be described as the maximization of the expected cumulative reward.  
This reward function translates the agent's goal of reaching the stick without spinning too much:  
- **Bonus for reaching the target**: positive reward for achieving the goal
- **Orientation bonus**: positive reward for getting closer to the target
- **Time penalty**: A fixed-time penalty (negative reward) given at every action if the agent has not reached the target
- **Rotation penalty**: Negative reward for spinning too much and turning too quickly

---
## 6. Create the Huggy config file
Define the training hyperparameters within the `Huggy.yaml` config file located at `/content/ml-agents/config/ppo`.

More information on hyperparameter tuning: [`/content/ml-agents/docs/training-configuration-file.md`](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md)

In [None]:
# Copy and paste the following into the new Huggy.yaml file
behaviors:
  Huggy:
    trainer_type: ppo
    hyperparameters:
      batch_size: 2048
      buffer_size: 20480
      learning_rate: 0.0003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 512
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.995
        strength: 1.0
    checkpoint_interval: 200000
    keep_checkpoints: 15
    max_steps: 2e6
    time_horizon: 1000
    summary_freq: 50000

---
## 7. Train the agent
Launch `mlagents-learn` and select the executable containing the environment.  

With ML Agents, run a training script and define four parameters:

1. `mlagents-learn <config>`: the path where the hyperparameter config file is
2. `--env`: where the environment executable is
3. `--run-id`: the name you want to give to your training run ID
4. `--no-graphics`: to not launch the visualization during the training

Train the model and use the `--resume` flag to continue training in case of interruption. It will fail first time when you use `--resume`. Try running the block again to bypass the error.

The training should take 30 to 45 min. depending on the machine/whether the runtime type is GPU.

In [None]:
!mlagents-learn ./config/ppo/Huggy.yaml --env=./trained-envs-executables/linux/Huggy/Huggy --run-id="Huggy2" --no-graphics

  import pkg_resources

            ‚îê  ‚ïñ
        ‚ïì‚ïñ‚ï¨‚îÇ‚ï°  ‚îÇ‚îÇ‚ï¨‚ïñ‚ïñ
    ‚ïì‚ïñ‚ï¨‚îÇ‚îÇ‚îÇ‚îÇ‚îÇ‚îò  ‚ï¨‚îÇ‚îÇ‚îÇ‚îÇ‚îÇ‚ï¨‚ïñ
 ‚ïñ‚ï¨‚îÇ‚îÇ‚îÇ‚îÇ‚îÇ‚ï¨‚ïú        ‚ïô‚ï¨‚îÇ‚îÇ‚îÇ‚îÇ‚îÇ‚ïñ‚ïñ                               ‚ïó‚ïó‚ïó
 ‚ï¨‚ï¨‚ï¨‚ï¨‚ïñ‚îÇ‚îÇ‚ï¶‚ïñ        ‚ïñ‚ï¨‚îÇ‚îÇ‚ïó‚ï£‚ï£‚ï£‚ï¨      ‚ïü‚ï£‚ï£‚ï¨    ‚ïü‚ï£‚ï£‚ï£             ‚ïú‚ïú‚ïú  ‚ïü‚ï£‚ï£
 ‚ï¨‚ï¨‚ï¨‚ï¨‚ï¨‚ï¨‚ï¨‚ï¨‚ïñ‚îÇ‚ï¨‚ïñ‚ïñ‚ïì‚ï¨‚ï™‚îÇ‚ïì‚ï£‚ï£‚ï£‚ï£‚ï£‚ï£‚ï£‚ï¨      ‚ïü‚ï£‚ï£‚ï¨    ‚ïü‚ï£‚ï£‚ï£ ‚ïí‚ï£‚ï£‚ïñ‚ïó‚ï£‚ï£‚ï£‚ïó   ‚ï£‚ï£‚ï£ ‚ï£‚ï£‚ï£‚ï£‚ï£‚ï£ ‚ïü‚ï£‚ï£‚ïñ   ‚ï£‚ï£‚ï£
 ‚ï¨‚ï¨‚ï¨‚ï¨‚îê  ‚ïô‚ï¨‚ï¨‚ï¨‚ï¨‚îÇ‚ïì‚ï£‚ï£‚ï£‚ïù‚ïú  ‚ï´‚ï£‚ï£‚ï£‚ï¨      ‚ïü‚ï£‚ï£‚ï¨    ‚ïü‚ï£‚ï£‚ï£ ‚ïü‚ï£‚ï£‚ï£‚ïô ‚ïô‚ï£‚ï£‚ï£  ‚ï£‚ï£‚ï£ ‚ïô‚ïü‚ï£‚ï£‚ïú‚ïô  ‚ï´‚ï£‚ï£  ‚ïü‚ï£‚ï£
 ‚ï¨‚ï¨‚ï¨‚ï¨‚îê     ‚ïô‚ï¨‚ï¨‚ï£‚ï£      ‚ï´‚ï£‚ï£‚ï£‚ï¨      ‚ïü‚ï£‚ï£‚ï¨    ‚ïü‚ï£‚ï£‚ï£ ‚ïü‚ï£‚ï£‚ï¨   ‚ï£‚ï£‚ï£  ‚ï£‚ï£‚ï£  ‚ïü‚ï£‚ï£     ‚ï£‚ï£‚ï£‚îå‚ï£‚ï£‚ïú
 ‚ï¨‚ï¨‚ï¨‚ïú       ‚ï¨‚ï¨‚ï£‚ï£      ‚ïô‚ïù‚ï£‚ï£‚ï¨      ‚ïô‚ï£‚ï£‚ï£

---

## 8. Push the agent to the Hugging Face Hub

### Connect to the Hub:
1. [Create](https://www.google.com/url?q=https%3A%2F%2Fwww.google.com%2Furl%3Fq%3Dhttps%253A%252F%252Fhuggingface.co%252Fjoin) and/or sign into an active Hugging Face account.
2. [Create a new token](https://www.google.com/url?q=https%3A%2F%2Fwww.google.com%2Furl%3Fq%3Dhttps%253A%252F%252Fhuggingface.co%252Fsettings%252Ftokens) with write role.
3. Copy the token.
4. Run the cell below and paste the token.

Note: If not using Colab or Jupyter Notebooks, run the `huggingface-cli login` command instead.

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv‚Ä¶

### Run `mlagents-push-to-hf` and define parameters:
1. `--run-id`: the name of the training run ID
2. `--local-dir`: results/<where the agent was saved\>, e.g., results/First Training
3. `--repo-id`: the name of the Hugging Face repo you want to create or update (Note: If the repo does not exist it will be created automatically.)
4. `--commit-message`: the commit message (HF repos are git repositories)

In [None]:
!mlagents-push-to-hf --run-id="HuggyTraining" --local-dir="./results/Huggy2" --repo-id="r-scoville/huggy-ppo" --commit-message="First Huggy PPO agent training complete."

[INFO] This function will create a model card and upload your HuggyTraining into HuggingFace Hub. This is a work in progress: If you encounter a bug, please send open an issue
[INFO] Pushing repo HuggyTraining to the Hugging Face Hub
Processing Files (0 / 0)      : |          |  0.00B /  0.00B            
New Data Upload               : |          |  0.00B /  0.00B            [A

  .../Huggy/Huggy-1199952.onnx:   6% 126k/2.27M [00:00<?, ?B/s][A[A


  ...y2/Huggy/Huggy-1199952.pt:   6% 748k/13.5M [00:00<?, ?B/s][A[A[A



  .../Huggy/Huggy-1399699.onnx:   6% 126k/2.27M [00:00<?, ?B/s][A[A[A[A




  ...y2/Huggy/Huggy-1399699.pt:   6% 748k/13.5M [00:00<?, ?B/s][A[A[A[A[A





  .../Huggy/Huggy-1599937.onnx:   6% 126k/2.27M [00:00<?, ?B/s][A[A[A[A[A[A






  .../Huggy/Huggy-1799975.onnx:   6% 126k/2.27M [00:00<?, ?B/s][A[A[A[A[A[A[A







  ...2/Huggy/Huggy-199713.onnx:   6% 126k/2.27M [00:00<?, ?B/s][A[A[A[A[A[A[A[A








  ...gy2/Huggy/Huggy-199713.

---

## 9. Try out the trained model in the browser

1. Open the [Huggy game](https://huggingface.co/spaces/ThomasSimonini/Huggy) in the browser.
2. Click `Play with my Huggy model`.
3. Load the Huggy model:
- Step 1: Enter username (`r-scoville`) and click `Search`.
- Step 2: Select the Huggy model repository (`r-scoville/huggy-ppo`)
- Step 3: Choose which model to play:  
  - For the most recent model, select `Huggy.onnx`.  
  - To see model progression along the training steps, select earlier models.
- Click `Watch the agent play`.


---

*End of Huggy script*