<a href="https://colab.research.google.com/github/migolan/HF-DRLC/blob/main/05_MLAgents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro

This notebook is based on https://huggingface.co/learn/deep-rl-course/unit5/hands-on.

* [Pyramids environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#pyramids)
* SnowballTarget environment
* [ML-Agents library](https://github.com/Unity-Technologies/ml-agents)
  * [Training Configuration File](https://github.com/Unity-Technologies/ml-agents/blob/release_20_docs/docs/Training-Configuration-File.md)
* PPO agent
* [Random Network Distillation](https://medium.com/data-from-the-trenches/curiosity-driven-learning-through-random-network-distillation-488ffd8e5938)

# Installations

In [None]:
%%capture
!pip install virtualenv
!virtualenv myenv
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
!source /usr/local/bin/activate
!conda install -q -y --prefix /usr/local python=3.10.12 ujson
!export PYTHONPATH=/usr/local/lib/python3.10/site-packages/
!export CONDA_PREFIX=/usr/local/envs/myenv

In [None]:
%%capture
!git clone --depth 1 https://github.com/Unity-Technologies/ml-agents
%cd ml-agents
!pip3 install -e ./ml-agents-envs
!pip3 install -e ./ml-agents

# SnowballTarget environment

## Install environment

In [None]:
!mkdir ./training-envs-executables
!mkdir ./training-envs-executables/linux
!wget "https://github.com/huggingface/Snowball-Target/raw/main/SnowballTarget.zip" -O ./training-envs-executables/linux/SnowballTarget.zip
!unzip -d ./training-envs-executables/linux/ ./training-envs-executables/linux/SnowballTarget.zip
!chmod -R 755 ./training-envs-executables/linux/SnowballTarget

## Define the SnowballTarget config file
`./content/ml-agents/config/ppo/SnowballTarget.yaml`:

```
behaviors:
  SnowballTarget:
    trainer_type: ppo
    summary_freq: 10000
    keep_checkpoints: 10
    checkpoint_interval: 50000
    max_steps: 200000
    time_horizon: 64
    threaded: false
    hyperparameters:
      learning_rate: 0.0003
      learning_rate_schedule: linear
      batch_size: 128
      buffer_size: 2048
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
```

## Train the agent

In [None]:
!mlagents-learn ./config/ppo/SnowballTarget.yaml --env=./training-envs-executables/linux/SnowballTarget/SnowballTarget --run-id="SnowballTarget1" --no-graphics

## Push the agent to the HF Hub

In [None]:
from huggingface_hub import notebook_login
notebook_login()

In [None]:
!mlagents-push-to-hf --run-id="SnowballTarget1" --local-dir="./results/SnowballTarget1" --repo-id="migolan/ppo-SnowballTarget" --commit-message="First Push"

## Watch the agent playing
https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget

# Pyramids environment

In [None]:
!wget "https://huggingface.co/spaces/unity/ML-Agents-Pyramids/resolve/main/Pyramids.zip" -O ./training-envs-executables/linux/Pyramids.zip
!unzip -d ./training-envs-executables/linux/ ./training-envs-executables/linux/Pyramids.zip
!chmod -R 755 ./training-envs-executables/linux/Pyramids/Pyramids

##  Modify the PyramidsRND config file
- The Pyramids environment is part of Unity, so the PyramidsRND config file already exists in `./content/ml-agents/config/ppo/PyramidsRND.yaml`.
- RND stands for *random network distillation* - a way to generate curiosity rewards: https://medium.com/data-from-the-trenches/curiosity-driven-learning-through-random-network-distillation-488ffd8e5938.


In [None]:
!mlagents-learn ./config/ppo/PyramidsRND.yaml --env=./training-envs-executables/linux/Pyramids/Pyramids --run-id="Pyramids Training" --no-graphics

## Push the agent to the HF Hub

In [None]:
!mlagents-push-to-hf  --run-id="Pyramids Training" --local-dir="./results/Pyramids Training"  --repo-id="migolan/ppo-Pyramids"  --commit-message="First Push"

## Watch the agent playing

https://huggingface.co/spaces/unity/ML-Agents-Pyramids

# Additional challenges
* [Other MLAgents environments](https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Examples.md)
- [Worm](https://huggingface.co/spaces/unity/ML-Agents-Worm) demo where you teach a **worm to crawl**.
- [Walker](https://huggingface.co/spaces/unity/ML-Agents-Walker) demo where you teach an agent **to walk towards a goal**.