## VizDoom Health Gathering Supreme

This notebook focuses on setting up and training an agent in the Health Gathering Supreme scenario of VizDoom, a 3D first-person shooter environment commonly used in reinforcement learning research. This scenario is especially interesting due to its sparse reward structure, where the agent must survive as long as possible by collecting health kits while constantly taking damage from the acid floor. Agent gets rewared based on its ability to pathfind towards health kits crucial for its survival.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit9/Health-Gathering-Supreme.png" alt="Health-Gathering-Supreme"/>


Map is a rectangle containing walls and with acid that **hurts the player periodically**. Initially there are some medkits spread uniformly over the map. A new medkit falls from the skies every now and then. **Medkits heal some portions of player's health** - to survive agent needs to pick them up. Episode finishes after player's death or on timeout.

Further configuration:
- Living_reward = 1
- 3 available buttons: turn left, turn right, move forward
- 1 available game variable: HEALTH
- death penalty = 100

### Installation and Setup

In this section, we will install all the necessary system dependencies and Python packages required to run the ViZDoom Health Gathering Supreme environment and train reinforcement learning agents using Sample Factory.

#### Why Sample Factory?
To run multiple environments in parallel. Lets us implement APPO (Asynchronous Proximal Policy Optimization), which is a highly optimized version of PPO, suitable for complex 3D environments like Doom.

In [None]:
%%capture
%%bash
apt-get install build-essential zlib1g-dev libsdl2-dev libjpeg-dev \
nasm tar libbz2-dev libgtk2.0-dev cmake git libfluidsynth-dev libgme-dev \
libopenal-dev timidity libwildmidi-dev unzip ffmpeg
apt-get install libboost-all-dev
apt-get install liblua5.1-dev

In [None]:
!pip install faster-fifo==1.4.2
!pip install vizdoom

In [None]:
!pip install sample-factory==2.1.1

torch.load 2.6 (default) will set the parameter -> weights_only = True by default which prevents it from loading custom/np objects. To avoid this we need to downgrade torch. More info: https://docs.pytorch.org/docs/stable/generated/torch.load.html

In [None]:
!pip install torch==2.5.1

### Setting up the Doom Environment in sample-factory

In [None]:
import functools

from sample_factory.algo.utils.context import global_model_factory
from sample_factory.cfg.arguments import parse_full_cfg, parse_sf_args
from sample_factory.envs.env_utils import register_env
from sample_factory.train import run_rl

from sf_examples.vizdoom.doom.doom_model import make_vizdoom_encoder
from sf_examples.vizdoom.doom.doom_params import add_doom_env_args, doom_override_defaults
from sf_examples.vizdoom.doom.doom_utils import DOOM_ENVS, make_doom_env_from_spec


# Registers ViZDoom environments
def register_vizdoom_envs():
    for env_spec in DOOM_ENVS:
        make_env_func = functools.partial(make_doom_env_from_spec, env_spec)
        register_env(env_spec.name, make_env_func)

def register_vizdoom_models():
    global_model_factory().register_encoder_factory(make_vizdoom_encoder)


def register_vizdoom_components():
    register_vizdoom_envs()
    register_vizdoom_models()

# parse the command line args and create a config
def parse_vizdoom_cfg(argv=None, evaluation=False):
    parser, _ = parse_sf_args(argv=argv, evaluation=evaluation)
    # lets you specify the parameters for the environment
    add_doom_env_args(parser)
    # override defaults
    doom_override_defaults(parser)
    final_cfg = parse_full_cfg(parser, argv)
    return final_cfg

### Training the agent
Toggle the values of **num_workers** and **num_envs_er_worker** as much as your device can handle. The higher the better, however, requries more processing power.
**gamma** is set to 0.98 to help the agent focus on immediate rewards as the health kits are random anyways.
**train_for_env_steps** should be set high enough.

Suggestions, could change learning_rate, increase rollout to stabilize learning early on, increase num_epochs (default = 1) for sample efficiency etc.

In [None]:
register_vizdoom_components()
env = "doom_health_gathering_supreme"
cfg = parse_vizdoom_cfg(argv=[f"--env={env}", "--num_workers=16", "--num_envs_per_worker=8", "--gamma=0.98", "--rollout=32", "--learning_rate=0.0001","--train_for_env_steps=33000000"])

status = run_rl(cfg)

### Evaluate the Performance and save the video
While evaluating, you only need to run 1 worker. Ensure save_video is enabled, to see the agent playing VizDoom

In [None]:
from sample_factory.enjoy import enjoy
cfg = parse_vizdoom_cfg(argv=[f"--env={env}", "--num_workers=1", "--save_video", "--no_render", "--max_num_episodes=10"], evaluation=True)
status = enjoy(cfg)

### View "replay.mp4"

In [None]:
from base64 import b64encode
from IPython.display import HTML

mp4 = open('/content/train_dir/default_experiment/replay.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=640 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

Model Card available on https://huggingface.co/loke-07/vizdoom_health_gathering_supreme