# Random Network Distillation 
The code is fully runnable in terminal, as instructed in `README.md`. This notebook is an alternative approach to train the networks and visualize key metrics, which calls exactly the same functions as in the terminal commands.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from train import train
from play import play
from rnd_rl.utils.plot_util import plot

In [None]:
# change this variable if you want to generate plot images
your_wandb_username = "FILL_YOUR_USERNAME_HERE"


## PPO baseline vs PPO with RND
Using Point Maze environment for best visualization.

### Vanilla PPO

In [None]:
maze_PPO_args = {    
    "env_name": "PointMaze_Medium-v3",
    "experiment_name": "maze_PPO",
    "use_rnd": False,
    "enable_safety_layer": False
    }

In [None]:
# NOTE: skip this step if you have already ran:
# uv run train.py --env_name InvertedPendulum-v5 --experiment_name maze_PPO
train(**maze_PPO_args)

In [None]:
# video visualization. equivalent to:
# uv run play.py --env_name InvertedPendulum-v5 --experiment_name maze_PPO
play(**maze_PPO_args)

### PPO with RND

In [None]:
maze_PPO_RND_args = {    
    "env_name": "PointMaze_Medium-v3",
    "experiment_name": "maze_PPO_RND",
    "use_rnd": True,
    "reward_normalization": True,
    "obs_normalization": False,
    "enable_safety_layer": False
    }

In [None]:
# NOTE: skip this step if you have already ran:
# uv run train.py --env_name PointMaze_Medium-v3 --experiment_name maze_PPO_RND --use_rnd --normalize_rnd
train(**maze_PPO_RND_args)

In [None]:
# video visualization. equivalent to:
# uv run play.py --env_name PointMaze_Medium-v3 --experiment_name maze_PPO_RND --use_rnd --normalize_rnd
play(**maze_PPO_RND_args)

### Plot Comparison
For formatted plotting in report. This should be the same as seen in your wandb workspace.

In [None]:
indices = [0,1] # change this if you have pervious runs in your wandb workspace

_ = plot(wandb_username = your_wandb_username, 
     indices = indices, title = "RND on PointMaze_Medium-v3", top_ylims = [None, None])

In [None]:
your_wandb_username + "/" + "rnd_rl"

## Safety Layer effect
Using Inverted Pendulum environment.

### PPO with RND

In [None]:
PPO_RND_args = {    
    "experiment_name": "PPO_RND",
    "use_rnd": True,
    "reward_normalization": True,
    "obs_normalization": True,
    "enable_safety_layer": False,
    # "max_epochs": 20
    }

In [None]:
# NOTE: skip this step if you have already ran:
# uv run train.py --experiment_name PPO_RND --use_rnd --normalize_rnd
train(**PPO_RND_args)

In [None]:
# video visualization. equivalent to:
# uv run play.py --experiment_name PPO_RND --use_rnd --normalize_rnd
play(**PPO_RND_args)

### PPO with RND and safety layer
This is more computationally intensive than previous experiments and can take significantly longer time.

In [None]:
PPO_CBF_args = {    
    "experiment_name": "PPO_CBF",
    "use_rnd": True,
    "reward_normalization": True,
    "obs_normalization": True,
    "enable_safety_layer": True,
    # "max_epochs": 20
    }

In [None]:
# NOTE: skip this step if you have already ran:
# uv run train.py --experiment_name PPO_CBF --use_rnd --normalize_rnd --enable_safety_layer
train(**PPO_CBF_args)

In [None]:
# video visualization. equivalent to:
# uv run play.py --experiment_name PPO_CBF --use_rnd --normalize_rnd --enable_safety_layer
play(**PPO_CBF_args)

### Plot Comparison
For formatted plotting in report. This should be the same as seen in your wandb workspace.

In [None]:
indices = [2, 3] # change this if you have pervious runs in your wandb workspace

plot(wandb_username = your_wandb_username, 
     indices = indices, title = "Safety Layer effect", safety_experiment = True)