# Homework 1: Intro to Deep RL with Single Agent Training Environments

The goal of this assignment is to gain hands-on experience with the key components of Reinforcement Learning (RL) environments. 

For more details please checkout [HW1.md](../HW1.md)

## Setup

You will need to make a copy of this notebook in your Google Drive before you can edit the homework files. You can do so with **File &rarr; Save a copy in Drive**.

In [None]:
#@title Mount Your Google Drive
#@markdown Your work will be stored in a folder called `cs285_f2022` by default to prevent Colab instance timeouts from deleting your edits.

import os
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
#@title Setup Mount Symlink

DRIVE_PATH = '/content/gdrive/My\ Drive/rl_class'
DRIVE_PYTHON_PATH = DRIVE_PATH.replace('\\', '')
if not os.path.exists(DRIVE_PYTHON_PATH):
  %mkdir $DRIVE_PATH

## the space in `My Drive` causes some issues,
## make a symlink to avoid this
SYM_PATH = '/content/rl_class'
if not os.path.exists(SYM_PATH):
  !ln -s $DRIVE_PATH $SYM_PATH

In [None]:
#@title Apt Install Requirements

#@markdown Run each section with Shift+Enter

#@markdown Double-click on section headers to show code.

!apt update
!apt install -y --no-install-recommends \
        build-essential \
        curl \
        git \
        gnupg2 \
        make \
        cmake \
        ffmpeg \
        swig \
        libz-dev \
        unzip \
        zlib1g-dev \
        libglfw3 \
        libglfw3-dev \
        libxrandr2 \
        libxinerama-dev \
        libxi6 \
        libxcursor-dev \
        libgl1-mesa-dev \
        libgl1-mesa-glx \
        libglew-dev \
        libosmesa6-dev \
        lsb-release \
        ack-grep \
        patchelf \
        wget \
        xpra \
        xserver-xorg-dev \
        xvfb \
        python-opengl \
        ffmpeg

In [None]:
#@title Clone Homework Repo

%cd $SYM_PATH
# !git clone https://github.com/heng2j/multigrid.git
%cd multigrid/
%pip install -r requirements_colab.txt
%pip install -e .

In [None]:
!pip install gym tensorboard moviepy torch opencv-python swig box2d-py ray[rllib] scikit-image pygame numba Gymnasium black PyYAML

## Editing Code

To edit code, click the folder icon on the left menu. Navigate to the corresponding file (`multigrid/...`). Double click a file to open an editor. There is a timeout of about ~12 hours with Colab while it is active (and less if you close your browser window). We sync your edits to Google Drive so that you won't lose your work in the event of an instance timeout, but you will need to re-mount your Google Drive and re-install packages with every new instance.

In [2]:
#@title Imports
from __future__ import annotations

import json
import pathlib
import os
import numpy as np
from dataclasses import dataclass, asdict, field
from types import SimpleNamespace
import git

import ray
from ray import tune
from ray.rllib.algorithms import AlgorithmConfig, Algorithm
from ray.tune import CLIReporter
from ray.air.integrations.mlflow import MLflowLoggerCallback


from multigrid.envs import *
from multigrid.utils.training_utilis import algorithm_config, can_use_gpu,  get_checkpoint_dir, policy_mapping_fn





In [None]:
#@title Fix Variables
SCRIPT_PATH = str(pathlib.Path(__file__).parent.absolute().parent.absolute())

## For Agent Training



In [None]:
#@title Configurable Training Function

# Set up Ray CLIReporter 
# NOTE Limit the number of rows.
reporter = CLIReporter(max_progress_rows=10)

# Configurable Training Function
def train(
    algo: str,
    config: AlgorithmConfig,
    stop_conditions: dict,
    save_dir: str,
    user_name: str,
    checkpoint_freq: int = 20,
    load_dir: str | None = None,
    local_mode: bool = False,
    experiment_name: str = "testing_experiment",
    mlflow_tracking_uri: str = "submission/mlflow",
):
    """
    Train an RLlib algorithm.

    Parameters
    ----------
    algo : str
        Name of the RLlib-registered algorithm to use.
    config : AlgorithmConfig
        Algorithm-specific configuration parameters.
    stop_conditions : dict
        Conditions to stop the training loop.
    save_dir : str
        Directory to save training checkpoints and results.
    user_name : str
        Experimenter's name.
    checkpoint_freq : int, optional
        Frequency of saving checkpoints, by default 20.
    load_dir : str, optional
        Directory to load pre-trained models from, by default None.
    local_mode : bool, optional
        Set to True to run Ray in local mode for debugging, by default False.
    experiment_name : str, optional
        Name of the experiment, by default "testing_experiment".
    mlflow_tracking_uri : str, optional
        Directory to save MLFlow metrics and artifacts, by default "submission/mlflow".
    """

    # Initialize Ray.
    ray.init(num_cpus=(config.num_rollout_workers + 1), local_mode=local_mode)

    # Execute training
    tune.run(
        algo,
        stop=stop_conditions,
        config=config,
        local_dir=save_dir,
        verbose=1,
        restore=get_checkpoint_dir(load_dir),
        checkpoint_freq=checkpoint_freq,
        checkpoint_at_end=True,
        progress_reporter=reporter,
        callbacks=[
            MLflowLoggerCallback(
                tracking_uri=mlflow_tracking_uri,
                experiment_name=experiment_name,
                tags={
                    "user_name": user_name,
                    "git_commit_hash": git.Repo(SCRIPT_PATH).head.commit,
                },
                save_artifact=True,
            )
        ],
    )

    # Shutdown Ray after training is complete
    ray.shutdown()




In [4]:
#@title Training Arguments
@dataclass
class Args:

  #@markdown agent config
  num_agents: int = 1 #@param {type: "integer"}
  algo: str = "PPO"  #@param {type: "string"}
  framework: str = "torch" #@param ['torch', 'tf2']

  #@markdown environemnt config
  env: str = "MultiGrid-CompetativeRedBlueDoor-v0"  #@param {type: "string"}

  #@markdown training config
  num_workers: int = 10  #@param {type: "integer"}
  num_gpus: int = 0 #@param {type: "integer"}
  # Please only keep the checkpoints that you want to submit
  save_dir: str = "../../submission/ray_results/" #@param {type: "string"}
  user_name: str = "<Your Name>" #@param {type: "string"}
  experiment_name: str = "testing_experiment", #@param {type: "string"}
  mlflow_tracking_uri: str = "../../submission/mlflow/", #@param {type: "string"}
  checkpoint_freq: int = 20 #@param {type: "integer"}
  num_timesteps: float = 5e5 #@param {type: "string"}
  checkpoint_freq: int = 20 #@param {type: "integer"}
  seed: int = 1 #@param {type: "integer"}






In [5]:
#@title Set up Training Arguments
def to_namespace(self):
    return SimpleNamespace(**asdict(self))

args = Args()
print(args)  # Prints the values of all attributes

args = args.to_namespace()
print(args)

config = algorithm_config(**vars(args))
config.seed = args.seed
stop_conditions = {'timesteps_total': args.num_timesteps}

print(config.env)
print(config.seed)

Args(num_agents=2, algo='PPO', framework='torch', env='MultiGrid-CompetativeRedBlueDoor-v0', num_workers=6, num_gpus=0, save_dir='submission/ray_results/', num_timesteps=500000.0, ep_len=1000, checkpoint_freq=20, seed=1)
namespace(num_agents=2, algo='PPO', framework='torch', env='MultiGrid-CompetativeRedBlueDoor-v0', num_workers=6, num_gpus=0, save_dir='submission/ray_results/', num_timesteps=500000.0, ep_len=1000, checkpoint_freq=20, seed=1)


 ## Initialize and Show Tensorboard Before Training

 Filter tags for key performance metrics:

episode_len_mean|ray/tune/episode_reward_mean|episode_reward_min|entropy|vf|loss|kl|cpu|ram

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

In [None]:
# Start TensorBoard and Map the `logdir`` to `save_dir` i.e. `/content/rl_class/multigrid/submission/ray_results/PPO`
%tensorboard --logdir /content/rl_class/multigrid/submission/ray_results/PPO

## Execute training

In [None]:
# Execute training
train(args.algo, config, stop_conditions, args.save_dir, None)

# Please remember to clear your training outputs before you submit your notebook to reduce file size and increase readability

In [None]:
# NOTE Manually shutdown Ray if needed
ray.shutdown()

## Submission for Task 3 - Monitor and Track Agent Training with Tensorboard and Save Out Visualization from Evaluation

1. Please take screenshots of your Tensorboard plots that highlight your performance metrics
2. Embedd your images here in CoLab
3. Only save the best checkpoint and video in the /submission folder and push to your repo


In [None]:
from google.colab import files
from IPython.display import Image

uploaded = files.upload()

# Assuming a single image file is uploaded
for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(name=fn, length=len(uploaded[fn])))
  display(Image(fn))

In [2]:
#@title Your Tensorboard Screenshots Go Here
...

## Agent Evaluation

In [20]:

#@title Evaluation Arguments

@dataclass
class EvalArgs:

  #@markdown agent config
  num_agents: int = 1 #@param {type: "integer"}
  algo: str = "PPO"  #@param {type: "string"}
  framework: str = "torch" #@param ['torch', 'tf2']
  lstm: bool = False #@param {type: "boolean"}

  #@markdown environemnt config
  env: str = "MultiGrid-CompetativeRedBlueDoor-v0"  #@param {type: "string"}
  env_config: json.load = {}

  #@markdown Evaluation config
  num_episodes: int = 10 #@param {type: "integer"}
  load_dir: str = "../../submission/ray_results/PPO/PPO_MultiGrid-CompetativeRedBlueDoor-v2_de334_00000_0_2023-07-14_17-32-00" #@param {type: "string"}
  gif: str = "../../submission/notebook/result.gif" #@param {type: "string"}
  # num_timesteps: float = 5e5 #@param {type: "string"}
  # ep_len: int = 1000 #@param {type: "integer"}
  # seed: int = 1 #@param {type: "integer"}
  env_config: dict = field(default_factory=dict)

  def to_namespace(self):
    return SimpleNamespace(**asdict(self))







In [22]:
#@title Configurable Evaluation Function

def visualize(algorithm: Algorithm, num_episodes: int = 100) -> list[np.ndarray]:
    """
    Visualize trajectories from trained agents.
    """
    frames = []
    env = algorithm.env_creator(algorithm.config.env_config)

    for episode in range(num_episodes):
        print('\n', '-' * 32, '\n', 'Episode', episode, '\n', '-' * 32)

        episode_rewards = {agent_id: 0.0 for agent_id in env.get_agent_ids()}
        terminations, truncations = {'__all__': False}, {'__all__': False}
        observations, infos = env.reset()
        states = {
            agent_id: algorithm.get_policy(policy_mapping_fn(agent_id)).get_initial_state()
            for agent_id in env.get_agent_ids()
        }
        while not terminations['__all__'] and not truncations['__all__']:
            frames.append(env.get_frame())

            actions = {}
            for agent_id in env.get_agent_ids():

                # Single-agent
                actions[agent_id] = algorithm.compute_single_action(
                    observations[agent_id],
                    states[agent_id],
                    policy_id=policy_mapping_fn(agent_id)
                )


            observations, rewards, terminations, truncations, infos = env.step(actions)
            for agent_id in rewards:
                episode_rewards[agent_id] += rewards[agent_id]

        frames.append(env.get_frame())
        print('Rewards:', episode_rewards)

    env.close()
    return frames


In [23]:
eval_args = EvalArgs()
print(eval_args)  # Prints the values of all attributes

eval_args = eval_args.to_namespace()
print(eval_args)

EvalArgs(num_agents=2, algo='PPO', framework='torch', lstm=False, env='MultiGrid-CompetativeRedBlueDoor-v2', env_config={}, num_episodes=10, load_dir='../../submission/ray_results/PPO/PPO_MultiGrid-CompetativeRedBlueDoor-v2_de334_00000_0_2023-07-14_17-32-00', gif='../../submission/notebook/result.gif')
namespace(num_agents=2, algo='PPO', framework='torch', lstm=False, env='MultiGrid-CompetativeRedBlueDoor-v2', env_config={}, num_episodes=10, load_dir='../../submission/ray_results/PPO/PPO_MultiGrid-CompetativeRedBlueDoor-v2_de334_00000_0_2023-07-14_17-32-00', gif='../../submission/notebook/result.gif')


In [24]:
eval_args.lstm


False

In [25]:
eval_args

namespace(num_agents=2,
          algo='PPO',
          framework='torch',
          lstm=False,
          env='MultiGrid-CompetativeRedBlueDoor-v2',
          env_config={},
          num_episodes=10,
          load_dir='../../submission/ray_results/PPO/PPO_MultiGrid-CompetativeRedBlueDoor-v2_de334_00000_0_2023-07-14_17-32-00',
          gif='../../submission/notebook/result.gif')

In [26]:
eval_args.env_config.update(render_mode='human')
config = algorithm_config(
    **vars(eval_args),
    num_workers=0,
    num_gpus=0,
)

In [27]:
config.env


'MultiGrid-CompetativeRedBlueDoor-v2'

In [28]:
algorithm = config.build()

checkpoint = get_checkpoint_dir(eval_args.load_dir)
if checkpoint:
    print(f"Loading checkpoint from {checkpoint}")
    algorithm.restore(checkpoint)


2023-07-15 13:42:10,135	INFO trainable.py:918 -- Restored on 127.0.0.1 from checkpoint: ../../submission/ray_results/PPO/PPO_MultiGrid-CompetativeRedBlueDoor-v2_de334_00000_0_2023-07-14_17-32-00/checkpoint_000320
2023-07-15 13:42:10,136	INFO trainable.py:927 -- Current state after restoring: {'_iteration': 320, '_timesteps_total': None, '_time_total': 72280.3040509224, '_episodes_total': 38691}


Loading checkpoint from ../../submission/ray_results/PPO/PPO_MultiGrid-CompetativeRedBlueDoor-v2_de334_00000_0_2023-07-14_17-32-00/checkpoint_000320


In [29]:
frames = visualize(algorithm, num_episodes=eval_args.num_episodes)
if eval_args.gif:
    import imageio
    filename = eval_args.gif if eval_args.gif.endswith('.gif') else f'{eval_args.gif}.gif'
    print(f"Saving GIF to {filename}")
    # write to file
    imageio.mimsave(filename, frames)


 -------------------------------- 
 Episode 0 
 --------------------------------
Rewards: {0: 1.483125, 1: 0.979125}

 -------------------------------- 
 Episode 1 
 --------------------------------
Rewards: {0: 1.48875, 1: 0.98875}

 -------------------------------- 
 Episode 2 
 --------------------------------
Rewards: {0: 1.48334375, 1: 0.98634375}

 -------------------------------- 
 Episode 3 
 --------------------------------
Rewards: {0: 1.48875, 1: 0.98575}

 -------------------------------- 
 Episode 4 
 --------------------------------
Rewards: {0: 1.49015625, 1: 0.98615625}

 -------------------------------- 
 Episode 5 
 --------------------------------
Rewards: {0: 1.229828125, 1: 1.714828125}

 -------------------------------- 
 Episode 6 
 --------------------------------
Rewards: {0: 1.4845312499999999, 1: 0.98353125}

 -------------------------------- 
 Episode 7 
 --------------------------------
Rewards: {0: 0.5, 1: 0.8259999999999998}

 ---------------------------

FileNotFoundError: The directory '/Users/zla0368/Documents/RL/RL_Class/code/submission/notebook' does not exist

In [None]:
filename

In [None]:
from IPython.display import Image

# Load the GIF
Image(filename=filename)
display(Image(filename=filename))