# Project Situated AI Assignment 2: Robotics and Reinforcement Learning

This notebook provides a quickstart for training RL agents on Gymnasium Robotics environments using RL Baselines3 Zoo.

## Part 1: Setup

Run these cells once to install dependencies.

### Collab Setup

In [2]:
# # Install system dependencies (Colab only - skip if running locally)
# !apt-get update && apt-get install -q -y swig cmake ffmpeg freeglut3-dev xvfb

In [4]:
# # Setup virtual display for video recording (Colab only)
# import os
# os.system("Xvfb :1 -screen 0 1024x768x24 &")
# os.environ['DISPLAY'] = ':1'

In [5]:
# # Mount your drive to the session
# from google.colab import drive
# drive.mount('/content/drive')

### General Setup

In [6]:
# # Install Python packages
# !pip install -q rl-zoo3
# !pip install -q -e git+https://github.com/Farama-Foundation/Gymnasium-Robotics.git#egg=gymnasium-robotics

In [9]:
# # Create wrapper for record_video (rl_zoo3.record_video doesn't support --gym-packages)
# with open('record_video.py', 'w') as f:
#     f.write(
#         '#!/usr/bin/env python\n'
#         'import gymnasium_robotics\n'
#         'import runpy\n'
#         'runpy.run_module("rl_zoo3.record_video", run_name="__main__")\n'
#     )

## Part 2: Configure Hyperparameters

RL Zoo expects hyperparameters in a YAML file. Modify these to experiment with different settings.

In [None]:
import gymnasium as gym
import gymnasium_robotics

gym.register_envs(gymnasium_robotics)

env = gym.make("FetchReachDense-v4") # (x,y,z) = [1.3419 0.7491 0.555] m are starting coords





<bound method Wrapper.class_name of <class 'gymnasium.wrappers.common.TimeLimit'>>

In [11]:
import yaml

hyperparams = {
    'FetchReachDense-v4': {
        'n_timesteps': 1000,
        'policy': 'MultiInputPolicy',
        'noise_type': 'ornstein-uhlenbeck',
        'noise_std': 0.5,
        'gradient_steps': 1,
        'train_freq': 1,
        'learning_rate': 1e-3,
        'batch_size': 256,
        'policy_kwargs': "dict(net_arch=[32, 32])",
    }
}

with open('hyperparams.yaml', 'w') as f:
    yaml.dump(hyperparams, f, sort_keys=False)

print("Hyperparameters saved to hyperparams.yaml")

Hyperparameters saved to hyperparams.yaml


## Part 3: Train the Agent

Train a DDPG agent on the FetchReachDense-v4 environment. Training logs are saved to `logs/`.

Run the following code in the terminal in ~/project_path/src/robotics_rl for training the agent

```
!python -m rl_zoo3.train --algo ddpg --env FetchReachDense-v4 \
    --gym-packages gymnasium_robotics -c hyperparams.yaml
```

Run the following for evaluating the agent

```
!python -m rl_zoo3.train --algo ddpg --env FetchReachDense-v4 \
    --gym-packages gymnasium_robotics -c hyperparams.yaml
```

## Part 4: Evaluate the Agent

Run the trained agent and see its performance metrics.

In [None]:
!python -m rl_zoo3.enjoy --algo ddpg --env FetchReachDense-v4 \
    --gym-packages gymnasium_robotics --no-render -n 500 -f logs/

## Part 5: Record and View Video

Record a video of the trained policy to visually evaluate performance.

In [None]:
# record_video.py is a wrapper that pre-loads gymnasium_robotics
# (rl_zoo3.record_video doesn't support --gym-packages)
!python record_video.py --algo ddpg --env FetchReachDense-v4 -f logs/ -n 1000

In [None]:
import base64
from pathlib import Path
from IPython import display as ipythondisplay

def show_videos(video_path, prefix=""):
    """Display MP4 videos from a folder in the notebook."""
    html = []
    for mp4 in Path(video_path).glob(f"{prefix}*.mp4"):
        video_b64 = base64.b64encode(mp4.read_bytes()).decode('ascii')
        html.append(f'''<video alt="{mp4}" autoplay loop controls style="height: 400px;">
            <source src="data:video/mp4;base64,{video_b64}" type="video/mp4" />
        </video>''')
    ipythondisplay.display(ipythondisplay.HTML(data="<br>".join(html)))

In [None]:
# Display the recorded video
# Update the path if your experiment ID differs (check logs/ddpg/ folder)
show_videos('logs/ddpg/FetchReachDense-v4_1/videos/')

### Useful Commands

```bash
# Train with a different algorithm
python -m rl_zoo3.train --algo sac --env FetchReachDense-v4 --gym-packages gymnasium_robotics -c hyperparams.yaml

# Train with a specific seed (for reproducibility)
python -m rl_zoo3.train --algo ddpg --env FetchReachDense-v4 --gym-packages gymnasium_robotics -c hyperparams.yaml --seed 42

# Load best model instead of final model
python -m rl_zoo3.enjoy --algo ddpg --env FetchReachDense-v4 --gym-packages gymnasium_robotics -f logs/ --load-best
```