# HW3-1: LunarLander-v3 Training with PPO

**CSCI6353 Homework 3 - Part 1**

This notebook trains a PPO agent to land a lunar module in the LunarLander-v3 environment.

**‚ö†Ô∏è Important:** Make sure the GitHub repository is set to **Public** before running this notebook.

---

## Instructions

1. **Enable GPU**: Go to Runtime ‚Üí Change runtime type ‚Üí Select "T4 GPU"
2. **Run all cells** in order (Runtime ‚Üí Run all)
3. **Training takes** ~15-30 minutes with GPU
4. **Download** model.pth and train_plot.png at the end

---

In [None]:
# Install system dependencies for Box2D
!apt-get update -qq
!apt-get install -y swig build-essential python3-dev

# Clone the repository (use HTTPS without credentials for public repos)
!git clone https://github.com/jaredlcs/HW3-Projects.git 2>&1 | grep -v "warning: You appear to have cloned an empty repository" || true
%cd HW3-Projects/HW3_1

# Install Python dependencies
!pip install -q gymnasium[box2d] torch matplotlib

# Verify setup
import torch
import os
print("\n" + "="*50)
print("‚úì Setup Complete")
print(f"‚úì main.py exists: {os.path.exists('main.py')}")
print(f"‚úì GPU available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úì GPU device: {torch.cuda.get_device_name(0)}")
print("="*50)

## Cell 2: Start Training

This will train the PPO agent. Training typically takes:
- **With GPU (T4):** 15-30 minutes
- **Without GPU:** 30-60 minutes
- **Convergence:** Usually 300-600 episodes

Progress will be printed every 10 episodes.

**Note:** If the session times out, simply run the cell below this one to resume from the last checkpoint.

In [None]:
# Train from scratch
!python main.py

## Cell 3: Resume Training (if session timed out)

**Only run this if your training was interrupted.**

This will resume from the last checkpoint.

In [None]:
# Resume from checkpoint (only if interrupted)
!python main.py --resume

## Cell 4: View Training Results

Display the training curve showing:
- Episode rewards (light blue)
- 100-episode moving average (dark blue)
- Target reward threshold (red dashed line)

In [None]:
# Display the training curve
from IPython.display import Image, display
import matplotlib.pyplot as plt

plt.figure(figsize=(14, 6))
img = plt.imread('train_plot.png')
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()

print("\n‚úì Training complete! Check the plot above.")

## Cell 5: Test the Trained Model

Evaluate the trained model over 20 test episodes.

**Note:** Rendering is disabled on Colab (no display). You can test with rendering locally.

In [None]:
# Test the trained model
!python main.py --test --no-render --test-episodes 20

## Cell 6: Download Results

Download the trained model and training plot to your computer.

You can also download checkpoints for resuming later.

In [None]:
from google.colab import files

# Download model and plot
print("Downloading model.pth...")
files.download('model.pth')

print("Downloading train_plot.png...")
files.download('train_plot.png')

# Optional: Download checkpoints for resuming later
print("\nCreating checkpoint archive...")
!zip -r checkpoints.zip checkpoints/
print("Downloading checkpoints.zip...")
files.download('checkpoints.zip')

print("\n" + "="*50)
print("‚úì All downloads complete!")
print("="*50)

## Next Steps

1. ‚úÖ Download `model.pth` and `train_plot.png` (done above)
2. üìπ Test locally with rendering: `python main.py --test`
3. üé• Record a 1-minute demo video
4. üì§ Upload video to YouTube (public/unlisted)
5. üìù Add YouTube link to README.md

---

### Additional Resources

- **Repository:** https://github.com/jaredlcs/HW3-Projects
- **Colab Guide:** [COLAB_GUIDE.md](https://github.com/jaredlcs/HW3-Projects/blob/main/HW3_1/COLAB_GUIDE.md)
- **Full Documentation:** [README.md](https://github.com/jaredlcs/HW3-Projects/blob/main/HW3_1/README.md)

Good luck! üöÄ