# Pacman RL Training on Kaggle
## Enhanced Reward Shaping - All Algorithms

This notebook trains 4 Deep RL algorithms (DQN, PPO, A2C, Rainbow) on Pacman with custom reward shaping.

**Training Configuration:**
- **Environment:** ALE/Pacman-v5 with OCAtari
- **Algorithms:** DQN, PPO, A2C, Rainbow (QR-DQN)
- **Total Timesteps:** 2,000,000 per algorithm
- **Reward Shaping:** 10-component enhanced wrapper
- **Device:** GPU (CUDA) for faster training

**Estimated Time:** 10-14 hours for all 4 algorithms

**Repository:** https://github.com/lequangdung2005/IntroAI-2025.1

## 📦 Step 1: Install System Dependencies

First, we need to install git and other system dependencies.

In [None]:
!apt-get update
!apt-get install -y git

## 🔽 Step 2: Clone GitHub Repository

Clone the repository containing the Pacman RL training code.

In [None]:
# Clone the repository
!git clone https://github.com/lequangdung2005/IntroAI-2025.1.git

print("✅ Repository cloned successfully!")

## 📂 Step 3: Navigate to Repository Directory

Change to the Pacman project directory.

In [None]:
import os

# Change to the Pacman directory
os.chdir('/kaggle/working/IntroAI-2025.1/Pacman')

# Verify current directory
print(f"Current directory: {os.getcwd()}")
print(f"Contents:")
!ls -la

## ✅ Step 4: Verify Repository Contents

Check that all necessary files are present in the enhanced_algorithm directory.

In [None]:
# Check enhanced_algorithm directory
print("Enhanced algorithm directory contents:")
!ls -la enhanced_algorithm/

print("\n" + "="*60)
print("Training scripts available:")
!ls -1 enhanced_algorithm/train_*.py

print("\n" + "="*60)
print("Batch training script:")
!ls -la enhanced_algorithm/train_all_enhanced.sh

## 📥 Step 5: Install Python Dependencies

Install all required Python packages for RL training.

In [None]:
# Install required packages
!pip install -q gymnasium ale-py stable-baselines3 sb3-contrib ocatari opencv-python

# Verify installations
print("\n✅ Checking installations...")
import gymnasium
import ale_py
import stable_baselines3
import sb3_contrib
import cv2

print(f"✓ Gymnasium: {gymnasium.__version__}")
print(f"✓ ALE-py: {ale_py.__version__}")
print(f"✓ Stable-Baselines3: {stable_baselines3.__version__}")
print(f"✓ SB3-Contrib: {sb3_contrib.__version__}")
print(f"✓ OpenCV: {cv2.__version__}")
print("\n✅ All packages installed successfully!")

## 🎮 Step 6: Check GPU Availability

Verify that GPU is available for faster training.

In [None]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

if torch.cuda.is_available():
    print("CUDA version:", torch.version.cuda)
    print("GPU device:", torch.cuda.get_device_name(0))
    print("GPU memory:", torch.cuda.get_device_properties(0).total_memory / 1e9, "GB")
    print("\n✅ GPU is available! Training will be fast.")
else:
    print("\n⚠️  GPU not available. Training will use CPU (slower).")

## 🚀 Step 7: Run Training Script

Execute the batch training script to train all 4 algorithms sequentially.

**This will train:**
1. **DQN** - Deep Q-Network (~2-3 hours)
2. **PPO** - Proximal Policy Optimization (~3-4 hours)
3. **A2C** - Advantage Actor-Critic (~2-3 hours)
4. **Rainbow** - QR-DQN (~3-4 hours)

**Total estimated time: 10-14 hours**

⚠️ **Note:** Make sure GPU is enabled in Kaggle settings for faster training!

In [None]:
# Make the script executable
!chmod +x enhanced_algorithm/train_all_enhanced.sh

# Run the batch training script
print("🚀 Starting training for all 4 algorithms...")
print("="*60)

!bash enhanced_algorithm/train_all_enhanced.sh

## 📊 Step 8: Display Training Results

Check the trained models and their performance.

In [None]:
# List all trained models
print("✅ Training Complete! Checking trained models...\n")
print("="*60)

# Check DQN models
print("\n📁 DQN Models:")
!ls -lh models/enhanced_dqn/*.zip 2>/dev/null || echo "No DQN models found"

# Check PPO models
print("\n📁 PPO Models:")
!ls -lh models/enhanced_ppo/*.zip 2>/dev/null || echo "No PPO models found"

# Check A2C models
print("\n📁 A2C Models:")
!ls -lh models/enhanced_a2c/*.zip 2>/dev/null || echo "No A2C models found"

# Check Rainbow models
print("\n📁 Rainbow Models:")
!ls -lh models/enhanced_rainbow/*.zip 2>/dev/null || echo "No Rainbow models found"

print("\n" + "="*60)
print("All models saved in the 'models/' directory")

## 📈 Step 9: View Training Logs

Check the training logs to see performance metrics.

In [None]:
# Check log directories
print("Training logs directory structure:")
!ls -la logs/

print("\n" + "="*60)
print("Available log directories:")
!ls -1 logs/

## 💾 Step 10: Save Models as Output

Copy trained models to Kaggle output for download.

In [None]:
import shutil

# Create output directory
output_dir = "/kaggle/working/trained_models"
os.makedirs(output_dir, exist_ok=True)

# Copy all model directories
algorithms = ['enhanced_dqn', 'enhanced_ppo', 'enhanced_a2c', 'enhanced_rainbow']

for algo in algorithms:
    src = f"models/{algo}"
    dst = f"{output_dir}/{algo}"
    if os.path.exists(src):
        shutil.copytree(src, dst, dirs_exist_ok=True)
        print(f"✅ Copied {algo} models")
    else:
        print(f"⚠️  {algo} models not found")

print("\n" + "="*60)
print("All models saved to:", output_dir)
print("You can download them from the Kaggle output section.")
print("="*60)

## 🎉 Training Summary

### What Was Trained:

| Algorithm | Type | Timesteps | Expected Score |
|-----------|------|-----------|----------------|
| **DQN** | Value-based | 2,000,000 | 1200-1800 |
| **PPO** | Policy-based | 2,000,000 | 1400-2200 |
| **A2C** | Policy-based | 2,000,000 | 1100-1600 |
| **Rainbow** | Value-based | 2,000,000 | 1500-2500 |

### Enhanced Reward Components:
1. Score Bonus (+0.1×)
2. Power Pellet Seeking (+5)
3. Ghost Hunting (+50)
4. Ghost Avoidance (+2)
5. Dot Collection (+1)
6. Movement Bonus (+0.5)
7. Corner Penalty (-1)
8. Survival Bonus (+0.1)
9. Death Penalty (-100)
10. Level Completion (+200)

### Files Generated:
- **Final models:** `pacman_{algorithm}_enhanced_final.zip`
- **Checkpoints:** Every 50,000 steps
- **Best models:** `best_model.zip` (best evaluation score)
- **Logs:** TensorBoard logs in `logs/enhanced_{algorithm}/`

### Next Steps:
1. Download trained models from Kaggle output
2. Use `play.py` to evaluate models locally
3. Compare with baseline models
4. Visualize training progress with TensorBoard

---

**✅ All 4 algorithms trained successfully with enhanced reward shaping!**