# 🧠 Q-Learning on FrozenLake — CPU vs GPU Benchmark (PyTorch + Gym)

This notebook benchmarks the performance of Q-learning on the FrozenLake environment using:

- ✅ **Pure Python (NumPy) on CPU**
- ✅ **PyTorch on GPU**

---

## 📌 Task Objective

> Optimize the [FrozenLake Q-Learning code](https://github.com/ronanmmurphy/Q-Learning-Algorithm) for GPU using PyTorch and benchmark it against the CPU (NumPy) version.

---

## 🚀 How to Run in Google Colab

1. Open the notebook in [Google Colab](https://colab.research.google.com/)
2. Set runtime type to **GPU**:  
   `Runtime > Change runtime type > Hardware accelerator > GPU`
3. Run all cells (`Runtime > Run all`)

---

## ⚙️ Fixing Compatibility Issues

To avoid this error:


Downgrade NumPy at the top of the notebook using:

```python
!pip install numpy==1.23.5 --quiet
import os
os.kill(os.getpid(), 9)  # Force Colab restart


In [1]:
!git clone https://github.com/ronanmmurphy/Q-Learning-Algorithm.git
%cd Q-Learning-Algorithm


Cloning into 'Q-Learning-Algorithm'...
remote: Enumerating objects: 16, done.[K
remote: Counting objects: 100% (16/16), done.[K
remote: Compressing objects: 100% (14/14), done.[K
remote: Total 16 (delta 3), reused 6 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (16/16), 32.26 KiB | 3.23 MiB/s, done.
Resolving deltas: 100% (3/3), done.
/content/Q-Learning-Algorithm


In [None]:
!pip install numpy==1.23.5 --quiet
import os
os.kill(os.getpid(), 9)


In [1]:
import torch
import gym
import time

# Create environment
env = gym.make('FrozenLake-v1', is_slippery=False)

# Get action and state sizes
action_size = env.action_space.n
state_size = env.observation_space.n

# Hyperparameters
alpha = 0.8
gamma = 0.95
epsilon = 0.1
episodes = 10000

# Initialize Q-table using PyTorch on GPU
Q = torch.zeros((state_size, action_size), dtype=torch.float32, device='cuda')

def train_gpu():
    for _ in range(episodes):
        reset_result = env.reset()
        state = reset_result[0] if isinstance(reset_result, tuple) else reset_result
        done = False

        while not done:
            state_tensor = torch.tensor(state, device='cuda')
            if torch.rand(1, device='cuda').item() < epsilon:
                action = torch.randint(0, action_size, (1,), device='cuda').item()
            else:
                action = torch.argmax(Q[state_tensor]).item()

            step_result = env.step(action)
            if len(step_result) == 5:  # New API
                new_state, reward, terminated, truncated, _ = step_result
                done = terminated or truncated
            else:  # Old API
                new_state, reward, done, _ = step_result

            new_state_tensor = torch.tensor(new_state, device='cuda')
            Q[state_tensor, action] += alpha * (
                reward + gamma * torch.max(Q[new_state_tensor]) - Q[state_tensor, action]
            )
            state = new_state

start_gpu = time.time()
train_gpu()
end_gpu = time.time()

print(f"✅ GPU Training Time: {end_gpu - start_gpu:.4f} seconds")


  deprecation(
  deprecation(


✅ GPU Training Time: 286.9483 seconds


In [2]:
import numpy as np

# Reset environment
env = gym.make('FrozenLake-v1', is_slippery=False)

# Reinitialize Q-table for CPU version
Q_cpu = np.zeros((state_size, action_size))

def train_cpu():
    for _ in range(episodes):
        reset_result = env.reset()
        state = reset_result[0] if isinstance(reset_result, tuple) else reset_result
        done = False

        while not done:
            if np.random.rand() < epsilon:
                action = np.random.randint(action_size)
            else:
                action = np.argmax(Q_cpu[state])

            step_result = env.step(action)
            if len(step_result) == 5:
                new_state, reward, terminated, truncated, _ = step_result
                done = terminated or truncated
            else:
                new_state, reward, done, _ = step_result

            Q_cpu[state, action] += alpha * (
                reward + gamma * np.max(Q_cpu[new_state]) - Q_cpu[state, action]
            )
            state = new_state

start_cpu = time.time()
train_cpu()
end_cpu = time.time()

print(f"✅ CPU Training Time: {end_cpu - start_cpu:.4f} seconds")


✅ CPU Training Time: 24.7515 seconds


In [3]:
speedup = (end_cpu - start_cpu) / (end_gpu - start_gpu)
print(f"🚀 Speed-up (CPU/GPU): {speedup:.2f}x")


🚀 Speed-up (CPU/GPU): 0.09x


## 📊 Benchmark Results

We trained a Q-Learning agent on the `FrozenLake-v1` environment for **10,000 episodes**, comparing performance between a classic **CPU (NumPy)** implementation and a **GPU-accelerated (PyTorch)** version.

### 🔁 Environment
- Gym: `FrozenLake-v1` (`is_slippery=False`)
- Training Episodes: `10,000`
- Reward: Default Gym rewards (0 or 1)
- Runtime: Google Colab (Tesla T4 GPU)

---

### ⏱ Training Time Comparison

| Method            | Training Time (s)      |
|-------------------|------------------------|
| **CPU (NumPy)**   | `24.7515` seconds      |
| **GPU (PyTorch)** | `286.9483` seconds     |
| **Speed-Up**      | `0.09×` slower on GPU  |

---

### ⚠️ Observation

While GPU acceleration **typically helps** with deep learning models or large-scale matrix operations, in this case:

- The **Q-table is small** (16×4)
- The **environment is lightweight**
- Overhead from **GPU memory transfer** outweighs computation benefits

So, the GPU version performed **slower** than CPU for this specific task.

---

### ✅ Takeaway

> For small-scale environments like FrozenLake, stick with CPU (NumPy).  
> For larger state/action spaces or deep Q-learning (DQN), GPU acceleration will shine.




> Although GPU acceleration did not yield performance gains for FrozenLake due to its small state/action space, this exercise demonstrated how to adapt and benchmark tabular Q-learning on GPU using PyTorch — a skill essential for scaling to more complex RL environments.



