## 🔍 Project Overview
This project explores how a robot can learn to **navigate a 2D space** with obstacles of varying severity using **reinforcement learning (RL)**. Inspired by real-world robotic applications such as **pick-and-place tasks**, warehouse logistics, and mobile navigation (e.g., rovers), we simulate how an agent makes intelligent movement decisions in a grid-based environment.

We use a custom **Gymnasium** environment where:
- The agent must reach a goal zone from a random starting point.
- The grid contains obstacles of different risk levels:
  - 🔵 Low-risk (can pass if necessary)
  - 🟨 Medium-risk (to be avoided if possible)
  - 🔴 High-risk (must avoid completely)
- The environment randomizes each episode to promote generalization.

## 🧠 RL Strategy
The agent is trained using **Q-learning**, a classic value-based reinforcement learning algorithm. Reward shaping encourages it to:
- Move toward the goal
- Avoid dangerous areas
- Learn efficient, safe paths

Over multiple episodes, the agent develops a **policy** to navigate intelligently — even in previously unseen environments.

## 🦾 Relevance to Robotics
This simulation reflects the early stages of an **autonomous robotic system**, where a computer vision pipeline would first map the space and classify obstacles. The RL model would then:
- Interpret this mapped environment as a grid
- Learn to navigate based on obstacle types and positions
- Inform **real-time motion planning** for robotic arms or mobile platforms

This is particularly useful for:
- Pick-and-place in dynamic fabrication settings
- Indoor warehouse navigation
- Autonomous rover path planning in unstructured environments

# 🛠️ Install Dependencies (Colab)
This installs required tools to run Pygame, OpenGL, and video capture in a Colab notebook.


In [3]:
!pip install gymnasium pygame numpy matplotlib tqdm
!apt-get install -y xvfb python3-opengl
!pip install pyvirtualdisplay opencv-python


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
python3-opengl is already the newest version (3.1.5+dfsg-1).
xvfb is already the newest version (2:21.1.4-2ubuntu1.7~22.04.13).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.


# 🖥️ Start Virtual Display
This allows Pygame to render graphics in a hidden display in Colab.


In [4]:
from pyvirtualdisplay import Display

# Start a virtual display
display = Display(visible=0, size=(600, 600))
display.start()


<pyvirtualdisplay.display.Display at 0x7c3ed1822bd0>

# 🌍 Custom GridWorld Environment
This environment is built using Gymnasium and Pygame to simulate a 15x15 grid world with:
- 8-directional movement
- Random single-cell obstacles (3 severity levels)
- Goal in the middle of the last row
- Custom reward shaping
- Visual rendering using Pygame


In [5]:
import gymnasium as gym
from gymnasium import spaces
import numpy as np
import pygame
import random

class GridWorldEnv(gym.Env):
    metadata = {"render_modes": ["human", "rgb_array"], "render_fps": 10}

    def __init__(self, grid_size=15, max_steps=100, render_mode=None):
        super().__init__()
        self.grid_size = grid_size
        self.max_steps = max_steps
        self.render_mode = render_mode
        self.step_count = 0

        self.observation_space = spaces.Box(low=0, high=9, shape=(grid_size, grid_size), dtype=np.uint8)
        self.action_space = spaces.Discrete(8)

        self.agent_pos = [0, 0]
        self.goal_cells = [(grid_size - 1, 4), (grid_size - 1, 5)]
        self.grid = np.zeros((grid_size, grid_size), dtype=np.uint8)

        pygame.init()
        self.cell_size = 50
        self.window_size = self.grid_size * self.cell_size
        self.screen = pygame.display.set_mode((self.window_size, self.window_size))
        self.clock = pygame.time.Clock()

    def reset(self, seed=None, options=None):
        self.grid.fill(0)
        self.step_count = 0
        self.agent_pos = [0, random.randint(0, self.grid_size - 1)]
        self._place_obstacles()
        self._update_grid()
        return self.grid.copy(), {}

    def _place_obstacles(self, count=15):
        for _ in range(count):
            x, y = random.randint(1, self.grid_size - 2), random.randint(0, self.grid_size - 1)
            if (x, y) not in self.goal_cells:
                severity = np.random.choice([1, 2, 3], p=[0.5, 0.3, 0.2])
                self.grid[x, y] = severity

    def _update_grid(self):
        self.grid[self.grid == 8] = 0
        for gx, gy in self.goal_cells:
            self.grid[gx, gy] = 9
        self.grid[tuple(self.agent_pos)] = 8

    def _distance_to_closest_goal(self, x, y):
        return min(np.sqrt((gx - x)**2 + (gy - y)**2) for gx, gy in self.goal_cells)

    def _is_valid_move(self, nx, ny):
        if 0 <= nx < self.grid_size and 0 <= ny < self.grid_size:
            target_cell = int(self.grid[nx, ny])
            return target_cell not in [2, 3]  # avoid yellow and red
        return False

    def step(self, action):
        self.step_count += 1
        x, y = self.agent_pos
        reward = -0.1
        done = False

        moves = [(-1, 0), (1, 0), (0, -1), (0, 1), (-1, -1), (-1, 1), (1, -1), (1, 1)]
        attempted = set()

        while True:
            dx, dy = moves[action]
            nx, ny = x + dx, y + dy

            if self._is_valid_move(nx, ny):
                old_dist = self._distance_to_closest_goal(x, y)
                new_dist = self._distance_to_closest_goal(nx, ny)

                if new_dist < old_dist:
                    reward += 1
                elif new_dist > old_dist:
                    reward -= 0.5

                if self.grid[nx, ny] == 1:
                    reward += -1

                if (nx, ny) in self.goal_cells:
                    reward = 100
                    done = True

                self.agent_pos = [nx, ny]
                moved = True
                break
            else:
                reward -= 2
                attempted.add(action)
                if len(attempted) == self.action_space.n:
                    moved = False
                    break
                remaining = list(set(range(self.action_space.n)) - attempted)
                action = random.choice(remaining)

        if self.step_count >= self.max_steps:
            done = True

        self._update_grid()
        return self.grid.copy(), reward, done, False, {"moved": moved}

    def render(self):
        self.screen.fill((0, 0, 0))
        for i in range(self.grid_size):
            for j in range(self.grid_size):
                rect = pygame.Rect(j*self.cell_size, i*self.cell_size, self.cell_size, self.cell_size)
                color = (200, 200, 200)
                if self.grid[i, j] == 1:
                    color = (0, 0, 255)
                elif self.grid[i, j] == 2:
                    color = (255, 255, 0)
                elif self.grid[i, j] == 3:
                    color = (255, 0, 0)
                elif self.grid[i, j] == 8:
                    color = (0, 255, 0)
                elif (i, j) in self.goal_cells:
                    color = (0, 0, 0)
                pygame.draw.rect(self.screen, color, rect)

                if (i, j) in self.goal_cells:
                    cx = j * self.cell_size + self.cell_size // 2
                    cy = i * self.cell_size + self.cell_size // 2
                    points = [(cx - 12, cy), (cx + 12, cy - 12), (cx + 12, cy + 12)]
                    pygame.draw.polygon(self.screen, (255, 255, 255), points)

        pygame.display.flip()
        self.clock.tick(10)

        if self.render_mode == "rgb_array":
            data = pygame.surfarray.array3d(pygame.display.get_surface())
            return np.transpose(data, (1, 0, 2))






# 🤖 Train Agent using Q-Learning
This section uses a simple Q-learning algorithm to train the agent to reach the goal efficiently.
- Uses ε-greedy exploration
- Updates Q-table only on valid moves
- Using 20000 episodes
- Stores total reward per episode



In [6]:
env = GridWorldEnv(grid_size=15, max_steps=100)

q_table = np.zeros((env.grid_size, env.grid_size, env.action_space.n))
alpha = 0.1
gamma = 0.95
epsilon = 0.2
episodes = 20000
rewards_per_episode = []

for ep in range(episodes):
    obs, _ = env.reset()
    done = False
    pos = env.agent_pos
    total_reward = 0

    while not done:
        x, y = pos
        action = np.random.choice(env.action_space.n) if np.random.rand() < epsilon else np.argmax(q_table[x, y])
        _, reward, done, _, info = env.step(action)
        new_x, new_y = env.agent_pos
        if info.get("moved", True):
            q_table[x, y, action] = (1 - alpha) * q_table[x, y, action] + \
                alpha * (reward + gamma * np.max(q_table[new_x, new_y]))
            pos = [new_x, new_y]
        total_reward += reward

    rewards_per_episode.append(total_reward)


# 🎥 Define Video Saving Function
This function saves a list of frames (from the agent's run) as an MP4 video.


In [7]:
import cv2

def save_video(frames, filename="output.mp4", fps=10):
    height, width, _ = frames[0].shape
    writer = cv2.VideoWriter(filename, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))
    for frame in frames:
        writer.write(cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
    writer.release()


# 🧪 Evaluate Trained Agent and Record Video
Runs the trained agent on 5 new random environments.
Captures its movement and creates a video using the previously defined function.


In [8]:
env = GridWorldEnv(grid_size=15, max_steps=100, render_mode="rgb_array")

test_frames = []
for ep in range(10):
    obs, _ = env.reset()
    done = False
    pos = env.agent_pos

    while not done:
        x, y = pos
        action = np.argmax(q_table[x, y])
        _, _, done, _, _ = env.step(action)
        frame = env.render()
        if frame is not None:
            test_frames.append(frame)
        pos = env.agent_pos

save_video(test_frames, "trained_agent_video.mp4", fps=7)
print("✅ Video saved as trained_agent_video.mp4")



✅ Video saved as trained_agent_video.mp4


# 💾 Render and Save 3D Video (10 Episodes)

Using the captured frames and positions, this block renders a fully 3D animation of the agent navigating through 10 different randomized environments.

The video is saved as:

📁 `agent_3d_path_10episodes.mp4`

Each frame is properly scaled using:
- Grid cells = `3x3` units
- Accurate obstacle heights
- Transparent obstacles for clear agent visibility

This creates a cinematic-quality replay of the trained agent.



In [27]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.animation as animation
import matplotlib.colors as mcolors
import numpy as np

def render_3d_grid_sequence(grid_sequence, agent_positions, save_path="agent_3d_path.mp4", fps=5):
    """
    Render a sequence of GridWorld frames in true 3D scale, with proportions:
    - Each grid tile: 3x3 units
    - Agent cube: 3x3x3
    - Red obstacle height = 3
    - Yellow = 1.5
    - Blue = 1.0
    - Transparent obstacles
    """
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111, projection='3d')

    def update(frame):
        ax.clear()
        grid = grid_sequence[frame]
        agent_x, agent_y = agent_positions[frame]

        n = grid.shape[0]
        cell_unit = 3.0  # Each grid cell is 3x3 in world space

        height_map = np.zeros((n, n), dtype=float)
        color_map = np.empty((n, n), dtype=object)
        alpha_map = np.ones((n, n), dtype=float)

        for i in range(n):
            for j in range(n):
                val = grid[i, j]
                if val == 1:  # Blue
                    height_map[i, j] = 1.0
                    color_map[i, j] = 'blue'
                    alpha_map[i, j] = 0.5
                elif val == 2:  # Yellow
                    height_map[i, j] = 1.5
                    color_map[i, j] = 'yellow'
                    alpha_map[i, j] = 0.5
                elif val == 3:  # Red
                    height_map[i, j] = 3.0
                    color_map[i, j] = 'red'
                    alpha_map[i, j] = 0.5
                elif val == 9:  # Goal
                    height_map[i, j] = 0.01
                    color_map[i, j] = 'black'
                    alpha_map[i, j] = 1.0
                else:
                    height_map[i, j] = 0.01
                    color_map[i, j] = 'lightgray'
                    alpha_map[i, j] = 1.0

        # Agent cube
        height_map[agent_x, agent_y] = 3.0
        color_map[agent_x, agent_y] = 'green'
        alpha_map[agent_x, agent_y] = 1.0

        _x = np.arange(n)
        _y = np.arange(n)
        _xx, _yy = np.meshgrid(_x, _y)
        x, y = _xx.ravel(), _yy.ravel()
        z = np.zeros_like(x)

        dx = dy = cell_unit
        dz = height_map.ravel()
        colors = [color_map[i, j] for i in range(n) for j in range(n)]
        alphas = alpha_map.ravel()

        rgba_colors = []
        for c, a in zip(colors, alphas):
            rgb = mcolors.to_rgb(c)
            rgba_colors.append((*rgb, a))

        ax.bar3d(x * dx, y * dy, z, dx, dy, dz, color=rgba_colors, shade=True)

        ax.set_xlim(0, n * dx)
        ax.set_ylim(0, n * dy)
        ax.set_zlim(0, 3.2)
        ax.view_init(elev=30, azim=45)
        ax.set_box_aspect([1, 1, 0.3])
        ax.axis('off')
        ax.set_title(f"Frame {frame + 1}")

    ani = animation.FuncAnimation(fig, update, frames=len(grid_sequence), interval=1000 // fps)
    ani.save(save_path)
    plt.close()



In [28]:
# Collect frames from 10 episodes
grid_frames_all = []
agent_positions_all = []

num_episodes = 10

env = GridWorldEnv(grid_size=15, max_steps=100)

for ep in range(num_episodes):
    obs, _ = env.reset()
    done = False
    grid_frames = []
    agent_path = []

    while not done:
        x, y = env.agent_pos
        action = np.argmax(q_table[x, y])
        grid_frames.append(env.grid.copy())
        agent_path.append((x, y))
        _, _, done, _, _ = env.step(action)

    # Append final position
    grid_frames.append(env.grid.copy())
    agent_path.append(env.agent_pos)

    grid_frames_all.extend(grid_frames)
    agent_positions_all.extend(agent_path)

# Save the video with full 3D proportions
render_3d_grid_sequence(
    grid_sequence=grid_frames_all,
    agent_positions=agent_positions_all,
    save_path="agent_3d_path_10episodes.mp4",
    fps=5
)

print("✅ Saved 3D video of 10 episodes as 'agent_3d_path_10episodes.mp4'")




✅ Saved 3D video of 10 episodes as 'agent_3d_path_10episodes.mp4'


# ✅ Summary: What We Built

This notebook creates a fully custom 15x15 GridWorld environment using Gymnasium and Pygame in Google Colab. It includes:

- 🧠 A Q-learning agent that learns to navigate from top to bottom
- 🔥 Reward shaping and obstacle-based penalties
- 🧱 Dynamic obstacles with varying severity (blue/yellow/red)
- 🧭 8-directional movement with automatic rerouting
- 🎥 Visual rendering of test episodes as downloadable `.mp4` video

This environment can be extended with:
- Moving goals or agents
- Dynamic rewards
- Curriculum learning with progressive difficulty
