rlp: Reinforcement Learning Laboratory (CSE4037)

A comprehensive study of Curriculum Learning versus Baseline Training within the MiniGrid ecosystem. This project implements and visualises various RL architectures, ranging from standard DQNs to advanced parallelised Recurrent PPO agents.

Project Taxonomy

The laboratory is divided into five core research categories:

DQN Architectures: Pixel-based Deep Q-Networks exploring frame stacking and custom CNN kernels for sparse observations.
PPO Visual: Pixel-based Proximal Policy Optimization focused on on-policy stability in 8x8 environments.
PPO Symbolic: High-speed logical extraction using FlatObsWrapper for rapid convergence.
Recurrent PPO (RPPO): LSTM-driven agents designed to solve POMDP (Partially Observable) challenges by maintaining temporal memory.
REPP2 Advanced: Our peak architecture—parallelised Recurrent PPO using SubprocVecEnv for multi-stage sequence learning across complex room boundaries.

Key Research Findings

Memory as a Necessity: Standard MLP policies suffer from "goldfish memory." LSTMs are critical for 8x8 maps to remember the location of keys/doors when they exit the agent's 7x7 field of view.
Symbolic vs. Pixel: Symbolic observations (object IDs) converge orders of magnitude faster than raw RGB pixels by bypassing the visual perception bottleneck.
Temporal Context: Frame stacking (4-frame buffer) significantly stabilises DQN performance on vision-based tasks.
Curriculum Efficiency: Staged training (e.g., Empty -> DoorKey 5x5 -> DoorKey 8x8) accelerates the discovery of sparse rewards in high-dimensional state spaces.

Installation & Setup

This project uses uv for Python and npm for the React dashboard.

1. Clone & Python Environment

git clone https://github.com/nxck2005/rlp
cd rlp/src
uv sync

2. Frontend Environment

cd rlp/frontend
npm install

Running the Laboratory (Webapp)

To launch the integrated visualizer, you must run both the backend and frontend in parallel sessions:

Backend (from rlp/src):

uv run uvicorn api.server:app --host 0.0.0.0 --port 8000

Frontend (from rlp/frontend):

npm run dev

Dashboard available at: http://localhost:5173

Features

Unified Visualizer Dashboard

A modern "Laboratory" interface built with React 19 + TypeScript to monitor agent logic in real-time.

Dual Viewports: Simultaneously view the Global God-View and the Agent's 7x7 Partial Observation.
Spatial Analysis (Phase 3):
- Dynamic Heatmaps: Visualise agent occupancy and exploration density layers.
- Breadcrumb Tracking: Real-time path tracking to identify navigation loops and efficiency.
Episode Analysis: Interactive "Time-Travel" scrubber to step through historical episodes frame-by-frame.
Policy Telemetry: Live action distribution charts and rolling success rate metrics.

Detailed theoretical notes and parameter explanations can be found in CITED.md and PARAMS.md.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
frontend		frontend
src		src
.gitignore		.gitignore
CITED.md		CITED.md
Episode Length (DQN vs PPO vs RPPO).png		Episode Length (DQN vs PPO vs RPPO).png
Episode Length (DQN vs PPO).png		Episode Length (DQN vs PPO).png
Episode Reward (DQN vs PPO vs RPPO).png		Episode Reward (DQN vs PPO vs RPPO).png
Episode Reward (DQN vs PPO).png		Episode Reward (DQN vs PPO).png
LICENSE		LICENSE
PARAMS.md		PARAMS.md
README.md		README.md
VISUALISATION_PLAN.md		VISUALISATION_PLAN.md
rppo_4stage_performance.png		rppo_4stage_performance.png
rppo_comparison.png		rppo_comparison.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rlp: Reinforcement Learning Laboratory (CSE4037)

Project Taxonomy

Key Research Findings

Installation & Setup

1. Clone & Python Environment

2. Frontend Environment

Running the Laboratory (Webapp)

Features

Unified Visualizer Dashboard

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rlp: Reinforcement Learning Laboratory (CSE4037)

Project Taxonomy

Key Research Findings

Installation & Setup

1. Clone & Python Environment

2. Frontend Environment

Running the Laboratory (Webapp)

Features

Unified Visualizer Dashboard

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages