# Welcome to the Reinforcement Learning Project

This notebook serves as a quick **guide and orientation** for anyone opening this repository for the first time.
It explains what the project is about, how the files are organized, and how to get started running experiments in **JupyterLab**.



By: Felipe Campoverde

---

## **Important**
### Run the code below before anything else.

In [3]:
"""
Robust bootstrap: make `src/` importable from any notebook location.
Finds the project root by searching for src/rl_capstone/__init__.py.
"""

import sys
from pathlib import Path
from IPython import get_ipython

GREEN = "\033[92m"; YELLOW = "\033[93m"; RED = "\033[91m"; RESET = "\033[0m"

def add_project_src(pkg_name: str = "rl_capstone") -> Path:
    here = Path.cwd()
    for base in [here, *here.parents]:
        marker = base / "src" / pkg_name / "__init__.py"
        if marker.exists():
            src_path = (base / "src").resolve()
            if str(src_path) not in sys.path:
                sys.path.insert(0, str(src_path))  # highest priority
            print(f"{GREEN}[ok]{RESET} using src at: {src_path}")
            return src_path
    raise FileNotFoundError(
        f"Could not locate src/{pkg_name}/__init__.py starting from {here} upwards."
    )

try:
    src_path = add_project_src("rl_capstone")

    # Auto-reload edited modules in src/ without manual re-imports
    ip = get_ipython()
    if ip:
        ip.run_line_magic("load_ext", "autoreload")
        ip.run_line_magic("autoreload", "2")

    # Quick smoke test
    from rl_capstone import GridWorld, WorldSettings
    _ = GridWorld(WorldSettings())
    print(f"{GREEN}Environment setup complete!{RESET}")

except Exception as e:
    print(f"{RED}Environment setup failed:{RESET}\n{e}")


[92m[ok][0m using src at: /home/houndsito/Documents/Development/github/fcampoverdeg/reinforcement_learning/src
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[92mEnvironment setup complete![0m


# Project Overview

Reinforcement Learning (RL) is a branch of Machine Learning that focuses on **how agents learn to make decisions** through interacting with an environment.
The agent receives **rewards or penalties** based on its actions and gradually learns a policy that **maximizes cumulative reward** over time.

In this project, the student implemented and compared several RL algorithms using a **custome GridWorld environment**:

- **Q-Learning** - model-free, off-policy learning.
- **SARSA** - model-free, on-policy learning.
- **Dyna-Q** - model-based RL that blends learning and planning.

Each algorithm is trained to navigate a **stochastic** grid world with obstacles, pits, and a goal state.

---

## Environment Setup

Before running notebooks, make sure your environment is active and functional:

```bash
source .venv/bin/activate
jupyter lab
```

If you have not yet installed dependencies:
```bash
# Preferable
pip install -r requirements.txt

# In case 'requirements.txt' is not available
pip install numpy scipy matplotlib jupyterlab ipykernel pandas tqdm \
            black ruff pytest pytest-cov mypy gymnasium pygame

# install packages in ieditable mode
pip install -e .
```
---

## Repository Structure

```text
reinforcement_learning/
├── Start_Here.ipynb     
├── notebooks/               ← main experiments
|   |── 00_RL.ipynb      ← you are here!
│   ├── 01_q_learning.ipynb
│   ├── 02_sarsa.ipynb
│   └── 03_dyna_q.ipynb
├── src/rl_capstone/         ← core implementation
│   ├── gridworld.py
│   ├── rl_algorithms.py
│   └── utils.py
├── data/                    ← training logs and results
├── figs/                    ← generated plots
├── reports/                 ← milestone & final reports
├── tests/                   ← unit tests
└── README.md                ← setup and project overview
```

---

## How to Get Started

1. Open the **Q-Learning notebook**:
   
- [notebooks/01_q_learning.ipynb](notebooks/01_q_learning.ipynb)

2. Run all cells top-to-bottom (**Shift + Enter**) to train a Q-learning agent.

3. Explore other algorithms:
- [SARSA notebook](notebooks/02_sarsa.ipynb)
- [Dyna-Q notebook](notebooks/03_dyna_q.ipynb)

4. Compare results using plots saved under the **figs/** directory

---

## Notes

- All algorithm implementations live in the **src/rl_capstone/** folder.
- The GridWorld environment defines states, transitions, and rewards.
- You can modify hyperparameters (**alpha**, **gamma**, **epsilon**, etc) directly in each notebook to experiment.

---

## Next Steps
- Start with **Q-Learning** to understand the training loop.
- Proceed to **SARSA** to compare on-policy learning.
- Explore **Dyna-Q** to see how planning accelerates learning.
- Document your results and insights in the **report/** folder.

---

<style>
    .button {
        background-color: #3b3b3b;
        color: white;
        padding: 25px 60px;
        border: none;
        border-radius: 12px;
        cursor: pointer;
        font-size: 30px;
        transition: background-color 0.3s ease;
    }

    .button:hover {
        background-color: #45a049;
        transform: scale(1.05);
    }
    
</style>

<div style=" text-align: center; margin-top:20px;">
    
  <a href="../Start_Here.ipynb">
    <button class="button">
      ⬅️ Prev: Start Here
    </button>
  </a>
  <span style="display:inline-block; width:200px;"></span>
  <a href="01_q_learning.ipynb">
    <button class="button">
      Next: Q-Learning ➡️
    </button>
  </a>
  
</div>
