# Regression Starter — Colab/Kaggle Quickstart 

> Author : Badr TAJINI

**Academic year:** 2025–2026  
**School:** ECE  
**Course:** Machine Learning & Deep Learning 2 

---

Use this notebook to verify the tabular/time-series regression starter on a GPU runtime. It walks you through environment checks, dependency install, a smoke test, and optional full training/evaluation.

## 0. Project files

- Option A: `git clone <your_repo_url> ts-project`
- Option B: Upload the `ts-project` folder via the sidebar (`/content/ts-project` in Colab).

Run the remaining cells afterwards; they will alert you if the folder is missing.

### Quick checklist before running code
- Switch the runtime to **GPU (T4)** first.
  - Colab: `Runtime → Change runtime type → GPU → Save`.
  - Kaggle: gear icon → enable **Accelerator**, choose `T4 x1`.
- Wait for the GPU session to reconnect.
- Execute cells from top to bottom. Fix and rerun a cell if it errors before moving forward.

### How to run a cell
- Click the ▶️ button on the left of a cell, or press **Shift+Enter** (Colab) / **Ctrl+Enter** (Kaggle).
- A cell is done when a number like `[1]` appears on the left.
- Do not skip steps; later cells rely on earlier setup.

### Step 0 — Confirm the GPU is ready
Run the next cell. You should see GPU name + memory. If you see `nvidia-smi unavailable`, switch the runtime to GPU and rerun this cell.

In [None]:
!nvidia-smi || echo "nvidia-smi unavailable (CPU runtime)"


### Step 1 — Point the notebook at the project folder
This cell switches into the `ts-project` directory.
If you get a `FileNotFoundError`, confirm where you uploaded/cloned the folder, adjust the path, and rerun.

In [None]:
import os
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().resolve()
if PROJECT_ROOT.name == "notebooks":
    PROJECT_ROOT = PROJECT_ROOT.parent.resolve()
elif PROJECT_ROOT.name == "content":
    candidate = PROJECT_ROOT / "ts-project"
    if candidate.exists():
        PROJECT_ROOT = candidate.resolve()

if not (PROJECT_ROOT / "src").exists():
    raise FileNotFoundError(
        f"Could not locate project root at {PROJECT_ROOT}. Upload or clone ts-project before proceeding."
    )

os.chdir(PROJECT_ROOT)
if str(PROJECT_ROOT / "src") not in sys.path:
    sys.path.append(str(PROJECT_ROOT / "src"))
print(f"Project root: {PROJECT_ROOT}")


### Step 2 — Install the project requirements
Installs PyTorch + supporting libraries from `requirements.txt`. Expect quite a bit of output. If the install fails, rerun the cell before continuing.

In [None]:
# Install project dependencies listed in requirements.txt
!pip install -r requirements.txt


### Step 3 — Run the smoke test
This loads the dataset defined in your config, runs one optimizer step, and writes `outputs/smoke_metrics.json`.
If the CSV specified in the config is missing, the cell will raise a clear error so you can add a sample dataset.

In [None]:
from src import smoke_check

smoke_path = smoke_check.run_smoke("configs/reg_tabular_mlp.yaml")
print(smoke_path.read_text())


## 1. Review smoke-test output
- Confirm the previous cell printed a JSON block with the loss, batch size, and input shape.
- `outputs/smoke_metrics.json` should now exist. If the smoke test failed because data is missing, create a small CSV and rerun it before proceeding.
- Ready for full training? Continue to Section 2.

## 2. Full training run (optional)
Train/evaluate on the configuration you choose. Default `reg_tabular_mlp.yaml` expects a CSV at `data/train.csv` with a numeric `target` column.

**Before running:**
1. Ensure your CSV paths in the config exist (or switch configs to the time-series variant).
2. Open the config if you want to tweak epochs, batch size, learning rate, etc.
3. Confirm the runtime still shows a GPU connection.

In [None]:
# Train the model defined in configs/reg_tabular_mlp.yaml (tabular regression).
!python src/train.py --config configs/reg_tabular_mlp.yaml


In [None]:
# Evaluate the saved checkpoint on the validation/test splits.
!python src/evaluate.py --config configs/reg_tabular_mlp.yaml --ckpt outputs/best.pt


### Step 4 — What should I see now?
- `outputs/best.pt`: trained weights + scaler metadata.
- `outputs/log.csv`: epoch-by-epoch loss and regression metrics.
- `outputs/metrics.json`: best validation MSE.
- `outputs/eval.json`: validation (and, if configured, test) MSE/MAE/R².
If any files are missing, scroll up for errors in the training/evaluation cells.

## 3. Switch configurations
- To run the time-series variant, swap the config paths in the training/evaluation cells for `configs/reg_timeseries_lstm.yaml`.
- Update the CSV paths (`data.csv_path`) to point at your series file (must include a timestamp column).
- Re-run the smoke test and full run to validate the new setup.

Keep the same order—GPU check → install → smoke test → train → evaluate—for consistent results.