# CV Starter — Colab/Kaggle Quickstart

> Author : Badr TAJINI

**Academic year:** 2025–2026  
**School:** ECE  
**Course:** Machine Learning & Deep Learning 2 

---


Welcome! This notebook is intentionally beginner-friendly. Follow the steps exactly and you will confirm that the starter project works on a free GPU runtime.

### Before you run anything
1. **Open the notebook in Google Colab or Kaggle.**
2. **Change the hardware accelerator to GPU (T4 preferred).**
   * Colab: `Runtime` → `Change runtime type` → Hardware accelerator `GPU` → Save.
   * Kaggle: `Settings` (gear icon) → Turn on `Accelerator` → Choose `T4 x1`.
3. Once the GPU is enabled, run the cells **from top to bottom**. Every code cell has comments explaining what it does.

The notebook will: (a) check that a GPU is available, (b) install dependencies, (c) run a quick smoke test that loads CIFAR-10 and performs one training step, and (d) show you how to launch the full training/evaluation commands when you are ready for longer experiments.

## 0. Project files

- Option A: `git clone <your_repo_url> cv-project`
- Option B: Upload the `cv-project` folder via the left sidebar (ensure the root is `/content/cv-project`).

Run the cell below afterwards; it will raise a helpful error if the folder is missing.

### Quick checklist before running code

- **Colab:** Go to `Runtime → Change runtime type`, pick `GPU`, and click **Save**.
- **Kaggle:** Open the gear icon in the top-right, enable **Accelerator**, and choose `T4 x1`.
- Wait for the runtime to restart (Colab shows `Connected` again).
- Then run every cell in order. If you see an error, stop, read the message, and re-run the cell after fixing the issue.

Once the smoke test succeeds you can run the full training and evaluation commands shown at the end of the notebook.

### How to run a cell
- Click the little ▶️ button on the left of a cell, or press **Shift + Enter** (Colab) / **Ctrl + Enter** (Kaggle).
- Wait for the cell to finish (a number like `[1]` appears once it is done).
- If a cell shows an error, read the message, fix the issue, and re-run that same cell before moving forward.


### Step 0 — Confirm the GPU is ready
Run the next cell. You should see a table with GPU details (name + memory).
If you get the message `nvidia-smi unavailable`, the runtime is still on CPU—go back to the checklist above and switch it to GPU, then rerun the cell.


In [None]:
!nvidia-smi || echo "nvidia-smi unavailable (CPU runtime)"

### Step 1 — Point the notebook at the project folder
This cell makes sure Colab/Kaggle is looking at the `cv-project` directory.
If it raises a `FileNotFoundError`, you likely uploaded the folder to a different place. Use the file browser on the left to confirm the path, fix it, and rerun the cell.


In [None]:
import os
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().resolve()
if PROJECT_ROOT.name == "notebooks":
    PROJECT_ROOT = PROJECT_ROOT.parent.resolve()
elif PROJECT_ROOT.name == "content":
    candidate = PROJECT_ROOT / "cv-project"
    if candidate.exists():
        PROJECT_ROOT = candidate.resolve()

if not (PROJECT_ROOT / "src").exists():
    raise FileNotFoundError(
        f"Could not locate project root at {PROJECT_ROOT}. Clone or upload cv-project before proceeding."
    )

os.chdir(PROJECT_ROOT)
if str(PROJECT_ROOT / "src") not in sys.path:
    sys.path.append(str(PROJECT_ROOT / "src"))
print(f"Project root: {PROJECT_ROOT}")

### Step 2 — Install the project requirements
This command reads `requirements.txt` and installs exactly the same packages you would get locally.
Expect a lot of text output; that is normal. If installation fails, run the cell again before moving forward.


In [None]:
# Install project dependencies listed in requirements.txt
!pip install -r requirements.txt

### Step 3 — Run the smoke test
This quick check downloads CIFAR-10 (if needed), runs one mini-batch through the model, and saves `outputs/smoke_metrics.json`.
You should see a short JSON output such as `{"loss": ..., "batch_size": 64, ...}`.
If you hit a download or network error, wait a few seconds and re-run the cell; Colab sometimes needs a second try.


In [None]:
from src import smoke_check

smoke_path = smoke_check.run_smoke("configs/cv_cifar10_fast.yaml")
print(smoke_path.read_text())

> **Why run the smoke test?**

> The smoke test is a safety check before long training. It loads CIFAR‑10, runs a single forward/backward pass, and writes outputs/smoke_metrics.json. If this succeeds, you know:

> - the dataset can be downloaded/read,
> - the model compiles and runs on your GPU,
> - dependencies are installed correctly.
>
> In short, if it fails, fix the error (missing folder, bad install, no GPU) before investing time in a full training run :
>
> *After !python src/evaluate.py ... (Step 4 below)*

## 1. Review smoke-test output
- Confirm the previous cell printed a JSON block (loss, batch size, device).
- You should now see `outputs/smoke_metrics.json` in the file browser on the left.
- Need only a quick check? You can stop here. Ready for real training? Continue to Section 2 below.
- If anything failed, read the error message carefully, fix the issue, and re-run the smoke cell before moving on.


## 2. Full training run (optional)
Only run these cells when you want the complete experiment. On the first run PyTorch will also download CIFAR-10, so the first epoch may start slowly.

**Before running:**
1. Open `configs/cv_cifar10.yaml` (Menu → File → Open …) if you want to change hyperparameters.
2. Ensure the runtime still shows a GPU connection.
3. Close other browser tabs or notebooks to avoid memory pressure.


In [None]:
# Train the model using the main configuration file. Expect visible progress bars.
# The first time you run this it will also download CIFAR-10, so the progress bar
# might pause around 0% while data is fetched.
!python src/train.py --config configs/cv_cifar10.yaml

In [None]:
# Evaluate the best checkpoint produced during training.
# This will print accuracy and write the metrics/plots listed below.
!python src/evaluate.py --config configs/cv_cifar10.yaml --ckpt outputs/best.pt

### Step 4 — What should I see now?
- `outputs/best.pt`: saved checkpoint.
- `outputs/log.csv`: training history (loss/accuracy per epoch).
- `outputs/eval.json`, `per_class_metrics.csv`, `confusion_matrix.png`, `leaderboard.png`: evaluation artefacts.

If any of these files are missing, scroll up for errors in the training/evaluation cells.


## 3. Mirror this workflow for other tracks

1. Duplicate this notebook and rename it (e.g., `00_nlp_quickstart.ipynb`).
2. Copy the corresponding starter folder into your Colab/Kaggle workspace (`nlp-project`, `od-project`, `ts-project`).
3. Update the install cell so it matches that starter's `requirements.txt`.
4. Replace `from src import smoke_check` with the helper module provided in the new starter (each repo ships with one).
5. Point the train/eval commands at the new `configs/*.yaml` file.
6. Optionally tweak the markdown text so instructions mention the right dataset and metrics.

By following the same structure—GPU check → install → smoke test → full run—students can master all four tracks with a consistent workflow.