# Colab Quickstart (5-10 min)

This notebook is an executable entry point to the repository.

It runs the full end-to-end pipeline on a tiny **fixture** dataset (no NinaPro download):
`prepare -> splits -> traineval -> report`, then runs `size` for quick sizing estimates.

Outputs are written to `runs/colab_quickstart/` so your working tree stays clean.

> This is a tutorial run (`--profile colab_quickstart`), not a benchmark for reporting results.


In [None]:
import os
import subprocess
import sys
from pathlib import Path


def sh(cmd, cwd=None):
    print("+", " ".join(cmd))
    subprocess.check_call(cmd, cwd=cwd)


IN_COLAB = "google.colab" in sys.modules
REPO_URL = "https://github.com/geronimobergk/semg-protocol-sensitivity.git"


def find_repo_root(start: Path) -> Path:
    for candidate in [start] + list(start.parents):
        if (candidate / "pyproject.toml").exists() and (candidate / "configs").exists():
            return candidate
    raise FileNotFoundError(f"Repo root not found from: {start}")


if IN_COLAB:
    REPO_ROOT = Path("/content/semg-protocol-sensitivity")
    if not REPO_ROOT.exists():
        sh(["git", "clone", REPO_URL, str(REPO_ROOT)])
else:
    REPO_ROOT = find_repo_root(Path.cwd())

sh([sys.executable, "-m", "pip", "install", "-e", "."], cwd=REPO_ROOT)

os.chdir(REPO_ROOT)
print("Repo root:", REPO_ROOT)


In [None]:
# sanity check
import torch

print(
    "python:",
    sys.version.split()[0],
    "| torch:",
    torch.__version__,
    "| cuda:",
    torch.cuda.is_available(),
)


## Configure outputs (via `--overrides`)

The base experiment config writes to `artifacts/`, `runs/`, and `reports/`. For a tutorial run, we redirect **all** outputs under `runs/colab_quickstart/`.

This keeps committed report files untouched while still showcasing the full pipeline.

In [None]:
from pathlib import Path


OUT_ROOT = (REPO_ROOT / "runs" / "colab_quickstart").resolve()
OUT_ROOT.mkdir(parents=True, exist_ok=True)
overrides_path = OUT_ROOT / "overrides_colab_quickstart.yaml"
overrides_path.write_text(
    """experiment:
  artifacts_root: "{artifacts_root}"
  runs_root: "{runs_root}"
  reports_root: "{reports_root}"
""".format(
        artifacts_root=OUT_ROOT / "artifacts",
        runs_root=OUT_ROOT / "runs",
        reports_root=OUT_ROOT / "reports",
    ),
    encoding="utf-8",
)

print("Overrides written to:", overrides_path)
print("Outputs root:", OUT_ROOT)


## Run the tiny end-to-end pipeline

This executes:

- `prepare` (fixture preprocessing)
- `splits` (pooled rep-disjoint + LOSO)
- `traineval` (tiny CNN, capped steps)
- `report` (aggregated tables)

All with `--profile colab_quickstart`.


In [None]:
import subprocess
import sys

BASE_CONFIG = str(REPO_ROOT / "configs/experiments/protocol_sensitivity_semg_cnn.yml")
PROFILE = "colab_quickstart"
CLI = [sys.executable, "-m", "tinyml_semg_classifier.cli"]

cmd = CLI + [
    "run",
    "-c",
    BASE_CONFIG,
    "--profile",
    PROFILE,
    "--overrides",
    str(overrides_path),
]
print("Running:", " ".join(cmd))
subprocess.run(cmd, check=True)


## Inspect outputs

We print the generated protocol tables and one example `metrics.json` to confirm the pipeline produced results end-to-end.

In [None]:
import json
from pathlib import Path

reports_root = OUT_ROOT / "reports"
tables_path = reports_root / "protocol_tables.md"

print("protocol_tables.md ->", tables_path)
print(tables_path.read_text(encoding="utf-8"))

metrics_paths = sorted((OUT_ROOT / "runs").rglob("metrics.json"))
print("metrics.json files:", len(metrics_paths))
if metrics_paths:
    sample = metrics_paths[0]
    print("Example run ->", sample)
    payload = json.loads(sample.read_text(encoding="utf-8"))
    print(json.dumps(payload, indent=2)[:2000])

## Sizing: `size`

`size` benchmarks a few steps, probes concurrency, and estimates wall-time plus resources.

We keep this tiny and CPU-only for Colab speed.


In [None]:
import json
import subprocess
import sys

cmd_size = CLI + [
    "size",
    "-c",
    BASE_CONFIG,
    "--profile",
    PROFILE,
    "--overrides",
    str(overrides_path),
    "--warmup-steps",
    "1",
    "--bench-train-steps",
    "5",
    "--bench-val-steps",
    "5",
    "--device",
    "cpu",
    "--max-k",
    "1",
    "--max-gpus",
    "1",
    "--alpha",
    "1.0",
]
print("Running:", " ").join(cmd_size)
subprocess.run(cmd_size, check=True)

sizing_path = OUT_ROOT / "artifacts" / "sizing" / "sizing.json"
sizing = json.loads(sizing_path.read_text(encoding="utf-8"))

print("sizing.json ->", sizing_path)
print(json.dumps(sizing, indent=2)[:2000])

recommendation = sizing.get("recommendation") or {}
if recommendation:
    print("Recommendation:", recommendation)

walltime_by_gpus = sizing.get("walltime_by_gpus") or []
if walltime_by_gpus:
    print("Walltime by GPUs:", walltime_by_gpus)


## Next steps

- Switch `PROFILE` to `smoke` or `dev_mini` for a larger fixture run.
- Run without a profile (or with `dry_run`) for real NinaPro data.
- Use `configs/experiments/protocol_sensitivity_semg_cnn.yml` to change protocols, models, or seeds.
