## Quick Resume (Updated Dataset Structure)

- Enable GPU (Runtime → Change runtime type → GPU) and run the GPU check.
- Mount Drive (ensure /content/drive/MyDrive/datasets contains drowsy/ and notdrowsy/). Use symlink instead of copying.

```bash
# 1) Clone or pull project
%cd /content
!git clone https://github.com/hmolhem/nthu-driver-drowsiness-ROI.git || true
%cd nthu-driver-drowsiness-ROI
!git pull

# 2) Install deps (CUDA build)
!pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision -U
!pip install -r requirements.txt

# 3) Link dataset (fast, avoids Drive copy)
from pathlib import Path
root = Path('datasets')
drive = Path('/content/drive/MyDrive/datasets')  # Must contain drowsy/, notdrowsy/
if root.is_symlink() and root.resolve()==drive:
    print('Symlink already exists: datasets ->', drive)
elif root.exists() and not root.is_symlink():
    print('Local datasets directory exists; using it.')
elif drive.exists():
    root.symlink_to(drive, target_is_directory=True)
    print('✅ Symlink created: datasets -> /content/drive/MyDrive/datasets')
else:
    raise FileNotFoundError('Missing: /content/drive/MyDrive/datasets')

# 4) Patch baseline YAML to Colab-friendly workers
import yaml
from pathlib import Path as _P
p = _P('configs/baseline_resnet50.yaml')
cfg = yaml.safe_load(p.read_text()); cfg.setdefault('data', {})
cfg['data']['num_workers'] = 2; cfg['data']['data_root'] = 'datasets'; cfg['data']['pin_memory'] = True
p.write_text(yaml.safe_dump(cfg, sort_keys=False))
print('✅ Updated', p)
print('data_root now:', cfg['data']['data_root'])

# 5) Train (regularized recommended) and evaluate
!python src/training/train_baseline.py --config configs/baseline_resnet50_regularized.yaml --device cuda
!python src/eval/evaluate_checkpoint.py --config configs/baseline_resnet50_regularized.yaml --device cuda --save-fig --save-preds
```

## Step 1: Verify GPU is Available

In [None]:
# Check GPU availability
!nvidia-smi

**Expected output:** Should show GPU info (Tesla T4, L4, or similar)

**If you see an error:** Go back and enable GPU runtime!

## Step 2: Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

**Action required:** Click the link and authorize access to your Google Drive

## Step 3: Clone Project from GitHub

In [None]:
%cd /content
!git clone https://github.com/hmolhem/nthu-driver-drowsiness-ROI.git
%cd nthu-driver-drowsiness-ROI
!git status

**Expected:** Shows "On branch main" and clean working tree

## Step 4: Set Up Dataset Symlink

This links your Drive dataset to the project folder

In [None]:
# Link dataset from Drive (run once if symlink missing)
%cd /content/nthu-driver-drowsiness-ROI
import pathlib
root = pathlib.Path('datasets')
drive = pathlib.Path('/content/drive/MyDrive/datasets')
if root.is_symlink() and root.resolve()==drive:
    print('Symlink already exists: datasets ->', drive)
elif root.exists() and not root.is_symlink():
    print('Local datasets directory present (not symlink); using it.')
elif drive.exists():
    root.symlink_to(drive, target_is_directory=True)
    print('✅ Symlink created: datasets -> /content/drive/MyDrive/datasets')
else:
    raise FileNotFoundError('Drive dataset not found at /content/drive/MyDrive/datasets')
!ls -lh datasets | head -5

**Expected:** Should list 3 image files from drowsy folder

**If error:** Adjust the path in the `ln -s` command to match your Drive structure

## Step 5: Verify Data Splits

In [None]:
# Patch baseline YAML for Colab (workers=2, data_root=datasets)
import yaml, pathlib
p = pathlib.Path('configs/baseline_resnet50.yaml')
cfg = yaml.safe_load(p.read_text())
cfg.setdefault('data', {})
cfg['data']['num_workers'] = 2
cfg['data']['data_root'] = 'datasets'
cfg['data']['pin_memory'] = True
p.write_text(yaml.safe_dump(cfg, sort_keys=False))
print('✅ Updated', p)
print('num_workers:', cfg['data']['num_workers'])
print('data_root:', cfg['data']['data_root'])

**Expected:** Shows train.csv, val.csv, test.csv with sizes and first few rows

## Step 6: Install Dependencies

In [None]:
# Install PyTorch with CUDA support
!pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision -U

# Install other requirements
!pip install -r requirements.txt

**Note:** May see some warnings about version conflicts — these are usually harmless

## Step 7: Verify CUDA PyTorch

In [None]:
import torch
print('PyTorch version:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU device:', torch.cuda.get_device_name(0))
    print('GPU memory:', torch.cuda.get_device_properties(0).total_memory / 1e9, 'GB')
else:
    print('WARNING: CUDA not available! Go enable GPU runtime.')

**Expected:** 
- `CUDA available: True`
- GPU device: Tesla T4, L4, or similar

**If False:** Stop here and enable GPU runtime!

## Step 8: Start Training (ResNet50 Baseline)

This will take ~30-60 minutes depending on GPU

In [None]:
# Train with full baseline config (224px, 50 epochs with early stopping)
!python src/training/train_baseline.py --config configs/baseline_resnet50.yaml --device cuda

**What to expect:**
- Progress bars for each epoch
- Training loss should decrease over time
- Validation metrics printed after each epoch
- Early stopping may trigger before 50 epochs if val metric plateaus
- Checkpoints saved to `checkpoints/baseline_resnet50_best.pth` and `_last.pth`

## Step 9: Verify Checkpoints

In [None]:
!ls -lh checkpoints/

**Expected:** Should see `baseline_resnet50_best.pth` and `baseline_resnet50_last.pth` (~90 MB each)

## Step 10: Evaluate on Test Set

In [None]:
import torch
from src.models.classifier import create_model
from src.data.transforms import get_val_transforms
from src.data.dataset import create_dataloaders
from src.utils.config import get_config
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

# Load config
cfg = get_config("configs/baseline_resnet50.yaml")
cfg.data.num_workers = 2  # Colab-friendly

# Create test dataloader
val_tf = get_val_transforms(cfg.data.image_size)
loaders = create_dataloaders(
    cfg.data.train_csv, cfg.data.val_csv, cfg.data.test_csv,
    data_root=cfg.data.data_root,
    train_transform=val_tf,
    val_transform=val_tf,
    batch_size=cfg.data.batch_size,
    num_workers=cfg.data.num_workers
)

# Load model with best checkpoint
model = create_model(cfg)
ckpt = torch.load("checkpoints/baseline_resnet50_best.pth", map_location="cuda")
model.load_state_dict(ckpt['model_state'])
model.cuda().eval()

print("Best checkpoint from epoch:", ckpt.get('epoch', 'unknown'))
print("Best val macro-F1:", ckpt.get('val_macro_f1', 'unknown'))
print("\nEvaluating on test set...")

# Run inference on test set
all_preds, all_labels = [], []
with torch.no_grad():
    for images, labels, _ in loaders['test']:
        images = images.cuda()
        outputs = model(images)
        preds = outputs.argmax(1).cpu()
        all_preds.append(preds)
        all_labels.append(labels)

all_preds = torch.cat(all_preds).numpy()
all_labels = torch.cat(all_labels).numpy()

# Print metrics
print("\n" + "="*60)
print("TEST SET RESULTS")
print("="*60)
print(classification_report(all_labels, all_preds, target_names=['notdrowsy', 'drowsy']))
print("\nConfusion Matrix:")
print(confusion_matrix(all_labels, all_preds))
print("\n[Row: True label | Column: Predicted label]")
print("[0=notdrowsy, 1=drowsy]")

## Step 11: Copy Checkpoints to Drive (for backup)

In [None]:
# Create backup directory on Drive
!mkdir -p /content/drive/MyDrive/drowsiness-results/checkpoints

# Copy checkpoints
!cp checkpoints/baseline_resnet50_best.pth /content/drive/MyDrive/drowsiness-results/checkpoints/
!cp checkpoints/baseline_resnet50_last.pth /content/drive/MyDrive/drowsiness-results/checkpoints/

print("✅ Checkpoints backed up to Drive!")

## Step 12: Download Checkpoints (Optional)

If you want to download directly to your computer:

In [None]:
# Write regularized config (updated data_root)
from pathlib import Path
regularized_yaml = Path('configs/baseline_resnet50_regularized.yaml')
regularized_yaml.write_text('''\
model:
  name: "resnet50"
  architecture: "resnet50"
  pretrained: true
  num_classes: 2
  freeze_backbone: true
  dropout: 0.7

data:
  data_root: "datasets"
  train_csv: "data/splits/train.csv"
  val_csv: "data/splits/val.csv"
  test_csv: "data/splits/test.csv"
  image_size: 224
  batch_size: 32
  num_workers: 2
  pin_memory: true

augmentation:
  enabled: true

training:
  epochs: 30
  optimizer: "adam"
  learning_rate: 0.00005
  weight_decay: 0.0005
  lr_scheduler:
    type: "reduce_on_plateau"
    mode: "max"
    factor: 0.5
    patience: 4
    min_lr: 0.000001
  loss:
    type: "weighted_cross_entropy"
    use_class_weights: true
  early_stopping:
    enabled: true
    patience: 5
    monitor: "val_macro_f1"
    mode: "max"
  gradient_clipping:
    enabled: true
    max_norm: 1.0

logging:
  experiment_name: "baseline_resnet50_regularized"
  log_dir: "runs"
  save_dir: "checkpoints"
  save_best_only: true
  save_last: true

seed: 42
device: "cuda"
''')
print(f'Wrote {regularized_yaml}')

## Next Steps

After successful training:

1. **Download checkpoints** to your local machine
2. **Train EfficientNet-B0 baseline** for comparison:
   ```python
   !python src/training/train_baseline.py --config configs/baseline_efficientnet.yaml --device cuda
   ```
3. **Compare results** and create performance tables
4. **Begin ROI implementation** (next phase)

---

**Troubleshooting:**
- **CUDA not available:** Enable GPU runtime
- **Dataset not found:** Check Drive path in symlink command
- **Out of memory:** Reduce batch_size in config
- **Import errors:** Re-run pip install cell

In [None]:
# Evaluate regularized model (checkpoint evaluator)
!python src/eval/evaluate_checkpoint.py --config configs/baseline_resnet50_regularized.yaml --device cuda --save-fig --save-preds

In [None]:
# 14. Train Regularized Baseline (GPU)
!python src/training/train_baseline.py --config configs/baseline_resnet50_regularized.yaml --device cuda

In [None]:
# 15. Evaluate Regularized Model on Test Set (metrics + plots)
!python src/eval/evaluate_model.py --config configs/baseline_resnet50_regularized.yaml --device cuda --save-fig --save-preds

In [None]:
# 16. Save & Download Regularized Run Artifacts
!zip -r baseline_results_regularized.zip checkpoints/baseline_resnet50_regularized_* runs/baseline_resnet50_regularized/ -q || echo "zip complete"
from google.colab import files
files.download('baseline_results_regularized.zip')

In [None]:
# 17. Train EfficientNet-B0 Baseline (GPU)
!python src/training/train_baseline.py --config configs/baseline_efficientnet.yaml --device cuda

In [None]:
# 18. Evaluate EfficientNet-B0 on Test Set (metrics + plots)
!python src/eval/evaluate_model.py --config configs/baseline_efficientnet.yaml --device cuda --save-fig --save-preds

In [None]:
# 19. Save & Download EfficientNet-B0 Artifacts
!zip -r efficientnet_b0_results.zip checkpoints/baseline_efficientnet_b0_* runs/baseline_efficientnet_b0/ -q || echo "zip complete"
from google.colab import files
files.download('efficientnet_b0_results.zip')

In [None]:
# 20. Compare Baselines: ResNet50 vs Regularized vs EfficientNet-B0
from pathlib import Path
import json, pandas as pd, numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')

experiments = [
    ("ResNet50 (baseline)", Path('runs/baseline_resnet50/metrics_test.json')),
    ("ResNet50 (regularized)", Path('runs/baseline_resnet50_regularized/metrics_test.json')),
    ("EfficientNet-B0", Path('runs/baseline_efficientnet_b0/metrics_test.json')),
]

rows = []
missing = []
for name, mpath in experiments:
    if mpath.exists():
        with open(mpath, 'r', encoding='utf-8') as f:
            m = json.load(f)
        roc_auc = m.get('roc_auc', {})
        if isinstance(roc_auc, dict) and roc_auc:
            roc_macro = float(np.mean([float(v) for v in roc_auc.values()]))
        else:
            roc_macro = np.nan
        rows.append({
            'experiment': name,
            'accuracy': float(m.get('accuracy', np.nan)),
            'macro_f1': float(m.get('f1_macro', np.nan)),
            'precision_macro': float(m.get('precision_macro', np.nan)),
            'recall_macro': float(m.get('recall_macro', np.nan)),
            'roc_auc_macro': roc_macro,
            'source': str(mpath),
        })
    else:
        missing.append((name, str(mpath)))

if missing:
    print("Missing metrics (run those evaluations first):")
    for n, p in missing:
        print(f" - {n}: {p}")

df = pd.DataFrame(rows)
if not df.empty:
    display(df.sort_values('macro_f1', ascending=False).reset_index(drop=True))
    # Plot Macro-F1 and Accuracy
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    order = df.sort_values('macro_f1', ascending=False)['experiment']
    sns.barplot(data=df, x='experiment', y='macro_f1', order=order, ax=axes[0], palette='Blues_d')
    axes[0].set_title('Macro-F1 (Test)')
    axes[0].set_ylim(0, 1)
    axes[0].set_xlabel('')
    axes[0].set_ylabel('Macro-F1')
    axes[0].tick_params(axis='x', rotation=20)

    sns.barplot(data=df, x='experiment', y='accuracy', order=order, ax=axes[1], palette='Greens_d')
    axes[1].set_title('Accuracy (Test)')
    axes[1].set_ylim(0, 1)
    axes[1].set_xlabel('')
    axes[1].set_ylabel('Accuracy')
    axes[1].tick_params(axis='x', rotation=20)

    fig.tight_layout()
    out_dir = Path('reports/figures')
    out_dir.mkdir(parents=True, exist_ok=True)
    out_png = out_dir / 'comparison_baselines.png'
    fig.savefig(out_png, dpi=150)
    print(f'Saved comparison figure to {out_png}')
else:
    print('No metrics found to compare yet.')

## Speed Up: Copy Dataset to Local SSD
On Colab, reading from Drive symlinks is slow. Copy the dataset once to the VM's local disk for faster DataLoader performance.

In [None]:
# Copy dataset from Drive to local (run once per session)
%cd /content/nthu-driver-drowsiness-ROI
import os, shutil, pathlib
local_root = pathlib.Path('datasets')
archive_link = local_root / 'archive'
drive_archive = pathlib.Path('/content/drive/MyDrive/datasets/archive')
local_root.mkdir(parents=True, exist_ok=True)
if archive_link.is_symlink():
    archive_link.unlink()
if not (local_root / 'archive').exists():
    if drive_archive.exists():
        print('Copying dataset from Drive to local... (may take several minutes)')
        shutil.copytree(drive_archive, local_root / 'archive')
        print('✅ Copied to datasets/archive')
    else:
        raise FileNotFoundError('Drive dataset not found at /content/drive/MyDrive/datasets/archive')
else:
    print('Local datasets/archive already exists; skipping copy.')
!ls -lh datasets | head -5

## Set DataLoader Workers to 2 (Colab-friendly)
Colab often warns about too many workers. This patches the baseline YAML to use 2 workers and the local dataset path.

In [None]:
import yaml, pathlib
p = pathlib.Path('configs/baseline_resnet50.yaml')
cfg = yaml.safe_load(p.read_text())
cfg.setdefault('data', {})
cfg['data']['num_workers'] = 2
cfg['data']['data_root'] = 'datasets/archive'
cfg['data']['pin_memory'] = True
p.write_text(yaml.safe_dump(cfg, sort_keys=False))
print('✅ Updated', p)
print('num_workers:', cfg['data']['num_workers'])
print('data_root:', cfg['data']['data_root'])

## Resume Tomorrow: Quick Checklist
- Step 1: Enable GPU and run the GPU check.
- Step 2: Mount Drive.
- Step 3: Clone repo (or `git pull` if it exists).
- Step 4: Run the copy-to-local cell to ensure `datasets/archive` exists locally.
- Step 5: Ensure `num_workers: 2` in YAML (patch cell above).
- Step 6: Start training (regularized config recommended):
  - `!python src/training/train_baseline.py --config configs/baseline_resnet50_regularized.yaml --device cuda`
- Optional: Evaluate best checkpoint when done:
  - `!python src/eval/evaluate_model.py --config configs/baseline_resnet50_regularized.yaml --device cuda --save-fig --save-preds`
- Tip: If time is short, try `configs/baseline_efficientnet.yaml` for a lighter model.