# Board Detector Training — YOLOv8s on Google Colab T4

**What this notebook does:**
1. Mounts your Google Drive (upload `chess_boards/` there first, see Step 1 instructions).
2. Installs `ultralytics`.
3. Rewrites `data.yaml` so the paths point to the Colab filesystem.
4. Trains a YOLOv8s board-detection model for 100 epochs at 640×640.
5. Prints validation metrics (mAP, precision, recall).
6. Downloads `best.pt` → save it as **`board_detector.pt`** and drop it into `python-ml-service/models/`.

---
### Before you run
* Make sure **Runtime → Change runtime type → GPU (T4)** is selected.
* Upload the entire `chess_boards/` folder to **My Drive** root (i.e. `/content/drive/MyDrive/chess_boards/`).


---
## Step 1 — Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import os
BOARD_DATASET_ROOT = '/content/drive/MyDrive/chess_boards'
assert os.path.isdir(BOARD_DATASET_ROOT), (
    f"Dataset folder not found at {BOARD_DATASET_ROOT}. "
    "Upload chess_boards/ to the root of My Drive."
)
print('Drive mounted. Dataset root confirmed.')

---
## Step 2 — Install ultralytics

In [None]:
!pip install ultralytics -q

---
## Step 3 — Rewrite data.yaml with absolute Colab paths

In [None]:
import yaml

DATA_YAML_PATH = os.path.join(BOARD_DATASET_ROOT, 'data.yaml')

# Read existing yaml just to preserve roboflow metadata if present
with open(DATA_YAML_PATH) as f:
    data = yaml.safe_load(f)

# Overwrite the path keys with absolute Colab paths
data['train'] = os.path.join(BOARD_DATASET_ROOT, 'train', 'images')
data['val']   = os.path.join(BOARD_DATASET_ROOT, 'valid', 'images')
data['test']  = os.path.join(BOARD_DATASET_ROOT, 'test',  'images')

# Ensure nc and names are correct for board detection
data['nc']    = 1
data['names'] = ['chessboard']

with open(DATA_YAML_PATH, 'w') as f:
    yaml.dump(data, f, default_flow_style=False)

# Print the final yaml so you can visually verify
with open(DATA_YAML_PATH) as f:
    print(f.read())

---
## Step 4 — Verify GPU and dataset file counts

In [None]:
import torch
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))

for split in ('train', 'valid', 'test'):
    img_dir = os.path.join(BOARD_DATASET_ROOT, split, 'images')
    lbl_dir = os.path.join(BOARD_DATASET_ROOT, split, 'labels')
    imgs = len([f for f in os.listdir(img_dir) if f.lower().endswith(('.jpg','.jpeg','.png'))])
    lbls = len([f for f in os.listdir(lbl_dir) if f.endswith('.txt')])
    print(f'{split:>6s} — images: {imgs}, labels: {lbls}')

---
## Step 5 — Train YOLOv8s

| Hyper-parameter | Value | Why |
|---|---|---|
| model | yolov8s.pt | Small variant — trains fast on T4, still accurate |
| epochs | 100 | Matches the original training script |
| imgsz | 640 | Dataset was pre-resized to 640×640 by Roboflow |
| batch | 16 | Comfortable for T4 16 GB |
| device | 0 | First (and only) GPU |
| patience | 20 | Early-stop if val mAP doesn't improve for 20 epochs |

In [None]:
from ultralytics import YOLO

model = YOLO('yolov8s.pt')          # downloads pretrained COCO weights automatically

results = model.train(
    data   = DATA_YAML_PATH,
    epochs = 100,
    imgsz  = 640,
    batch  = 16,
    device = 0,
    patience = 20,                   # early stopping
    project  = '/content/runs',
    name     = 'board_detection',
    verbose  = True,
)

---
## Step 6 — Print validation metrics

In [None]:
# ultralytics stores the last validation run in model.metrics
metrics = model.metrics
print('\n===== Board Detector — Validation Metrics =====')
print(f"  mAP50      : {metrics.box.map50:.4f}")
print(f"  mAP50-95   : {metrics.box.map:.4f}")
print(f"  Precision  : {metrics.box.mp:.4f}")
print(f"  Recall     : {metrics.box.mr:.4f}")
print('================================================')

---
## Step 7 — Validate on the held-out test set

In [None]:
# Run validation on the TEST split (not the val split used during training)
test_results = model.val(
    data   = DATA_YAML_PATH,
    split  = 'test',
    imgsz  = 640,
    device = 0,
)
print('\n===== Test-Set Metrics =====')
print(f"  mAP50      : {test_results.box.map50:.4f}")
print(f"  mAP50-95   : {test_results.box.map:.4f}")
print(f"  Precision  : {test_results.box.mp:.4f}")
print(f"  Recall     : {test_results.box.mr:.4f}")
print('============================')

---
## Step 8 — Visual sample predictions on test images

In [None]:
from IPython.display import display, Image as IPImage
import glob

test_imgs = sorted(glob.glob(os.path.join(BOARD_DATASET_ROOT, 'test', 'images', '*.*')))
sample    = test_imgs[:6]          # show 6 sample predictions

pred_results = model.predict(
    source = sample,
    save   = True,
    show   = False,
    device = 0,
)

# The saved prediction images land in runs/predict/
pred_dir = sorted(glob.glob('/content/runs/predict*'))[-1]   # latest predict folder
for img_path in sorted(glob.glob(os.path.join(pred_dir, '*.*'))):
    display(IPImage(filename=img_path))

---
## Step 9 — Download best.pt

This downloads **`best.pt`** to your browser.
Save it as **`board_detector.pt`** and place it in:
`python-ml-service/models/board_detector.pt`

In [None]:
from google.colab import files

BEST_PT = '/content/runs/board_detection/weights/best.pt'
assert os.path.isfile(BEST_PT), (
    f'best.pt not found at {BEST_PT}. '
    'Check /content/runs/ manually.'
)

files.download(BEST_PT)
print('Download started. Save the file as  board_detector.pt')

---
### Done
Place the downloaded file at:
```
ThesisBookProject/
  └── python-ml-service/
        └── models/
              └── board_detector.pt   ← here
```