# üõ£Ô∏è Road Damage Detection ‚Äî YOLOv5 Training Notebook (Local - VS Code)

**Thesis:** Road Damage Detection Mobile App for LGU Road Surveys Using YOLOv5

This notebook trains a YOLOv5s model on the RDD2022 dataset **locally on your Mac (Apple M1)** using MPS (Metal Performance Shaders) GPU acceleration.

**Detects 4 types of road damage:**
- **D00** ‚Äî Longitudinal Crack
- **D10** ‚Äî Transverse Crack
- **D20** ‚Äî Alligator Crack
- **D40** ‚Äî Pothole

## Your Hardware
- **CPU:** Apple M1
- **RAM:** 8 GB
- **GPU:** MPS (Metal) ‚Äî supported by PyTorch!

## Instructions
1. Run all cells **in order** from top to bottom
2. Training will take ~1-3 hours depending on dataset size
3. The trained `best.pt` will be saved to `backend/models/`

> ‚ö†Ô∏è **8GB RAM is tight.** We'll use batch size 8 instead of 16 to avoid memory issues.

## Step 1: Check Hardware & GPU Support

In [1]:
import torch
import platform
import os

print(f"Python: {platform.python_version()}")
print(f"PyTorch: {torch.__version__}")
print(f"System: {platform.system()} {platform.machine()}")
print()

# Check for Apple MPS (Metal) GPU
if torch.backends.mps.is_available():
    print("‚úÖ MPS (Metal GPU) is available ‚Äî training will use GPU acceleration!")
    device = "mps"
elif torch.cuda.is_available():
    print("‚úÖ CUDA GPU is available!")
    device = "cuda"
else:
    print("‚ö†Ô∏è  No GPU found ‚Äî training will use CPU (slower but still works)")
    device = "cpu"

print(f"Selected device: {device}")

# Set the project root
PROJECT_ROOT = os.path.abspath(os.path.dirname("__file__"))
BACKEND_DIR = os.path.join(PROJECT_ROOT, "backend")
MODELS_DIR = os.path.join(BACKEND_DIR, "models")
DATA_DIR = os.path.join(BACKEND_DIR, "data")

os.makedirs(MODELS_DIR, exist_ok=True)
os.makedirs(DATA_DIR, exist_ok=True)

print(f"\nProject root: {PROJECT_ROOT}")
print(f"Models will be saved to: {MODELS_DIR}")

Python: 3.11.4
PyTorch: 2.10.0
System: Darwin arm64

‚úÖ MPS (Metal GPU) is available ‚Äî training will use GPU acceleration!
Selected device: mps

Project root: /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection
Models will be saved to: /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/models
‚úÖ MPS (Metal GPU) is available ‚Äî training will use GPU acceleration!
Selected device: mps

Project root: /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection
Models will be saved to: /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/models


## Step 2: Install Dependencies & Clone YOLOv5

This installs PyTorch, YOLOv5, and all required packages into your local Python environment.

In [2]:
import subprocess
import sys

# Install required packages
packages = ["torch", "torchvision", "ultralytics", "opencv-python-headless", "Pillow", "matplotlib", "pandas", "pyyaml", "tqdm", "scipy", "seaborn"]
for pkg in packages:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])

print("‚úÖ All packages installed!")

# Clone YOLOv5 repo if not already present
yolov5_dir = os.path.join(BACKEND_DIR, "yolov5")
if not os.path.exists(yolov5_dir):
    print("Cloning YOLOv5 repository...")
    subprocess.check_call(["git", "clone", "https://github.com/ultralytics/yolov5.git", yolov5_dir])
    print("‚úÖ YOLOv5 cloned!")
else:
    print("‚úÖ YOLOv5 already exists, skipping clone.")

# Install YOLOv5-specific requirements
req_file = os.path.join(yolov5_dir, "requirements.txt")
if os.path.exists(req_file):
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "-r", req_file])
    print("‚úÖ YOLOv5 requirements installed!")

print(f"\nYOLOv5 location: {yolov5_dir}")

‚úÖ All packages installed!
Cloning YOLOv5 repository...


Cloning into '/Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/yolov5'...


‚úÖ YOLOv5 cloned!
‚úÖ YOLOv5 requirements installed!

YOLOv5 location: /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/yolov5


## Step 3: Download & Prepare RDD2022 Dataset

### How to get the dataset:
1. Go to [https://github.com/sekilab/RoadDamageDetector](https://github.com/sekilab/RoadDamageDetector)
2. Download the **RDD2022** dataset (look for the download link in their README)
3. Extract/organize the images and labels into the folder structure shown below

### Required folder structure:
```
backend/data/
‚îú‚îÄ‚îÄ images/
‚îÇ   ‚îú‚îÄ‚îÄ train/    ‚Üê training images (.jpg)
‚îÇ   ‚îî‚îÄ‚îÄ val/      ‚Üê validation images (.jpg)
‚îú‚îÄ‚îÄ labels/
‚îÇ   ‚îú‚îÄ‚îÄ train/    ‚Üê YOLO format labels (.txt)
‚îÇ   ‚îî‚îÄ‚îÄ val/      ‚Üê YOLO format labels (.txt)
‚îî‚îÄ‚îÄ data.yaml     ‚Üê created automatically in the next cell
```

> **Run the cell below** to create the folder structure, then **manually copy** your images and labels into the folders.

In [3]:
# Create the dataset folder structure
folders = [
    os.path.join(DATA_DIR, "images", "train"),
    os.path.join(DATA_DIR, "images", "val"),
    os.path.join(DATA_DIR, "labels", "train"),
    os.path.join(DATA_DIR, "labels", "val"),
]

for folder in folders:
    os.makedirs(folder, exist_ok=True)
    print(f"‚úÖ {folder}")

print(f"\nüìÅ Dataset directory: {DATA_DIR}")
print()
print("=" * 60)
print("NEXT STEP: Copy your RDD2022 images & labels into these folders!")
print("=" * 60)
print()
print("  Images (.jpg) ‚Üí backend/data/images/train/ and .../val/")
print("  Labels (.txt) ‚Üí backend/data/labels/train/ and .../val/")
print()
print("Each label .txt file should have lines like:")
print("  <class_id> <x_center> <y_center> <width> <height>")
print("  Example: 0 0.5 0.5 0.3 0.2")
print()
print("Class IDs: 0=D00, 1=D10, 2=D20, 3=D40")

‚úÖ /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data/images/train
‚úÖ /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data/images/val
‚úÖ /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data/labels/train
‚úÖ /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data/labels/val

üìÅ Dataset directory: /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data

NEXT STEP: Copy your RDD2022 images & labels into these folders!

  Images (.jpg) ‚Üí backend/data/images/train/ and .../val/
  Labels (.txt) ‚Üí backend/data/labels/train/ and .../val/

Each label .txt file should have lines like:
  <class_id> <x_center> <y_center> <width> <height>
  Example: 0 0.5 0.5 0.3 0.2

Class IDs: 0=D00, 1=D10, 2=D20, 3=D40


## Step 4: Create Dataset Configuration File (data.yaml)

In [4]:
# Create data.yaml configuration file
yaml_content = f"""path: {DATA_DIR}
train: images/train
val: images/val

nc: 4
names: ['D00', 'D10', 'D20', 'D40']
"""

yaml_path = os.path.join(DATA_DIR, "data.yaml")
with open(yaml_path, "w") as f:
    f.write(yaml_content)

print(f"‚úÖ Created {yaml_path}")
print()
print("Contents:")
print(yaml_content)

‚úÖ Created /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data/data.yaml

Contents:
path: /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data
train: images/train
val: images/val

nc: 4
names: ['D00', 'D10', 'D20', 'D40']



## Step 5: Verify Dataset

Make sure images and labels are in the right place.

In [5]:
# Verify dataset is in place
print("üìä Dataset verification:\n")

total_images = 0
for split in ['train', 'val']:
    img_dir = os.path.join(DATA_DIR, 'images', split)
    lbl_dir = os.path.join(DATA_DIR, 'labels', split)
    
    img_count = len([f for f in os.listdir(img_dir) if f.endswith(('.jpg', '.jpeg', '.png'))]) if os.path.exists(img_dir) else 0
    lbl_count = len([f for f in os.listdir(lbl_dir) if f.endswith('.txt')]) if os.path.exists(lbl_dir) else 0
    total_images += img_count
    
    status = "‚úÖ" if img_count > 0 else "‚ùå"
    print(f'{status} {split}: {img_count} images, {lbl_count} labels')

print(f'\nTotal images: {total_images}')

if total_images == 0:
    print('\n‚ö†Ô∏è  No images found!')
    print('Please copy your RDD2022 images and labels into:')
    print(f'  {os.path.join(DATA_DIR, "images")}')
    print(f'  {os.path.join(DATA_DIR, "labels")}')
    print('\nThen re-run this cell.')
else:
    print('\n‚úÖ Dataset is ready for training!')

üìä Dataset verification:

‚úÖ train: 4805 images, 0 labels
‚úÖ val: 1200 images, 0 labels

Total images: 6005

‚úÖ Dataset is ready for training!


## Step 6: Train YOLOv5s üöÄ

Training configuration (optimized for Apple M1 with 8GB RAM):
- **Model:** YOLOv5s (small ‚Äî fast inference, good for mobile API)
- **Image size:** 640√ó640
- **Batch size:** 8 (reduced from 16 to fit in 8GB RAM)
- **Epochs:** 50 (increase to 100 for better results if you have time)
- **Patience:** 10 (early stopping ‚Äî stops if no improvement for 10 epochs)
- **Device:** MPS (Apple Metal GPU)

> ‚è±Ô∏è **Expected time:** ~1-3 hours depending on dataset size. You can keep using your Mac while it trains.

In [6]:
import subprocess
import sys

yolov5_dir = os.path.join(BACKEND_DIR, "yolov5")
train_script = os.path.join(yolov5_dir, "train.py")
data_yaml = os.path.join(DATA_DIR, "data.yaml")
runs_dir = os.path.join(BACKEND_DIR, "runs")

# Training command
cmd = [
    sys.executable, train_script,
    "--img", "640",
    "--batch", "8",           # Reduced for 8GB RAM
    "--epochs", "50",
    "--data", data_yaml,
    "--weights", "yolov5s.pt",
    "--project", runs_dir,
    "--name", "road_damage",
    "--patience", "10",
    "--device", device,       # "mps" on Apple Silicon, "cuda" on NVIDIA, "cpu" otherwise
    "--cache",
    "--exist-ok",
]

print(f"üöÄ Starting training on device: {device}")
print(f"   Command: {' '.join(cmd)}")
print()
print("=" * 60)
print("Training will begin... this takes 1-3 hours.")
print("You'll see progress updates below.")
print("=" * 60)
print()

# Run training (output streams live)
process = subprocess.Popen(
    cmd,
    cwd=yolov5_dir,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT,
    text=True,
    bufsize=1,
)

for line in process.stdout:
    print(line, end="")

process.wait()

if process.returncode == 0:
    print("\n‚úÖ Training completed successfully!")
else:
    print(f"\n‚ùå Training failed with return code {process.returncode}")

üöÄ Starting training on device: mps
   Command: /Users/gpybut/Downloads/Thesis Proposal/.venv/bin/python /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/yolov5/train.py --img 640 --batch 8 --epochs 50 --data /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/data/data.yaml --weights yolov5s.pt --project /Users/gpybut/Downloads/Thesis Proposal/SYSTEM/road-damage-detection/backend/runs --name road_damage --patience 10 --device mps --cache --exist-ok

Training will begin... this takes 1-3 hours.
You'll see progress updates below.

Creating new Ultralytics Settings v0.0.6 file ‚úÖ 
View Ultralytics Settings with 'yolo settings' or at '/Users/gpybut/Library/Application Support/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
Creating new Ultralytics Settings v0.0.6 file ‚úÖ 
View Ultralytics 

## Step 7: View Training Results

In [None]:
from IPython.display import Image, display

runs_dir = os.path.join(BACKEND_DIR, "runs")

# Training results plot
results_img = os.path.join(runs_dir, 'road_damage', 'results.png')
if os.path.exists(results_img):
    display(Image(filename=results_img, width=800))
else:
    print('Results image not found. Training may not have completed.')
    print(f'Expected at: {results_img}')

In [None]:
# Confusion matrix
cm_img = os.path.join(runs_dir, 'road_damage', 'confusion_matrix.png')
if os.path.exists(cm_img):
    display(Image(filename=cm_img, width=600))
else:
    print('Confusion matrix not found.')
    print(f'Expected at: {cm_img}')

## Step 8: Test the Model on Sample Images

In [None]:
import subprocess

yolov5_dir = os.path.join(BACKEND_DIR, "yolov5")
detect_script = os.path.join(yolov5_dir, "detect.py")
best_pt = os.path.join(runs_dir, "road_damage", "weights", "best.pt")
val_images = os.path.join(DATA_DIR, "images", "val")

if not os.path.exists(best_pt):
    print(f"‚ùå best.pt not found at: {best_pt}")
    print("Make sure training completed successfully.")
else:
    # Run inference on validation images
    cmd = [
        sys.executable, detect_script,
        "--weights", best_pt,
        "--img", "640",
        "--conf", "0.4",
        "--source", val_images,
        "--project", runs_dir,
        "--name", "test_results",
        "--save-txt",
        "--max-det", "20",
        "--device", device,
        "--exist-ok",
    ]
    
    print("üîç Running inference on validation images...")
    result = subprocess.run(cmd, cwd=yolov5_dir, capture_output=True, text=True)
    print(result.stdout[-500:] if len(result.stdout) > 500 else result.stdout)
    if result.returncode == 0:
        print("‚úÖ Test inference completed!")
    else:
        print(f"‚ùå Error: {result.stderr[-300:]}")

In [None]:
# Show some test results
import glob

test_dir = os.path.join(runs_dir, 'test_results')
test_images = sorted(glob.glob(os.path.join(test_dir, '*.jpg')))[:6]

if test_images:
    for img_path in test_images:
        print(os.path.basename(img_path))
        display(Image(filename=img_path, width=400))
        print()
else:
    print(f"No test result images found in {test_dir}")
    # Try png as well
    test_images = sorted(glob.glob(os.path.join(test_dir, '*.png')))[:6]
    for img_path in test_images:
        print(os.path.basename(img_path))
        display(Image(filename=img_path, width=400))
        print()

## Step 9: Export Model Metrics

Print precision, recall, and mAP for your thesis documentation.

In [None]:
import pandas as pd

results_csv = os.path.join(runs_dir, 'road_damage', 'results.csv')
if os.path.exists(results_csv):
    df = pd.read_csv(results_csv)
    df.columns = df.columns.str.strip()
    
    # Get last epoch metrics
    last = df.iloc[-1]
    print('=' * 50)
    print('   FINAL TRAINING METRICS (for your thesis)')
    print('=' * 50)
    print(f'  Precision:      {last.get("metrics/precision(B)", "N/A")}')
    print(f'  Recall:         {last.get("metrics/recall(B)", "N/A")}')
    print(f'  mAP@0.5:        {last.get("metrics/mAP50(B)", "N/A")}')
    print(f'  mAP@0.5:0.95:   {last.get("metrics/mAP50-95(B)", "N/A")}')
    print('=' * 50)
    print()
    print(f'Total epochs trained: {len(df)}')
    
    # Save metrics to a text file for easy reference
    metrics_file = os.path.join(BACKEND_DIR, "training_metrics.txt")
    with open(metrics_file, "w") as f:
        f.write("Road Damage Detection - YOLOv5s Training Metrics\n")
        f.write("=" * 50 + "\n")
        f.write(f"Precision:      {last.get('metrics/precision(B)', 'N/A')}\n")
        f.write(f"Recall:         {last.get('metrics/recall(B)', 'N/A')}\n")
        f.write(f"mAP@0.5:        {last.get('metrics/mAP50(B)', 'N/A')}\n")
        f.write(f"mAP@0.5:0.95:   {last.get('metrics/mAP50-95(B)', 'N/A')}\n")
        f.write(f"Epochs:         {len(df)}\n")
    print(f"üìÑ Metrics saved to: {metrics_file}")
else:
    print('Results CSV not found.')
    print(f'Expected at: {results_csv}')

## Step 10: Copy Trained Model to Backend

Copies `best.pt` from the training runs folder to `backend/models/` where the Flask API expects it.

In [None]:
import shutil

best_pt_src = os.path.join(runs_dir, 'road_damage', 'weights', 'best.pt')
best_pt_dst = os.path.join(MODELS_DIR, 'best.pt')

if os.path.exists(best_pt_src):
    shutil.copy2(best_pt_src, best_pt_dst)
    
    # Get file size
    size_mb = os.path.getsize(best_pt_dst) / (1024 * 1024)
    
    print(f"‚úÖ Model copied successfully!")
    print(f"   From: {best_pt_src}")
    print(f"   To:   {best_pt_dst}")
    print(f"   Size: {size_mb:.1f} MB")
    print()
    print("üéâ Your model is ready! Next steps:")
    print("   1. Start the backend: cd backend && python app.py")
    print("   2. Update mobile/src/config.js with your IP address")
    print("   3. Start the app: cd mobile && npx expo start")
else:
    print(f"‚ùå best.pt not found at: {best_pt_src}")
    print("Make sure training completed successfully (Step 6).")
    
    # Check if a last.pt exists as fallback
    last_pt = os.path.join(runs_dir, 'road_damage', 'weights', 'last.pt')
    if os.path.exists(last_pt):
        print(f"\n‚ö†Ô∏è  Found last.pt (last checkpoint) ‚Äî you can use this as a fallback:")
        print(f"   {last_pt}")
        use_last = input("Copy last.pt instead? (y/n): ").strip().lower()
        if use_last == 'y':
            shutil.copy2(last_pt, best_pt_dst)
            print(f"‚úÖ Copied last.pt to {best_pt_dst}")

## ‚úÖ Done! Training Complete!

Your trained model is now at `backend/models/best.pt`.

### Next Steps:
1. **Start the Flask backend:**
   ```bash
   cd backend
   pip install -r requirements.txt
   python app.py
   ```
2. **Find your computer's local IP** (System Preferences ‚Üí Network ‚Üí Wi-Fi ‚Üí IP Address)
3. **Update** `mobile/src/config.js` with your IP (e.g., `http://192.168.1.100:5000`)
4. **Start the Expo app:**
   ```bash
   cd mobile
   npm install
   npx expo start
   ```
5. **Open Expo Go** on your phone (same WiFi) and scan the QR code

### Troubleshooting:
- **Out of memory?** Reduce batch size to 4 in Step 6
- **Training too slow?** Reduce epochs to 25, or use fewer training images
- **MPS errors?** Change `device` to `"cpu"` in Step 1 (slower but always works)