# PCB Defects Detection - YOLOv5 Training on Google Colab GPU

## üöÄ Complete Guide to Train YOLOv5 Model with GPU Acceleration

This notebook will train a YOLOv5 model for PCB defect detection using Google Colab's free GPU (Tesla K80 or T4).

**Features:**
- ‚úÖ GPU-accelerated training (10x faster than CPU)
- ‚úÖ 300 epochs training
- ‚úÖ Automatic dataset download and preparation
- ‚úÖ Real-time training visualization
- ‚úÖ Model saving to Google Drive
- ‚úÖ Inference testing with detection visualization

## 1Ô∏è‚É£ Setup Google Colab Environment

Enable GPU acceleration and check available hardware

In [2]:
# Check GPU availability
!nvidia-smi

Mon Feb  2 10:42:19 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   38C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [3]:
# Verify PyTorch can access GPU
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
print(f"GPU count: {torch.cuda.device_count()}")

PyTorch version: 2.9.0+cu126
CUDA available: True
CUDA device: Tesla T4
GPU count: 1


## 2Ô∏è‚É£ Clone YOLOv5 Repository

Clone the official YOLOv5 repository and install dependencies

In [4]:
# Clone YOLOv5 repository
!git clone https://github.com/ultralytics/yolov5
%cd yolov5

Cloning into 'yolov5'...
remote: Enumerating objects: 17783, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 17783 (delta 2), reused 1 (delta 1), pack-reused 17776 (from 2)[K
Receiving objects: 100% (17783/17783), 16.89 MiB | 21.12 MiB/s, done.
Resolving deltas: 100% (12125/12125), done.
/content/yolov5


In [5]:
# Install YOLOv5 dependencies
!pip install -r requirements.txt -q
print("‚úÖ YOLOv5 dependencies installed successfully!")

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.2/1.2 MB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m00:01[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m131.6/131.6 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[?25h‚úÖ YOLOv5 dependencies installed successfully!


## 3Ô∏è‚É£ Mount Google Drive

Mount your Google Drive to access and save files

In [6]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')
print("‚úÖ Google Drive mounted successfully!")

# Navigate to working directory (create if doesn't exist)
work_dir = '/content/drive/MyDrive/PCB_Defects_Detection'
os.makedirs(work_dir, exist_ok=True)
print(f"Working directory: {work_dir}")

ValueError: mount failed

## 4Ô∏è‚É£ Download and Prepare Dataset

Download the PCB defects dataset and organize it for YOLOv5 training

In [None]:
# Option A: Download from your Google Drive (if already uploaded)
# Copy from Google Drive to Colab working directory
dataset_path = '/content/drive/MyDrive/datasets/pcb-dataset'

if os.path.exists(dataset_path):
    print(f"‚úÖ Dataset found at {dataset_path}")
    !cp -r "{dataset_path}" /content/
else:
    print("‚ö†Ô∏è Dataset not found in Google Drive")
    print("Please upload your dataset to Google Drive at: /MyDrive/datasets/pcb-dataset")
    print("Or download from Kaggle using the API")
    
# Alternative: Download from Kaggle (requires API key)
# !pip install kaggle
# !mkdir -p ~/.kaggle
# # Upload your kaggle.json to Colab first
# !cp kaggle.json ~/.kaggle/
# !chmod 600 ~/.kaggle/kaggle.json
# !kaggle datasets download -d akhatova/pcb-defects
# !unzip -q pcb-defects.zip

In [None]:
import os
from pathlib import Path

# Assuming you have a pre-organized dataset with images and labels
# Create YOLOv5 directory structure
%cd /content

dataset_root = Path('/content/dataset')
dataset_root.mkdir(exist_ok=True)

(dataset_root / 'images' / 'train').mkdir(parents=True, exist_ok=True)
(dataset_root / 'images' / 'val').mkdir(parents=True, exist_ok=True)
(dataset_root / 'labels' / 'train').mkdir(parents=True, exist_ok=True)
(dataset_root / 'labels' / 'val').mkdir(parents=True, exist_ok=True)

print("‚úÖ Dataset directory structure created!")

In [None]:
# If your dataset is in a folder with all images/labels, organize them
# Example: Copy images and labels to appropriate folders
import shutil
from random import shuffle

# If you have source dataset folder
source_images = '/content/pcb-dataset/images'  # Adjust path as needed
source_labels = '/content/pcb-dataset/labels'   # Adjust path as needed

# Check if source paths exist
if os.path.exists(source_images) and os.path.exists(source_labels):
    all_images = sorted([f for f in os.listdir(source_images) if f.endswith(('.jpg', '.png'))])
    all_labels = sorted([f for f in os.listdir(source_labels) if f.endswith('.txt')])
    
    # Split 80/20 train/val
    split_idx = int(len(all_images) * 0.8)
    train_files = all_images[:split_idx]
    val_files = all_images[split_idx:]
    
    # Copy training files
    for img in train_files:
        shutil.copy(os.path.join(source_images, img), f'/content/dataset/images/train/{img}')
        label = img.rsplit('.', 1)[0] + '.txt'
        if os.path.exists(os.path.join(source_labels, label)):
            shutil.copy(os.path.join(source_labels, label), f'/content/dataset/labels/train/{label}')
    
    # Copy validation files
    for img in val_files:
        shutil.copy(os.path.join(source_images, img), f'/content/dataset/images/val/{img}')
        label = img.rsplit('.', 1)[0] + '.txt'
        if os.path.exists(os.path.join(source_labels, label)):
            shutil.copy(os.path.join(source_labels, label), f'/content/dataset/labels/val/{label}')
    
    print(f"‚úÖ Dataset organized: {len(train_files)} train, {len(val_files)} validation")
else:
    print("‚ö†Ô∏è Source dataset not found. Please upload dataset first.")

## 5Ô∏è‚É£ Create Dataset Configuration File

Create the dataset.yaml file with class names and paths

In [None]:
# Create dataset.yaml configuration file
yaml_content = """# train and val data
path: /content/dataset
train: images/train
val: images/val

# number of classes
nc: 6

# class names
names: ['missing_hole', 'mouse_bite', 'open_circuit', 'short', 'spur', 'spurious_copper']
"""

with open('/content/dataset.yaml', 'w') as f:
    f.write(yaml_content)

print("‚úÖ dataset.yaml created successfully!")
print("\nDataset Configuration:")
print(yaml_content)

In [None]:
# Verify dataset structure
import os

train_images = len(os.listdir('/content/dataset/images/train'))
train_labels = len(os.listdir('/content/dataset/labels/train'))
val_images = len(os.listdir('/content/dataset/images/val'))
val_labels = len(os.listdir('/content/dataset/labels/val'))

print("üìä Dataset Summary:")
print(f"  Training images: {train_images}")
print(f"  Training labels: {train_labels}")
print(f"  Validation images: {val_images}")
print(f"  Validation labels: {val_labels}")
print(f"  Total images: {train_images + val_images}")

## 6Ô∏è‚É£ Train YOLOv5 Model with GPU

Start training with 300 epochs using GPU acceleration

In [None]:
%cd /content/yolov5

# Training command with 300 epochs using GPU
!python train.py \
  --img 416 \
  --batch 32 \
  --epochs 300 \
  --data /content/dataset.yaml \
  --weights yolov5s.pt \
  --device 0 \
  --cache \
  --name pcb_1st \
  --patience 50 \
  --save-period 10

## 7Ô∏è‚É£ Training Metrics Visualization

View and analyze training results

In [None]:
from IPython.display import Image
import os

# Display training results chart
results_path = '/content/yolov5/runs/train/pcb_1st/results.png'
if os.path.exists(results_path):
    display(Image(results_path))
    print("‚úÖ Training results visualization displayed!")
else:
    print("‚ö†Ô∏è Training results chart not found yet. Check back after training completes.")

In [None]:
# Read and display training metrics CSV
import pandas as pd

metrics_path = '/content/yolov5/runs/train/pcb_1st/results.csv'
if os.path.exists(metrics_path):
    df = pd.read_csv(metrics_path)
    print("üìä Latest Training Metrics:")
    print(df.tail(10))
else:
    print("‚ö†Ô∏è Metrics file not available yet.")

## 8Ô∏è‚É£ Validate Model Performance

Run validation on the test dataset

In [None]:
# Validate the trained model
!python val.py \
  --weights /content/yolov5/runs/train/pcb_1st/weights/best.pt \
  --data /content/dataset.yaml \
  --img 416

## 9Ô∏è‚É£ Run Detection on Test Images

Perform inference and visualize predictions

In [None]:
# Run detection on test images
!python detect.py \
  --weights /content/yolov5/runs/train/pcb_1st/weights/best.pt \
  --source /content/dataset/images/val \
  --conf 0.25 \
  --img 416 \
  --save-txt \
  --save-conf

In [None]:
# Display sample detection results
from IPython.display import Image
import glob

detect_results = glob.glob('/content/yolov5/runs/detect/exp*/**.jpg')
if detect_results:
    for img_path in detect_results[:5]:  # Show first 5 results
        print(f"Displaying: {img_path}")
        display(Image(img_path))
else:
    print("‚ö†Ô∏è Detection results not found yet.")

## üîü Save Model to Google Drive

Export the trained model to Google Drive for future use

In [None]:
import shutil
import os

# Create output directory in Google Drive
output_dir = '/content/drive/MyDrive/PCB_Defects_Detection/trained_models'
os.makedirs(output_dir, exist_ok=True)

# Copy the trained model
src_model = '/content/yolov5/runs/train/pcb_1st/weights/best.pt'
dst_model = os.path.join(output_dir, 'pcb_defects_best.pt')

if os.path.exists(src_model):
    shutil.copy(src_model, dst_model)
    print(f"‚úÖ Model saved to Google Drive: {dst_model}")
else:
    print("‚ö†Ô∏è Model file not found. Training may still be in progress.")

# Also copy training results
src_results = '/content/yolov5/runs/train/pcb_1st'
dst_results = os.path.join(output_dir, 'pcb_1st_results')

if os.path.exists(src_results):
    if os.path.exists(dst_results):
        shutil.rmtree(dst_results)
    shutil.copytree(src_results, dst_results)
    print(f"‚úÖ Training results saved to Google Drive: {dst_results}")
else:
    print("‚ö†Ô∏è Training results not found.")

## üìã Summary

### Training Configuration
- **Model**: YOLOv5s (Small variant)
- **Epochs**: 300
- **Batch Size**: 32 (optimized for GPU)
- **Image Size**: 416√ó416
- **Classes**: 6 PCB Defects (missing_hole, mouse_bite, open_circuit, short, spur, spurious_copper)
- **Early Stopping**: 50 epochs patience

### GPU Benefits
- ‚úÖ ~10x faster training compared to CPU
- ‚úÖ Can use larger batch sizes for better convergence
- ‚úÖ Real-time model visualization during training
- ‚úÖ Faster inference for testing

### What Happens Next
1. Model trains for up to 300 epochs
2. Best model saved to `best.pt`
3. Training results and metrics saved
4. Model automatically exported to Google Drive
5. Ready for deployment in FastAPI backend

### Model Output Path
- **Best Model**: `/content/yolov5/runs/train/pcb_1st/weights/best.pt`
- **Google Drive**: `/MyDrive/PCB_Defects_Detection/trained_models/pcb_defects_best.pt`

### Expected Performance
- **mAP@0.5**: 85-92% (excellent detection)
- **Precision**: 85-95%
- **Recall**: 80-90%
- **Training Time**: 2-4 hours on Colab GPU