# PyPotteryLens YOLO Training Notebook

This notebook demonstrates how to train a YOLO (You Only Look Once) segmentation model for pottery detection and analysis. The model uses YOLOv8 architecture for object detection and instance segmentation on archaeological pottery images.

## Overview
- **Purpose**: Train a deep learning model to detect and segment pottery artifacts in images
- **Architecture**: YOLOv8 with segmentation capabilities
- **Framework**: Ultralytics YOLO implementation
- **Environment**: Google Colab (with GPU acceleration recommended)

## Prerequisites
- Google Colab account with access to GPU runtime
- Prepared dataset in YOLO format (images + annotations)
- YAML configuration file defining dataset paths and classes

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 1. Setup Environment

### Mount Google Drive
First, we need to mount Google Drive to access our dataset and save training results. This will prompt you to authenticate and grant access to your Google Drive.

In [None]:
!pip install ultralytics

### Install Required Packages
Install the Ultralytics package which provides the YOLOv8 implementation. This includes all necessary dependencies for training and inference.

In [None]:
from ultralytics import YOLO
from matplotlib import pyplot as plt
from PIL import Image

### Import Libraries
Import the necessary libraries for model training and visualization:
- `YOLO`: Main class for loading and training YOLO models
- `matplotlib.pyplot`: For plotting training results and visualizations
- `PIL.Image`: For image processing and display

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO("yolo8x-seg.yaml")  # build a new model from YAML
model = YOLO("yolo8x.pt")  # load a pretrained model (recommended for training)
#model = YOLO("yolo11m-seg.yaml").load("yolo11m.pt")  # build from YAML and transfer weights


## 2. Model Initialization

### Load Pre-trained YOLO Model
We initialize a YOLOv8 model for segmentation. There are several options:

1. **Build from YAML**: `YOLO("yolo8x-seg.yaml")` - Creates a new model architecture
2. **Load pre-trained weights**: `YOLO("yolo8x.pt")` - **Recommended** for training as it provides a good starting point
3. **Combine both**: Build from YAML and load pre-trained weights for transfer learning

The `yolo8x.pt` model is the extra-large variant with the best accuracy, ideal for detailed pottery analysis.

In [None]:
# define number of classes based on YAML
import yaml
with open("PATH_TO_YAML", 'r') as stream:
    num_classes = str(yaml.safe_load(stream)['names'])

## 3. Dataset Configuration

### Load Dataset Information
Read the YAML configuration file to understand the dataset structure and classes. The YAML file should contain:
- `path`: Root directory of the dataset
- `train`: Path to training images
- `val`: Path to validation images  
- `names`: Dictionary or list of class names

**Note**: Replace `"PATH_TO_YAML"` with the actual path to your dataset configuration file (e.g., `"/content/drive/MyDrive/pottery_dataset/data.yaml"`).

In [None]:
num_classes

### Verify Dataset Classes
Display the classes defined in your dataset to ensure they match your expectations.

In [None]:
#Define a project --> Destination directory for all results
project = "PATH_TO_PROJECT"  # e.g., "/content/drive/MyDrive/YOLO_12_200m_epochs-X"
#Define subdirectory for this specific training
name = "YOLO_8_200m_epochs" #note that if you run the training again, it creates a directory: 200_epochs-2

## 4. Training Configuration

### Set Output Directories
Configure where training results will be saved:
- **project**: Main directory for all training experiments
- **name**: Subdirectory for this specific training run

**Important**: Replace `"PATH_TO_PROJECT"` with your actual project path (e.g., `"/content/drive/MyDrive/pottery_experiments"`).

If you run training multiple times with the same name, YOLO automatically creates numbered subdirectories (e.g., `YOLO_8_200m_epochs-2`).

In [None]:
# Train the model
results = model.train(data='PATH_TO_YAML',
                      project=project,
                      name=name,
                      epochs=200, #200
                      patience=10, #I am setting patience=0 to disable early stopping.
                      batch=-1, #8
                      imgsz=800, #800
                      fliplr= 0.6,
                      single_cls = True,
                      mask_ratio = 1,
                      )

## 5. Model Training

### Start Training Process
Configure and start the YOLO model training with optimized parameters for pottery detection:

#### Key Training Parameters:
- **data**: Path to your YAML configuration file ⚠️ *Replace with actual path*
- **epochs**: Number of training iterations (200 for thorough training)
- **patience**: Early stopping patience (10 = stop if no improvement for 10 epochs)
- **batch**: Batch size (-1 = auto-detect optimal size based on GPU memory)
- **imgsz**: Input image size (800x800 pixels for detailed pottery features)
- **fliplr**: Horizontal flip augmentation probability (0.6 = 60% chance)
- **single_cls**: Treat all classes as one (True for general pottery detection)
- **mask_ratio**: Ratio of mask loss to box loss (1 = equal weighting)

#### Training Process:
The training will automatically:
1. Validate dataset structure
2. Download pre-trained weights if needed
3. Start training with progress tracking
4. Save best weights and training metrics
5. Generate training plots and validation results

### Training Control
Use this cell to manually stop training if needed. This is useful for debugging or if you need to interrupt the training process.

In [None]:
from ultralytics import YOLO
model = YOLO('/content/drive/MyDrive/Schedinator/Pots/results/300m_epochs-/weights/best.pt')  # load a partially trained model
results = model.train(resume=True)

## 6. Resume Training (Optional)

### Continue from Checkpoint
If training was interrupted or you want to continue from a previously saved model, use this section to resume training.

**Steps to resume:**
1. Update the path to point to your `best.pt` or `last.pt` model file
2. The `resume=True` parameter will continue from where training left off
3. All previous training settings will be restored automatically

**Example path format**: `/content/drive/MyDrive/YourProject/runs/segment/train/weights/best.pt`