# Philippine License Plate Character Instance Segmentation with Similarity-Aware Loss

Single-stage training: YOLO11-seg with polygon masks and character labels, using a custom similarity-aware loss function to handle visually confusable characters (O/0, I/1/L, etc.) in CCTV surveillance footage.


## 0. Environment Setup

This notebook is optimized for Google Colab with a T4 GPU.


In [None]:
!nvidia-smi


In [None]:
%pip install -U ultralytics --quiet

import torch
print('PyTorch version:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU:', torch.cuda.get_device_name(0))


## 1. Paths and Configuration Variables

Set these to your actual dataset and output locations before training.


In [None]:
DATA_YAML_PATH = '/content/philippine_lp_chars.yaml'
RUN_PROJECT = 'philippine_lp_ocr'
RUN_NAME = 'seg_with_similarity_loss'
EXPORT_DIR = '/content/exports'

!mkdir -p "$EXPORT_DIR"
print('DATA_YAML_PATH:', DATA_YAML_PATH)
print('EXPORT_DIR:', EXPORT_DIR)


## 2. Imports

Core dependencies for segmentation training, custom loss, and optimization.


In [None]:
from ultralytics import YOLO
from ultralytics.models.yolo.segment import SegmentationTrainer
from ultralytics.nn.tasks import SegmentationModel

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using device:', device)


## 3. Character Set and Similarity Matrix

Define the 36-class character set (A–Z, 0–9) and visual-similarity relationships based on glyph shapes. Characters in the same group (e.g., O, 0, Q) are visually similar and should receive reduced penalties when confused during training.


In [None]:
CHARS = [chr(i) for i in range(65, 91)] + [str(i) for i in range(10)]
NUM_CLASSES = len(CHARS)
CHAR_TO_IDX = {c: i for i, c in enumerate(CHARS)}
IDX_TO_CHAR = {i: c for i, c in enumerate(CHARS)}

print('Number of classes:', NUM_CLASSES)
print('Characters:', CHARS)

SIMILAR_GROUPS = [
    ['O', '0', 'Q'],
    ['I', '1', 'L'],
    ['S', '5'],
    ['Z', '2'],
    ['B', '8'],
    ['D', '0'],
    ['G', 'C'],
    ['U', 'V'],
    ['P', 'R'],
]

def create_similarity_matrix(num_classes=NUM_CLASSES, groups=SIMILAR_GROUPS, base_sim=0.6):
    S = np.zeros((num_classes, num_classes), dtype=np.float32)
    np.fill_diagonal(S, 1.0)
    for group in groups:
        idxs = [CHAR_TO_IDX[c] for c in group if c in CHAR_TO_IDX]
        for i in idxs:
            for j in idxs:
                if i != j:
                    S[i, j] = base_sim
    return torch.tensor(S, dtype=torch.float32)

similarity_matrix = create_similarity_matrix()
print('Similarity matrix shape:', similarity_matrix.shape)


## 4. Custom Similarity-Aware Loss Function

Similarity-aware top-k loss directly rewards the model when visually similar characters appear in the top-2 predictions. If the model is uncertain between O and 0, having both in the top-2 with high confidence is acceptable and should be penalized less than confidently predicting X when the answer is O. This matches the requirement of considering "top-K outputs (e.g., top-2) rather than only the single best prediction." [https://openaccess.thecvf.com/content_cvpr_2016/papers/Lapin_Loss_Functions_for_CVPR_2016_paper.pdf](https://openaccess.thecvf.com/content_cvpr_2016/papers/Lapin_Loss_Functions_for_CVPR_2016_paper.pdf)


In [None]:
class SimilarityAwareTopKLoss(nn.Module):
    def __init__(self, num_classes=NUM_CLASSES, similarity_matrix=None,
                 k=2, temperature=1.0, base_weight=0.7, topk_weight=0.3):
        super().__init__()
        self.num_classes = num_classes
        self.k = k
        self.temperature = temperature
        self.base_weight = base_weight
        self.topk_weight = topk_weight
        if similarity_matrix is not None:
            self.register_buffer('similarity_matrix', similarity_matrix)
        else:
            self.register_buffer('similarity_matrix', create_similarity_matrix())

    def forward(self, logits, targets):
        B = logits.size(0)
        device = logits.device

        ce_loss = F.cross_entropy(logits, targets, reduction='none')
        probs = F.softmax(logits / self.temperature, dim=1)
        topk_probs, topk_indices = torch.topk(probs, self.k, dim=1)

        sim_loss = torch.zeros(B, device=device)
        for i in range(B):
            t = targets[i].item()
            sims = self.similarity_matrix[t][topk_indices[i]]
            penalties = 1.0 - sims
            weighted_penalties = topk_probs[i] * penalties
            sim_loss[i] = weighted_penalties.sum()

        total = self.base_weight * ce_loss + self.topk_weight * sim_loss
        return total.mean()

print('Similarity-aware loss defined.')


## 5. Sanity Check for Custom Loss

Verify that confusing similar characters (O vs 0) incurs lower penalty than confusing very different characters (O vs X).


In [None]:
loss_fn = SimilarityAwareTopKLoss(num_classes=NUM_CLASSES, similarity_matrix=similarity_matrix, k=2).to(device)

logits_similar = torch.zeros(1, NUM_CLASSES, device=device)
logits_similar[0, CHAR_TO_IDX['0']] = 5.0
target_O = torch.tensor([CHAR_TO_IDX['O']], device=device)
loss_similar = loss_fn(logits_similar, target_O)

logits_diff = torch.zeros(1, NUM_CLASSES, device=device)
logits_diff[0, CHAR_TO_IDX['X']] = 5.0
loss_diff = loss_fn(logits_diff, target_O)

print(f'Loss (O vs 0): {loss_similar.item():.4f}')
print(f'Loss (O vs X): {loss_diff.item():.4f}')
assert loss_similar < loss_diff, 'Expected O/0 confusion < O/X confusion'


## 6. Custom Segmentation Trainer with Similarity-Aware Character Loss

Override YOLO's segmentation trainer to inject the similarity-aware loss into the character classification head. The model still outputs masks (via polygon supervision) and boxes, but the character class logits are trained with the custom loss instead of vanilla cross-entropy. This preserves mask quality while handling character confusion intelligently.


In [None]:
class CustomSegmentationTrainer(SegmentationTrainer):
    def __init__(self, cfg=None, overrides=None, _callbacks=None):
        super().__init__(cfg, overrides, _callbacks)
        self.character_loss_fn = SimilarityAwareTopKLoss(
            num_classes=NUM_CLASSES,
            similarity_matrix=similarity_matrix,
            k=2,
            base_weight=0.7,
            topk_weight=0.3,
        ).to(device)
    
    def compute_loss(self, preds, batch):
        base_loss = super().compute_loss(preds, batch)
        
        if len(preds) > 3:
            cls_logits = preds[3]
            cls_targets = batch['cls'].long()
            
            if cls_logits is not None and cls_targets is not None:
                cls_logits_flat = cls_logits.view(-1, NUM_CLASSES)
                cls_targets_flat = cls_targets.view(-1)
                
                valid_mask = cls_targets_flat >= 0
                if valid_mask.sum() > 0:
                    char_loss = self.character_loss_fn(
                        cls_logits_flat[valid_mask],
                        cls_targets_flat[valid_mask]
                    )
                    base_loss += 0.5 * char_loss
        
        return base_loss

print('Custom segmentation trainer defined.')


## 7. Training Configuration (Hyperparameters & Augmentations)

Configure training hyperparameters tuned for character-level OCR on CCTV footage.


In [None]:
EPOCHS = 300
BATCH_SIZE = 16
IMG_SIZE = 224

LR0 = 0.01
LRF = 0.01
MOMENTUM = 0.937
WEIGHT_DECAY = 5e-4
WARMUP_EPOCHS = 3.0
WARMUP_MOMENTUM = 0.8
WARMUP_BIAS_LR = 0.1

AUG_HSV_H = 0.015
AUG_HSV_S = 0.7
AUG_HSV_V = 0.4
AUG_ERASING = 0.4
AUG_FLIPLR = 0.0
AUG_MOSAIC = 0.0
AUG_MIXUP = 0.0
AUG_COPY_PASTE = 0.0

print('Hyperparameters configured.')


### 7.1. Hyperparameter and Augmentation Rationale

These settings aim to balance robustness, stability, and efficiency for text-level OCR on pre‑augmented character crops. SGD with momentum and weight decay, combined with cosine‑annealed learning rate and brief warmup (LR0 = 0.01, LRF = 0.01, MOMENTUM = 0.937, WEIGHT_DECAY = 5e-4, WARMUP_EPOCHS = 3), follows recommended YOLO training practice and is known to improve convergence and final accuracy over simple step schedules in vision models. [https://docs.ultralytics.com/guides/hyperparameter-tuning/](https://docs.ultralytics.com/guides/hyperparameter-tuning/)

Moderate HSV jitter and random erasing (AUG_HSV_*, AUG_ERASING = 0.4) extend lighting and occlusion variability to better match CCTV conditions while preserving character structure. [https://arxiv.org/abs/1902.07296](https://arxiv.org/abs/1902.07296)

Horizontal flips and detection-style augmentations (Mosaic, MixUp, Copy-Paste) are disabled because mirrored or composited text does not occur in the target domain and can degrade OCR performance. [https://home.nr.no/~eikvil/OCR.pdf](https://home.nr.no/~eikvil/OCR.pdf)


## 8. Initialize Model and Attach Custom Trainer

Load YOLO11-seg as the backbone and plug in the custom trainer with similarity-aware character loss.


In [None]:
model = YOLO('yolo11n-seg.pt')
model.trainer = CustomSegmentationTrainer

print('Segmentation model initialized with custom trainer.')


## 9. Optional: Early Stopping Callback

Halt training if validation loss stalls for a prolonged period to prevent overfitting and wasted compute.


In [None]:
best_val_loss = float('inf')
no_improve_epochs = 0
EARLY_STOP_PATIENCE = 50

def early_stopping_callback(trainer):
    global best_val_loss, no_improve_epochs
    metrics = trainer.metrics or {}
    val_loss = metrics.get('loss', None)
    if val_loss is None:
        return

    if best_val_loss == float('inf'):
        best_val_loss = val_loss
        no_improve_epochs = 0
        return

    improvement = (best_val_loss - val_loss) / max(best_val_loss, 1e-8) * 100.0
    if improvement >= 1.0:
        best_val_loss = val_loss
        no_improve_epochs = 0
    else:
        no_improve_epochs += 1

    if no_improve_epochs >= EARLY_STOP_PATIENCE:
        print(f'Early stopping at epoch {trainer.epoch}')
        trainer.stop = True

model.add_callback('on_val_end', early_stopping_callback)
print('Early stopping callback registered.')


## 10. Train Segmentation Model with Similarity-Aware Character Loss

Train YOLO11-seg on polygon annotations with the custom trainer. The model learns to segment character regions (mask) while classifying each character (O vs 0 etc.) with reduced penalties for visually similar confusions. Make sure `DATA_YAML_PATH` points to your dataset.


In [None]:
results = model.train(
    data=DATA_YAML_PATH,
    epochs=EPOCHS,
    batch=BATCH_SIZE,
    imgsz=IMG_SIZE,
    optimizer='SGD',
    lr0=LR0,
    lrf=LRF,
    momentum=MOMENTUM,
    weight_decay=WEIGHT_DECAY,
    warmup_epochs=WARMUP_EPOCHS,
    warmup_momentum=WARMUP_MOMENTUM,
    warmup_bias_lr=WARMUP_BIAS_LR,
    hsv_h=AUG_HSV_H,
    hsv_s=AUG_HSV_S,
    hsv_v=AUG_HSV_V,
    erasing=AUG_ERASING,
    fliplr=AUG_FLIPLR,
    mosaic=AUG_MOSAIC,
    mixup=AUG_MIXUP,
    copy_paste=AUG_COPY_PASTE,
    project=RUN_PROJECT,
    name=RUN_NAME,
    exist_ok=True,
    val=True,
    save=True,
    save_period=10,
    amp=True,
    device=0 if device == 'cuda' else 'cpu',
    seed=42,
    deterministic=True,
)

print('Training completed.')
print('Results directory:', results.save_dir)


## 11. Export Best Model

Copy the best weights to the export directory for inference and deployment.


In [None]:
import os, shutil

best_weights_path = os.path.join(str(results.save_dir), 'weights', 'best.pt')
export_path = os.path.join(EXPORT_DIR, f'{RUN_NAME}_best.pt')

if os.path.exists(best_weights_path):
    shutil.copy2(best_weights_path, export_path)
    print('Best weights exported to:', export_path)
else:
    print('best.pt not found.')


## 12. Inference on Test Images

Load the trained model and run inference to validate segmentation and character classification.


In [None]:
inference_model = YOLO(export_path if os.path.exists(export_path) else best_weights_path)

TEST_IMAGE_PATHS = [
    # '/content/test_plate_1.jpg',
    # '/content/test_plate_2.jpg',
]

for img_path in TEST_IMAGE_PATHS:
    if not os.path.exists(img_path):
        print(f'Test image not found: {img_path}')
        continue
    
    results_inf = inference_model(img_path, imgsz=IMG_SIZE)
    print(f'\nInference on {img_path}')
    for r in results_inf:
        if r.masks is not None:
            print(f'Detected {len(r.masks)} character instances')
        if r.boxes is not None:
            print(f'Boxes: {r.boxes.cls}')
