<a href="https://colab.research.google.com/github/giosanchez0208/CSC173-DeepCV-Sanchez/blob/main/experiments_and_results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# License Plate OCR Model Testing & Validation

We are benchmarking four distinct approaches here: EasyOCR, PyTesseract, your Custom OCR, and the Refined Custom OCR.

The core of this evaluation is the Intersection over Union (IoU) calculation. This metric tells us how much our predicted bounding box overlaps with the ground truth. If the overlap is over our threshold (standard 0.5), it's a hit (True Positive). Anything else is either a miss (False Negative) or a ghost (False Positive). We're also calculating Mean Average Precision (mAP) across multiple thresholds to see which model has the most "stamina" when we tighten the requirements for accuracy.

Dataset: CatEye-ALPR-v3-3 (pre-cropped license plates)

# Benchmarking Dataset

### Download our models to this workspace

In [8]:
import os, gdown

os.makedirs('models', exist_ok=True)

# custom_ocr_refined.pt
gdown.download(id='1KTy1c-l1uhqT0fa_hn1JfJoiXsHKPRSI', output='models/custom_ocr_refined.pt')

# custom_ocr.pt
gdown.download(id='1BF78J866NNvU2r-c4sQTcpQoaoO4M4SG', output='models/custom_ocr.pt')

Downloading...
From: https://drive.google.com/uc?id=1KTy1c-l1uhqT0fa_hn1JfJoiXsHKPRSI
To: /content/models/custom_ocr_refined.pt
100%|██████████| 5.97M/5.97M [00:00<00:00, 73.9MB/s]
Downloading...
From: https://drive.google.com/uc?id=1BF78J866NNvU2r-c4sQTcpQoaoO4M4SG
To: /content/models/custom_ocr.pt
100%|██████████| 11.8M/11.8M [00:00<00:00, 77.2MB/s]


'models/custom_ocr.pt'

### Merge the train, test, and valid folders into one folder

In [4]:
!pip install roboflow

from roboflow import Roboflow
from dotenv import load_dotenv
import os

load_dotenv()
rf = Roboflow(api_key=os.getenv("ROBOFLOW_API_KEY"))

# Access the workspace and project
project = rf.workspace("trafficbralpr").project("cateye-alpr-v3")

# We use 'yolov8' format because it gives us the clean images/labels split we need
dataset = project.version(3).download("yolov8")

loading Roboflow workspace...
loading Roboflow project...


Downloading Dataset Version Zip in CatEye-ALPR-v3-3 to yolov8:: 100%|██████████| 152702/152702 [00:10<00:00, 14099.63it/s]





Extracting Dataset Version Zip to CatEye-ALPR-v3-3 in yolov8:: 100%|██████████| 51260/51260 [00:04<00:00, 10293.04it/s]


In [5]:
import os
import shutil

src_root = 'CatEye-ALPR-v3-3'
dst_root = 'CatEye-ALPR-v3-3-Merged'

subfolders = ['images', 'labels']
splits = ['train', 'test', 'valid']

# Create merged directories
for subfolder in subfolders:
    os.makedirs(os.path.join(dst_root, subfolder), exist_ok=True)

# Merge files from each split into the merged folder
for split in splits:
    for subfolder in subfolders:
        src_dir = os.path.join(src_root, split, subfolder)
        dst_dir = os.path.join(dst_root, subfolder)
        if os.path.exists(src_dir):
            for fname in os.listdir(src_dir):
                src_file = os.path.join(src_dir, fname)
                dst_file = os.path.join(dst_dir, fname)
                if not os.path.exists(dst_file):
                    shutil.copy2(src_file, dst_file)

This folder will be used for benchmarking the dataset.

## Model Initialization and Configuration

In this section, we set up our hardware acceleration and load our pre-trained weights. We are using a standard ImageNet normalization for our custom models to ensure the input distribution matches what they saw during training.

In [6]:
!pip install ultralytics easyocr torch pytesseract

Collecting ultralytics
  Downloading ultralytics-8.3.241-py3-none-any.whl.metadata (37 kB)
Collecting easyocr
  Downloading easyocr-1.7.2-py3-none-any.whl.metadata (10 kB)
Collecting pytesseract
  Downloading pytesseract-0.3.13-py3-none-any.whl.metadata (11 kB)
Collecting ultralytics-thop>=2.0.18 (from ultralytics)
  Downloading ultralytics_thop-2.0.18-py3-none-any.whl.metadata (14 kB)
Collecting python-bidi (from easyocr)
  Downloading python_bidi-0.6.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting pyclipper (from easyocr)
  Downloading pyclipper-1.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (8.6 kB)
Collecting ninja (from easyocr)
  Downloading ninja-1.13.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (5.1 kB)
Downloading ultralytics-8.3.241-py3-none-any.whl (1.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m52.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloa

In [17]:
import torch
import easyocr
import pytesseract
from torchvision import transforms

# CLI equivalent: Hardware and model paths
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
CUSTOM_MODEL_PATH = 'models/custom_ocr.pt'
REFINED_MODEL_PATH = 'models/custom_ocr_refined.pt'

def load_and_unwrap(path, device):
    """
    Extracts a callable model from various checkpoint formats.

    Arguments:
        path (str): File system path to the .pt checkpoint.
        device (torch.device): Target hardware for the model.

    Returns:
        torch.nn.Module: The evaluated model object ready for inference.
    """
    ckpt = torch.load(path, map_location=device, weights_only=False)

    if isinstance(ckpt, dict):
        # Hunt for the model object in common checkpoint keys
        for key in ['model', 'ema', 'network', 'state_dict']:
            if key in ckpt and ckpt[key] is not None:
                model = ckpt[key]
                break
        else:
            # Fallback if the dictionary itself contains the attributes
            model = ckpt if hasattr(ckpt, 'to') else None
    else:
        model = ckpt

    if model is None:
        raise ValueError(f"No valid model found in {path}")

    return model.to(device).eval()

# Load detection engines
custom_model = load_and_unwrap(CUSTOM_MODEL_PATH, DEVICE)
refined_model = load_and_unwrap(REFINED_MODEL_PATH, DEVICE)
reader = easyocr.Reader(['en'], gpu=torch.cuda.is_available())

# Preprocessing transform
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

## Inference Wrappers

These functions standardize the output from different libraries. No matter the engine, we want a dictionary containing bounding boxes, confidence scores, and the clock time it took to finish.

In [30]:
import time
import cv2
import torch
import numpy as np
from PIL import Image

def get_easyocr_predictions(image_path):
    start = time.time()
    img = cv2.imread(image_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    results = reader.readtext(img)
    inference_time = time.time() - start

    boxes, scores = [], []
    for bbox, text, conf in results:
        x_coords, y_coords = [p[0] for p in bbox], [p[1] for p in bbox]
        boxes.append([min(x_coords), min(y_coords), max(x_coords), max(y_coords)])
        scores.append(conf)

    return {
        'boxes': np.array(boxes) if boxes else np.array([]).reshape(0, 4),
        'scores': np.array(scores) if scores else np.array([]),
        'inference_time': inference_time
    }

def get_pytesseract_predictions(image_path):
    start = time.time()
    img = Image.open(image_path).convert('RGB')
    data = pytesseract.image_to_data(img, output_type=pytesseract.Output.DICT)
    inference_time = time.time() - start

    boxes, scores = [], []
    for i in range(len(data['text'])):
        if int(data['conf'][i]) > 0:
            x, y, w, h = data['left'][i], data['top'][i], data['width'][i], data['height'][i]
            boxes.append([x, y, x+w, y+h])
            scores.append(data['conf'][i] / 100.0)

    return {
        'boxes': np.array(boxes) if boxes else np.array([]).reshape(0, 4),
        'scores': np.array(scores) if scores else np.array([]),
        'inference_time': inference_time
    }

def get_custom_model_predictions(model, image_path, conf_threshold=0.5):
    start = time.time()
    img = Image.open(image_path).convert('RGB')

    model_dtype = next(model.parameters()).dtype
    img_tensor = transform(img).unsqueeze(0).to(DEVICE, dtype=model_dtype)

    with torch.no_grad():
        output = model(img_tensor)

    inference_time = time.time() - start

    # Check if output is raw YOLO tensor [Batch, 4+Classes, Anchors]
    if isinstance(output, (list, tuple)) and torch.is_tensor(output[0]):
        pred = output[0]
        if pred.dim() == 3:
            pred = pred.transpose(1, 2) # [1, 8400, 5]
            boxes_raw = pred[0, :, :4]
            # Take max score across class channels (assumes 1 class for text)
            scores_raw = pred[0, :, 4:5].squeeze()

            mask = scores_raw >= conf_threshold
            boxes_filt = boxes_raw[mask]
            scores_filt = scores_raw[mask]

            if len(boxes_filt) > 0:
                # Convert cxcywh to x1y1x2y2
                x, y, w, h = boxes_filt[:, 0], boxes_filt[:, 1], boxes_filt[:, 2], boxes_filt[:, 3]
                boxes = torch.stack([x - w/2, y - h/2, x + w/2, y + h/2], dim=1).cpu().numpy()
                scores = scores_filt.cpu().numpy()
            else:
                boxes, scores = np.array([]).reshape(0, 4), np.array([])
        else:
            boxes, scores = np.array([]).reshape(0, 4), np.array([])

    # Handle Torchvision style: {'boxes': ..., 'scores': ...}
    else:
        res = output if isinstance(output, dict) else output[0]
        boxes = res['boxes'].cpu().numpy()
        scores = res['scores'].cpu().numpy()
        mask = scores >= conf_threshold
        boxes, scores = boxes[mask], scores[mask]

    return {'boxes': boxes, 'scores': scores, 'inference_time': inference_time}

In [31]:
benchmarking_squad = [
    ('EasyOCR', get_easyocr_predictions),
    ('Pytesseract', get_pytesseract_predictions),
    ('Custom OCR', lambda x: get_custom_model_predictions(custom_model, x)),
    ('Custom OCR Refined', lambda x: get_custom_model_predictions(refined_model, x))
]

test_sample_path = test_data[0][0]
print(f"Validating engines on: {test_sample_path}")

for name, predict_fn in benchmarking_squad:
    try:
        res = predict_fn(test_sample_path)
        print(f"PASS: {name} | Detections: {len(res['boxes'])} | Time: {res['inference_time']:.4f}s")
    except Exception as e:
        print(f"FAIL: {name} error: {e}")
        raise e

print("\nValidation complete. Ready for full benchmark.")

Validating engines on: CatEye-ALPR-v3-3-Merged/images/001e1195-3367-484a-96fb-dc424f7b869f_jpg.rf.2b23086a69288685def03ba7b0bae983.jpg
PASS: EasyOCR | Detections: 2 | Time: 0.0435s
PASS: Pytesseract | Detections: 0 | Time: 0.1015s
PASS: Custom OCR | Detections: 0 | Time: 0.0175s
PASS: Custom OCR Refined | Detections: 0 | Time: 0.0159s

Validation complete. Ready for full benchmark.


## Metric Calculation Logic
This is the scorecard. We are calculating IoU for overlap, standard Precision/Recall, and Mean Average Precision (mAP) to ensure our models aren't just getting lucky.

In [32]:
def calculate_iou(box1, box2):
    """
    Calculates Intersection over Union between two boxes.
    Inputs: box1 (list/np.array), box2 (list/np.array)
    Returns: float (IoU score)
    """
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])

    intersection = max(0, x2 - x1) * max(0, y2 - y1)
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union = area1 + area2 - intersection

    return intersection / union if union > 0 else 0

def calculate_metrics(pred_boxes, pred_scores, gt_boxes, iou_threshold=0.5):
    """
    Computes PR metrics for a single image.
    Inputs: pred_boxes (np.array), pred_scores (np.array), gt_boxes (np.array), iou_threshold (float)
    Returns: dict(precision: float, recall: float, tp: int, fp: int, fn: int)
    """
    if len(pred_boxes) == 0:
        return {'precision': 0.0, 'recall': 0.0 if len(gt_boxes) > 0 else 1.0, 'tp': 0, 'fp': 0, 'fn': len(gt_boxes)}

    if len(gt_boxes) == 0:
        return {'precision': 0.0, 'recall': 1.0, 'tp': 0, 'fp': len(pred_boxes), 'fn': 0}

    matched_gt = set()
    tp, fp = 0, 0
    sorted_indices = np.argsort(pred_scores)[::-1]

    for idx in sorted_indices:
        pred_box = pred_boxes[idx]
        max_iou, max_gt_idx = 0, -1

        for gt_idx, gt_box in enumerate(gt_boxes):
            if gt_idx in matched_gt:
                continue
            iou = calculate_iou(pred_box, gt_box)
            if iou > max_iou:
                max_iou, max_gt_idx = iou, gt_idx

        if max_iou >= iou_threshold:
            tp += 1
            matched_gt.add(max_gt_idx)
        else:
            fp += 1

    fn = len(gt_boxes) - len(matched_gt)
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0

    return {'precision': precision, 'recall': recall, 'tp': tp, 'fp': fp, 'fn': fn}

def calculate_map(pred_boxes, pred_scores, gt_boxes, iou_thresholds=[0.5, 0.75]):
    """
    Computes mAP across specified IoU thresholds.
    Inputs: pred_boxes (np.array), pred_scores (np.array), gt_boxes (np.array), iou_thresholds (list)
    Returns: float (mAP score)
    """
    aps = []
    for iou_thresh in iou_thresholds:
        if len(pred_boxes) == 0 or len(gt_boxes) == 0:
            aps.append(0.0)
            continue

        sorted_indices = np.argsort(pred_scores)[::-1]
        sorted_boxes = pred_boxes[sorted_indices]
        matched_gt = [False] * len(gt_boxes)
        tp, fp = np.zeros(len(sorted_boxes)), np.zeros(len(sorted_boxes))

        for pred_idx, pred_box in enumerate(sorted_boxes):
            max_iou, max_gt_idx = 0, -1
            for gt_idx, gt_box in enumerate(gt_boxes):
                iou = calculate_iou(pred_box, gt_box)
                if iou > max_iou:
                    max_iou, max_gt_idx = iou, gt_idx

            if max_iou >= iou_thresh and not matched_gt[max_gt_idx]:
                tp[pred_idx], matched_gt[max_gt_idx] = 1, True
            else:
                fp[pred_idx] = 1

        tp_cumsum, fp_cumsum = np.cumsum(tp), np.cumsum(fp)
        recalls = tp_cumsum / len(gt_boxes)
        precisions = tp_cumsum / (tp_cumsum + fp_cumsum)
        recalls = np.concatenate(([0], recalls, [1]))
        precisions = np.concatenate(([0], precisions, [0]))

        for i in range(len(precisions) - 1, 0, -1):
            precisions[i - 1] = max(precisions[i - 1], precisions[i])

        indices = np.where(recalls[1:] != recalls[:-1])[0]
        ap = np.sum((recalls[indices + 1] - recalls[indices]) * precisions[indices + 1])
        aps.append(ap)

    return np.mean(aps)

## Benchmarking Execution
We'll iterate through our test dataset and gather the stats for every model.

In [33]:
import os
import torch
from PIL import Image
from torch.utils.data import Dataset, Subset
from tqdm.auto import tqdm

class LicensePlateDataset(Dataset):
    """
    Custom Dataset for License Plate Detection benchmarking.
    Inputs:
        root_dir (str): Path to merged dataset folder
        transform (callable): Preprocessing transforms
    Returns:
        image (Tensor), target (dict with 'boxes' key)
    """
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.image_dir = os.path.join(root_dir, 'images')
        self.label_dir = os.path.join(root_dir, 'labels')

        self.imgs = [os.path.join(self.image_dir, f) for f in sorted(os.listdir(self.image_dir))
                     if f.lower().endswith(('.png', '.jpg', '.jpeg'))]

        self.targets = []
        self._load_targets()

    def _load_targets(self):
        """
        Parses YOLO format labels. Optimized to use image headers for speed.
        """
        for img_path in tqdm(self.imgs, desc="Preparing Dataset Metadata"):
            # Fast header-only read for dimensions
            with Image.open(img_path) as img:
                w, h = img.size

            label_path = os.path.join(self.label_dir, os.path.basename(img_path).rsplit('.', 1)[0] + '.txt')
            boxes = []

            if os.path.exists(label_path):
                with open(label_path, 'r') as f:
                    for line in f:
                        # YOLO: class, x_center, y_center, width, height
                        _, x_c, y_c, bw, bh = map(float, line.split())

                        # Scale to absolute pixels
                        x1 = (x_c - bw / 2) * w
                        y1 = (y_c - bh / 2) * h
                        x2 = (x_c + bw / 2) * w
                        y2 = (y_c + bh / 2) * h
                        boxes.append([x1, y1, x2, y2])

            self.targets.append({'boxes': torch.tensor(boxes)})

    def __len__(self):
        return len(self.imgs)

    def __getitem__(self, idx):
        img_path = self.imgs[idx]
        img = Image.open(img_path).convert("RGB")
        target = self.targets[idx]

        if self.transform:
            img = self.transform(img)

        return img, target

# Execute the preparation
full_dataset = LicensePlateDataset(root_dir='CatEye-ALPR-v3-3-Merged', transform=transform)
indices = list(range(len(full_dataset)))
test_dataset = Subset(full_dataset, indices)

Preparing Dataset Metadata:   0%|          | 0/25624 [00:00<?, ?it/s]

In [None]:
from tqdm.auto import tqdm

def benchmark_model(model_name, predict_fn, test_data, iou_threshold=0.5):
    """
    Runs the full benchmarking loop for a model with a progress bar.
    Inputs:
        model_name (str): Name of the model for display.
        predict_fn (callable): Function that returns predictions.
        test_data (list): List of (path, boxes) tuples.
        iou_threshold (float): Threshold for True Positive overlap.
    Returns:
        dict: Aggregated precision, recall, mAP, and timing stats.
    """
    all_metrics, all_times, all_precisions, all_recalls, all_maps = [], [], [], [], []

    # Wrap the loop in tqdm for real-time tracking
    for image_path, gt_boxes in tqdm(test_data, desc=f"Benchmarking {model_name}"):
        predictions = predict_fn(image_path)
        metrics = calculate_metrics(predictions['boxes'], predictions['scores'], gt_boxes, iou_threshold)
        map_score = calculate_map(predictions['boxes'], predictions['scores'], gt_boxes)

        all_metrics.append(metrics)
        all_times.append(predictions['inference_time'])
        all_precisions.append(metrics['precision'])
        all_recalls.append(metrics['recall'])
        all_maps.append(map_score)

    return {
        'model': model_name,
        'avg_precision': np.mean(all_precisions),
        'avg_recall': np.mean(all_recalls),
        'avg_map': np.mean(all_maps),
        'avg_inference_time': np.mean(all_times),
        'total_tp': sum(m['tp'] for m in all_metrics),
        'total_fp': sum(m['fp'] for m in all_metrics),
        'total_fn': sum(m['fn'] for m in all_metrics)
    }

# Preparation of test data with progress bar
test_data = []
for idx in tqdm(range(len(test_dataset)), desc="Preparing Test Data"):
    img_path = test_dataset.dataset.imgs[test_dataset.indices[idx]]
    gt_boxes = test_dataset.dataset.targets[test_dataset.indices[idx]]['boxes'].numpy()
    test_data.append((img_path, gt_boxes))

# Execute benchmarks
# CLI usage: the progress bars will appear sequentially for each model
results = []
results.append(benchmark_model('EasyOCR', get_easyocr_predictions, test_data))
results.append(benchmark_model('Pytesseract', get_pytesseract_predictions, test_data))
results.append(benchmark_model('Custom OCR', lambda x: get_custom_model_predictions(custom_model, x), test_data))
results.append(benchmark_model('Custom OCR Refined', lambda x: get_custom_model_predictions(refined_model, x), test_data))

Preparing Test Data:   0%|          | 0/25624 [00:00<?, ?it/s]

Benchmarking EasyOCR:   0%|          | 0/25624 [00:00<?, ?it/s]

Benchmarking Pytesseract:   0%|          | 0/25624 [00:00<?, ?it/s]