[![Labellerr](https://storage.googleapis.com/labellerr-cdn/%200%20Labellerr%20template/notebook.webp)](https://www.labellerr.com)

# **YOLO Model Comparison**

---

[![labellerr](https://img.shields.io/badge/Labellerr-BLOG-black.svg)](https://www.labellerr.com/blog)
[![Youtube](https://img.shields.io/badge/Labellerr-YouTube-b31b1b.svg)](https://www.youtube.com/@Labellerr)
[![Github](https://img.shields.io/badge/Labellerr-GitHub-green.svg)](https://github.com/Labellerr/Hands-On-Learning-in-Computer-Vision)

This notebook provides a tool to compare the performance of different YOLO models on the same video. It generates a side-by-side comparison video, performance statistics, and visualizations.

In [2]:
import cv2
import numpy as np
from ultralytics import YOLO
import matplotlib.pyplot as plt
from pathlib import Path
import time
import pandas as pd
from typing import List, Dict, Optional, Tuple
import warnings
warnings.filterwarnings('ignore')


## `YOLOComparator` Class

This cell defines the main class for the comparison tool.

In [None]:
class YOLOComparator:
    """Master class for comparing multiple YOLO models on same task"""
    
    SUPPORTED_TASKS = ['detect', 'segment', 'pose', 'track', 'obb', 'claQssify']
    
    # Predefined model configurations
    PRESET_MODELS = {
        # Detection models
        'yolov8n': 'yolov8n.pt',
        'yolov8s': 'yolov8s.pt',
        'yolov8m': 'yolov8m.pt',
        'yolov8l': 'yolov8l.pt',
        'yolov8x': 'yolov8x.pt',
        'yolo11n': 'yolo11n.pt',
        'yolo11s': 'yolo11s.pt',
        'yolo11m': 'yolo11m.pt',
        'yolo11l': 'yolo11l.pt',
        'yolo11x': 'yolo11x.pt',
        
        # Segmentation models
        'yolov8n-seg': 'yolov8n-seg.pt',
        'yolov8s-seg': 'yolov8s-seg.pt',
        'yolov8m-seg': 'yolov8m-seg.pt',
        'yolov8l-seg': 'yolov8l-seg.pt',
        'yolov8x-seg': 'yolov8x-seg.pt',
        'yolo11n-seg': 'yolo11n-seg.pt',
        'yolo11s-seg': 'yolo11s-seg.pt',
        'yolo11m-seg': 'yolo11m-seg.pt',
        
        # Pose models
        'yolov8n-pose': 'yolov8n-pose.pt',
        'yolov8s-pose': 'yolov8s-pose.pt',
        'yolov8m-pose': 'yolov8m-pose.pt',
        'yolov8l-pose': 'yolov8l-pose.pt',
        'yolo11n-pose': 'yolo11n-pose.pt',
        'yolo11s-pose': 'yolo11s-pose.pt',
        'yolo11m-pose': 'yolo11m-pose.pt',
    }
    
    def __init__(self, output_dir: str = "yolo_comparison_results"):
        """Initialize the comparator"""
        self.output_dir = Path(output_dir)
        self.output_dir.mkdir(exist_ok=True)
        self.loaded_models = {}
        
    def _load_model(self, model_path: str, model_name: str) -> YOLO:
        """Load a YOLO model with caching"""
        if model_name not in self.loaded_models:
            print(f"  Loading {model_name}...")
            # Check if it's a preset model
            if model_path in self.PRESET_MODELS:
                model_path = self.PRESET_MODELS[model_path]
            self.loaded_models[model_name] = YOLO(model_path)
        return self.loaded_models[model_name]
    
    def _get_video_info(self, video_path: str) -> Dict:
        """Extract video metadata"""
        cap = cv2.VideoCapture(video_path)
        info = {
            'fps': int(cap.get(cv2.CAP_PROP_FPS)),
            'width': int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),
            'height': int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)),
            'total_frames': int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        }
        cap.release()
        info['duration'] = info['total_frames'] / info['fps'] if info['fps'] > 0 else 0
        return info
    
    def _create_grid_layout(self, frames: List[np.ndarray], titles: List[str], 
                           times: List[float], counts: List[int]) -> np.ndarray:
        """Create a grid layout for multiple frames"""
        n_models = len(frames)
        
        if n_models == 1:
            rows, cols = 1, 1
        elif n_models == 2:
            rows, cols = 1, 2
        elif n_models <= 4:
            rows, cols = 2, 2
        elif n_models <= 6:
            rows, cols = 2, 3
        elif n_models <= 9:
            rows, cols = 3, 3
        else:
            rows = int(np.ceil(np.sqrt(n_models)))
            cols = int(np.ceil(n_models / rows))
        
        h, w = frames[0].shape[:2]
        
        # Create empty grid
        grid = np.zeros((h * rows, w * cols, 3), dtype=np.uint8)
        
        # Place frames in grid
        for idx, (frame, title, inf_time, count) in enumerate(zip(frames, titles, times, counts)):
            row = idx // cols
            col = idx % cols
            
            y_start = row * h
            y_end = (row + 1) * h
            x_start = col * w
            x_end = (col + 1) * w
            
            grid[y_start:y_end, x_start:x_end] = frame
            
            # Add info overlay
            info_text = f"{title} | {inf_time*1000:.1f}ms | {count} obj"
            font = cv2.FONT_HERSHEY_SIMPLEX
            font_scale = 0.6
            thickness = 2
            
            (text_w, text_h), _ = cv2.getTextSize(info_text, font, font_scale, thickness)
            cv2.rectangle(grid, (x_start + 5, y_start + 5), 
                         (x_start + text_w + 15, y_start + text_h + 15), 
                         (0, 0, 0), -1)
            cv2.putText(grid, info_text, (x_start + 10, y_start + text_h + 10),
                       font, font_scale, (0, 255, 0), thickness)
        
        return grid
    
    def _process_frame(self, frame: np.ndarray, model: YOLO, task: str, 
                      conf_threshold: float) -> Tuple[np.ndarray, float, int]:
        """Process a single frame with a model"""
        start_time = time.time()
        
        if task == 'track':
            results = model.track(frame, conf=conf_threshold, persist=True, verbose=False)[0]
        else:
            results = model(frame, conf=conf_threshold, verbose=False)[0]
        
        inference_time = time.time() - start_time
        processed_frame = results.plot()
        
        # Count detections
        if task == 'track':
            count = len(results.boxes.id) if results.boxes.id is not None else 0
        else:
            count = len(results.boxes) if results.boxes else 0
        
        return processed_frame, inference_time, count
    
    def compare_models(self,
                      video_path: str,
                      models: List[str],
                      task: str = 'detect',
                      conf_threshold: float = 0.25,
                      max_frames: Optional[int] = None,
                      model_names: Optional[List[str]] = None,
                      generate_plots: bool = True) -> Dict:
        """
        Compare multiple YOLO models on the same task
        
        Args:
            video_path: Path to input video file
            models: List of model paths or preset names (e.g., ['yolov8n', 'yolov8s', 'yolo11n'])
            task: Task to perform ('detect', 'segment', 'pose', 'track')
            conf_threshold: Confidence threshold for detections (default: 0.25)
            max_frames: Maximum frames to process (None for entire video)
            model_names: Optional custom names for models (defaults to model paths)
            generate_plots: Whether to generate visualization plots
            
        Returns:
            Dictionary containing results and statistics
        """
        
        # Validate inputs
        if not Path(video_path).exists():
            raise FileNotFoundError(f"Video file not found: {video_path}")
        
        if task not in self.SUPPORTED_TASKS:
            raise ValueError(f"Invalid task: {task}. Supported: {self.SUPPORTED_TASKS}")
        
        if len(models) < 1:
            raise ValueError("At least one model must be provided")
        
        # Set default model names
        if model_names is None:
            model_names = [Path(m).stem if m not in self.PRESET_MODELS else m 
                          for m in models]
        
        if len(model_names) != len(models):
            raise ValueError("Number of model names must match number of models")
        
        # Get video info
        video_info = self._get_video_info(video_path)
        
        print("\n" + "="*70)
        print("YOLO MODEL COMPARISON - SAME TASK")
        print("="*70)
        print(f"Video: {video_path}")
        print(f"Resolution: {video_info['width']}x{video_info['height']}")
        print(f"FPS: {video_info['fps']}")
        print(f"Total Frames: {video_info['total_frames']}")
        print(f"Duration: {video_info['duration']:.2f}s")
        print(f"Task: {task.upper()}")
        print(f"Models: {', '.join(model_names)}")
        print(f"Max Frames: {max_frames if max_frames else 'All'}")
        print("="*70)
        
        # Load all models
        print("\nLoading models...")
        loaded_models = []
        for model_path, name in zip(models, model_names):
            model = self._load_model(model_path, name)
            loaded_models.append(model)
        print("✓ All models loaded\n")
        
        # Setup video processing
        cap = cv2.VideoCapture(video_path)
        output_filename = f'{task}_comparison_{"_".join(model_names)}.mp4'
        out_path = self.output_dir / output_filename
        writer = None
        
        # Metrics storage
        metrics = {name: {'times': [], 'counts': []} for name in model_names}
        
        frame_count = 0
        print(f"Processing video...")
        
        while cap.isOpened() and (max_frames is None or frame_count < max_frames):
            ret, frame = cap.read()
            if not ret:
                break
            
            # Process frame with all models
            processed_frames = []
            times = []
            counts = []
            
            for model, name in zip(loaded_models, model_names):
                proc_frame, inf_time, count = self._process_frame(
                    frame, model, task, conf_threshold
                )
                processed_frames.append(proc_frame)
                times.append(inf_time)
                counts.append(count)
                
                # Store metrics
                metrics[name]['times'].append(inf_time)
                metrics[name]['counts'].append(count)
            
            # Create grid comparison
            comparison_grid = self._create_grid_layout(
                processed_frames, model_names, times, counts
            )
            
            # Initialize writer
            if writer is None:
                h, w = comparison_grid.shape[:2]
                writer = cv2.VideoWriter(str(out_path),
                                        cv2.VideoWriter_fourcc(*'mp4v'),
                                        video_info['fps'], (w, h))
            
            writer.write(comparison_grid)
            frame_count += 1
            
            if frame_count % 30 == 0:
                print(f"  Processed {frame_count} frames...")
        
        cap.release()
        if writer:
            writer.release()
        
        print(f"\n✓ Comparison video saved: {out_path}")
        
        # Generate statistics
        print("\n" + "="*70)
        print("PERFORMANCE STATISTICS")
        print("="*70)
        
        summary_data = []
        for name in model_names:
            times = np.array(metrics[name]['times']) * 1000
            counts = metrics[name]['counts']
            
            summary_data.append({
                'Model': name,
                'Task': task.capitalize(),
                'Avg Time (ms)': f"{np.mean(times):.2f}",
                'Std Time (ms)': f"{np.std(times):.2f}",
                'FPS': f"{1000/np.mean(times):.2f}",
                'Min Time (ms)': f"{np.min(times):.2f}",
                'Max Time (ms)': f"{np.max(times):.2f}",
                'Avg Count': f"{np.mean(counts):.1f}"
            })
        
        df_summary = pd.DataFrame(summary_data)
        csv_path = self.output_dir / f'{task}_performance_summary.csv'
        df_summary.to_csv(csv_path, index=False)
        
        print("\n" + df_summary.to_string(index=False))
        print(f"\n✓ Summary saved: {csv_path}")
        
        # Generate visualizations
        if generate_plots:
            self._create_visualizations(metrics, model_names, task)
        
        # Speedup analysis
        self._print_speedup_analysis(metrics, model_names)
        
        # Final summary
        print("\n" + "="*70)
        print("COMPARISON COMPLETE!")
        print("="*70)
        print(f"\nOutput directory: {self.output_dir}/")
        print(f"\nGenerated files:")
        print(f"  • {output_filename}")
        print(f"  • {task}_performance_summary.csv")
        if generate_plots:
            print(f"  • {task}_performance_comparison.png")
        print("="*70 + "\n")
        
        return {
            'video_info': video_info,
            'task': task,
            'models': model_names,
            'metrics': metrics,
            'summary': df_summary,
            'output_path': str(out_path),
            'output_dir': str(self.output_dir),
            'frames_processed': frame_count
        }
    
    def _create_visualizations(self, metrics: Dict, model_names: List[str], task: str):
        """Create performance visualization plots"""
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        fig.suptitle(f'{task.capitalize()} Performance Comparison', 
                    fontsize=16, fontweight='bold')
        
        colors = plt.cm.tab10(np.linspace(0, 1, len(model_names)))
        
        # Plot 1: Inference time over frames
        for idx, name in enumerate(model_names):
            frames = range(len(metrics[name]['times']))
            times = np.array(metrics[name]['times']) * 1000
            ax1.plot(frames, times, label=name, color=colors[idx], 
                    alpha=0.7, linewidth=1.5)
        
        ax1.set_xlabel('Frame Number', fontsize=12)
        ax1.set_ylabel('Inference Time (ms)', fontsize=12)
        ax1.set_title('Inference Time per Frame', fontsize=14)
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Plot 2: Average performance comparison
        avg_times = [np.mean(metrics[name]['times']) * 1000 for name in model_names]
        avg_fps = [1000 / t for t in avg_times]
        
        x = np.arange(len(model_names))
        width = 0.35
        
        ax2_twin = ax2.twinx()
        bars1 = ax2.bar(x - width/2, avg_times, width, label='Avg Time (ms)',
                       color=colors, alpha=0.7)
        bars2 = ax2_twin.bar(x + width/2, avg_fps, width, label='FPS',
                            color=colors, alpha=0.4, edgecolor='black', linewidth=2)
        
        ax2.set_xlabel('Model', fontsize=12)
        ax2.set_ylabel('Average Time (ms)', fontsize=12)
        ax2_twin.set_ylabel('FPS', fontsize=12)
        ax2.set_title('Average Performance', fontsize=14)
        ax2.set_xticks(x)
        ax2.set_xticklabels(model_names, rotation=45, ha='right')
        
        # Add value labels on bars
        for bar, val in zip(bars1, avg_times):
            height = bar.get_height()
            ax2.text(bar.get_x() + bar.get_width()/2., height,
                    f'{val:.1f}', ha='center', va='bottom', fontsize=9)
        
        for bar, val in zip(bars2, avg_fps):
            height = bar.get_height()
            ax2_twin.text(bar.get_x() + bar.get_width()/2., height,
                         f'{val:.1f}', ha='center', va='bottom', fontsize=9)
        
        lines1, labels1 = ax2.get_legend_handles_labels()
        lines2, labels2 = ax2_twin.get_legend_handles_labels()
        ax2.legend(lines1 + lines2, labels1 + labels2, loc='upper left')
        ax2.grid(True, alpha=0.3, axis='y')
        
        plt.tight_layout()
        plot_path = self.output_dir / f'{task}_performance_comparison.png'
        plt.savefig(plot_path, dpi=300, bbox_inches='tight')
        plt.close()
        
        print(f"\n✓ Performance plot saved: {plot_path}")
    
    def _print_speedup_analysis(self, metrics: Dict, model_names: List[str]):
        """Print speedup analysis relative to baseline (first model)"""
        print("\n" + "="*70)
        print("SPEEDUP ANALYSIS (relative to baseline)")
        print("="*70)
        
        baseline_name = model_names[0]
        baseline_time = np.mean(metrics[baseline_name]['times'])
        
        print(f"Baseline: {baseline_name} ({baseline_time*1000:.2f}ms)")
        print("-" * 70)
        
        for name in model_names[1:]:
            model_time = np.mean(metrics[name]['times'])
            speedup = ((baseline_time - model_time) / baseline_time) * 100
            
            if speedup > 0:
                print(f"{name}: {speedup:.1f}% faster than baseline")
            else:
                print(f"{name}: {abs(speedup):.1f}% slower than baseline")
        
        print("="*70)



##  Example 1 - Compare YOLOv8 Sizes

This cell demonstrates how to use the `YOLOComparator` to compare different sizes of the YOLOv8 model (nano, small, medium, and large) on a detection task.

In [4]:
if __name__ == "__main__":
    
    # Example 1: Compare YOLOv8 sizes (n, s, m, l) on detection
    print("Compare YOLOv8 model sizes on detection\n")
    comparator = YOLOComparator(output_dir="results_yolov8_sizes")
    results = comparator.compare_models(
        video_path="sample_1.mp4",
        models=['yolov8n', 'yolov8s', 'yolov8m', 'yolov8l'],
        task='detect',
        conf_threshold=0.25,
        max_frames=200
    )
    


Compare YOLOv8 model sizes on detection


YOLO MODEL COMPARISON - SAME TASK
Video: sample_1.mp4
Resolution: 1280x720
FPS: 29
Total Frames: 471
Duration: 16.24s
Task: DETECT
Models: yolov8n, yolov8s, yolov8m, yolov8l
Max Frames: 200

Loading models...
  Loading yolov8n...
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n.pt to 'yolov8n.pt': 100% ━━━━━━━━━━━━ 6.2MB 1.6MB/s 4.0s 4.0s<0.0s4s0s
  Loading yolov8s...
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8s.pt to 'yolov8s.pt': 100% ━━━━━━━━━━━━ 21.5MB 7.4MB/s 2.9s2.9s<1.0ss1s
  Loading yolov8m...
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8m.pt to 'yolov8m.pt': 100% ━━━━━━━━━━━━ 49.7MB 11.8MB/s 4.2s4.2s<0.0ss59
  Loading yolov8l...
[KDownloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8l.pt to 'yolov8l.pt': 100% ━━━━━━━━━━━━ 83.7MB 13.3MB/s 6.3s6.2s<0.1s7s
✓ All models loaded

Processing video..

## Example 2 - Compare YOLOv11 Sizes

This cell shows how to compare different sizes of the YOLOv11 model.

In [8]:
if __name__ == "__main__":
    # Example 1: Compare YOLO11 sizes (n, s, m, l) on detection
    print("Compare YOLOv8 model sizes on detection\n")
    comparator = YOLOComparator(output_dir="results_yolo11_sizes")
    results = comparator.compare_models(
        video_path="sample_1.mp4",
        models=['yolo11n', 'yolo11s', 'yolo11m', 'yolo11l'],
        task='detect',
        conf_threshold=0.25,
        max_frames=800
    )

Compare YOLOv8 model sizes on detection


YOLO MODEL COMPARISON - SAME TASK
Video: sample_1.mp4
Resolution: 1280x720
FPS: 29
Total Frames: 471
Duration: 16.24s
Task: DETECT
Models: yolo11n, yolo11s, yolo11m, yolo11l
Max Frames: 800

Loading models...
  Loading yolo11n...
  Loading yolo11s...
  Loading yolo11m...
  Loading yolo11l...
✓ All models loaded

Processing video...
  Processed 30 frames...
  Processed 60 frames...
  Processed 90 frames...
  Processed 120 frames...
  Processed 150 frames...
  Processed 180 frames...
  Processed 210 frames...
  Processed 240 frames...
  Processed 270 frames...
  Processed 300 frames...
  Processed 330 frames...
  Processed 360 frames...
  Processed 390 frames...
  Processed 420 frames...
  Processed 450 frames...

✓ Comparison video saved: results_yolo11_sizes\detect_comparison_yolo11n_yolo11s_yolo11m_yolo11l.mp4

PERFORMANCE STATISTICS

  Model   Task Avg Time (ms) Std Time (ms)   FPS Min Time (ms) Max Time (ms) Avg Count
yolo11n Detect         

## Example 3 - Compare YOLOv8n vs. YOLOv11n

This cell compares the "nano" versions of YOLOv8 and YOLOv11 on a detection task.

In [9]:
if __name__ == "__main__":
    # Example 2: Compare YOLOv8 vs YOLO11 MODEL n on detection
    print("Compare YOLOv8 vs YOLO11 on detection\n")
    comparator2 = YOLOComparator(output_dir="results_v8_vs_v11_model-n_detect")
    results2 = comparator2.compare_models(
        video_path="sample_1.mp4",
        models=['yolov8n', 'yolo11n'],
        task='detect',
        conf_threshold=0.25,
        max_frames=800
    )

Compare YOLOv8 vs YOLO11 on detection


YOLO MODEL COMPARISON - SAME TASK
Video: sample_1.mp4
Resolution: 1280x720
FPS: 29
Total Frames: 471
Duration: 16.24s
Task: DETECT
Models: yolov8n, yolo11n
Max Frames: 800

Loading models...
  Loading yolov8n...
  Loading yolo11n...
✓ All models loaded

Processing video...
  Processed 30 frames...
  Processed 60 frames...
  Processed 90 frames...
  Processed 120 frames...
  Processed 150 frames...
  Processed 180 frames...
  Processed 210 frames...
  Processed 240 frames...
  Processed 270 frames...
  Processed 300 frames...
  Processed 330 frames...
  Processed 360 frames...
  Processed 390 frames...
  Processed 420 frames...
  Processed 450 frames...

✓ Comparison video saved: results_v8_vs_v11_model-n_detect\detect_comparison_yolov8n_yolo11n.mp4

PERFORMANCE STATISTICS

  Model   Task Avg Time (ms) Std Time (ms)   FPS Min Time (ms) Max Time (ms) Avg Count
yolov8n Detect         16.28          5.11 61.44          5.88         88.52      17.4
yo

## Example 4 - Compare YOLOv8l vs. YOLOv11l

This cell compares the "large" versions of YOLOv8 and YOLOv11 on a detection task.

In [10]:
if __name__ == "__main__":
    # Example 2: Compare YOLOv8 vs YOLO11 MODEL l on detection
    print("\n\nExample 2: Compare YOLOv8 vs YOLO11 on segmentation\n")
    comparator2 = YOLOComparator(output_dir="results_v8_vs_v11_model-l_detect")
    results2 = comparator2.compare_models(
        video_path="sample_1.mp4",
        models=['yolov8l', 'yolo11l'],
        task='detect',
        conf_threshold=0.25,
        max_frames=800
    )



Example 2: Compare YOLOv8 vs YOLO11 on segmentation


YOLO MODEL COMPARISON - SAME TASK
Video: sample_1.mp4
Resolution: 1280x720
FPS: 29
Total Frames: 471
Duration: 16.24s
Task: DETECT
Models: yolov8l, yolo11l
Max Frames: 800

Loading models...
  Loading yolov8l...
  Loading yolo11l...
✓ All models loaded

Processing video...
  Processed 30 frames...
  Processed 60 frames...
  Processed 90 frames...
  Processed 120 frames...
  Processed 150 frames...
  Processed 180 frames...
  Processed 210 frames...
  Processed 240 frames...
  Processed 270 frames...
  Processed 300 frames...
  Processed 330 frames...
  Processed 360 frames...
  Processed 390 frames...
  Processed 420 frames...
  Processed 450 frames...

✓ Comparison video saved: results_v8_vs_v11_model-l_detect\detect_comparison_yolov8l_yolo11l.mp4

PERFORMANCE STATISTICS

  Model   Task Avg Time (ms) Std Time (ms)   FPS Min Time (ms) Max Time (ms) Avg Count
yolov8l Detect         26.31         25.20 38.01         17.34        568