[![Labellerr](https://storage.googleapis.com/labellerr-cdn/%200%20Labellerr%20template/notebook.webp)](https://www.labellerr.com)

# **YOLO vs RT-DETR Video Detection Comparison Notebook**

---

[![labellerr](https://img.shields.io/badge/Labellerr-BLOG-black.svg)](https://www.labellerr.com/blog)
[![Youtube](https://img.shields.io/badge/Labellerr-YouTube-b31b1b.svg)](https://www.youtube.com/@Labellerr)
[![Github](https://img.shields.io/badge/Labellerr-GitHub-green.svg)](https://github.com/Labellerr/Hands-On-Learning-in-Computer-Vision)

This notebook compares the performance of the YOLO (You Only Look Once) and RT-DETR (Real-Time Detection Transformer) models for object detection on video data.

**Main Objectives:**
- Demonstrate installation and setup for Ultralyics YOLO and RT-DETR with all required dependencies.
- Define configuration classes and settings for both models.
- Implement a class-based workflow (`VideoDetectionComparator`) to:
  - Run both models on video frames side-by-side.
  - Collect metrics such as inference time, detection counts, and confidence scores.
  - Save annotated output videos for each model.
  - Generate performance reports and visualizations (plots) comparing both models.

**Features:**
- Supports large and extra-large variants of YOLO and RT-DETR models.
- Processes videos frame by frame and aggregates accuracy and speed metrics.
- Produces comparison tables, summary statistics (FPS, detections/frame, confidence), and visualization charts.
- Includes usage examples and test cases for quick validation.

**Intended Use:**
- Benchmark and analyze trade-offs between speed and accuracy for YOLO vs RT-DETR on real video data.
- Help computer vision practitioners select optimal models for deployment in applications requiring efficient and accurate object detection.


## Installation & Setup

This section provides installation commands to set up the required packages for running YOLO and RT-DETR model comparisons. Ensure these commands are run in your terminal before executing other notebook cells.
- Installs: ultralytics, opencv-python, pandas, matplotlib, seaborn, numpy, torch, torchvision (CPU/GPU).


In [None]:
# ============================================================================
# SECTION 1: INSTALLATION & SETUP
# ============================================================================
"""
# Installation Commands (run in terminal):

pip install ultralytics  # For both YOLO and RT-DETR
pip install opencv-python
pip install pandas
pip install matplotlib
pip install seaborn
pip install numpy
pip install torch torchvision  # PyTorch (if not already installed)

# For GPU support:
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu129
"""

## Imports

Imports all necessary Python libraries for video processing, model inference, visualization, and file handling.
Included:
- OpenCV for video I/O
- Core libraries (time, json)
- pandas, numpy, seaborn, matplotlib for data analysis and visualization
- Ultralytics YOLO and RT-DETR models
- Pathlib for path management
- Defaultdict for results storage

In [1]:
# ============================================================================
# SECTION 2: IMPORTS
# ============================================================================

import cv2
import time
import pandas as pd
import numpy as np
import seaborn as sns
from pathlib import Path
from ultralytics import YOLO, RTDETR
from collections import defaultdict
import json
import matplotlib.pyplot as plt


## Model Configuration

Defines the configuration class for YOLO and RT-DETR models.
- Specifies available model variants (large, extra large)
- Sets default parameters for confidence and IoU thresholds used in comparison tasks.


In [2]:
# ============================================================================
# SECTION 3: MODEL CONFIGURATION
# ============================================================================

class ModelConfig:
    """Configuration for YOLO and RT-DETR models"""
    
    # Available YOLO models
    YOLO_MODELS = {
        'yolov8l': 'yolov8l.pt',  # Large
        'yolov8x': 'yolov8x.pt',  # Extra Large
    }
    
    # Available RT-DETR models
    RTDETR_MODELS = {
        'rtdetr-l': 'rtdetr-l.pt',  # Large
        'rtdetr-x': 'rtdetr-x.pt',  # Extra Large
    }
    
    # Default test parameters
    DEFAULT_CONF = 0.25  # Confidence threshold
    DEFAULT_IOU = 0.45   # IoU threshold for NMS


## Video Processing & Comparison

Implements the `VideoDetectionComparator` class for frame-by-frame inference and metric tracking.
- Initializes YOLO and RT-DETR models
- Processes individual frames for detection
- Annotates and saves outputs
- Aggregates detection results and timing metrics
- Main function for comparing two models across all video frames.


In [3]:
# ============================================================================
# SECTION 4: VIDEO PROCESSOR CLASS
# ============================================================================

class VideoDetectionComparator:
    """Main class for comparing YOLO and RT-DETR on videos"""
    
    def __init__(self, yolo_model='yolov8l', rtdetr_model='rtdetr-l', 
                 conf_threshold=0.25, iou_threshold=0.45):
        """
        Initialize models for comparison
        
        Args:
            yolo_model: YOLO model variant
            rtdetr_model: RT-DETR model variant
            conf_threshold: Confidence threshold for detections
            iou_threshold: IoU threshold for NMS
        """
        print(f"Loading {yolo_model}...")
        self.yolo = YOLO(ModelConfig.YOLO_MODELS[yolo_model])
        
        print(f"Loading {rtdetr_model}...")
        self.rtdetr = RTDETR(ModelConfig.RTDETR_MODELS[rtdetr_model])
        
        self.conf_threshold = conf_threshold
        self.iou_threshold = iou_threshold
        
        self.yolo_name = yolo_model
        self.rtdetr_name = rtdetr_model
        
        # Metrics storage
        self.results = {
            'yolo': defaultdict(list),
            'rtdetr': defaultdict(list)
        }
    
    def process_frame(self, frame, model, model_type):
        """Process a single frame with given model"""
        start_time = time.time()
        
        results = model(frame, conf=self.conf_threshold, 
                       iou=self.iou_threshold, verbose=False, show_labels=False)[0]
        
        inference_time = (time.time() - start_time) * 1000  # ms
        
        # Extract metrics
        boxes = results.boxes
        num_detections = len(boxes)
        confidences = boxes.conf.cpu().numpy() if num_detections > 0 else []
        classes = boxes.cls.cpu().numpy() if num_detections > 0 else []
        
        return {
            'inference_time': inference_time,
            'num_detections': num_detections,
            'confidences': confidences,
            'classes': classes,
            'boxes': boxes,
            'results': results
        }
    
    def process_video(self, video_path, output_dir='outputs', 
                     save_videos=True, max_frames=None):
        """
        Process video with both models and compare
        
        Args:
            video_path: Path to input video
            output_dir: Directory to save outputs
            save_videos: Whether to save annotated videos
            max_frames: Maximum frames to process (None for all)
        """
        video_path = Path(video_path)
        output_dir = Path(output_dir)
        output_dir.mkdir(exist_ok=True, parents=True)
        
        print(f"\n{'='*60}")
        print(f"Processing: {video_path.name}")
        print(f"{'='*60}")
        
        # Open video
        cap = cv2.VideoCapture(str(video_path))
        if not cap.isOpened():
            raise ValueError(f"Cannot open video: {video_path}")
        
        # Get video properties
        fps = int(cap.get(cv2.CAP_PROP_FPS))
        width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        
        if max_frames:
            total_frames = min(total_frames, max_frames)
        
        print(f"Video specs: {width}x{height} @ {fps}fps, {total_frames} frames")
        
        # Setup video writers
        if save_videos:
            fourcc = cv2.VideoWriter_fourcc(*'mp4v')
            yolo_out = cv2.VideoWriter(
                str(output_dir / f'{video_path.stem}_yolo.mp4'),
                fourcc, fps, (width, height)
            )
            rtdetr_out = cv2.VideoWriter(
                str(output_dir / f'{video_path.stem}_rtdetr.mp4'),
                fourcc, fps, (width, height)
            )
        
        frame_count = 0
        
        while cap.isOpened() and (max_frames is None or frame_count < max_frames):
            ret, frame = cap.read()
            if not ret:
                break
            
            # Process with YOLO
            yolo_result = self.process_frame(frame, self.yolo, 'yolo')
            self.results['yolo']['inference_time'].append(yolo_result['inference_time'])
            self.results['yolo']['num_detections'].append(yolo_result['num_detections'])
            self.results['yolo']['confidences'].extend(yolo_result['confidences'])
            
            # Process with RT-DETR
            rtdetr_result = self.process_frame(frame, self.rtdetr, 'rtdetr')
            self.results['rtdetr']['inference_time'].append(rtdetr_result['inference_time'])
            self.results['rtdetr']['num_detections'].append(rtdetr_result['num_detections'])
            self.results['rtdetr']['confidences'].extend(rtdetr_result['confidences'])
            
            # Save annotated frames
            if save_videos:
                yolo_frame = yolo_result['results'].plot()
                rtdetr_frame = rtdetr_result['results'].plot()
                
                yolo_out.write(yolo_frame)
                rtdetr_out.write(rtdetr_frame)
            
            frame_count += 1
            if frame_count % 30 == 0:
                print(f"Processed {frame_count}/{total_frames} frames...", end='\r')
        
        cap.release()
        if save_videos:
            yolo_out.release()
            rtdetr_out.release()
        
        print(f"\n✓ Completed processing {frame_count} frames")
        
        # Generate comparison report
        self._generate_report(video_path.stem, output_dir)
    
    def _generate_report(self, video_name, output_dir):
        """Generate comparison metrics and visualizations"""
        print("\nGenerating comparison report...")
        
        # Calculate statistics
        stats = {}
        for model_name in ['yolo', 'rtdetr']:
            inf_times = self.results[model_name]['inference_time']
            detections = self.results[model_name]['num_detections']
            confidences = self.results[model_name]['confidences']
            
            stats[model_name] = {
                'avg_inference_time': float(np.mean(inf_times)),
                'std_inference_time': float(np.std(inf_times)),
                'min_inference_time': float(np.min(inf_times)),
                'max_inference_time': float(np.max(inf_times)),
                'avg_fps': float(1000 / np.mean(inf_times)),
                'avg_detections': float(np.mean(detections)),
                'total_detections': int(np.sum(detections)),
                'avg_confidence': float(np.mean(confidences)) if len(confidences) > 0 else 0.0,
            }
        
        # Print comparison table
        print(f"\n{'='*60}")
        print(f"COMPARISON RESULTS: {video_name}")
        print(f"{'='*60}")
        print(f"\n{'Metric':<30} {'YOLO':<15} {'RT-DETR':<15}")
        print(f"{'-'*60}")
        
        metrics_display = [
            ('Avg Inference Time (ms)', 'avg_inference_time'),
            ('Std Inference Time (ms)', 'std_inference_time'),
            ('Average FPS', 'avg_fps'),
            ('Avg Detections/Frame', 'avg_detections'),
            ('Total Detections', 'total_detections'),
            ('Avg Confidence', 'avg_confidence'),
        ]
        
        for label, key in metrics_display:
            yolo_val = stats['yolo'][key]
            rtdetr_val = stats['rtdetr'][key]
            print(f"{label:<30} {yolo_val:<15.2f} {rtdetr_val:<15.2f}")
        
        # Save statistics to JSON
        stats_file = output_dir / f'{video_name}_stats.json'
        with open(stats_file, 'w') as f:
            json.dump(stats, f, indent=2)
        print(f"\n✓ Statistics saved to: {stats_file}")
        
        # Create visualizations
        self._create_visualizations(video_name, output_dir, stats)
    
    def _create_visualizations(self, video_name, output_dir, stats):
        """Create comparison visualizations"""
        fig, axes = plt.subplots(2, 2, figsize=(15, 12))
        fig.suptitle(f'YOLO vs RT-DETR Comparison: {video_name}', 
                     fontsize=16, fontweight='bold')
        
        # 1. Inference Time Distribution
        ax1 = axes[0, 0]
        yolo_times = self.results['yolo']['inference_time']
        rtdetr_times = self.results['rtdetr']['inference_time']
        
        ax1.hist(yolo_times, bins=50, alpha=0.6, label=f'YOLO ({self.yolo_name})', 
                color='blue', edgecolor='black')
        ax1.hist(rtdetr_times, bins=50, alpha=0.6, label=f'RT-DETR ({self.rtdetr_name})', 
                color='red', edgecolor='black')
        ax1.set_xlabel('Inference Time (ms)')
        ax1.set_ylabel('Frequency')
        ax1.set_title('Inference Time Distribution')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # 2. Inference Time Over Frames
        ax2 = axes[0, 1]
        frames = range(len(yolo_times))
        ax2.plot(frames, yolo_times, alpha=0.7, label=f'YOLO ({self.yolo_name})', 
                linewidth=1, color='blue')
        ax2.plot(frames, rtdetr_times, alpha=0.7, label=f'RT-DETR ({self.rtdetr_name})', 
                linewidth=1, color='red')
        ax2.set_xlabel('Frame Number')
        ax2.set_ylabel('Inference Time (ms)')
        ax2.set_title('Inference Time Over Frames')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # 3. Detections Per Frame
        ax3 = axes[1, 0]
        yolo_dets = self.results['yolo']['num_detections']
        rtdetr_dets = self.results['rtdetr']['num_detections']
        
        ax3.plot(frames, yolo_dets, alpha=0.7, label=f'YOLO ({self.yolo_name})', 
                linewidth=1.5, color='blue')
        ax3.plot(frames, rtdetr_dets, alpha=0.7, label=f'RT-DETR ({self.rtdetr_name})', 
                linewidth=1.5, color='red')
        ax3.set_xlabel('Frame Number')
        ax3.set_ylabel('Number of Detections')
        ax3.set_title('Detections Per Frame')
        ax3.legend()
        ax3.grid(True, alpha=0.3)
        
        # 4. Performance Metrics Comparison
        ax4 = axes[1, 1]
        metrics = ['Avg FPS', 'Avg Detections', 'Avg Confidence']
        yolo_values = [
            stats['yolo']['avg_fps'],
            stats['yolo']['avg_detections'],
            stats['yolo']['avg_confidence'] * 100  # Scale to 0-100
        ]
        rtdetr_values = [
            stats['rtdetr']['avg_fps'],
            stats['rtdetr']['avg_detections'],
            stats['rtdetr']['avg_confidence'] * 100
        ]
        
        x = np.arange(len(metrics))
        width = 0.35
        
        ax4.bar(x - width/2, yolo_values, width, label=f'YOLO ({self.yolo_name})', 
               color='blue', alpha=0.8)
        ax4.bar(x + width/2, rtdetr_values, width, label=f'RT-DETR ({self.rtdetr_name})', 
               color='red', alpha=0.8)
        
        ax4.set_ylabel('Value')
        ax4.set_title('Performance Metrics Comparison')
        ax4.set_xticks(x)
        ax4.set_xticklabels(metrics)
        ax4.legend()
        ax4.grid(True, alpha=0.3, axis='y')
        
        # Add value labels on bars
        for i, (y_val, r_val) in enumerate(zip(yolo_values, rtdetr_values)):
            ax4.text(i - width/2, y_val, f'{y_val:.1f}', 
                    ha='center', va='bottom', fontsize=9)
            ax4.text(i + width/2, r_val, f'{r_val:.1f}', 
                    ha='center', va='bottom', fontsize=9)
        
        plt.tight_layout()
        
        plot_file = output_dir / f'{video_name}_comparison.png'
        plt.savefig(plot_file, dpi=300, bbox_inches='tight')
        plt.close()
        
        print(f"✓ Visualizations saved to: {plot_file}")
    
    def reset_results(self):
        """Reset results for processing a new video"""
        self.results = {
            'yolo': defaultdict(list),
            'rtdetr': defaultdict(list)
        }
    

## Test Cases & Usage Examples

Demonstrates how to use the `VideoDetectionComparator` for model comparisons on sample input videos.
- `test_case_single_video()` compares the inference and detection counts for YOLO vs RT-DETR for one video file.
- Outputs annotated videos and saves metrics/statistics.
Run this as the main workflow to validate your setup.


In [4]:
# ============================================================================
# SECTION 5: TEST CASES & USAGE EXAMPLES
# ============================================================================

def test_case_single_video():
    """Test Case 1: Single video comparison with default models"""
    print("\n" + "="*60)
    print("TEST CASE 1: Single Video Comparison")
    print("="*60)
    
    comparator = VideoDetectionComparator(
        yolo_model='yolov8l',
        rtdetr_model='rtdetr-l',
        conf_threshold=0.25
    )
    
    # Process video
    comparator.process_video(
        video_path='sample_1.mp4',
        output_dir='outputs/test1_model-l',
        save_videos=True,
        max_frames=None  # Process all frames
    )

if __name__ == "__main__":
    test_case_single_video()


TEST CASE 1: Single Video Comparison
Loading yolov8l...
Loading rtdetr-l...

Processing: sample_1.mp4
Video specs: 1280x720 @ 29fps, 471 frames
Processed 450/471 frames...
✓ Completed processing 471 frames

Generating comparison report...

COMPARISON RESULTS: sample_1

Metric                         YOLO            RT-DETR        
------------------------------------------------------------
Avg Inference Time (ms)        32.50           33.59          
Std Inference Time (ms)        44.61           27.82          
Average FPS                    30.77           29.77          
Avg Detections/Frame           22.39           38.42          
Total Detections               10546.00        18095.00       
Avg Confidence                 0.63            0.54           

✓ Statistics saved to: outputs\test1_model-l\sample_1_stats.json
✓ Visualizations saved to: outputs\test1_model-l\sample_1_comparison.png


## Test Case: Extra Large Models

Compares YOLOv8x with RT-DETR-x using the same workflow as above.
- Switches to the "extra large" variants for an alternate performance baseline.
- Processes the same video and saves to a different output directory.


In [6]:
def test_case_single_video():
    """Test Case 1: Single video comparison with default models"""
    print("\n" + "="*60)
    print("TEST CASE 1: Single Video Comparison")
    print("="*60)
    
    comparator = VideoDetectionComparator(
        yolo_model='yolov8x',
        rtdetr_model='rtdetr-x',
        conf_threshold=0.25
    )
    
    # Process video
    comparator.process_video(
        video_path='sample_2.mp4',
        output_dir='outputs/test1-2_model-x',
        save_videos=True,
        max_frames=None  # Process all frames
    )

if __name__ == "__main__":
    test_case_single_video()


TEST CASE 1: Single Video Comparison
Loading yolov8x...
Loading rtdetr-x...

Processing: sample_2.mp4
Video specs: 1920x1080 @ 29fps, 518 frames
Processed 510/518 frames...
✓ Completed processing 518 frames

Generating comparison report...

COMPARISON RESULTS: sample_2

Metric                         YOLO            RT-DETR        
------------------------------------------------------------
Avg Inference Time (ms)        43.91           40.00          
Std Inference Time (ms)        44.17           41.67          
Average FPS                    22.77           25.00          
Avg Detections/Frame           30.44           53.47          
Total Detections               15769.00        27699.00       
Avg Confidence                 0.55            0.48           

✓ Statistics saved to: outputs\test1-2_model-x\sample_2_stats.json
✓ Visualizations saved to: outputs\test1-2_model-x\sample_2_comparison.png
