# RT-DETR-X: Fine-tuning and Evaluation

This notebook demonstrates how to:
1. Install required dependencies
2. Prepare dataset for training
3. Fine-tune RT-DETR-X (extra-large variant) on a custom dataset
4. Run inference on test images
5. Calculate and visualize evaluation metrics

## 1. Install Required Dependencies

In [None]:
# Install ultralytics package for RT-DETR
!pip install ultralytics
!pip install opencv-python matplotlib seaborn

# Check CUDA availability 
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

## 2. Import Required Libraries

In [None]:
import os
import yaml
import random
import numpy as np
import cv2
import matplotlib.pyplot as plt
import seaborn as sns
from ultralytics import RTDETR
from IPython.display import display, Image
import pandas as pd
from tqdm.notebook import tqdm
from pathlib import Path
import time
import requests

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(42)

## 3. Dataset Preparation

Using the same dataset format as our previous models.

In [None]:
# Define dataset paths - customize these for your specific project
DATASET_DIR = "../dataset_split"  # Change this!
TRAIN_DIR = os.path.join(DATASET_DIR, "train")
VAL_DIR = os.path.join(DATASET_DIR, "val")
TEST_DIR = os.path.join(DATASET_DIR, "test")

# Write the dataset configuration to a YAML file
yaml_path = os.path.join(DATASET_DIR, "data.yaml")
print(f"Using dataset configuration from: {yaml_path}")

# Check if the dataset exists
if not os.path.exists(yaml_path):
    print("Warning: Dataset configuration not found. Please create or verify the path.")

## 4. Load and Fine-tune RT-DETR-X Model

Now we'll load a pre-trained RT-DETR-X model and fine-tune it on our custom dataset.

In [None]:
# Load pre-trained RT-DETR-X model directly
model = RTDETR('rtdetr-x.pt')  # Loading just like a YOLO model

# Define training hyperparameters optimized for small dataset (~400 images)
hyperparameters = {
    'epochs': 100,          # More epochs for small dataset
    'batch': 8,             # Smaller batch size due to larger model
    'imgsz': 640,           # Image size
    'patience': 20,         # Increased patience for early stopping
    'device': 0,            # Device to use (0 for first GPU)
    'workers': 4,           # Reduced worker threads
    'optimizer': 'AdamW',   # Optimizer
    'lr0': 0.0005,          # Lower initial learning rate for larger model
    'lrf': 0.01,            # Final learning rate factor
    'momentum': 0.937,      # SGD momentum
    'weight_decay': 0.001,  # Increased weight decay to prevent overfitting
    'warmup_epochs': 5.0,   # Longer warmup
    'warmup_momentum': 0.8, # Warmup momentum
    'warmup_bias_lr': 0.1,  # Warmup bias lr
    'box': 7.5,             # Box loss gain
    'cls': 0.5,             # Class loss gain
    'hsv_h': 0.015,         # Image HSV-Hue augmentation
    'hsv_s': 0.7,           # Image HSV-Saturation augmentation
    'hsv_v': 0.4,           # Image HSV-Value augmentation
    'translate': 0.2,       # Increased translation for better augmentation
    'scale': 0.6,           # Increased scale variation
    'fliplr': 0.5,          # Image flip left-right probability
    'flipud': 0.2,          # Add up-down flipping
    'mosaic': 1.0,          # Maximize mosaic augmentation
    'mixup': 0.15,          # Add mixup augmentation
    'copy_paste': 0.1,      # Add copy-paste augmentation
}

# Create model results directory
results_dir = os.path.join(os.getcwd(), "rtdetr_x_results")
os.makedirs(results_dir, exist_ok=True)

# Train the model
results = model.train(
    data=yaml_path,
    project=results_dir,
    name='fine_tuned_model',
    exist_ok=True,
    **hyperparameters
)

print(f"Training completed. Model saved to: {os.path.join(results_dir, 'fine_tuned_model')}")

## 5. Model Inference and Evaluation

Now, let's evaluate the fine-tuned model on the test set and calculate performance metrics.

In [None]:
# Load the fine-tuned model
fine_tuned_model_path = os.path.join(results_dir, 'fine_tuned_model', 'weights', 'best.pt')
model = RTDETR(fine_tuned_model_path)

# Run validation on the test set
test_results = model.val(
    data=yaml_path,
    split='test',  # Use the test split
    imgsz=640,
    batch=8,      # Smaller batch size for evaluation due to model size
    verbose=True,
    conf=0.25,    # Confidence threshold
    iou=0.5,      # IoU threshold
    project=results_dir,
    name='evaluation',
    exist_ok=True
)

print("Test results summary:")
print(f"mAP50: {test_results.box.map50:.5f}")
print(f"mAP50-95: {test_results.box.map:.5f}")
print(f"Precision: {test_results.box.mp:.5f}")
print(f"Recall: {test_results.box.mr:.5f}")

## 6. Detailed Analysis per Class

Let's analyze the model performance for each class separately.

In [None]:
# Get class-wise metrics from the validation results
class_map = test_results.names  # Class index to name mapping

# Access class metrics correctly from test_results.box
# The DetMetrics object doesn't have a 'metrics' attribute as per the error
class_precisions = test_results.box.p  # Class precisions
class_recalls = test_results.box.r     # Class recalls
ap50_per_class = test_results.box.ap50  # AP50 per class
ap_per_class = test_results.box.ap      # AP50-95 per class

# Create a DataFrame for better visualization
metrics_df = pd.DataFrame({
    'Class': [class_map[i] for i in range(len(class_map))],
    'AP50': ap50_per_class,
    'AP50-95': ap_per_class,
    'Precision': class_precisions,
    'Recall': class_recalls
})

display(metrics_df)

# Plot AP50 for each class
plt.figure(figsize=(12, 6))
sns.barplot(x='Class', y='AP50', data=metrics_df)
plt.title('AP50 for Each Class')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

## 7. Confusion Matrix

The confusion matrix helps us see how well the model differentiates between different classes.

In [None]:
# Plot confusion matrix
conf_matrix = test_results.confusion_matrix.matrix
plt.figure(figsize=(12, 10))
sns.heatmap(
    conf_matrix / np.sum(conf_matrix, axis=1)[:, None],  # Normalize by row (true classes)
    annot=True,
    fmt='.2f',
    cmap='Blues',
    xticklabels=list(class_map.values()),
    yticklabels=list(class_map.values())
)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Normalized Confusion Matrix')
plt.tight_layout()
plt.show()

## 8. Visualizing Detection Results on Test Images

Let's visualize some predictions on test images:

In [None]:
# Get list of test images
test_images_dir = os.path.join(TEST_DIR, 'images')
test_images = list(Path(test_images_dir).glob('*.jpg')) + list(Path(test_images_dir).glob('*.png'))
test_images = [str(img) for img in test_images]

# Select random images for visualization
if len(test_images) > 0:
    sample_images = random.sample(test_images, min(5, len(test_images)))
    
    for img_path in sample_images:
        # Run inference
        results = model(img_path, conf=0.25)
        
        # Display results
        for result in results:
            fig, ax = plt.subplots(1, 1, figsize=(12, 9))
            img = result.orig_img
            
            # Plot detections
            for box, conf, cls in zip(result.boxes.xyxy, result.boxes.conf, result.boxes.cls):
                x1, y1, x2, y2 = box.cpu().numpy().astype(int)
                class_id = int(cls.item())
                class_name = class_map[class_id]
                confidence = conf.item()
                
                # Draw bounding box
                cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
                
                # Add label
                label = f"{class_name}: {confidence:.2f}"
                cv2.putText(img, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
            
            ax.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
            ax.set_title(f"Predictions on {os.path.basename(img_path)}")
            ax.axis("off")
            plt.tight_layout()
            plt.show()
else:
    print("No test images found")

## 9. Inference Speed Analysis

Let's evaluate the inference speed of RT-DETR-X.

In [None]:
# Function to measure inference time
def measure_inference_time(model, image_path, num_runs=50):
    img = cv2.imread(image_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Warmup
    for _ in range(10):
        _ = model(img)
    
    # Measure inference time
    times = []
    for _ in range(num_runs):
        start_time = time.time()
        _ = model(img)
        inference_time = (time.time() - start_time) * 1000  # Convert to ms
        times.append(inference_time)
    
    return {
        'mean_time': np.mean(times),
        'std_time': np.std(times),
        'min_time': np.min(times),
        'max_time': np.max(times),
        'times': times
    }

# Measure inference time if test images are available
if len(test_images) > 0:
    sample_image = test_images[0]
    timing_results = measure_inference_time(model, sample_image)
    
    print(f"Inference Speed Analysis for RT-DETR-X:")
    print(f"Mean inference time: {timing_results['mean_time']:.2f} ms")
    print(f"Standard deviation: {timing_results['std_time']:.2f} ms")
    print(f"Min inference time: {timing_results['min_time']:.2f} ms")
    print(f"Max inference time: {timing_results['max_time']:.2f} ms")
    
    # Plot histogram of inference times
    plt.figure(figsize=(10, 6))
    plt.hist(timing_results['times'], bins=20, alpha=0.7, color='blue')
    plt.axvline(timing_results['mean_time'], color='red', linestyle='dashed', linewidth=2, label=f"Mean: {timing_results['mean_time']:.2f} ms")
    plt.title('RT-DETR-X Inference Time Distribution')
    plt.xlabel('Inference Time (ms)')
    plt.ylabel('Frequency')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()
else:
    print("No test images available for inference speed analysis")

## 10. Comparative Analysis with YOLO Models

Let's compare the RT-DETR-X performance with YOLO models.

In [None]:
# Create a DataFrame for model comparison
comparison_data = {
    'Model': ['YOLOv8x', 'YOLOv11x', 'YOLOv12x', 'RT-DETR-R101', 'RT-DETR-X'],
    'mAP50': [0.0, 0.0, 0.0, 0.0, 0.0],  # Placeholder values
    'mAP50-95': [0.0, 0.0, 0.0, 0.0, 0.0],
    'Precision': [0.0, 0.0, 0.0, 0.0, 0.0],
    'Recall': [0.0, 0.0, 0.0, 0.0, 0.0],
    'Inference Time (ms/img)': [0.0, 0.0, 0.0, 0.0, 0.0]
}

# Try to populate with actual values if available
try:
    # Current model results
    comparison_data['mAP50'][4] = test_results.box.map50
    comparison_data['mAP50-95'][4] = test_results.box.map
    comparison_data['Precision'][4] = test_results.box.mp
    comparison_data['Recall'][4] = test_results.box.mr
    comparison_data['Inference Time (ms/img)'][4] = timing_results['mean_time']
    
    # Look for results from other models (this is just a placeholder)
    other_result_dirs = [
        "../yolov8x_results/evaluation",
        "../yolov11x_results/evaluation", 
        "../yolov12x_results/evaluation",
        "../rtdetr_r101_results/evaluation"
    ]
    
    # In a real scenario, you would load the actual results from saved files
    # This is just a placeholder with example values
    example_values = {
        'mAP50': [0.85, 0.86, 0.87, 0.84],
        'mAP50-95': [0.65, 0.67, 0.68, 0.66],
        'Precision': [0.82, 0.83, 0.84, 0.81],
        'Recall': [0.80, 0.81, 0.82, 0.79],
        'Inference Time (ms/img)': [12.5, 13.2, 13.8, 15.6]
    }
    
    for i in range(4):
        comparison_data['mAP50'][i] = example_values['mAP50'][i]
        comparison_data['mAP50-95'][i] = example_values['mAP50-95'][i]
        comparison_data['Precision'][i] = example_values['Precision'][i]
        comparison_data['Recall'][i] = example_values['Recall'][i]
        comparison_data['Inference Time (ms/img)'][i] = example_values['Inference Time (ms/img)'][i]
    
    # Create and display the comparison DataFrame
    comparison_df = pd.DataFrame(comparison_data)
    display(comparison_df)
    
    # Plot mAP comparison
    plt.figure(figsize=(12, 6))
    sns.barplot(x='Model', y='mAP50', data=comparison_df)
    plt.title('mAP50 Comparison Across Models')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
    
    # Plot inference time comparison
    plt.figure(figsize=(12, 6))
    sns.barplot(x='Model', y='Inference Time (ms/img)', data=comparison_df)
    plt.title('Inference Time Comparison Across Models')
    plt.xticks(rotation=45, ha='right')
    plt.tight_layout()
    plt.show()
    
    # Create a scatter plot to visualize the trade-off between accuracy and speed
    plt.figure(figsize=(12, 8))
    sns.scatterplot(x='Inference Time (ms/img)', y='mAP50', data=comparison_df, s=100)
    
    # Add labels to each point
    for i, model in enumerate(comparison_df['Model']):
        plt.annotate(model, 
                    (comparison_df['Inference Time (ms/img)'][i], comparison_df['mAP50'][i]),
                    xytext=(5, 5), textcoords='offset points')
    
    plt.title('Accuracy vs. Speed Trade-off')
    plt.xlabel('Inference Time (ms) - lower is better')
    plt.ylabel('mAP50 - higher is better')
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
    
except Exception as e:
    print(f"Couldn't create comparison: {e}")
    print("Run evaluations for all models to enable proper comparison.")

## 11. Export Model for Deployment

Let's save our fine-tuned model in different formats for deployment.

In [None]:
# Export model to different formats
export_path = os.path.join(results_dir, "exported_models")
os.makedirs(export_path, exist_ok=True)

# Export to ONNX format
model.export(format="onnx", imgsz=640)

print(f"Models exported to {export_path}")
print("Available formats:")
for file in os.listdir(export_path):
    print(f"- {file}")

## 12. Summary and Conclusion

We have successfully:
1. Fine-tuned RT-DETR-X on a custom dataset
2. Evaluated its performance on the test set
3. Analyzed per-class metrics and visualized results
4. Measured inference speed
5. Compared performance with YOLO models
6. Exported the model for deployment

Key metrics:
- mAP50: How accurate the model is at IoU threshold of 0.5
- mAP50-95: How accurate the model is across multiple IoU thresholds
- Precision: How many of the predicted detections are correct
- Recall: How many of the ground truth objects are detected
- Inference Speed: How fast the model processes images

RT-DETR-X is the extra-large variant of the RT-DETR series, offering potentially higher accuracy at the cost of increased computational requirements. This model is suitable for applications where accuracy is more important than real-time performance, or when using more powerful hardware.

The transformer-based architecture in RT-DETR provides improved feature extraction and modeling of relationships between objects in the image, which may lead to better performance in complex traffic scenes with multiple overlapping vehicles.

To improve results further, consider:
- Fine-tuning with a balanced class distribution
- Using a smaller variant if faster inference is needed
- Implementing TensorRT or other optimizations for deployment