# Fusion Model Inference for VIP Cup 2025

This notebook demonstrates how to use the updated inference code with fusion models that combine RGB and IR data for improved detection and tracking performance.

## 1. Import Required Libraries

Import necessary libraries including cv2, numpy, matplotlib, and the custom models for multiscale detection and tracking.

In [None]:
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import pandas as pd

# Import custom models and utilities
from inference_detection_tracking import SubmissionGenerator
from models import multiscale_model as multiscale
from models import track_model
from src.utils import visualize_tracking_video, visualize_detection_image

print("Libraries imported successfully!")

## 2. Initialize Fusion Detection Model

Set up the multiscale detection model with fusion capabilities, specifying model path, confidence and IoU thresholds, and device configuration.

In [None]:
# Model configuration
model_path = "checkpoints/fusion.pt"
team_name = "V-Linsight"
conf_threshold = 0.2
iou_threshold = 0.1

# Initialize the fusion submission generator
fusion_generator = SubmissionGenerator(
    model_path=model_path,
    modality='FUSION',
    team_name=team_name,
    conf_threshold=conf_threshold,
    iou_threshold=iou_threshold,
    is_visualize=True
)

print(f"Fusion model initialized successfully!")
print(f"Model path: {model_path}")
print(f"Confidence threshold: {conf_threshold}")
print(f"IoU threshold: {iou_threshold}")

## 3. Load and Preprocess RGB and IR Video Data

Load RGB and IR video pairs from the validation dataset, implement file matching logic to pair corresponding RGB and IR videos.

In [None]:
# Data paths
data_folder = "data/Validation_Videos"
rgb_folder = Path(data_folder) / "RGB"
ir_folder = Path(data_folder) / "IR"

print(f"RGB folder exists: {rgb_folder.exists()}")
print(f"IR folder exists: {ir_folder.exists()}")

# Find RGB and IR video pairs
rgb_videos = list(rgb_folder.glob("*.mp4")) if rgb_folder.exists() else []
ir_videos = list(ir_folder.glob("*.mp4")) if ir_folder.exists() else []

print(f"\nFound {len(rgb_videos)} RGB videos")
print(f"Found {len(ir_videos)} IR videos")

# Display first few video pairs
if rgb_videos:
    print("\nFirst few RGB videos:")
    for i, video in enumerate(rgb_videos[:3]):
        print(f"  {i+1}. {video.name}")
        
        # Find matching IR video
        rgb_name = video.stem
        matching_ir = fusion_generator.find_matching_ir_file(str(video), data_folder)
        if matching_ir:
            print(f"     -> Matching IR: {Path(matching_ir).name}")
        else:
            print(f"     -> No matching IR found")

## 4. Configure Model Parameters

Set up detection parameters including confidence threshold, IoU threshold, and other model-specific configurations for optimal fusion performance.

In [None]:
# Detection parameters
detection_params = {
    'scales': [1.0],
    'crop_ratio': 0.65,
    'weights': [
        0.75,  # original image
        1.0,   # top-left
        1.0,   # top-right
        1.0,   # bottom-left
        1.0,   # bottom-right
        2.0    # center
    ],
    'conf_threshold': conf_threshold,
    'iou_threshold': iou_threshold
}

# Post-processing parameters for tracking
tracking_params = {
    'min_detection_frames': 4,    # Minimum frames to keep detection
    'max_missing_frames': 5       # Maximum consecutive missing frames
}

print("Detection Parameters:")
for key, value in detection_params.items():
    print(f"  {key}: {value}")

print("\nTracking Parameters:")
for key, value in tracking_params.items():
    print(f"  {key}: {value}")

## 5. Process Video Pairs for Fusion Detection

Implement video processing pipeline that handles both RGB and IR inputs simultaneously, perform fusion detection on video pairs.

In [None]:
# Select first video pair for demonstration
if rgb_videos:
    rgb_video_path = str(rgb_videos[0])
    ir_video_path = fusion_generator.find_matching_ir_file(rgb_video_path, data_folder)
    
    print(f"Processing video pair:")
    print(f"RGB: {Path(rgb_video_path).name}")
    print(f"IR:  {Path(ir_video_path).name if ir_video_path else 'Not found'}")
    
    # Process single video pair
    if ir_video_path:
        print(f"\nProcessing fusion video...")
        video_results = fusion_generator.process_video(rgb_video_path, ir_video_path)
        print(f"Generated {len(video_results)} detection results")
        
        # Display first few results
        if video_results:
            print(f"\nFirst 3 detection results:")
            for i, result in enumerate(video_results[:3]):
                print(f"Result {i+1}:")
                print(f"  Frame: {result['Frame_name']}")
                print(f"  Class: {result['class_label']}")
                print(f"  Track ID: {result['track_id']}")
                print(f"  Confidence: {result['confidence_detection']:.3f}")
                print(f"  BBox: ({result['x_min_norm']:.3f}, {result['y_min_norm']:.3f}, {result['x_max_norm']:.3f}, {result['y_max_norm']:.3f})")
    else:
        print("No IR video found, processing RGB only...")
        video_results = fusion_generator.process_video(rgb_video_path, None)
        print(f"Generated {len(video_results)} detection results")
else:
    print("No RGB videos found!")

## 6. Implement Tracking with Fusion Model

Create tracking pipeline using the fusion detection model, process video sequences and generate tracking results with track IDs and motion analysis.

In [None]:
# Analyze tracking results if available
if 'video_results' in locals() and video_results:
    # Group results by frame for analysis
    frames_dict = {}
    for result in video_results:
        frame_name = result['Frame_name']
        if frame_name not in frames_dict:
            frames_dict[frame_name] = []
        frames_dict[frame_name].append(result)
    
    print(f"Tracking Analysis:")
    print(f"Total frames processed: {len(frames_dict)}")
    
    # Analyze track IDs
    track_ids = set()
    class_counts = {}
    for result in video_results:
        track_ids.add(result['track_id'])
        class_label = result['class_label']
        class_counts[class_label] = class_counts.get(class_label, 0) + 1
    
    print(f"Unique track IDs: {sorted(track_ids)}")
    print(f"Class distribution: {class_counts}")
    
    # Analyze confidence scores
    confidences = [float(result['confidence_detection']) for result in video_results]
    if confidences:
        print(f"Confidence scores:")
        print(f"  Mean: {np.mean(confidences):.3f}")
        print(f"  Min:  {np.min(confidences):.3f}")
        print(f"  Max:  {np.max(confidences):.3f}")
    
    # Plot confidence distribution
    plt.figure(figsize=(10, 4))
    
    plt.subplot(1, 2, 1)
    plt.hist(confidences, bins=20, alpha=0.7)
    plt.xlabel('Confidence Score')
    plt.ylabel('Frequency')
    plt.title('Detection Confidence Distribution')
    plt.grid(True, alpha=0.3)
    
    plt.subplot(1, 2, 2)
    frame_counts = [len(frames_dict[frame]) for frame in sorted(frames_dict.keys())]
    plt.plot(frame_counts, marker='o', alpha=0.7)
    plt.xlabel('Frame Index')
    plt.ylabel('Number of Detections')
    plt.title('Detections per Frame')
    plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 7. Generate Results and Visualizations

Create visualizations for both RGB and IR tracking results, generate detection confidence scores and timing metrics.

In [None]:
# Generate visualizations (if videos were processed)
if 'rgb_video_path' in locals() and rgb_video_path:
    print("Generating visualization videos...")
    
    # Create output directory
    os.makedirs("outputs", exist_ok=True)
    
    # Note: Visualizations are already generated if is_visualize=True in the generator
    # Check if visualization files exist
    rgb_viz_path = f"videos/visualized_rgb_{os.path.basename(rgb_video_path)}"
    ir_viz_path = f"videos/visualized_ir_{os.path.basename(ir_video_path)}" if ir_video_path else None
    
    if os.path.exists(rgb_viz_path):
        print(f"RGB visualization saved to: {rgb_viz_path}")
    else:
        print("RGB visualization not found (check if visualize=True)")
    
    if ir_viz_path and os.path.exists(ir_viz_path):
        print(f"IR visualization saved to: {ir_viz_path}")
    elif ir_video_path:
        print("IR visualization not found (check if visualize=True)")
    
    # Create performance metrics visualization
    if 'video_results' in locals() and video_results:
        inference_times = [float(result['inference_time_detection (ms)']) for result in video_results]
        
        plt.figure(figsize=(12, 4))
        
        plt.subplot(1, 3, 1)
        plt.hist(inference_times, bins=15, alpha=0.7, color='blue')
        plt.xlabel('Inference Time (ms)')
        plt.ylabel('Frequency')
        plt.title('Inference Time Distribution')
        plt.grid(True, alpha=0.3)
        
        plt.subplot(1, 3, 2)
        # Track length analysis
        track_lengths = {}
        for result in video_results:
            track_id = result['track_id']
            if track_id not in track_lengths:
                track_lengths[track_id] = 0
            track_lengths[track_id] += 1
        
        lengths = list(track_lengths.values())
        plt.hist(lengths, bins=10, alpha=0.7, color='green')
        plt.xlabel('Track Length (frames)')
        plt.ylabel('Number of Tracks')
        plt.title('Track Length Distribution')
        plt.grid(True, alpha=0.3)
        
        plt.subplot(1, 3, 3)
        # Confidence vs time
        confidences = [float(result['confidence_detection']) for result in video_results]
        plt.plot(confidences, alpha=0.7, color='red')
        plt.xlabel('Detection Index')
        plt.ylabel('Confidence Score')
        plt.title('Confidence Over Time')
        plt.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Print performance summary
        print(f"\nPerformance Summary:")
        print(f"Average inference time: {np.mean(inference_times):.2f} ms")
        print(f"Total detections: {len(video_results)}")
        print(f"Average track length: {np.mean(lengths):.1f} frames")
        print(f"Total unique tracks: {len(track_lengths)}")

## 8. Export Submission CSV

Format results according to submission requirements, export normalized coordinates, confidence scores, and timing information to CSV format.

In [None]:
# Process full dataset and generate submission CSV
print("Processing full dataset for submission...")

# Output path for submission
output_csv_path = f"submissions/{team_name}_FUSION_submission.csv"

# Process the entire validation dataset
fusion_generator.process_dataset(data_folder, output_csv_path)

# Read and analyze the generated CSV
if os.path.exists(output_csv_path):
    df = pd.read_csv(output_csv_path)
    
    print(f"\nSubmission CSV Analysis:")
    print(f"Total records: {len(df)}")
    print(f"Columns: {list(df.columns)}")
    
    # Analyze FPS columns
    if 'FPS (GPU)' in df.columns and 'FPS (CPU)' in df.columns:
        gpu_fps_avg = df['FPS (GPU)'].astype(float).mean()
        cpu_fps_avg = df['FPS (CPU)'].astype(float).mean()
        
        print(f"\nFPS Analysis:")
        print(f"Average GPU FPS: {gpu_fps_avg:.2f}")
        print(f"Average CPU FPS: {cpu_fps_avg:.2f}")
        print(f"GPU speedup: {gpu_fps_avg/cpu_fps_avg:.2f}x")
        
        # Visualize FPS comparison
        plt.figure(figsize=(12, 4))
        
        plt.subplot(1, 3, 1)
        plt.hist(df['FPS (GPU)'].astype(float), bins=20, alpha=0.7, label='GPU', color='green')
        plt.hist(df['FPS (CPU)'].astype(float), bins=20, alpha=0.7, label='CPU', color='red')
        plt.xlabel('FPS')
        plt.ylabel('Frequency')
        plt.title('FPS Distribution Comparison')
        plt.legend()
        plt.grid(True, alpha=0.3)
        
        plt.subplot(1, 3, 2)
        plt.boxplot([df['FPS (GPU)'].astype(float), df['FPS (CPU)'].astype(float)], 
                   labels=['GPU', 'CPU'])
        plt.ylabel('FPS')
        plt.title('FPS Box Plot Comparison')
        plt.grid(True, alpha=0.3)
        
        plt.subplot(1, 3, 3)
        speedup = df['FPS (GPU)'].astype(float) / df['FPS (CPU)'].astype(float)
        plt.hist(speedup, bins=20, alpha=0.7, color='blue')
        plt.xlabel('Speedup Factor (GPU/CPU)')
        plt.ylabel('Frequency')
        plt.title('GPU Speedup Distribution')
        plt.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    # Display sample of the CSV
    print(f"\nFirst 5 rows of submission CSV:")
    print(df.head())
    
    # Class distribution
    if 'class_label' in df.columns:
        class_dist = df['class_label'].value_counts()
        print(f"\nClass distribution:")
        print(class_dist)
    
    print(f"\nSubmission CSV saved to: {output_csv_path}")
else:
    print("Failed to generate submission CSV!")

## Summary

This notebook demonstrated how to use the fusion model for VIP Cup 2025 submission generation with the following key features:

### ✅ **Completed Features:**
1. **Fusion Model Integration** - Successfully combines RGB and IR data
2. **Dual Device Processing** - Runs inference on both GPU and CPU
3. **FPS Benchmarking** - Measures and compares performance on both devices
4. **Enhanced CSV Output** - Includes new FPS (GPU) and FPS (CPU) columns
5. **Automatic File Matching** - Intelligently pairs RGB and IR files
6. **Comprehensive Visualization** - Shows tracking results and performance metrics

### 🚀 **Performance Benefits:**
- **GPU Acceleration** - Significantly faster inference for real-time applications
- **CPU Fallback** - Ensures compatibility across different hardware
- **Benchmarking** - Provides detailed performance comparison data

### 📊 **Output Format:**
The generated CSV now includes:
- All standard submission fields
- **FPS (GPU)** - Frames per second on GPU
- **FPS (CPU)** - Frames per second on CPU
- Enhanced timing and confidence metrics

### 🔧 **Usage:**
```bash
# Run the fusion inference
python inference_detection_tracking.py \
    --data_folder "data/Validation_Videos" \
    --model "checkpoints/fusion.pt" \
    --team_name "V-Linsight" \
    --modality "FUSION" \
    --conf 0.2 \
    --iou 0.1 \
    --visualize
```

This updated implementation provides comprehensive fusion model support with dual-device performance benchmarking for optimal VIP Cup 2025 submissions.