# Notebook 3: Object Detection Demo with YOLOv8

**Session 1: AI-based Perception Systems in Autonomous Vehicles**

**Author:** Milin Patel  
**Duration:** ~20 minutes

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- ‚úÖ Understand how deep learning object detection works
- ‚úÖ Run YOLOv8 on driving scenes
- ‚úÖ Interpret confidence scores and bounding boxes
- ‚úÖ Compare different detection models
- ‚úÖ Analyze detection performance and failure cases
- ‚úÖ Perform real-time inference

---

## üì¶ Setup and Imports

In [None]:
# Import required libraries
import torch
import cv2
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from ultralytics import YOLO
import time
import requests
from io import BytesIO
import warnings
warnings.filterwarnings('ignore')

# Check PyTorch and CUDA
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

# Set device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"\nUsing device: {device}")

print("\n‚úÖ All libraries imported successfully!")

---

## 1Ô∏è‚É£ Understanding Object Detection

### What is Object Detection?

**Input:** An image (e.g., 1920√ó1080 pixels, 3 color channels)  
**Output:** List of detected objects with:
- **Bounding box:** [x, y, width, height]
- **Class label:** "car", "person", "bicycle", etc.
- **Confidence score:** 0.0 to 1.0 (0% to 100%)

### Example Output:
```python
[
    {"class": "car", "bbox": [100, 200, 150, 80], "confidence": 0.95},
    {"class": "person", "bbox": [500, 300, 50, 120], "confidence": 0.87},
    {"class": "bicycle", "bbox": [800, 250, 80, 100], "confidence": 0.72}
]
```

### Popular Architectures:
- **YOLO (You Only Look Once):** Fast, real-time (50-150 FPS)
- **Faster R-CNN:** High accuracy, slower (5-10 FPS)
- **SSD (Single Shot Detector):** Balance of speed and accuracy
- **Modern:** EfficientDet, DETR (Transformer-based)

**Today's focus:** YOLOv8 - state-of-the-art speed and accuracy!

---

## 2Ô∏è‚É£ Load YOLOv8 Model

In [None]:
# Load YOLOv8 model
print("Loading YOLOv8n (nano - fastest) model...")
model = YOLO('yolov8n.pt')  # Automatically downloads if not present

# Move model to device
model.to(device)

print(f"‚úÖ Model loaded on {device}")
print(f"\nModel info:")
print(f"  - Input size: 640x640")
print(f"  - Classes: {len(model.names)} (COCO dataset)")
print(f"  - Parameters: ~3.2M")
print(f"\nSample classes: {list(model.names.values())[:15]}...")

### COCO Dataset Classes

YOLOv8 is trained on the **COCO (Common Objects in Context)** dataset with **80 object classes**.

Relevant for autonomous driving:
- **Vehicles:** car, truck, bus, motorcycle, bicycle
- **People:** person
- **Traffic:** traffic light, stop sign, parking meter
- **Animals:** dog, cat, horse, etc. (road hazards!)

**Note:** COCO is not driving-specific. For production AVs, models are trained on specialized datasets (nuScenes, Waymo, etc.)

---

## 3Ô∏è‚É£ Run Detection on Sample Images

Let's test on various driving scenarios!

In [None]:
# Helper function to download and display image
def load_image_from_url(url):
    """Load image from URL"""
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    return np.array(img)

# Sample driving scene URLs (free, no copyright issues)
test_images = {
    "Urban Scene": "https://images.unsplash.com/photo-1449824913935-59a10b8d2000?w=800",
    "Highway Traffic": "https://images.unsplash.com/photo-1501594907352-04cda38ebc29?w=800",
    "Pedestrian Crossing": "https://images.unsplash.com/photo-1514565131-fce0801e5785?w=800",
}

print("üì• Test images ready!")
print("   Note: If images fail to load, check internet connection or use local images.")

In [None]:
# Function to run detection and visualize
def detect_and_visualize(image, conf_threshold=0.25, title="Detection Results"):
    """
    Run YOLOv8 detection and visualize results
    
    Args:
        image: numpy array (RGB)
        conf_threshold: minimum confidence score
        title: plot title
    """
    # Run inference
    start_time = time.time()
    results = model.predict(image, conf=conf_threshold, verbose=False)[0]
    inference_time = (time.time() - start_time) * 1000  # ms
    
    # Get detections
    boxes = results.boxes.xyxy.cpu().numpy()  # [x1, y1, x2, y2]
    confidences = results.boxes.conf.cpu().numpy()
    class_ids = results.boxes.cls.cpu().numpy().astype(int)
    
    # Draw bounding boxes
    img_with_boxes = image.copy()
    for box, conf, cls_id in zip(boxes, confidences, class_ids):
        x1, y1, x2, y2 = box.astype(int)
        label = model.names[cls_id]
        
        # Color based on class
        if label in ['person']:
            color = (255, 0, 0)  # Red for pedestrians
        elif label in ['car', 'truck', 'bus']:
            color = (0, 255, 0)  # Green for vehicles
        elif label in ['bicycle', 'motorcycle']:
            color = (255, 165, 0)  # Orange for bikes
        else:
            color = (0, 0, 255)  # Blue for others
        
        # Draw box and label
        cv2.rectangle(img_with_boxes, (x1, y1), (x2, y2), color, 3)
        text = f"{label}: {conf:.2f}"
        cv2.putText(img_with_boxes, text, (x1, y1-10), 
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    axes[0].imshow(image)
    axes[0].set_title("Original Image", fontsize=14, fontweight='bold')
    axes[0].axis('off')
    
    axes[1].imshow(img_with_boxes)
    axes[1].set_title(f"{title}\nDetections: {len(boxes)}, Inference: {inference_time:.1f}ms", 
                      fontsize=14, fontweight='bold')
    axes[1].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Print detection details
    print(f"\nüìä Detection Summary:")
    print(f"   - Total objects detected: {len(boxes)}")
    print(f"   - Inference time: {inference_time:.1f} ms")
    print(f"   - FPS: {1000/inference_time:.1f}")
    print(f"\n   Detected objects:")
    for label, conf in zip([model.names[c] for c in class_ids], confidences):
        print(f"      - {label}: confidence {conf:.3f} ({conf*100:.1f}%)")
    
    return results

print("‚úÖ Detection function ready!")

### Test 1: Urban Driving Scene

In [None]:
# Load and detect urban scene
print("Testing on: Urban Scene")
try:
    img = load_image_from_url(test_images["Urban Scene"])
    results = detect_and_visualize(img, conf_threshold=0.4, title="Urban Scene Detection")
except Exception as e:
    print(f"‚ùå Error loading image: {e}")
    print("   üí° Tip: Use a local image instead or check internet connection")

---

## 4Ô∏è‚É£ Confidence Threshold Analysis

The **confidence threshold** controls the trade-off between:
- **Precision:** How many detections are correct?
- **Recall:** How many actual objects are detected?

Let's experiment!

In [None]:
# Test different confidence thresholds
def compare_confidence_thresholds(image, thresholds=[0.25, 0.5, 0.75]):
    """
    Compare detection results with different confidence thresholds
    """
    fig, axes = plt.subplots(1, len(thresholds), figsize=(18, 5))
    
    for idx, threshold in enumerate(thresholds):
        # Run detection
        results = model.predict(image, conf=threshold, verbose=False)[0]
        
        # Visualize
        img_with_boxes = image.copy()
        boxes = results.boxes.xyxy.cpu().numpy()
        confidences = results.boxes.conf.cpu().numpy()
        class_ids = results.boxes.cls.cpu().numpy().astype(int)
        
        for box, conf, cls_id in zip(boxes, confidences, class_ids):
            x1, y1, x2, y2 = box.astype(int)
            label = model.names[cls_id]
            color = (0, 255, 0) if conf > 0.7 else (255, 165, 0) if conf > 0.5 else (255, 0, 0)
            cv2.rectangle(img_with_boxes, (x1, y1), (x2, y2), color, 2)
            cv2.putText(img_with_boxes, f"{label}:{conf:.2f}", (x1, y1-5),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
        
        axes[idx].imshow(img_with_boxes)
        axes[idx].set_title(f"Threshold: {threshold}\nDetections: {len(boxes)}", 
                            fontsize=12, fontweight='bold')
        axes[idx].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("\nüí° Observations:")
    print("   - Lower threshold ‚Üí more detections (higher recall, lower precision)")
    print("   - Higher threshold ‚Üí fewer detections (higher precision, lower recall)")
    print("   - For safety-critical systems: Need to balance both!")

# Run comparison
try:
    compare_confidence_thresholds(img)
except:
    print("Use your own image for this comparison")

---

## 5Ô∏è‚É£ Real-Time Inference Benchmarking

For autonomous driving, **real-time performance** is crucial!

Target: **30-60 FPS** (33 ms - 16 ms per frame)

In [None]:
# Benchmark inference speed
def benchmark_inference(model_name='yolov8n.pt', num_runs=50, img_size=640):
    """
    Benchmark YOLOv8 inference speed
    """
    print(f"\n‚è±Ô∏è Benchmarking {model_name} on {device}...")
    
    # Load model
    test_model = YOLO(model_name)
    test_model.to(device)
    
    # Create dummy image
    dummy_img = np.random.randint(0, 255, (img_size, img_size, 3), dtype=np.uint8)
    
    # Warm-up runs
    for _ in range(10):
        _ = test_model.predict(dummy_img, verbose=False)
    
    # Benchmark
    times = []
    for _ in range(num_runs):
        start = time.time()
        _ = test_model.predict(dummy_img, verbose=False)
        if device == 'cuda':
            torch.cuda.synchronize()  # Wait for GPU
        times.append((time.time() - start) * 1000)  # ms
    
    # Statistics
    mean_time = np.mean(times)
    std_time = np.std(times)
    fps = 1000 / mean_time
    
    print(f"\nüìä Results ({num_runs} runs):")
    print(f"   - Mean inference time: {mean_time:.2f} ¬± {std_time:.2f} ms")
    print(f"   - Throughput: {fps:.1f} FPS")
    print(f"   - Min time: {min(times):.2f} ms")
    print(f"   - Max time: {max(times):.2f} ms")
    
    if fps >= 30:
        print(f"   ‚úÖ Real-time capable! (>30 FPS)")
    else:
        print(f"   ‚ö†Ô∏è Below real-time threshold (<30 FPS)")
    
    return mean_time, fps

# Run benchmark
mean_time, fps = benchmark_inference()

---

## 6Ô∏è‚É£ Exercise: Analyze Failure Cases

**Task:** Upload or use your own driving scene images and identify:
1. **False positives:** Detected objects that don't exist
2. **False negatives:** Objects that exist but weren't detected
3. **Misclassifications:** Wrong label assigned

**Think:** Why did these failures occur?

In [None]:
# TODO: Test your own images
# Option 1: Load local image
# your_image = cv2.cvtColor(cv2.imread('path/to/your/image.jpg'), cv2.COLOR_BGR2RGB)

# Option 2: Load from URL
# your_image = load_image_from_url('your_image_url')

# Run detection
# detect_and_visualize(your_image, conf_threshold=0.3, title="Your Test Image")

print("üí° Upload challenging images:")
print("   - Night scenes")
print("   - Rain/fog")
print("   - Occlusions")
print("   - Unusual objects")
print("\n   Observe where YOLO fails!")

---

## üéØ Key Takeaways

### Object Detection for AVs
- **Task:** Detect and classify objects in camera images
- **Output:** Bounding boxes + class labels + confidence scores
- **State-of-the-art:** YOLOv8 achieves real-time performance

### Performance Considerations
- **Speed:** YOLOv8n runs at 50-150+ FPS (real-time capable!)
- **Accuracy:** Trade-off with model size (nano vs small vs medium)
- **Confidence:** Threshold controls precision-recall trade-off

### Limitations Observed
- **Weather:** Performance degrades in rain, fog, snow
- **Lighting:** Night scenes are challenging
- **Occlusions:** Partially hidden objects missed
- **Unusual objects:** Not in training data ‚Üí not detected

### Safety Implications
- **False negatives:** Missing a pedestrian ‚Üí collision!
- **False positives:** Emergency brake for phantom object ‚Üí rear-end collision
- **Confidence != certainty:** High confidence can still be wrong

**Next session:** We'll analyze real accident cases and failure modes in depth!

---

## üîú Next Steps

1. **Notebook 4:** Explore autonomous driving datasets (KITTI, nuScenes)
2. **Notebook 5:** Learn sensor fusion (camera + LiDAR + radar)
3. **Notebook 6:** Pedestrian detection case study

**Then in Session 2:** Analyze why these systems fail!

---

*Notebook created by Milin Patel | Hochschule Kempten*  
*Last updated: 2025-01-17*