# Training Tiny YOLO on VisDrone Dataset

This notebook walks through the complete process of training a YOLOv8-nano (tiny) model on the VisDrone dataset for drone-based object detection.

## About VisDrone Dataset
VisDrone is a large-scale benchmark dataset for drone-based computer vision tasks, containing:
- 10 object categories (pedestrian, person, car, van, bus, truck, motor, bicycle, awning-tricycle, tricycle)
- Images captured from various drones at different heights and angles
- Challenging scenarios with small objects, occlusion, and crowded scenes

## Step 1: Environment Setup

First, we'll install the Ultralytics library which provides the YOLOv8 implementation. We'll also import necessary libraries for data handling and visualization.

In [None]:
# Install required packages
!pip install ultralytics
!pip install roboflow  # Optional: if downloading from Roboflow

# Import libraries
from ultralytics import YOLO
import torch
import cv2
import matplotlib.pyplot as plt
import os
from pathlib import Path

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

## Step 2: Download and Prepare VisDrone Dataset

The VisDrone dataset needs to be in YOLO format:
- Images in one folder
- Labels in another folder (one .txt file per image)
- Each label line: `class_id center_x center_y width height` (normalized 0-1)

You can download VisDrone from:
- Official site: http://aiskyeye.com/
- Roboflow Universe (already in YOLO format)
- Kaggle

In [None]:
# Option 1: Download from Roboflow (pre-formatted for YOLO)
# You'll need to create a free account at roboflow.com and get your API key

# from roboflow import Roboflow
# rf = Roboflow(api_key="YOUR_API_KEY")
# project = rf.workspace("visdrone").project("visdrone-2019")
# dataset = project.version(1).download("yolov8")

# Option 2: Manual download
# Download from http://aiskyeye.com/ and convert to YOLO format
# Place in the following structure:
# VisDrone/
#   ├── train/
#   │   ├── images/
#   │   └── labels/
#   ├── val/
#   │   ├── images/
#   │   └── labels/
#   └── data.yaml

# Define dataset path
dataset_path = Path('./VisDrone')
print(f"Dataset path: {dataset_path.absolute()}")

: 

## Step 3: Create Dataset Configuration File

YOLO requires a `data.yaml` file that specifies:
- Path to training and validation images
- Number of classes
- Class names

In [None]:
# Create data.yaml for VisDrone dataset
data_yaml_content = """# VisDrone Dataset Configuration
path: ./VisDrone  # dataset root dir
train: train/images  # train images (relative to 'path')
val: val/images  # val images (relative to 'path')

# Classes
nc: 10  # number of classes
names:
  0: pedestrian
  1: people
  2: bicycle
  3: car
  4: van
  5: truck
  6: tricycle
  7: awning-tricycle
  8: bus
  9: motor
"""

# Write to file
with open('visdrone.yaml', 'w') as f:
    f.write(data_yaml_content)
    
print("Created visdrone.yaml configuration file")

## Step 4: Explore the Dataset

Before training, let's visualize some sample images with their annotations to verify the dataset is correctly formatted.

In [None]:
# Function to visualize images with bounding boxes
def visualize_sample(image_path, label_path, class_names):
    """Visualize an image with its YOLO format annotations"""
    img = cv2.imread(str(image_path))
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    h, w = img.shape[:2]
    
    # Read labels
    if os.path.exists(label_path):
        with open(label_path, 'r') as f:
            labels = f.readlines()
        
        # Draw bounding boxes
        for label in labels:
            class_id, x_center, y_center, width, height = map(float, label.strip().split())
            
            # Convert from YOLO format to pixel coordinates
            x1 = int((x_center - width/2) * w)
            y1 = int((y_center - height/2) * h)
            x2 = int((x_center + width/2) * w)
            y2 = int((y_center + height/2) * h)
            
            # Draw rectangle and label
            cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
            cv2.putText(img, class_names[int(class_id)], (x1, y1-10), 
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
    
    return img

# Visualize a few samples
class_names = ['pedestrian', 'people', 'bicycle', 'car', 'van', 
               'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor']

# Get sample images
train_images = list(dataset_path.glob('train/images/*.jpg'))[:3]

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for idx, img_path in enumerate(train_images):
    label_path = str(img_path).replace('images', 'labels').replace('.jpg', '.txt')
    img = visualize_sample(img_path, label_path, class_names)
    axes[idx].imshow(img)
    axes[idx].axis('off')
    axes[idx].set_title(f'Sample {idx+1}')
plt.tight_layout()
plt.show()

## Step 5: Initialize the YOLO Model

We'll use YOLOv8n (nano), which is the smallest and fastest YOLO variant - perfect for learning and quick iterations.

Model variants (from smallest to largest):
- **YOLOv8n** (nano): ~3M parameters, fastest
- YOLOv8s (small): ~11M parameters
- YOLOv8m (medium): ~26M parameters
- YOLOv8l (large): ~44M parameters
- YOLOv8x (xlarge): ~68M parameters, most accurate

In [None]:
# Load a pretrained YOLOv8n model
# This will download the pretrained weights from Ultralytics
model = YOLO('yolov8n.pt')

# Display model information
print(model.info())

## Step 6: Configure Training Parameters

Key training hyperparameters:
- **epochs**: Number of complete passes through the dataset (50-100 is common)
- **imgsz**: Input image size (640 is standard, smaller = faster but less accurate)
- **batch**: Batch size (adjust based on GPU memory; -1 for auto)
- **device**: GPU device (0 for first GPU, 'cpu' for CPU)
- **workers**: Number of dataloader workers
- **optimizer**: SGD or Adam (SGD is default and usually better)
- **lr0**: Initial learning rate
- **augment**: Enable data augmentation

In [None]:
# Training configuration
training_config = {
    'data': 'visdrone.yaml',        # path to data config file
    'epochs': 50,                   # number of epochs to train
    'imgsz': 640,                   # image size (pixels)
    'batch': 16,                    # batch size (adjust based on GPU memory)
    'device': 0,                    # GPU device (0) or 'cpu'
    'workers': 8,                   # number of dataloader workers
    'patience': 10,                 # early stopping patience
    'save': True,                   # save train checkpoints
    'project': 'runs/visdrone',     # project name
    'name': 'yolov8n_exp',          # experiment name
    'exist_ok': True,               # overwrite existing experiment
    'pretrained': True,             # use pretrained weights
    'optimizer': 'SGD',             # optimizer (SGD, Adam, AdamW)
    'verbose': True,                # verbose output
    'seed': 42,                     # random seed for reproducibility
    'lr0': 0.01,                    # initial learning rate
    'lrf': 0.01,                    # final learning rate factor
    'momentum': 0.937,              # SGD momentum
    'weight_decay': 0.0005,         # optimizer weight decay
    'warmup_epochs': 3.0,           # warmup epochs
    'box': 7.5,                     # box loss gain
    'cls': 0.5,                     # class loss gain
    'dfl': 1.5,                     # DFL loss gain
}

print("Training Configuration:")
for key, value in training_config.items():
    print(f"  {key}: {value}")

## Step 7: Train the Model

Now we'll start the training process. This will:
1. Load the pretrained YOLOv8n weights
2. Replace the detection head with one matching our 10 VisDrone classes
3. Train on the VisDrone dataset
4. Save checkpoints and best model
5. Generate training metrics and plots

**Note**: Training can take several hours depending on your hardware and dataset size.

In [None]:
# Start training
results = model.train(**training_config)

print("\nTraining completed!")
print(f"Results saved to: {results.save_dir}")

## Step 8: Evaluate the Model

After training, we'll evaluate the model on the validation set to get metrics like:
- **mAP@0.5**: Mean Average Precision at IoU threshold 0.5
- **mAP@0.5:0.95**: Mean Average Precision at IoU thresholds from 0.5 to 0.95
- **Precision**: True positives / (True positives + False positives)
- **Recall**: True positives / (True positives + False negatives)

In [None]:
# Validate the model
metrics = model.val()

# Print key metrics
print("\nValidation Metrics:")
print(f"mAP@0.5: {metrics.box.map50:.4f}")
print(f"mAP@0.5:0.95: {metrics.box.map:.4f}")
print(f"Precision: {metrics.box.mp:.4f}")
print(f"Recall: {metrics.box.mr:.4f}")

# Per-class metrics
print("\nPer-Class mAP@0.5:")
for idx, class_name in enumerate(class_names):
    print(f"  {class_name}: {metrics.box.maps[idx]:.4f}")

## Step 9: Visualize Training Results

Ultralytics automatically generates training plots including:
- Loss curves (box, class, DFL)
- Precision-Recall curves
- Confusion matrix
- F1-score curves

In [None]:
# Display training results
from IPython.display import Image, display

results_dir = Path('runs/visdrone/yolov8n_exp')

# Show training curves
print("Training Results:")
display(Image(filename=str(results_dir / 'results.png')))

# Show confusion matrix
print("\nConfusion Matrix:")
display(Image(filename=str(results_dir / 'confusion_matrix.png')))

# Show PR curve
print("\nPrecision-Recall Curve:")
display(Image(filename=str(results_dir / 'PR_curve.png')))

## Step 10: Run Inference on Test Images

Now let's use our trained model to detect objects in new images!

In [None]:
# Load the best trained model
best_model = YOLO('runs/visdrone/yolov8n_exp/weights/best.pt')

# Run inference on validation images
val_images = list(dataset_path.glob('val/images/*.jpg'))[:6]

# Predict
results = best_model.predict(
    source=val_images,
    conf=0.25,        # confidence threshold
    iou=0.45,         # NMS IoU threshold
    save=True,        # save results
    project='runs/visdrone',
    name='predictions',
    exist_ok=True
)

# Display predictions
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for idx, result in enumerate(results[:6]):
    # Get the image with plotted boxes
    img = result.plot()  # BGR format
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    axes[idx].imshow(img)
    axes[idx].axis('off')
    axes[idx].set_title(f'Detection {idx+1}')

plt.tight_layout()
plt.show()

# Print detection statistics
for idx, result in enumerate(results[:3]):
    boxes = result.boxes
    print(f"\nImage {idx+1}: {len(boxes)} objects detected")
    for box in boxes:
        cls = int(box.cls[0])
        conf = float(box.conf[0])
        print(f"  - {class_names[cls]}: {conf:.2f}")

## Step 11: Export the Model (Optional)

You can export the trained model to different formats for deployment:
- ONNX: For general deployment
- TorchScript: For production with PyTorch
- TensorRT: For NVIDIA GPUs (fastest)
- CoreML: For iOS devices
- TFLite: For mobile/edge devices

In [None]:
# Export to ONNX format
onnx_path = best_model.export(format='onnx')
print(f"Model exported to: {onnx_path}")

# Other export options:
# best_model.export(format='torchscript')
# best_model.export(format='tensorrt')  # requires TensorRT
# best_model.export(format='coreml')    # for iOS
# best_model.export(format='tflite')    # for mobile

## Summary

In this notebook, we covered:

1. **Environment Setup**: Installed Ultralytics YOLO and dependencies
2. **Dataset Preparation**: Downloaded and formatted VisDrone for YOLO
3. **Configuration**: Created data.yaml with class definitions
4. **Exploration**: Visualized sample images with annotations
5. **Model Initialization**: Loaded pretrained YOLOv8n weights
6. **Training**: Trained the model with optimized hyperparameters
7. **Evaluation**: Measured performance on validation set
8. **Visualization**: Reviewed training metrics and plots
9. **Inference**: Ran predictions on new images
10. **Export**: Converted model for deployment

## Next Steps

- Try different model sizes (yolov8s, yolov8m) for better accuracy
- Experiment with hyperparameters (learning rate, batch size, augmentation)
- Train for more epochs if validation loss is still decreasing
- Fine-tune on specific drone scenarios in your use case
- Deploy the model to a real-time application