# Data Preprocessing Pipeline

This notebook handles preprocessing of both aircraft datasets for model training.

**Objectives:**
1. Preprocess images (resize, normalize, augment)
2. Convert annotations to YOLO format for object detection
3. Prepare data loaders for classification tasks
4. Create train/val/test splits
5. Save processed data for model training


In [None]:
# Import required libraries
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import cv2
from pathlib import Path
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# Add src to path
sys.path.append('../src')

# Set plotting style
plt.style.use('default')
%matplotlib inline

print("✓ Libraries imported successfully!")


## 1. Setup Paths and Configuration


In [None]:
# Configuration
CONFIG = {
    'image_size': (640, 640),  # Target size for YOLO
    'normalize': True,
    'augmentation': True,
    'random_seed': 42
}

# Paths
DATA_DIR = Path('../data')
PROCESSED_DIR = DATA_DIR / 'processed'
RAW_DIR = DATA_DIR / 'raw'
DATASET_DIR = DATA_DIR / 'dataset'
FGVC_DIR = DATA_DIR / 'fgvc-aircraft-2013b' / 'fgvc-aircraft-2013b' / 'data'

# Create output directories
PROCESSED_DIR.mkdir(parents=True, exist_ok=True)
(PROCESSED_DIR / 'yolo_format').mkdir(exist_ok=True)
(PROCESSED_DIR / 'classification').mkdir(exist_ok=True)

print("✓ Paths configured")
print(f"  - Data directory: {DATA_DIR}")
print(f"  - Processed output: {PROCESSED_DIR}")


## 2. Image Preprocessing Functions

Define functions for image preprocessing operations.


In [None]:
# TODO: Add image preprocessing functions here
# Examples:
# - resize_image()
# - normalize_image()
# - apply_augmentation()
# - denoise_image()

def resize_image(image, target_size=(640, 640)):
    """Resize image to target size."""
    pass

def normalize_image(image):
    """Normalize pixel values to [0, 1]."""
    pass

print("✓ Preprocessing functions defined (ready to implement)")


## 3. Convert Annotations to YOLO Format

Convert bounding box annotations from Military Aircraft Dataset to YOLO format.


In [None]:
# TODO: Implement YOLO format conversion
# YOLO format: <class_id> <x_center> <y_center> <width> <height>
# All values normalized to [0, 1]

def convert_bbox_to_yolo(xmin, ymin, xmax, ymax, img_width, img_height):
    """
    Convert bounding box from (xmin, ymin, xmax, ymax) to YOLO format.
    
    Args:
        xmin, ymin, xmax, ymax: Bounding box coordinates
        img_width, img_height: Image dimensions
        
    Returns:
        x_center, y_center, width, height (all normalized to [0, 1])
    """
    pass

print("✓ YOLO conversion function defined (ready to implement)")


e p

In [None]:
# TODO: Process military aircraft dataset
# Steps:
# 1. Load annotations from labels_with_split.csv
# 2. Create class mapping (aircraft type -> class_id)
# 3. Convert each annotation to YOLO format
# 4. Save images and labels in YOLO directory structure
#    - images/train/, images/val/, images/test/
#    - labels/train/, labels/val/, labels/test/

print("✓ Ready to process military aircraft dataset")


cessin

In [None]:
# TODO: Prepare FGVC dataset
# Steps:
# 1. Load variant annotations
# 2. Preprocess images (resize, normalize)
# 3. Organize into train/val/test folders
# 4. Create class mapping file
# 5. Save preprocessed data

print("✓ Ready to process FGVC aircraft dataset")


ess

In [None]:
# TODO: Implement data augmentation
# Techniques to consider:
# - Horizontal flip
# - Random rotation
# - Brightness/contrast adjustment
# - Random crop
# - Gaussian noise
# - Color jittering

print("✓ Ready to implement data augmentation")


In [None]:
# TODO: Add visualization functions
# - Show original vs preprocessed images
# - Visualize augmentation results
# - Display YOLO format annotations

print("✓ Ready to add visualization")


## 8. Summary and Next Steps

### Preprocessing Checklist:
- ☐ Implement image resizing and normalization
- ☐ Convert Military Aircraft annotations to YOLO format
- ☐ Organize FGVC dataset for classification
- ☐ Apply data augmentation
- ☐ Verify preprocessed data quality
- ☐ Save preprocessed datasets

### After Preprocessing:
1. Move to YOLO training notebook (`03_yolo_training.ipynb`)
2. Train object detection model
3. Train classification model
4. Integrate both for threat detection
