# Car Detection - Data Preparation with Roboflow

This notebook guides you through the process of preparing a car detection dataset using Roboflow.

## 1. Install Required Libraries

In [None]:
!pip install roboflow ultralytics

## 2. Roboflow Account Setup

1. Go to [Roboflow](https://roboflow.com/) and create a free account
2. Create a new project with the type "Object Detection"
3. Get your API key from your Roboflow account settings

In [None]:
from roboflow import Roboflow

# Initialize Roboflow client with your API key
rf = Roboflow(api_key="YOUR_API_KEY")

## 3. Data Collection Options

### Option 1: Upload your own car images
Upload your annotated car images to Roboflow. You can use tools like LabelImg or CVAT for annotation.

### Option 2: Use existing car datasets
Load an existing car detection dataset from Roboflow Universe.

In [None]:
# Option 2: Use an existing public car dataset from Roboflow Universe
project = rf.workspace().project("vehicles-q0x2h")
dataset = project.version(1).download("yolov8")

## 4. Data Preprocessing and Augmentation in Roboflow

In the Roboflow web interface:

1. **Preprocessing**:
   - Auto-orient images (fix orientation)
   - Resize images (e.g., 640x640)
   - Apply image enhancement techniques

2. **Augmentation**:
   - Rotation (±15°)
   - Brightness adjustment (±25%)
   - Saturation adjustments
   - Blur/Noise addition
   - Cutout (simulates occlusion)
   - Mosaic (combines multiple images)
   
These techniques help improve model robustness by simulating different conditions.

## 5. Generate Dataset Version in Roboflow

After uploading and preprocessing your data:

1. Split the dataset (e.g., 70% train, 20% validation, 10% test)
2. Generate a new dataset version
3. Export in YOLOv8 format

## 6. Download the Processed Dataset

In [None]:
# If using your own project
# Replace with your workspace, project name, and version number
rf = Roboflow(api_key="YOUR_API_KEY")
project = rf.workspace("YOUR_WORKSPACE").project("car-detection")
dataset = project.version(1).download("yolov8")

print(f"Dataset downloaded to: {dataset.location}")

## 7. Explore the Dataset Structure

In [None]:
import os

# Navigate to dataset location (your path may differ)
data_path = dataset.location

print("Dataset structure:")
for root, dirs, files in os.walk(data_path):
    level = root.replace(data_path, '').count(os.sep)
    indent = ' ' * 4 * level
    print(f"{indent}{os.path.basename(root)}/")
    for file in files[:5]:  # Show only first 5 files per directory
        print(f"{indent}    {file}")
    if len(files) > 5:
        print(f"{indent}    ...")

## 8. Visualize Sample Images with Annotations

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
import glob

def plot_sample_with_bbox(img_path, label_path):
    # Read image
    img = cv2.imread(img_path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    height, width, _ = img.shape
    
    # Read labels (YOLO format: class x_center y_center width height)
    boxes = []
    if os.path.exists(label_path):
        with open(label_path, 'r') as f:
            for line in f.readlines():
                data = line.strip().split()
                class_id = int(data[0])
                x_center = float(data[1]) * width
                y_center = float(data[2]) * height
                box_width = float(data[3]) * width
                box_height = float(data[4]) * height
                
                x1 = int(x_center - box_width / 2)
                y1 = int(y_center - box_height / 2)
                x2 = int(x_center + box_width / 2)
                y2 = int(y_center + box_height / 2)
                
                boxes.append((class_id, x1, y1, x2, y2))
    
    # Plot image with bounding boxes
    plt.figure(figsize=(10, 10))
    plt.imshow(img)
    
    colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0), (0, 255, 255)]
    
    for box in boxes:
        class_id, x1, y1, x2, y2 = box
        color = colors[class_id % len(colors)]
        
        rect = plt.Rectangle((x1, y1), x2-x1, y2-y1, fill=False, 
                           edgecolor=np.array(color)/255, linewidth=2)
        plt.gca().add_patch(rect)
        plt.text(x1, y1-5, f"Class {class_id}", 
                 color=np.array(color)/255, fontsize=12, 
                 bbox=dict(facecolor='white', alpha=0.7))
    
    plt.axis('off')
    plt.title(os.path.basename(img_path))
    plt.show()

# Display a few sample images from the training set
train_images = glob.glob(f"{data_path}/train/images/*.jpg")[:3]

for img_path in train_images:
    label_path = img_path.replace('images', 'labels').replace('.jpg', '.txt')
    plot_sample_with_bbox(img_path, label_path)

## 9. Apply Additional Image Processing Techniques

While Roboflow provides many preprocessing options, you can apply additional techniques:

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load a sample image
sample_img_path = train_images[0]
img = cv2.imread(sample_img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 1. Apply histogram equalization (enhance contrast)
img_yuv = cv2.cvtColor(img, cv2.COLOR_RGB2YUV)
img_yuv[:,:,0] = cv2.equalizeHist(img_yuv[:,:,0])
img_hist_eq = cv2.cvtColor(img_yuv, cv2.COLOR_YUV2RGB)

# 2. Apply Gaussian blur (reduce noise)
img_blur = cv2.GaussianBlur(img, (5, 5), 0)

# 3. Apply edge detection (Canny)
img_gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
img_edges = cv2.Canny(img_gray, 100, 200)
img_edges_color = cv2.cvtColor(img_edges, cv2.COLOR_GRAY2RGB)

# 4. Apply adaptive thresholding
img_thresh = cv2.adaptiveThreshold(img_gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                                  cv2.THRESH_BINARY, 11, 2)
img_thresh_color = cv2.cvtColor(img_thresh, cv2.COLOR_GRAY2RGB)

# Display the results
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].imshow(img)
axes[0, 0].set_title('Original')

axes[0, 1].imshow(img_hist_eq)
axes[0, 1].set_title('Histogram Equalization')

axes[0, 2].imshow(img_blur)
axes[0, 2].set_title('Gaussian Blur')

axes[1, 0].imshow(img_edges_color)
axes[1, 0].set_title('Edge Detection')

axes[1, 1].imshow(img_thresh_color)
axes[1, 1].set_title('Adaptive Thresholding')

axes[1, 2].axis('off')

for ax in axes.flat:
    ax.axis('off')

plt.tight_layout()
plt.show()

## 10. Create YAML Configuration for YOLOv8

In [None]:
# The Roboflow download should have created a data.yaml file
# Let's check and modify if needed

import yaml

yaml_path = os.path.join(data_path, 'data.yaml')

# Read the existing YAML file
with open(yaml_path, 'r') as f:
    data_yaml = yaml.safe_load(f)

print("Current data.yaml contents:")
print(yaml.dump(data_yaml))

## 11. Copy the dataset to the project data directory

In [None]:
import shutil

project_data_dir = "../data"

# Copy the entire dataset to the project data directory
if os.path.exists(data_path):
    # Create a subdirectory with the dataset name
    dataset_name = os.path.basename(data_path)
    target_dir = os.path.join(project_data_dir, dataset_name)
    
    if os.path.exists(target_dir):
        print(f"Warning: {target_dir} already exists. Skipping copy.")
    else:
        shutil.copytree(data_path, target_dir)
        print(f"Dataset copied to {target_dir}")

## Next Steps

Now that your dataset is prepared and properly formatted for YOLOv8, you can proceed to model training. See the `model_training.ipynb` notebook for instructions on training your car detection model.