# Data Preperation

## Part A: Dataset Curation and Exploration
Dataset Sampling: From the full Mapillary dataset, create a focused subset:
- Identify the 20-30 most frequent traffic sign classes
- Extract 10,000-15,000 images containing these signs
- Ensure balanced class distribution (or document imbalance strategy)
- Split data: 70% training, 15% validation, 15% test

Exploratory Analysis:
= Analyze class distributions and sign size variations
- Visualize sample images showing different conditions (weather, lighting, occlusion)
- Document challenges: scale variation, multiple signs per image, background complexity
- Examine bounding box annotations and prepare ground truth data

Deliverable: Dataset preparation script and exploratory analysis notebook with visualizations.

## Part B: Preprocessing Pipeline
Color Space Analysis: Convert images to RGB, HSV, and Lab color spaces. Analyze which color space best isolates traffic signs from complex urban backgrounds (sky, buildings, vegetation).

Image Enhancement:
- Implement adaptive histogram equalization (CLAHE) for lighting normalization
- Apply bilateral filtering for noise reduction while preserving edges
- Test preprocessing on challenging images (nighttime, shadows, rain)
- Sign Region Extraction: Use bounding box annotations to extract sign regions. Implement padding strategy to include context around signs.
- 
Standardization: Resize extracted signs to uniform dimensions (e.g., 64x64 or 128x128 pixels) while maintaining aspect ratio considerations.
Deliverable: Preprocessing pipeline that outputs enhanced, standardized sign images.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import os
import json
import cv2
import matplotlib.pyplot as plt

In [None]:
DATAPATH = "/data/project/MSA8395/mapillary_traffic_sign_dataset/"

In [None]:
! ls {DATAPATH}/images | head 

In [None]:
img = cv2.imread(f"{DATAPATH}/images/_-0aygxELCt_AvFtXT-iOA.jpg")

In [None]:
plt.imshow(img)

In [None]:
img.shape

In [None]:
plt.imshow(img[:,:,0], cmap='gray')

In [None]:
plt.imshow(img[:,:,1], cmap='gray');

In [None]:
plt.imshow(img[:,:,2], cmap='gray');

In [None]:
cv2.con

In [None]:
import cv2

# Read the image
img = cv2.imread("myimage.jpg")

# Define the new dimensions (width, height)
new_size = (300, 300)

# Resize the image
resized_img = cv2.resize(img, new_size)

import cv2

# Read the image
img = cv2.imread("myimage.jpg")

# Resize to double the width and half the height
resized_img_scaled = cv2.resize(img, (0, 0), fx=2, fy=0.5)


import cv2

# Load the image (assuming 'image.jpg' is in BGR format)
bgr_image = cv2.imread('image.jpg')

# Convert the BGR image to RGB
rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)

# Now 'rgb_image' holds the image data in RGB format
# You can then display it using libraries like Matplotlib or save it.

In [None]:
rgb_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
rgb_image = cv2.resize(rgb_image, (400, 300))
plt.imshow(rgb_image);


In [None]:
print(cv2.COLOR_BGR2RGB)

In [None]:
from helper import load_and_scale

In [None]:
img = load_and_scale(f"{DATAPATH}/images/_0kfEqHYb79-bAe5dqVntA.jpg")
plt.imshow(img);

In [None]:
data = json.load(open(f"{DATAPATH}/mtsd_v2_fully_annotated/annotations/00CPBbi50rnROtcdEFVpwA.json"))
data