# Data Augmentation Strategy & Visualization

In this notebook, we define and test the visual transformation pipeline for our product dataset. Since our images are captured against a **pure white background**, our augmentation strategy focuses on:

1. **Reducing Background Dependency:** Ensuring the model learns product features rather than just the object-to-white contrast.
    
2. **Environmental Simulation:** Using color and brightness jitters to simulate different lighting conditions.
    
3. **Spatial Invariance:** Using rotations and translations so the model can recognize products regardless of their position in the frame.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import requests
from io import BytesIO
from torchvision import transforms

df = pd.read_csv('../data/processed/products_cleaned.csv')

def load_image_from_url(url):
    """Downloads an image from a URL and returns a PIL Image object."""
    try:
        response = requests.get(url, timeout=5)
        return Image.open(BytesIO(response.content)).convert("RGB")
    except Exception as e:
        print(f"Error loading image: {e}")
        return None

print(f"Dataset loaded: {len(df)} products available for testing.")

## Defining the Augmentation Pipeline

We use `torchvision.transforms` to build the pipeline.

**Crucial Setting:** Since our background is white, we use `fill=255` (or `fill=1.0` for tensors) during rotations and translations. This ensures that any "empty space" created by moving the image is filled with white, matching our dataset's aesthetic and preventing the model from learning black borders as a feature.

In [None]:
train_transforms = transforms.Compose([
    # Stretching the image to exactly 224x224
    transforms.Resize((224, 224)),
    
    # Random Rotation: Up to 20 degrees, filled with white (to match background)
    transforms.RandomRotation(degrees=20, fill=255),
    
    # Random Translation: Shifting the product slightly
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1), fill=255),
    
    # Color/Light Jitter: Simulating different room lightings
    transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.2, hue=0.05),
    
    # Horizontal Flip
    transforms.RandomHorizontalFlip(p=0.5),
    
    # Convert to Tensor and Normalize
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

print("Augmentation pipeline with 224x224 Stretch Resize is ready.")

## Why Stretch Resize?

By using `transforms.Resize((224, 224))`, we ensure every input image has the exact same dimensions required by architectures like **ResNet** or **EfficientNet**.

- **Pros:** Every pixel in the 224x224 grid contains data; no "wasted" space with black or white bars.
    
- **Cons:** If an image is very wide (like a belt) or very tall (like a floor lamp), stretching might slightly distort the product's shape. However, CNNs are generally robust enough to handle this minor distortion during classification.

In [None]:
sample_row = df.sample(1).iloc[0]
print(f"Visualizing: {sample_row['title']}")
original_img = load_image_from_url(sample_row['imgUrl'])

if original_img:
    plt.figure(figsize=(16, 8))
    
    plt.subplot(2, 4, 1)
    plt.imshow(original_img)
    plt.title("Original (Raw)")
    plt.axis('off')
    
    # Apply Augmentations 7 times
    for i in range(2, 9):
        viz_transforms = transforms.Compose(train_transforms.transforms[:-1]) 
        
        augmented_tensor = viz_transforms(original_img)
        # Convert Tensor (C,H,W) to Numpy (H,W,C) for Matplotlib
        augmented_img = augmented_tensor.permute(1, 2, 0).numpy()
        
        plt.subplot(2, 4, i)
        plt.imshow(augmented_img)
        plt.title(f"Augmented Var {i-1}")
        plt.axis('off')
    
    plt.tight_layout()
    plt.show()

## Conclusion for Training Pipeline

This strategy effectively "breaks" the perfect studio look of the dataset. By the time the model finishes training, it will have seen millions of variations in lighting, angle, and position, making it significantly more reliable for real-world application.