# üõ†Ô∏è **Automated Data Augmentation Script**
## 1. Overview
This script is designed to artificially expand a dataset for deep learning (CNNs/ViTs). It iterates through every image in a dataset and generates multiple variations (augmentations) of it.

**Goal**: Transform a small dataset into a large, diverse dataset to prevent model overfitting.

## 2. Library Imports & Setup
We use a mix of standard Python libraries and PyTorch's computer vision tools.

**numpy**: Used to manipulate images as matrices (grids of numbers) for pixel-level noise.

**PIL**: Used to load images from the hard drive and apply CPU-based enhancements (Contrast).

**torchvision.transforms**: The core library for geometric and color transformations compatible with Deep Learning models.

**tqdm**: Provides a visual progress bar so we know the script hasn't frozen.

In [22]:
import os
import random
import numpy as np
from PIL import Image, ImageEnhance
from torchvision import transforms
import torch
from tqdm import tqdm

In [23]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using:", device)

Using: cpu


In [24]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [25]:
import os
os.listdir('/content')

['.config', 'drive', 'sample_data']

In [26]:
import os

# Main input folder (your existing dataset)
main_folder = "/content/drive/MyDrive/AI4DRR Workshop/Data"

# New output folder for augmented images
augmented_base = "/content/drive/MyDrive/AI4DRR Workshop/Augmented_Data"

# Create main augmented folder if not exists
os.makedirs(augmented_base, exist_ok=True)

## 3. Custom Augmentation: **Salt & Pepper Noise**
Standard libraries sometimes lack specific noise types.\
Here, we manipulate the pixel data directly to simulate sensor noise ("Salt" is white, "Pepper" is black).
![A cute cat](https://www.askpython.com/wp-content/uploads/2023/08/salt-pepper-noise-1024x563.png.webp)

In [27]:
def add_salt_and_pepper_noise(image, amount=0.02):
    """
    Applies salt and pepper noise to a PIL image.
    """
    np_img = np.array(image)

    # Generate Salt noise (White)
    num_salt = int(amount * np_img.size * 0.5)
    coords_salt = [np.random.randint(0, i - 1, num_salt) for i in np_img.shape]
    np_img[coords_salt[0], coords_salt[1], :] = 255

    # Generate Pepper noise (Black)
    num_pepper = int(amount * np_img.size * 0.5)
    coords_pepper = [np.random.randint(0, i - 1, num_pepper) for i in np_img.shape]
    np_img[coords_pepper[0], coords_pepper[1], :] = 0

    return Image.fromarray(np_img)

# **4. The Augmentation Configuration**
We define a dictionary of transformations.
This makes the code modular. If we want to add a new transformation later, we just add a line here.

In [28]:
def get_augmentation_dict():
    """
    Returns a dictionary of PyTorch transforms.
    Each key is the suffix for the filename, value is the transform.
    """
    return {
        "flip": transforms.RandomHorizontalFlip(p=1.0),
        "rot": transforms.RandomAffine(degrees=15, translate=(0.1, 0.1), scale=(0.9, 1.1)),
        "color": transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1),
        "blur": transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 2.0)),
        "sharp": transforms.Lambda(lambda img: transforms.functional.adjust_sharpness(img, sharpness_factor=1.5)),
    }

In [29]:
def process_folder(folder_path):
    """
    Applies all augmentations to all images in a specific folder.
    Saves augmented images in a new Augmented_Data folder with class-wise subfolders.
    """

    class_name = os.path.basename(folder_path)

    # Output folder for this class
    class_output_folder = os.path.join(augmented_base, class_name)
    os.makedirs(class_output_folder, exist_ok=True)

    # List valid images
    image_files = [f for f in os.listdir(folder_path)
                   if f.lower().endswith(('png', 'jpg', 'jpeg'))]

    # Filter out augmented images
    original_images = [f for f in image_files if "aug_" not in f]

    if not original_images:
        print(f"No original images found in {folder_path}")
        return

    # Load transforms
    aug_dict = get_augmentation_dict()
    tensor_converter = transforms.ToTensor()
    pil_converter = transforms.ToPILImage()

    print(f"Processing folder: {class_name} | Found {len(original_images)} originals")

    for img_file in tqdm(original_images, desc=f"Augmenting {class_name}"):
        img_path = os.path.join(folder_path, img_file)
        file_name, ext = os.path.splitext(img_file)

        try:
            with Image.open(img_path) as img:
                img = img.convert("RGB")

                img_tensor = tensor_converter(img).to(device)

                # -------- PyTorch GPU Augmentations ----------
                for suffix, transform in aug_dict.items():
                    augmented_tensor = transform(img_tensor)
                    aug_pil = pil_converter(augmented_tensor.cpu())

                    save_name = f"{file_name}_aug_{suffix}{ext}"
                    aug_pil.save(os.path.join(class_output_folder, save_name))

                # -------- Custom CPU Augmentations ----------
                sn_img = add_salt_and_pepper_noise(img, amount=0.02)
                sn_img.save(os.path.join(class_output_folder, f"{file_name}_aug_noise{ext}"))

                contrast_img = ImageEnhance.Contrast(img).enhance(1.5)
                contrast_img.save(os.path.join(class_output_folder, f"{file_name}_aug_contrast{ext}"))

        except Exception as e:
            print(f"Error processing {img_file}: {e}")

## 6. Execution Block
This entry point allows the script to handle multiple class folders (e.g., Data/Building_A, Data/Building_B) automatically.

In [30]:
if __name__ == "__main__":
    if not os.path.exists(main_folder):
        print(f"Error: Main folder not found at {main_folder}")
    else:
        subfolders = [f for f in os.listdir(main_folder)
                      if os.path.isdir(os.path.join(main_folder, f))]

        for folder in subfolders:
            full_path = os.path.join(main_folder, folder)
            process_folder(full_path)

        print("\n--- All folders processed successfully ---")

Processing folder: Metal_Sheet | Found 30 originals


Augmenting Metal_Sheet: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 30/30 [00:10<00:00,  2.93it/s]


Processing folder: RCC | Found 30 originals


Augmenting RCC: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 30/30 [00:21<00:00,  1.42it/s]


Processing folder: Assam_Type | Found 30 originals


Augmenting Assam_Type: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 30/30 [00:13<00:00,  2.29it/s]


Processing folder: Vacant | Found 30 originals


Augmenting Vacant: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 30/30 [00:22<00:00,  1.34it/s]

No original images found in /content/drive/MyDrive/AI4DRR Workshop/Data/.ipynb_checkpoints

--- All folders processed successfully ---



