# Image Data Fundamentals

## Definition of Digital Images

A digital image is a numerical representation of visual information, discretized into a grid of picture elements (pixels). Mathematically, an image $I$ can be represented as a function:

$$I: \Omega \subset \mathbb{R}^2 \rightarrow \mathbb{R}^c$$

Where $\Omega$ represents the spatial domain and $c$ denotes the number of channels.

## Digital Image Properties

### Width and Height

The width ($W$) and height ($H$) of an image define its spatial dimensions in pixels. An image can be represented as a matrix:

$$I = \begin{bmatrix}
p_{1,1} & p_{1,2} & \cdots & p_{1,W} \\
p_{2,1} & p_{2,2} & \cdots & p_{2,W} \\
\vdots & \vdots & \ddots & \vdots \\
p_{H,1} & p_{H,2} & \cdots & p_{H,W}
\end{bmatrix}$$

Where $p_{i,j}$ represents the pixel value at coordinates $(i,j)$.

### Channels

Channels represent different color or intensity components of an image. For an RGB image with 3 channels:

$$I_{RGB} = \{I_R, I_G, I_B\}$$

Where each channel $I_c$ is a matrix with dimensions $H \times W$.

The tensor representation of a color image:

$$I \in \mathbb{R}^{H \times W \times C}$$

## Computer Understanding of Images

Computers interpret images as numerical arrays. Each pixel value is quantized into discrete intensity levels:

$$p_{i,j,c} \in [0, 2^b-1]$$

Where $b$ is the bit depth (typically 8 bits per channel, giving values from 0-255).
![](./images/12.webp)
### Memory Representation

Images in memory follow a layout defined by:

$$\text{Address}(x,y,c) = \text{Base} + (y \cdot W + x) \cdot C + c$$

For row-major storage, where $\text{Base}$ is the starting memory address.

## Channel Significance

### RGB Color Model

RGB channels represent the additive color model expressed as:

$$I_{RGB}(x,y) = [r(x,y), g(x,y), b(x,y)]$$

Where intensity values typically range from 0 to 255 for 8-bit encoding.

### Grayscale

Single-channel representation where intensity is calculated as:

$$I_{gray}(x,y) = 0.299 \cdot r(x,y) + 0.587 \cdot g(x,y) + 0.114 \cdot b(x,y)$$

### Alpha Channel

Represents transparency where:

$$I_{RGBA}(x,y) = [r(x,y), g(x,y), b(x,y), \alpha(x,y)]$$

$\alpha = 0$ indicates full transparency; $\alpha = 255$ indicates full opacity.

### Domain-Specific Channels

- **Medical Imaging**: DICOM images often use 16-bit single channel for radiographic data
- **Multispectral Imaging**: Multiple discrete channels representing specific electromagnetic wavelengths
- **Thermal Imaging**: Single channel representing temperature gradients

## Height and Width Significance

### Resolution Considerations

The total number of pixels in an image is defined by:

$$\text{Resolution} = W \times H$$

Higher resolution increases detail but requires more storage and processing resources.

### Aspect Ratio

The relationship between width and height:

$$\text{Aspect Ratio} = \frac{W}{H}$$

Common aspect ratios: 16:9 (widescreen), 4:3 (standard), 1:1 (square).

## Optimal Dimension Selection

### Network Architecture Constraints

Many deep learning models require fixed input dimensions:

- **ResNet**: 224×224 pixels
- **YOLO**: 416×416 or 608×608 pixels
- **EfficientNet**: Dynamic sizing based on scaling coefficients

### Application-Specific Considerations

#### Medical Imaging

- **CT/MRI**: High-resolution isotropic voxels (e.g., 512×512×Z)
- **Histopathology**: Ultra-high resolution (10,000×10,000+ pixels)
- **Optimal dimensions**: Preserve clinical features at minimum necessary resolution

#### Real-time Vision Systems

Dimensions balanced by the equation:

$$\text{Processing Time} \propto W \times H \times C \times \text{Computational Complexity}$$

Common resolutions:
- **High-speed tracking**: 320×240 (QVGA)
- **Autonomous vehicles**: 1280×720 (720p)
- **Industrial inspection**: Application-specific, balancing detail and speed

#### Camera Systems

Resolution determined by sensor dimensions and pixel density:

$$\text{Resolution} = \text{Sensor Width} \times \text{Sensor Height} \times \text{Pixel Density}^2$$

## Image File Formats

### Lossless Formats

- **PNG**: Supports transparency, uses DEFLATE compression
- **TIFF**: Versatile, supports multiple compression algorithms
- **BMP**: Simple format with minimal or no compression

### Lossy Formats

- **JPEG**: Uses discrete cosine transform (DCT) compression
- **WebP**: Modern format with superior compression ratios

### Special-Purpose Formats

- **DICOM**: Medical imaging standard with metadata
- **FITS**: Astronomical imaging
- **RAW**: Camera-specific formats preserving sensor data

### Format Selection Criteria

Format selection depends on the optimization metric:
- **Size-constrained**: Choose lossy compression
- **Quality-critical**: Choose lossless formats
- **Application-specific**: Choose domain-specialized formats
- **Compatibility**: Choose widely-supported formats

# Image Transformations in Computer Vision

## 1. Rotation

### Definition
Rotation is a geometric transformation that turns an image around a pivot point by a specified angle $\theta$.

### Mathematical Formulation
For a point $(x, y)$ in the original image, its rotated coordinates $(x', y')$ around the origin $(0, 0)$ by angle $\theta$ are:

$$x' = x\cos(\theta) - y\sin(\theta)$$
$$y' = x\sin(\theta) + y\cos(\theta)$$

For rotation around a center point $(c_x, c_y)$ (typically the image center):

$$x' = (x-c_x)\cos(\theta) - (y-c_y)\sin(\theta) + c_x$$
$$y' = (x-c_x)\sin(\theta) + (y-c_y)\cos(\theta) + c_y$$

### Matrix Representation
Rotation around the origin can be expressed as:

$$\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$$

### Implementation for Image Tensor (C,H,W)
When rotating a tensor with dimensions (channels, height, width):

```
function rotate_image(image, angle, center=None):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    if center is None:
        center = (w/2, h/2)
    
    # Convert angle to radians
    theta = angle * PI / 180
    cos_theta, sin_theta = cos(theta), sin(theta)
    
    # Calculate new dimensions
    new_h = int(abs(h * cos_theta) + abs(w * sin_theta))
    new_w = int(abs(w * cos_theta) + abs(h * sin_theta))
    
    # Create output tensor
    output = zeros((c, new_h, new_w))
    
    # Inverse mapping
    for y_out in range(new_h):
        for x_out in range(new_w):
            # Centered coordinates
            x = x_out - new_w/2
            y = y_out - new_h/2
            
            # Apply inverse rotation
            x_orig = x * cos_theta + y * sin_theta + w/2
            y_orig = -x * sin_theta + y * cos_theta + h/2
            
            # If within bounds, copy pixel (with interpolation)
            if 0 <= x_orig < w and 0 <= y_orig < h:
                for ch in range(c):
                    output[ch, y_out, x_out] = interpolate(image[ch], x_orig, y_orig)
    
    return output
```

## 2. Flipping

### Definition
Flipping mirrors an image across either the horizontal or vertical axis.

### Mathematical Formulation
For horizontal flipping of an image with width $w$:
$$x' = w - 1 - x$$
$$y' = y$$

For vertical flipping of an image with height $h$:
$$x' = x$$
$$y' = h - 1 - y$$

### Matrix Representation
For horizontal flipping:
$$\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} w-1 \\ 0 \end{bmatrix}$$

For vertical flipping:
$$\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} 0 \\ h-1 \end{bmatrix}$$

### Implementation for Image Tensor (C,H,W)

```
function flip_image(image, direction):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    if direction == "horizontal":
        for ch in range(c):
            for y in range(h):
                for x in range(w):
                    output[ch, y, x] = image[ch, y, w-1-x]
    
    elif direction == "vertical":
        for ch in range(c):
            for y in range(h):
                for x in range(w):
                    output[ch, y, x] = image[ch, h-1-y, x]
    
    return output
```

Efficient implementation:
```
function flip_image(image, direction):
    if direction == "horizontal":
        return image[:, :, ::-1]  # Reverse width dimension
    elif direction == "vertical":
        return image[:, ::-1, :]  # Reverse height dimension
```

## 3. Scaling

### Definition
Scaling resizes an image by multiplying its dimensions by scale factors.

### Mathematical Formulation
For scaling by factors $s_x$ and $s_y$:
$$x' = x \cdot s_x$$
$$y' = y \cdot s_y$$

### Matrix Representation
$$\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix}$$

### Implementation for Image Tensor (C,H,W)

```
function scale_image(image, scale_x, scale_y):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    
    # Calculate new dimensions
    new_h, new_w = int(h * scale_y), int(w * scale_x)
    
    # Create output tensor
    output = zeros((c, new_h, new_w))
    
    # Inverse mapping
    for y_out in range(new_h):
        for x_out in range(new_w):
            # Find source position
            x_orig = x_out / scale_x
            y_orig = y_out / scale_y
            
            # Apply interpolation
            for ch in range(c):
                output[ch, y_out, x_out] = bilinear_interpolate(
                    image[ch], x_orig, y_orig)
    
    return output
```

### Interpolation Methods
- **Nearest Neighbor**: Uses value of closest pixel
- **Bilinear**: Weighted average of 4 nearest pixels
- **Bicubic**: Weighted average of 16 nearest pixels

## 4. Translation

### Definition
Translation shifts an image by a constant offset in x and/or y directions.

### Mathematical Formulation
For translation by $(t_x, t_y)$:
$$x' = x + t_x$$
$$y' = y + t_y$$

### Matrix Representation (Homogeneous Coordinates)
$$\begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}$$

### Implementation for Image Tensor (C,H,W)

```
function translate_image(image, tx, ty, fill_value=0):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    
    # Create output tensor
    output = full((c, h, w), fill_value)
    
    # Copy pixels with offset
    for y_out in range(h):
        y_in = y_out - ty
        if 0 <= y_in < h:
            for x_out in range(w):
                x_in = x_out - tx
                if 0 <= x_in < w:
                    for ch in range(c):
                        output[ch, y_out, x_out] = image[ch, y_in, x_in]
    
    return output
```

## 5. Cropping

### Definition
Cropping extracts a rectangular region from an image.

### Mathematical Formulation
For cropping a region starting at $(x_0, y_0)$ with width $w'$ and height $h'$:
The new image consists of pixels $(x, y)$ where:
$$x_0 \leq x < x_0 + w'$$
$$y_0 \leq y < y_0 + h'$$

### Implementation for Image Tensor (C,H,W)

```
function crop_image(image, x0, y0, crop_width, crop_height):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    
    # Ensure crop region is within bounds
    x0 = max(0, min(x0, w-1))
    y0 = max(0, min(y0, h-1))
    crop_width = min(crop_width, w - x0)
    crop_height = min(crop_height, h - y0)
    
    # Extract region
    output = zeros((c, crop_height, crop_width))
    
    for ch in range(c):
        for y in range(crop_height):
            for x in range(crop_width):
                output[ch, y, x] = image[ch, y0 + y, x0 + x]
    
    return output
```

Efficient implementation:
```
function crop_image(image, x0, y0, crop_width, crop_height):
    # Ensure crop region is within bounds
    c, h, w = image.shape
    x0 = max(0, min(x0, w-1))
    y0 = max(0, min(y0, h-1))
    x1 = min(w, x0 + crop_width)
    y1 = min(h, y0 + crop_height)
    
    # Extract region using slicing
    return image[:, y0:y1, x0:x1]


# Image Processing Transformations continue(day-2)...

## 1. Brightness Adjustment

### Definition
Brightness adjustment modifies the luminance or intensity values of an image by adding a constant value to all pixels, effectively making the image appear brighter or darker.

### Mathematical Formulation
For an image $I$ with pixel values in range $[0, 255]$ and brightness factor $\beta$:

$$I'(x,y,c) = \text{clip}(I(x,y,c) + \beta, 0, 255)$$

Where:
- $I(x,y,c)$ is the original pixel value at position $(x,y)$ for channel $c$
- $I'(x,y,c)$ is the adjusted pixel value
- $\beta > 0$ increases brightness
- $\beta < 0$ decreases brightness
- $\text{clip}(v, min, max)$ constrains values to the valid range

### Implementation for Image Tensor (C,H,W)

```
function adjust_brightness(image, beta):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                # Add brightness factor and clip to valid range
                output[ch, y, x] = clip(image[ch, y, x] + beta, 0, 255)
    
    return output
```

### Vectorized Implementation

```
function adjust_brightness(image, beta):
    # image: tensor of shape (c, h, w)
    # Direct vectorized operation
    return clip(image + beta, 0, 255)
```

## 2. Contrast Adjustment

### Definition
Contrast adjustment scales the difference between pixel values and a reference point (typically the mean intensity), enhancing or reducing the distinction between light and dark regions.

### Mathematical Formulation
For contrast factor $\alpha$:

$$I'(x,y,c) = \text{clip}(\alpha \times (I(x,y,c) - \mu) + \mu, 0, 255)$$

Where:
- $\mu$ is the mean intensity of the image (often 128 for 8-bit images)
- $\alpha > 1$ increases contrast
- $0 < \alpha < 1$ decreases contrast
- $\alpha = 1$ leaves contrast unchanged

### Implementation for Image Tensor (C,H,W)

```
function adjust_contrast(image, alpha, mean_value=128):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    for ch in range(c):
        # Option 1: Use global mean value (typically 128 for 8-bit images)
        # Option 2: Calculate mean of the channel
        # channel_mean = sum(image[ch]) / (h * w)
        
        for y in range(h):
            for x in range(w):
                # Apply contrast formula and clip
                output[ch, y, x] = clip(
                    alpha * (image[ch, y, x] - mean_value) + mean_value, 
                    0, 255)
    
    return output
```

### Vectorized Implementation

```
function adjust_contrast(image, alpha, mean_value=128):
    # image: tensor of shape (c, h, w)
    return clip(alpha * (image - mean_value) + mean_value, 0, 255)
```

## 3. Saturation Adjustment

### Definition
Saturation adjustment modifies the intensity of colors in an image without changing its luminance, making colors appear more vibrant or muted.

### Mathematical Formulation
For RGB images, saturation adjustment requires conversion to HSV color space, adjustment of the S channel, and conversion back to RGB:

1. Convert RGB to HSV:
   $$V = \max(R, G, B)$$
   $$S = \begin{cases}
   \frac{V - \min(R,G,B)}{V}, & \text{if } V \neq 0 \\
   0, & \text{otherwise}
   \end{cases}$$
   $$H = \begin{cases}
   60 \times \frac{G-B}{V-\min(R,G,B)} \mod 360, & \text{if } V = R \\
   60 \times \frac{B-R}{V-\min(R,G,B)} + 120, & \text{if } V = G \\
   60 \times \frac{R-G}{V-\min(R,G,B)} + 240, & \text{if } V = B
   \end{cases}$$

2. Modify saturation:
   $$S' = \text{clip}(S \times \gamma, 0, 1)$$
   Where $\gamma$ is the saturation factor

3. Convert back to RGB (simplified):
   $$C = V \times S'$$
   $$X = C \times (1 - |((H / 60) \mod 2) - 1|)$$
   $$m = V - C$$

   $$(R', G', B') = \begin{cases}
   (C+m, X+m, m), & \text{if } 0 \leq H < 60 \\
   (X+m, C+m, m), & \text{if } 60 \leq H < 120 \\
   (m, C+m, X+m), & \text{if } 120 \leq H < 180 \\
   (m, X+m, C+m), & \text{if } 180 \leq H < 240 \\
   (X+m, m, C+m), & \text{if } 240 \leq H < 300 \\
   (C+m, m, X+m), & \text{if } 300 \leq H < 360
   \end{cases}$$

### Implementation for Image Tensor (C,H,W)

```
function adjust_saturation(image, gamma):
    # image: tensor of shape (3, h, w) - RGB format
    if image.shape[0] != 3:
        raise ValueError("Saturation adjustment requires RGB image")
    
    h, w = image.shape[1], image.shape[2]
    output = zeros_like(image)
    
    for y in range(h):
        for x in range(w):
            # Extract RGB values
            r, g, b = image[0, y, x], image[1, y, x], image[2, y, x]
            
            # Convert to HSV
            r, g, b = r/255.0, g/255.0, b/255.0  # Normalize to [0,1]
            v = max(r, g, b)
            min_val = min(r, g, b)
            diff = v - min_val
            
            # Calculate saturation
            s = diff/v if v != 0 else 0
            
            # Calculate hue
            if diff == 0:
                h_val = 0
            elif v == r:
                h_val = 60 * ((g - b)/diff % 6)
            elif v == g:
                h_val = 60 * ((b - r)/diff + 2)
            else:  # v == b
                h_val = 60 * ((r - g)/diff + 4)
            
            # Adjust saturation
            s = clip(s * gamma, 0, 1)
            
            # Convert back to RGB
            c = v * s
            x = c * (1 - abs(((h_val/60) % 2) - 1))
            m = v - c
            
            if 0 <= h_val < 60:
                r_new, g_new, b_new = c+m, x+m, m
            elif 60 <= h_val < 120:
                r_new, g_new, b_new = x+m, c+m, m
            elif 120 <= h_val < 180:
                r_new, g_new, b_new = m, c+m, x+m
            elif 180 <= h_val < 240:
                r_new, g_new, b_new = m, x+m, c+m
            elif 240 <= h_val < 300:
                r_new, g_new, b_new = x+m, m, c+m
            else:  # 300 <= h_val < 360
                r_new, g_new, b_new = c+m, m, x+m
            
            # Convert back to [0,255] and store
            output[0, y, x] = int(r_new * 255)
            output[1, y, x] = int(g_new * 255)
            output[2, y, x] = int(b_new * 255)
    
    return output
```

## 4. Gaussian Noise

### Definition
Gaussian noise adds random values to image pixels, where the values are drawn from a Gaussian (normal) distribution, simulating thermal noise in electronic systems.

### Mathematical Formulation
For Gaussian noise with mean $\mu$ and standard deviation $\sigma$:

$$I'(x,y,c) = \text{clip}(I(x,y,c) + \mathcal{N}(\mu, \sigma^2), 0, 255)$$

Where:
- $\mathcal{N}(\mu, \sigma^2)$ represents a random value drawn from the normal distribution
- $\mu$ is typically 0 for zero-centered noise
- $\sigma$ controls the strength of the noise

The probability density function (PDF) of the Gaussian distribution is:

$$p(z) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(z-\mu)^2}{2\sigma^2}}$$

### Implementation for Image Tensor (C,H,W)

```
function add_gaussian_noise(image, mean=0, sigma=15):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                # Generate random noise from Gaussian distribution
                noise = random_normal(mean, sigma)
                # Add noise and clip to valid range
                output[ch, y, x] = clip(image[ch, y, x] + noise, 0, 255)
    
    return output
```

### Vectorized Implementation

```
function add_gaussian_noise(image, mean=0, sigma=15):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    
    # Generate noise tensor of same shape as image
    noise = random_normal(mean, sigma, size=(c, h, w))
    
    # Add noise and clip
    return clip(image + noise, 0, 255)
```

## 5. Salt and Pepper Noise

### Definition
Salt and pepper noise (also known as impulse noise) randomly replaces pixels with either minimum (pepper) or maximum (salt) values, simulating defects in image sensors or transmission errors.

### Mathematical Formulation
For salt and pepper noise with probability $p$:

$$I'(x,y,c) = \begin{cases}
0 \text{ (pepper)}, & \text{with probability } \frac{p}{2} \\
255 \text{ (salt)}, & \text{with probability } \frac{p}{2} \\
I(x,y,c), & \text{with probability } 1-p
\end{cases}$$

Where:
- $p$ is the total probability of a pixel being affected (typically 0.01 to 0.1)
- Half of affected pixels become 'salt' (255)
- Half of affected pixels become 'pepper' (0)

### Implementation for Image Tensor (C,H,W)

```
function add_salt_pepper_noise(image, prob=0.05):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = copy(image)
    
    # Total number of pixels
    total_pixels = c * h * w
    
    # Number of salt and pepper pixels
    num_salt = int((prob/2) * total_pixels)
    num_pepper = int((prob/2) * total_pixels)
    
    # Add salt noise (white pixels)
    for _ in range(num_salt):
        # Random channel, y and x coordinates
        ch = random_int(0, c-1)
        y = random_int(0, h-1)
        x = random_int(0, w-1)
        output[ch, y, x] = 255
    
    # Add pepper noise (black pixels)
    for _ in range(num_pepper):
        # Random channel, y and x coordinates
        ch = random_int(0, c-1)
        y = random_int(0, h-1)
        x = random_int(0, w-1)
        output[ch, y, x] = 0
    
    return output
```

### Alternative Implementation with Random Mask

```
function add_salt_pepper_noise(image, prob=0.05):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = copy(image)
    
    # Generate random values for all pixels
    rnd = random_uniform(size=(c, h, w))
    
    # Salt mask (where pixel values should be 255)
    salt_mask = rnd < prob/2
    # Pepper mask (where pixel values should be 0)
    pepper_mask = (rnd >= prob/2) & (rnd < prob)
    
    # Apply salt (white) and pepper (black) noise
    output[salt_mask] = 255
    output[pepper_mask] = 0
    
    return output
```

# Image Processing and Augmentation Techniques continue(day3)...

## 1. Gaussian Blur

### Definition
Gaussian blur is a linear low-pass filter that convolves an image with a Gaussian function to reduce noise and detail by applying a weighted average of pixel values in a local neighborhood.

### Mathematical Formulation
The 2D Gaussian function is defined as:

$$G(x,y) = \frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}$$

Where:
- $(x,y)$ is the distance from the origin
- $\sigma$ is the standard deviation controlling blur strength

For discrete images, we create a kernel of size $(2k+1) \times (2k+1)$ where $k \approx 3\sigma$.

The blurred image is obtained through convolution:

$$I'(x,y) = I(x,y) * G(x,y) = \sum_{i=-k}^{k}\sum_{j=-k}^{k} G(i,j) \cdot I(x-i, y-j)$$

### Implementation for Image Tensor (C,H,W)

```
function gaussian_blur(image, kernel_size, sigma):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    # Create Gaussian kernel
    k = kernel_size // 2
    kernel = zeros((kernel_size, kernel_size))
    
    # Fill kernel with Gaussian values
    for i in range(-k, k+1):
        for j in range(-k, k+1):
            kernel[i+k, j+k] = (1/(2*pi*sigma**2)) * exp(-(i**2 + j**2)/(2*sigma**2))
    
    # Normalize kernel
    kernel = kernel / kernel.sum()
    
    # Apply convolution with padding
    padded = pad(image, ((0,0), (k,k), (k,k)), mode='reflect')
    
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                # Apply kernel
                for i in range(kernel_size):
                    for j in range(kernel_size):
                        output[ch, y, x] += kernel[i, j] * padded[ch, y+i, x+j]
    
    return output
```

### Separable Implementation (Optimized)

```
function gaussian_blur_separable(image, kernel_size, sigma):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    
    # Create 1D Gaussian kernel
    k = kernel_size // 2
    kernel_1d = zeros(kernel_size)
    
    for i in range(-k, k+1):
        kernel_1d[i+k] = (1/(sqrt(2*pi)*sigma)) * exp(-(i**2)/(2*sigma**2))
    
    # Normalize kernel
    kernel_1d = kernel_1d / kernel_1d.sum()
    
    # Apply horizontal blur
    temp = zeros_like(image)
    padded_h = pad(image, ((0,0), (0,0), (k,k)), mode='reflect')
    
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                for j in range(kernel_size):
                    temp[ch, y, x] += kernel_1d[j] * padded_h[ch, y, x+j]
    
    # Apply vertical blur
    output = zeros_like(image)
    padded_v = pad(temp, ((0,0), (k,k), (0,0)), mode='reflect')
    
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                for i in range(kernel_size):
                    output[ch, y, x] += kernel_1d[i] * padded_v[ch, y+i, x]
    
    return output
```

## 2. Sharpening

### Definition
Sharpening enhances edges and details in an image by amplifying high-frequency components, typically achieved by adding a scaled version of the image's Laplacian to the original image.

### Mathematical Formulation
The unsharp masking algorithm is defined as:

$$I'(x,y) = I(x,y) + \lambda \cdot L(x,y)$$

Where:
- $I(x,y)$ is the original image
- $L(x,y)$ is the Laplacian of the image
- $\lambda$ is a scaling factor controlling sharpening strength

The Laplacian kernel is typically:

$$L = \begin{bmatrix} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{bmatrix}$$

Or for 8-connectivity:

$$L = \begin{bmatrix} -1 & -1 & -1 \\ -1 & 8 & -1 \\ -1 & -1 & -1 \end{bmatrix}$$

### Implementation for Image Tensor (C,H,W)

```
function sharpen_image(image, strength=1.0):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    # Laplacian kernel
    laplacian_kernel = array([
        [0, -1, 0],
        [-1, 4, -1],
        [0, -1, 0]
    ])
    
    # Apply convolution with padding
    padded = pad(image, ((0,0), (1,1), (1,1)), mode='reflect')
    
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                # Compute Laplacian
                laplacian = 0
                for i in range(3):
                    for j in range(3):
                        laplacian += laplacian_kernel[i, j] * padded[ch, y+i, x+j]
                
                # Apply sharpening
                output[ch, y, x] = clip(image[ch, y, x] + strength * laplacian, 0, 255)
    
    return output
```

### Single-Pass Implementation with Combined Kernel

```
function sharpen_image_single_pass(image, strength=1.0):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    # Combined sharpening kernel
    center_value = 4 * strength + 1
    sharpening_kernel = array([
        [0, -strength, 0],
        [-strength, center_value, -strength],
        [0, -strength, 0]
    ])
    
    # Apply convolution with padding
    padded = pad(image, ((0,0), (1,1), (1,1)), mode='reflect')
    
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                # Apply kernel
                pixel_value = 0
                for i in range(3):
                    for j in range(3):
                        pixel_value += sharpening_kernel[i, j] * padded[ch, y+i, x+j]
                
                output[ch, y, x] = clip(pixel_value, 0, 255)
    
    return output
```

## 3. Affine Transformations

### Definition
Affine transformations preserve collinearity and parallelism of lines while allowing translation, rotation, scaling, and shearing operations in a combined linear manner.

### Mathematical Formulation
An affine transformation can be represented in matrix form:

$$\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \end{bmatrix}$$

Using homogeneous coordinates:

$$\begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = \begin{bmatrix} a & b & t_x \\ c & d & t_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}$$

Key affine transformation matrices:

1. Translation by $(t_x, t_y)$:
   $$T = \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{bmatrix}$$

2. Rotation by angle $\theta$:
   $$R = \begin{bmatrix} \cos(\theta) & -\sin(\theta) & 0 \\ \sin(\theta) & \cos(\theta) & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

3. Scaling by factors $(s_x, s_y)$:
   $$S = \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

4. Shearing by factors $(s_h^x, s_h^y)$:
   $$H = \begin{bmatrix} 1 & s_h^x & 0 \\ s_h^y & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

### Implementation for Image Tensor (C,H,W)

```
function affine_transform(image, matrix, output_shape=None):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    
    if output_shape is None:
        output_shape = (h, w)
    
    output_h, output_w = output_shape
    output = zeros((c, output_h, output_w))
    
    # Inverse mapping
    inv_matrix = inverse(matrix)
    
    for y_out in range(output_h):
        for x_out in range(output_w):
            # Convert to homogeneous coordinates
            p_out = array([x_out, y_out, 1])
            
            # Apply inverse transform to find source pixel
            p_in = inv_matrix @ p_out
            
            # Convert back from homogeneous coordinates
            x_in, y_in = p_in[0] / p_in[2], p_in[1] / p_in[2]
            
            # Check if source pixel is within bounds
            if 0 <= x_in < w-1 and 0 <= y_in < h-1:
                # Bilinear interpolation
                x0, y0 = int(x_in), int(y_in)
                x1, y1 = x0 + 1, y0 + 1
                
                dx = x_in - x0
                dy = y_in - y0
                
                for ch in range(c):
                    # Interpolate
                    val = (1-dx)*(1-dy)*image[ch, y0, x0] + \
                          dx*(1-dy)*image[ch, y0, x1] + \
                          (1-dx)*dy*image[ch, y1, x0] + \
                          dx*dy*image[ch, y1, x1]
                    
                    output[ch, y_out, x_out] = val
    
    return output
```

### Matrix Creation Function

```
function create_affine_matrix(rotation=0, scale=(1, 1), translation=(0, 0), shear=(0, 0)):
    # Convert rotation to radians
    theta = rotation * pi / 180
    
    # Create rotation matrix
    rot_matrix = array([
        [cos(theta), -sin(theta), 0],
        [sin(theta), cos(theta), 0],
        [0, 0, 1]
    ])
    
    # Create scaling matrix
    scale_matrix = array([
        [scale[0], 0, 0],
        [0, scale[1], 0],
        [0, 0, 1]
    ])
    
    # Create shear matrix
    shear_matrix = array([
        [1, shear[0], 0],
        [shear[1], 1, 0],
        [0, 0, 1]
    ])
    
    # Create translation matrix
    trans_matrix = array([
        [1, 0, translation[0]],
        [0, 1, translation[1]],
        [0, 0, 1]
    ])
    
    # Combine all transformations (order matters)
    # First scale, then shear, then rotate, finally translate
    return trans_matrix @ rot_matrix @ shear_matrix @ scale_matrix
```

## 4. Elastic Deformations

### Definition
Elastic deformation simulates random local distortions in an image by applying a dense displacement field, creating realistic variations while preserving the underlying structure.

### Mathematical Formulation
The process involves:

1. Generate random displacement fields $\Delta x(x,y)$ and $\Delta y(x,y)$
2. Smooth these fields using a Gaussian filter with standard deviation $\sigma$
3. Scale the displacement fields by factor $\alpha$
4. Compute new pixel coordinates:
   $$x' = x + \alpha \cdot \Delta x(x,y)$$
   $$y' = y + \alpha \cdot \Delta y(x,y)$$

The probability density function of the Gaussian filter is:

$$G(x,y) = \frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}$$

### Implementation for Image Tensor (C,H,W)

```
function elastic_deformation(image, alpha=10, sigma=4):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = zeros_like(image)
    
    # Generate random displacement fields
    dx = random_uniform(-1, 1, size=(h, w))
    dy = random_uniform(-1, 1, size=(h, w))
    
    # Smooth displacement fields using Gaussian filter
    dx = gaussian_filter(dx, sigma=sigma)
    dy = gaussian_filter(dy, sigma=sigma)
    
    # Scale displacement fields
    dx = dx * alpha
    dy = dy * alpha
    
    # Create mesh grid of coordinates
    y_indices, x_indices = meshgrid(range(h), range(w), indexing='ij')
    
    # Apply displacement fields
    x_mapped = x_indices + dx
    y_mapped = y_indices + dy
    
    # Clip to ensure indices are within image boundaries
    x_mapped = clip(x_mapped, 0, w-1)
    y_mapped = clip(y_mapped, 0, h-1)
    
    # Interpolate values for each channel
    for ch in range(c):
        for y in range(h):
            for x in range(w):
                # Get source coordinates
                x_src, y_src = x_mapped[y, x], y_mapped[y, x]
                
                # Bilinear interpolation
                x0, y0 = int(x_src), int(y_src)
                x1, y1 = min(x0 + 1, w-1), min(y0 + 1, h-1)
                
                dx_local = x_src - x0
                dy_local = y_src - y0
                
                # Interpolate
                val = (1-dx_local)*(1-dy_local)*image[ch, y0, x0] + \
                      dx_local*(1-dy_local)*image[ch, y0, x1] + \
                      (1-dx_local)*dy_local*image[ch, y1, x0] + \
                      dx_local*dy_local*image[ch, y1, x1]
                
                output[ch, y, x] = val
    
    return output
```

### Vectorized Implementation

```
function elastic_deformation_vectorized(image, alpha=10, sigma=4):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    
    # Generate random displacement fields
    dx = random_uniform(-1, 1, size=(h, w))
    dy = random_uniform(-1, 1, size=(h, w))
    
    # Smooth displacement fields using Gaussian filter
    dx = gaussian_filter(dx, sigma=sigma)
    dy = gaussian_filter(dy, sigma=sigma)
    
    # Scale displacement fields
    dx = dx * alpha
    dy = dy * alpha
    
    # Create mesh grid of coordinates
    y_indices, x_indices = meshgrid(range(h), range(w), indexing='ij')
    
    # Apply displacement fields
    x_mapped = x_indices + dx
    y_mapped = y_indices + dy
    
    # Clip to ensure indices are within image boundaries
    x_mapped = clip(x_mapped, 0, w-1)
    y_mapped = clip(y_mapped, 0, h-1)
    
    # Use map_coordinates for each channel
    output = zeros_like(image)
    for ch in range(c):
        output[ch] = map_coordinates(image[ch], [y_mapped, x_mapped], order=1)
    
    return output
```

## 5. Random Erasing/Cutout

### Definition
Random Erasing (Cutout) is a data augmentation technique that randomly selects rectangular regions in an image and replaces them with constant values or noise, forcing the model to learn more robust features.

### Mathematical Formulation
For an image $I$ and a randomly selected rectangular region $R(x_1:x_2, y_1:y_2)$:

$$I'(x,y,c) = \begin{cases} 
v, & \text{if } (x,y) \in R \\
I(x,y,c), & \text{otherwise}
\end{cases}$$

Key parameters:
- $p$: probability of applying erasing
- $s_l, s_h$: min/max area ratio of erased rectangle
- $r_1, r_2$: min/max aspect ratio of erased rectangle
- $v$: replacement value (0, mean, or random)

### Implementation for Image Tensor (C,H,W)

```
function random_erasing(image, p=0.5, s_l=0.02, s_h=0.4, r_1=0.3, r_2=3.3, v=0):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = copy(image)
    
    # Apply random erasing with probability p
    if random_uniform() > p:
        return output
    
    # Get random area and aspect ratio
    area = h * w
    target_area = random_uniform(s_l, s_h) * area
    aspect_ratio = random_uniform(r_1, r_2)
    
    # Calculate dimensions
    h_cutout = int(sqrt(target_area * aspect_ratio))
    w_cutout = int(sqrt(target_area / aspect_ratio))
    
    # Ensure cutout dimensions don't exceed image dimensions
    h_cutout = min(h_cutout, h)
    w_cutout = min(w_cutout, w)
    
    # Get random top-left corner
    x0 = random_int(0, w - w_cutout)
    y0 = random_int(0, h - h_cutout)
    
    # Fill the region with the specified value
    if v == 'random':
        for ch in range(c):
            output[ch, y0:y0+h_cutout, x0:x0+w_cutout] = random_uniform(0, 255, size=(h_cutout, w_cutout))
    elif v == 'mean':
        for ch in range(c):
            output[ch, y0:y0+h_cutout, x0:x0+w_cutout] = image[ch].mean()
    else:  # Default: fill with zero or specified value
        for ch in range(c):
            output[ch, y0:y0+h_cutout, x0:x0+w_cutout] = v
    
    return output
```

### Vectorized Implementation

```
function random_erasing_vectorized(image, p=0.5, s_l=0.02, s_h=0.4, r_1=0.3, r_2=3.3, v=0):
    # image: tensor of shape (c, h, w)
    c, h, w = image.shape
    output = copy(image)
    
    # Apply random erasing with probability p
    if random_uniform() > p:
        return output
    
    # Get random area and aspect ratio
    area = h * w
    target_area = random_uniform(s_l, s_h) * area
    aspect_ratio = random_uniform(r_1, r_2)
    
    # Calculate dimensions
    h_cutout = int(sqrt(target_area * aspect_ratio))
    w_cutout = int(sqrt(target_area / aspect_ratio))
    
    # Ensure dimensions don't exceed image dimensions
    h_cutout = min(h_cutout, h)
    w_cutout = min(w_cutout, w)
    
    # Get random top-left corner
    x0 = random_int(0, w - w_cutout)
    y0 = random_int(0, h - h_cutout)
    
    # Create slicing indices
    y_slice = slice(y0, y0 + h_cutout)
    x_slice = slice(x0, x0 + w_cutout)
    
    # Fill the region with the specified value
    if v == 'random':
        for ch in range(c):
            output[ch, y_slice, x_slice] = random_uniform(0, 255, size=(h_cutout, w_cutout))
    elif v == 'mean':
        for ch in range(c):
            output[ch, y_slice, x_slice] = image[ch].mean()
    else:  # Default: fill with zero or specified value
        output[:, y_slice, x_slice] = v
    
    return output
```

# continue ...

## Mixup

### Definition
Mixup is a data augmentation technique that creates virtual training examples by linearly interpolating both inputs and labels of randomly sampled pairs of examples.

### Mathematical Formulation
Given two input-label pairs $(x_i, y_i)$ and $(x_j, y_j)$, Mixup generates a new virtual sample $(\tilde{x}, \tilde{y})$:

$$\tilde{x} = \lambda x_i + (1 - \lambda) x_j$$
$$\tilde{y} = \lambda y_i + (1 - \lambda) y_j$$

where $\lambda \in [0, 1]$ is sampled from a Beta distribution $\text{Beta}(\alpha, \alpha)$ with $\alpha$ as a hyperparameter controlling the strength of interpolation.

### Algorithm
```python
def mixup(batch_x, batch_y, alpha=1.0):
    '''
    batch_x: Input images with shape (batch_size, channels, height, width)
    batch_y: One-hot encoded labels
    alpha: Parameter for Beta distribution
    '''
    batch_size = len(batch_x)
    
    # Sample mixing parameter lambda
    lam = np.random.beta(alpha, alpha, batch_size)
    lam = np.max(lam, 1-lam)  # Ensure lambda >= 0.5 for better stability
    lam = lam.reshape(-1, 1, 1, 1)  # Shape for broadcasting
    
    # Generate random indices for pairs
    indices = np.random.permutation(batch_size)
    
    # Create mixed inputs and targets
    mixed_x = lam * batch_x + (1 - lam) * batch_x[indices]
    mixed_y = lam.reshape(-1, 1) * batch_y + (1 - lam).reshape(-1, 1) * batch_y[indices]
    
    return mixed_x, mixed_y
```

### Properties
- Encourages linear behavior between training examples
- Reduces memorization of corrupt or noisy labels
- Improves robustness to adversarial examples
- Acts as a form of regularization by constraining the network to behave linearly between training samples

## CutMix

### Definition
CutMix is an augmentation strategy that replaces a rectangular region of an image with a patch from another image while mixing the labels proportionally to the area of the replaced region.

### Mathematical Formulation
For two input-label pairs $(x_A, y_A)$ and $(x_B, y_B)$:

$$\tilde{x} = \mathbf{M} \odot x_A + (1 - \mathbf{M}) \odot x_B$$
$$\tilde{y} = \lambda y_A + (1 - \lambda) y_B$$

where:
- $\mathbf{M} \in \{0, 1\}^{C \times H \times W}$ is a binary mask with 1s for regions from image A and 0s for regions from image B
- $\odot$ represents element-wise multiplication
- $\lambda$ is the ratio of the remaining area of image A to the total image area, calculated as $\lambda = \frac{|\mathbf{M}|}{C \times H \times W}$
- $|\mathbf{M}|$ is the number of ones in the mask

### Algorithm
```python
def cutmix(batch_x, batch_y, alpha=1.0):
    '''
    batch_x: Input images with shape (batch_size, channels, height, width)
    batch_y: One-hot encoded labels
    alpha: Parameter for Beta distribution controlling box size
    '''
    batch_size, c, h, w = batch_x.shape
    
    # Sample mixing parameter lambda from Beta distribution
    lam = np.random.beta(alpha, alpha)
    
    # Generate random indices for pairs
    indices = np.random.permutation(batch_size)
    
    # Get random box coordinates
    cut_ratio = np.sqrt(1.0 - lam)
    cut_w = int(w * cut_ratio)
    cut_h = int(h * cut_ratio)
    
    cx = np.random.randint(w)  # Center x of cut box
    cy = np.random.randint(h)  # Center y of cut box
    
    # Determine box boundaries
    bbx1 = np.clip(cx - cut_w // 2, 0, w)
    bby1 = np.clip(cy - cut_h // 2, 0, h)
    bbx2 = np.clip(cx + cut_w // 2, 0, w)
    bby2 = np.clip(cy + cut_h // 2, 0, h)
    
    # Create new mixed images
    mixed_x = batch_x.copy()
    mixed_x[:, :, bby1:bby2, bbx1:bbx2] = batch_x[indices, :, bby1:bby2, bbx1:bbx2]
    
    # Adjust lambda based on actual box size
    lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (w * h))
    
    # Create mixed labels
    mixed_y = lam * batch_y + (1 - lam) * batch_y[indices]
    
    return mixed_x, mixed_y
```

### Properties
- Preserves more local information than Mixup
- Introduces stronger regularization by removing portions of informative regions
- Attends to both foreground and background contexts
- Helps models focus on less discriminative parts of objects

## Color Jittering

### Definition
Color jittering is an augmentation technique that randomly alters color properties of images, modifying brightness, contrast, saturation, and hue.

### Mathematical Formulation
For an input image $x$ with shape $(C, H, W)$, color jittering applies the following sequential transformations:

1. Brightness: $x' = x \cdot (1 + \delta_b)$ where $\delta_b \sim \mathcal{U}(-b, b)$
2. Contrast: $x'' = (x' - \mu) \cdot (1 + \delta_c) + \mu$ where $\delta_c \sim \mathcal{U}(-c, c)$ and $\mu$ is the mean pixel value
3. Saturation: Converted to HSV color space, then $S_{new} = S \cdot (1 + \delta_s)$ where $\delta_s \sim \mathcal{U}(-s, s)$
4. Hue: Converted to HSV color space, then $H_{new} = (H + \delta_h) \mod 1$ where $\delta_h \sim \mathcal{U}(-h, h)$

The parameters $b, c, s, h$ control the strength of each transformation.

### Algorithm
```python
def color_jittering(image, brightness=0.4, contrast=0.4, saturation=0.4, hue=0.1):
    '''
    image: Input image with shape (channels, height, width)
    brightness, contrast, saturation, hue: Maximum perturbation strengths
    '''
    # Apply transformations in random order
    transforms = []
    
    # Brightness adjustment
    if brightness > 0:
        brightness_factor = random.uniform(max(0, 1-brightness), 1+brightness)
        transforms.append(lambda img: adjust_brightness(img, brightness_factor))
    
    # Contrast adjustment
    if contrast > 0:
        contrast_factor = random.uniform(max(0, 1-contrast), 1+contrast)
        transforms.append(lambda img: adjust_contrast(img, contrast_factor))
    
    # Saturation adjustment
    if saturation > 0:
        saturation_factor = random.uniform(max(0, 1-saturation), 1+saturation)
        transforms.append(lambda img: adjust_saturation(img, saturation_factor))
    
    # Hue adjustment
    if hue > 0:
        hue_factor = random.uniform(-hue, hue)
        transforms.append(lambda img: adjust_hue(img, hue_factor))
    
    # Shuffle transform order
    random.shuffle(transforms)
    
    # Apply transforms sequentially
    result = image.copy()
    for transform in transforms:
        result = transform(result)
    
    return result
```

### Properties
- Increases robustness to lighting and color variations
- Simulates different camera sensors, lighting conditions, and processing pipelines
- Reduces model dependency on specific color patterns
- Particularly effective for outdoor scenes and varying illumination conditions

## Channel Shuffling

### Definition
Channel shuffling randomly permutes the color channels of an image, creating variations in color representation while preserving spatial information.

### Mathematical Formulation
For an RGB image $x \in \mathbb{R}^{3 \times H \times W}$ with channels $[x_R, x_G, x_B]$, channel shuffling applies a random permutation $\sigma$ of the set $\{0, 1, 2\}$:

$$\tilde{x} = [x_{\sigma(0)}, x_{\sigma(1)}, x_{\sigma(2)}]$$

For images with more than 3 channels, this generalizes to permuting any subset of channels.

### Algorithm
```python
def channel_shuffling(image):
    '''
    image: Input image with shape (channels, height, width)
    '''
    c, h, w = image.shape
    
    # Generate random permutation of channels
    permutation = np.random.permutation(c)
    
    # Apply the permutation
    shuffled_image = image[permutation, :, :]
    
    return shuffled_image
```

### Properties
- Forces the model to rely less on specific channel correlations
- Simulates different color spaces and representations
- Prevents overfitting to specific color channel patterns
- Particularly useful for RGB images but can be applied to any multi-channel data

## Grid Distortion

### Definition
Grid distortion is a spatial augmentation technique that applies local elastic deformations to images by perturbing a regular grid of control points and using interpolation to compute the new pixel locations.

### Mathematical Formulation
1. Create a regular grid of points $G = \{(i,j) | i=0,1,...,n; j=0,1,...,m\}$ over the image
2. Perturb each point with random offsets:
   $$G'_{i,j} = (i+\delta_{i,j}^x, j+\delta_{i,j}^y)$$
   where $\delta_{i,j}^x, \delta_{i,j}^y \sim \mathcal{U}(-\Delta, \Delta)$ with $\Delta$ controlling distortion magnitude
3. For each pixel $(x,y)$ in the output image, find its location in the input image:
   $$(x', y') = \mathcal{I}(x, y, G, G')$$
   where $\mathcal{I}$ is an interpolation function (bilinear, cubic, etc.)
4. Sample the input image at position $(x', y')$ to get the output pixel value

### Algorithm
```python
def grid_distortion(image, num_steps=5, distort_limit=0.3, interpolation='linear'):
    '''
    image: Input image with shape (channels, height, width)
    num_steps: Number of grid cells in each dimension
    distort_limit: Maximum displacement as a fraction of step size
    interpolation: Interpolation method ('linear', 'cubic')
    '''
    c, h, w = image.shape
    
    # Create regular grid
    x_steps = np.linspace(0, w-1, num_steps)
    y_steps = np.linspace(0, h-1, num_steps)
    x_grid, y_grid = np.meshgrid(x_steps, y_steps)
    
    # Create random displacement field
    dx = np.random.uniform(-distort_limit, distort_limit, size=(num_steps, num_steps)) * (w // (num_steps-1))
    dy = np.random.uniform(-distort_limit, distort_limit, size=(num_steps, num_steps)) * (h // (num_steps-1))
    
    # Displace grid points
    x_grid_distorted = x_grid + dx
    y_grid_distorted = y_grid + dy
    
    # Flatten grid points for interpolation
    src_points = np.stack([y_grid.flatten(), x_grid.flatten()], axis=-1)
    dst_points = np.stack([y_grid_distorted.flatten(), x_grid_distorted.flatten()], axis=-1)
    
    # Generate pixel mapping function using specified interpolation
    map_func = get_mapping_function(src_points, dst_points, (h, w), interpolation)
    
    # Apply transformation channel by channel
    result = np.zeros_like(image)
    for i in range(c):
        result[i] = map_func(image[i])
    
    return result
```

### Properties
- Creates realistic local deformations that simulate object movements and camera perspective changes
- Preserves topology of images while introducing geometric variance
- More realistic than global affine transformations for organic objects
- Effective for medical imaging, object recognition, and scenes with non-rigid objects
- Helps models learn invariance to local geometric distortions

# Grayscale Conversion

## Definition
Grayscale conversion transforms a color image into a single-channel representation where pixel intensity represents brightness, eliminating chrominance information while preserving luminance.

## Mathematical Formulation
For an RGB image with dimensions $(C, H, W)$ where $C=3$, grayscale conversion applies a weighted sum of the RGB channels:

$$Y = \alpha_R \cdot R + \alpha_G \cdot G + \alpha_B \cdot B$$

Standard weights follow human perception sensitivity:
$$Y = 0.299 \cdot R + 0.587 \cdot G + 0.114 \cdot B$$

Alternative ITU-R BT.709 standard:
$$Y = 0.2126 \cdot R + 0.7152 \cdot G + 0.0722 \cdot B$$

## Implementation
```python
def grayscale_conversion(image):
    """
    Convert RGB image to grayscale
    Args:
        image: numpy array of shape (C, H, W) with C=3
    Returns:
        grayscale_image: numpy array of shape (1, H, W)
    """
    # Extract RGB channels
    R, G, B = image[0], image[1], image[2]
    
    # Apply weighted sum
    gray = 0.299 * R + 0.587 * G + 0.114 * B
    
    # Reshape to (1, H, W)
    return gray.reshape(1, gray.shape[0], gray.shape[1])
```

# Normalization

## Definition
Normalization standardizes pixel intensity values to a specified range, improving model convergence and performance by eliminating dataset biases.

## Mathematical Formulations

### Min-Max Normalization
Scales values to range $[a, b]$ (typically $[0, 1]$):

$$X_{norm} = a + \frac{(X - X_{min})(b - a)}{X_{max} - X_{min}}$$

### Z-Score Normalization
Transforms values to have zero mean and unit variance:

$$X_{norm} = \frac{X - \mu}{\sigma}$$

Where $\mu$ is mean and $\sigma$ is standard deviation.

### Channel-wise Normalization
For RGB images, typically applied per channel using mean ($\mu_c$) and standard deviation ($\sigma_c$):

$$X_{norm}(c,h,w) = \frac{X(c,h,w) - \mu_c}{\sigma_c}$$

## Implementation
```python
def normalize_minmax(image, a=0, b=1):
    """
    Apply min-max normalization
    Args:
        image: numpy array of shape (C, H, W)
        a, b: target range bounds
    Returns:
        normalized image
    """
    C, H, W = image.shape
    normalized = np.zeros_like(image, dtype=float)
    
    for c in range(C):
        channel = image[c]
        min_val = np.min(channel)
        max_val = np.max(channel)
        normalized[c] = a + (channel - min_val) * (b - a) / (max_val - min_val)
    
    return normalized

def normalize_zscore(image):
    """
    Apply z-score normalization
    Args:
        image: numpy array of shape (C, H, W)
    Returns:
        normalized image
    """
    C, H, W = image.shape
    normalized = np.zeros_like(image, dtype=float)
    
    for c in range(C):
        channel = image[c]
        mean = np.mean(channel)
        std = np.std(channel)
        normalized[c] = (channel - mean) / std
    
    return normalized
```

# Shearing

## Definition
Shearing is an affine transformation that displaces each point in an image by a distance proportional to its distance from an axis, creating a trapezoidal distortion effect.

## Mathematical Formulation
The shearing transformation can be represented by the matrix:

$$S_x = \begin{bmatrix} 1 & s_x & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}, S_y = \begin{bmatrix} 1 & 0 & 0 \\ s_y & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}$$

For a point $(x, y)$, horizontal shearing transforms it to:

$$x' = x + s_x \cdot y$$
$$y' = y$$

Vertical shearing transforms it to:

$$x' = x$$
$$y' = s_y \cdot x + y$$

## Implementation
```python
def shear_image(image, shear_x=0, shear_y=0):
    """
    Apply shearing transformation to image
    Args:
        image: numpy array of shape (C, H, W)
        shear_x: horizontal shear factor
        shear_y: vertical shear factor
    Returns:
        sheared image
    """
    C, H, W = image.shape
    result = np.zeros_like(image)
    
    # Create transformation matrix
    transform_matrix = np.array([
        [1, shear_x, 0],
        [shear_y, 1, 0],
        [0, 0, 1]
    ])
    
    # Create coordinate matrices
    y_coords, x_coords = np.mgrid[0:H, 0:W]
    coords = np.stack([x_coords.flatten(), y_coords.flatten(), np.ones_like(x_coords).flatten()])
    
    # Apply transformation
    new_coords = np.dot(transform_matrix, coords)
    x_new, y_new = new_coords[0].reshape(H, W), new_coords[1].reshape(H, W)
    
    # Interpolate values and handle out-of-bounds
    for c in range(C):
        for y in range(H):
            for x in range(W):
                src_x, src_y = x_new[y, x], y_new[y, x]
                
                if 0 <= src_x < W-1 and 0 <= src_y < H-1:
                    # Bilinear interpolation
                    x0, y0 = int(src_x), int(src_y)
                    dx, dy = src_x - x0, src_y - y0
                    
                    result[c, y, x] = (1-dx)*(1-dy)*image[c, y0, x0] + \
                                     dx*(1-dy)*image[c, y0, x0+1] + \
                                     (1-dx)*dy*image[c, y0+1, x0] + \
                                     dx*dy*image[c, y0+1, x0+1]
    
    return result
```

# Solarization

## Definition
Solarization is a non-linear transformation that partially inverts image tones at specific intensity thresholds, creating a distinctive effect where shadows or highlights may appear reversed.

## Mathematical Formulation
For a pixel intensity value $p$ and threshold $t$:

$$p' = \begin{cases} 
p & \text{if } p < t \\
255 - p & \text{if } p \geq t
\end{cases}$$

Alternative formulation with continuous transition:

$$p' = \begin{cases}
p & \text{if } p < t_1 \\
(255 - p) \cdot \frac{p - t_1}{t_2 - t_1} + p \cdot \frac{t_2 - p}{t_2 - t_1} & \text{if } t_1 \leq p \leq t_2 \\
255 - p & \text{if } p > t_2
\end{cases}$$

## Implementation
```python
def solarize(image, threshold=128):
    """
    Apply solarization effect to image
    Args:
        image: numpy array of shape (C, H, W)
        threshold: intensity threshold for inversion
    Returns:
        solarized image
    """
    result = image.copy()
    mask = image >= threshold
    result[mask] = 255 - result[mask]
    return result

def solarize_smooth(image, threshold_low=100, threshold_high=150):
    """
    Apply smooth solarization with transition zone
    Args:
        image: numpy array of shape (C, H, W)
        threshold_low: lower bound of transition
        threshold_high: upper bound of transition
    Returns:
        smoothly solarized image
    """
    result = image.copy()
    
    # No change below threshold_low
    mask_transition = (image >= threshold_low) & (image <= threshold_high)
    mask_invert = image > threshold_high
    
    # Calculate transition weights
    weight = (image[mask_transition] - threshold_low) / (threshold_high - threshold_low)
    
    # Apply smooth transition
    result[mask_transition] = (255 - image[mask_transition]) * weight + image[mask_transition] * (1 - weight)
    
    # Full inversion above threshold_high
    result[mask_invert] = 255 - image[mask_invert]
    
    return result
```

# Posterization

## Definition
Posterization reduces the number of distinct color/intensity levels in an image, creating regions of flat color separated by abrupt transitions, simplifying the visual information.

## Mathematical Formulation
For a pixel value $p$ with bit depth $b$ (typically 8 for standard images) reduced to $n$ bits:

$$p' = \left\lfloor \frac{p \times 2^n}{2^b} \right\rfloor \times \frac{2^b}{2^n}$$

Simplified for 8-bit images:

$$p' = \left\lfloor \frac{p}{2^{8-n}} \right\rfloor \times 2^{8-n}$$

## Implementation
```python
def posterize(image, bits=2):
    """
    Reduce image to fewer intensity levels
    Args:
        image: numpy array of shape (C, H, W) with values 0-255
        bits: number of bits to keep (1-8)
    Returns:
        posterized image
    """
    if bits < 1 or bits > 8:
        raise ValueError("Bits must be between 1 and 8")
    
    # Create bit mask and shift values
    mask = 2**(8 - bits) - 1
    shift = 8 - bits
    
    result = image.copy()
    
    # Zero out the lower bits and scale back
    result = (result & ~mask) | (mask // 2)
    
    # Alternative implementation:
    # result = (image // (2**shift)) * (2**shift)
    
    return result
```

# Mosaic Augmentation

## Definition
Mosaic augmentation combines multiple images (typically four) into a single training sample by stitching them together in a grid pattern, enriching object detection training with varied contexts and scales.

## Mathematical Formulation
For four images $I_1, I_2, I_3, I_4$ with corresponding labels, a new image $I_{mosaic}$ is created:

$$I_{mosaic}(x,y) = \begin{cases}
I_1(x,y) & \text{if } x < x_c \text{ and } y < y_c \\
I_2(x-(W-x_c),y) & \text{if } x \geq x_c \text{ and } y < y_c \\
I_3(x,y-(H-y_c)) & \text{if } x < x_c \text{ and } y \geq y_c \\
I_4(x-(W-x_c),y-(H-y_c)) & \text{if } x \geq x_c \text{ and } y \geq y_c
\end{cases}$$

Where $(x_c, y_c)$ is the center point, and $W, H$ are target dimensions.

## Implementation
```python
def mosaic_augmentation(images, labels, target_size=(640, 640)):
    """
    Combine 4 images into one mosaic
    Args:
        images: list of 4 numpy arrays of shape (C, H, W)
        labels: list of 4 arrays of bounding boxes [class, x, y, w, h]
        target_size: tuple of (height, width) for output image
    Returns:
        mosaic_image: combined image
        mosaic_labels: adjusted bounding boxes
    """
    assert len(images) == 4, "Mosaic requires exactly 4 images"
    
    # Create output arrays
    H, W = target_size
    mosaic_img = np.zeros((images[0].shape[0], H, W), dtype=np.uint8)
    mosaic_labels = []
    
    # Choose random center point
    center_x = int(random.uniform(W//4, 3*W//4))
    center_y = int(random.uniform(H//4, 3*H//4))
    
    # Define quadrant coordinates
    xc, yc = center_x, center_y
    
    # Process each image and place in mosaic
    for i, (img, img_labels) in enumerate(zip(images, labels)):
        # Original dimensions
        h, w = img.shape[1], img.shape[2]
        
        # Place image in proper position based on quadrant
        if i == 0:  # top-left
            x1a, y1a, x2a, y2a = 0, 0, xc, yc  # mosaic coordinates
            x1b, y1b, x2b, y2b = w - xc, h - yc, w, h  # img coordinates
        elif i == 1:  # top-right
            x1a, y1a, x2a, y2a = xc, 0, W, yc
            x1b, y1b, x2b, y2b = 0, h - yc, W - xc, h
        elif i == 2:  # bottom-left
            x1a, y1a, x2a, y2a = 0, yc, xc, H
            x1b, y1b, x2b, y2b = w - xc, 0, w, H - yc
        elif i == 3:  # bottom-right
            x1a, y1a, x2a, y2a = xc, yc, W, H
            x1b, y1b, x2b, y2b = 0, 0, W - xc, H - yc
        
        # Copy image segment
        mosaic_img[:, y1a:y2a, x1a:x2a] = img[:, y1b:y2b, x1b:x2b]
        
        # Adjust labels (bounding boxes)
        if len(img_labels) > 0:
            # Extract bounding box coordinates
            boxes = img_labels.copy()
            
            # Convert bounding box coordinates from [x_center, y_center, width, height]
            # to [x1, y1, x2, y2]
            boxes[:, 1] = boxes[:, 1] * w
            boxes[:, 2] = boxes[:, 2] * h
            boxes[:, 3] = boxes[:, 3] * w
            boxes[:, 4] = boxes[:, 4] * h
            
            # Convert to corners
            boxes_corners = np.zeros_like(boxes)
            boxes_corners[:, 0] = boxes[:, 0]  # class
            boxes_corners[:, 1] = boxes[:, 1] - boxes[:, 3]/2  # x1
            boxes_corners[:, 2] = boxes[:, 2] - boxes[:, 4]/2  # y1
            boxes_corners[:, 3] = boxes[:, 1] + boxes[:, 3]/2  # x2
            boxes_corners[:, 4] = boxes[:, 2] + boxes[:, 4]/2  # y2
            
            # Adjust coordinates based on placement
            if i == 0:  # top-left
                boxes_corners[:, 1:5] = boxes_corners[:, 1:5] - np.array([w - xc, h - yc, w - xc, h - yc])
            elif i == 1:  # top-right
                boxes_corners[:, 1:5] = boxes_corners[:, 1:5] + np.array([xc, -h + yc, xc, -h + yc])
            elif i == 2:  # bottom-left
                boxes_corners[:, 1:5] = boxes_corners[:, 1:5] + np.array([-w + xc, yc, -w + xc, yc])
            elif i == 3:  # bottom-right
                boxes_corners[:, 1:5] = boxes_corners[:, 1:5] + np.array([xc, yc, xc, yc])
            
            # Filter out boxes that are outside the mosaic
            valid_indices = np.all(
                np.array([
                    boxes_corners[:, 1] < W,
                    boxes_corners[:, 2] < H,
                    boxes_corners[:, 3] > 0,
                    boxes_corners[:, 4] > 0
                ]),
                axis=0
            )
            
            # Add valid boxes to mosaic labels
            if np.any(valid_indices):
                # Convert back to [class, x_center, y_center, width, height] format
                valid_boxes = boxes_corners[valid_indices]
                valid_boxes_formatted = np.zeros_like(valid_boxes)
                valid_boxes_formatted[:, 0] = valid_boxes[:, 0]  # class
                valid_boxes_formatted[:, 1] = (valid_boxes[:, 1] + valid_boxes[:, 3]) / 2  # x_center
                valid_boxes_formatted[:, 2] = (valid_boxes[:, 2] + valid_boxes[:, 4]) / 2  # y_center
                valid_boxes_formatted[:, 3] = valid_boxes[:, 3] - valid_boxes[:, 1]  # width
                valid_boxes_formatted[:, 4] = valid_boxes[:, 4] - valid_boxes[:, 2]  # height
                
                # Normalize by mosaic dimensions
                valid_boxes_formatted[:, 1] /= W
                valid_boxes_formatted[:, 2] /= H
                valid_boxes_formatted[:, 3] /= W
                valid_boxes_formatted[:, 4] /= H
                
                mosaic_labels.append(valid_boxes_formatted)
    
    # Combine all valid labels
    if len(mosaic_labels) > 0:
        mosaic_labels = np.vstack(mosaic_labels)
    else:
        mosaic_labels = np.zeros((0, 5))
    
    return mosaic_img, mosaic_labels
```

# GridMask

## Definition
GridMask is a structured occlusion-based data augmentation technique that systematically removes information from images by applying a grid-like binary mask. Unlike random dropout methods, GridMask preserves spatial structure correlation by enforcing a regular pattern of occlusions, improving model robustness and generalization.

## Mathematical Formulation

For an input image $X \in \mathbb{R}^{C \times H \times W}$, the GridMask augmentation applies a binary mask $M \in \{0,1\}^{H \times W}$ to produce an augmented image:

$$X' = X \odot M$$

where $\odot$ represents element-wise multiplication applied channel-wise.

The mask $M$ is defined by grid parameters:

$$M(i,j) = 
\begin{cases} 
0, & \text{if } i \bmod (d+r) \in [0,d-1] \text{ and } j \bmod (d+r) \in [0,d-1] \\
1, & \text{otherwise}
\end{cases}$$

Where:
- $d$ is the size of each deleted square region
- $r$ is the size of each kept square region
- $(d+r)$ defines the periodicity of the grid

The effective keep ratio $\alpha$ is controlled by:

$$\alpha = 1 - \frac{d^2}{(d+r)^2}$$

## RandomGridMask Implementation
```python
def gridmask(image, d_ratio=0.5, ratio=0.6, rotate=1):
    """
    Apply GridMask augmentation to an image
    
    Args:
        image: numpy array of shape (C, H, W)
        d_ratio: ratio of hole size to grid size, controls the occlusion severity
        ratio: mask ratio, controls the overall area being masked 
        rotate: rotation angle in radians (randomized if >0)
        
    Returns:
        Augmented image with GridMask applied
    """
    C, H, W = image.shape
    
    # Calculate grid parameters
    h, w = H, W
    
    # Calculate d (hole size) based on image dimensions and d_ratio
    d = int(min(h, w) * d_ratio * ratio)
    
    # Grid spacing (hole + kept region)
    grid_size = int(d / ratio)
    
    # Create meshgrid for coordinates
    x, y = np.meshgrid(np.arange(w), np.arange(h))
    
    # Random rotation if specified
    if rotate:
        angle = np.random.uniform(0, rotate)
        cos_theta = np.cos(angle)
        sin_theta = np.sin(angle)
        
        # Calculate rotation center
        center_x, center_y = w // 2, h // 2
        
        # Translate coordinates to origin, rotate, and translate back
        x_orig, y_orig = x.copy(), y.copy()
        x = (x_orig - center_x) * cos_theta - (y_orig - center_y) * sin_theta + center_x
        y = (x_orig - center_x) * sin_theta + (y_orig - center_y) * cos_theta + center_y
    
    # Compute mask based on grid parameters
    mask = (x % grid_size < (grid_size - d)) | (y % grid_size < (grid_size - d))
    
    # Expand mask to match image dimensions
    mask = np.expand_dims(mask, axis=0).repeat(C, axis=0)
    
    # Apply mask to image
    masked_image = image * mask
    
    return masked_image
```

## Advanced Parameters and Implementation

### Adaptive GridMask
The grid size and hole size can be adapted based on training progress:

$$d(t) = d_{min} + \frac{1}{2}(d_{max} - d_{min})(1 - \cos(\frac{\pi t}{T}))$$

where $t$ is the current epoch and $T$ is the total epochs.

```python
def adaptive_gridmask(image, epoch, total_epochs, d_min=0.1, d_max=0.5, ratio=0.6):
    """
    Apply adaptive GridMask with parameters changing through training
    
    Args:
        image: numpy array of shape (C, H, W)
        epoch: current training epoch
        total_epochs: total training epochs
        d_min: minimum hole size ratio
        d_max: maximum hole size ratio
        ratio: mask ratio
        
    Returns:
        Augmented image with adaptive GridMask
    """
    # Calculate adaptive d_ratio based on training progress
    t = epoch / total_epochs
    d_ratio = d_min + 0.5 * (d_max - d_min) * (1 - np.cos(np.pi * t))
    
    # Apply GridMask with the calculated parameters
    return gridmask(image, d_ratio=d_ratio, ratio=ratio)
```

## Implementation Details

GridMask's efficacy comes from:

1. **Region-Based Information Deletion**: Removes contiguous regions preserving feature context
2. **Structured Patterns**: Regular patterns help models learn invariances to occlusions
3. **Parameter Scheduling**: Gradually increasing difficulty during training

This approach is particularly effective for object detection and instance segmentation tasks, where it outperforms random erasing and cutout by preserving structural information.

# Coarse Dropout

## Definition
Coarse Dropout is an augmentation technique that randomly masks out contiguous regions of an image rather than individual pixels, creating coherent "holes" of missing information. This forces models to learn robust features by inferring missing regions from surrounding context.

## Mathematical Formulation

For an input image $X \in \mathbb{R}^{C \times H \times W}$, a binary mask $M \in \{0,1\}^{H \times W}$ is generated with coarse patterns:

$$X' = X \odot M$$

The mask $M$ is defined as:

$$M(i,j) = 
\begin{cases} 
0, & \text{if } (i,j) \in \mathcal{R} \\
1, & \text{otherwise}
\end{cases}$$

Where $\mathcal{R}$ is the set of pixel coordinates belonging to randomly placed dropout regions.

The probability of a region being dropped is given by:

$$P(\text{dropout region}) = p_{drop}$$

## Implementation

```python
def coarse_dropout(image, p_drop=0.1, size_percent=(0.02, 0.2), per_channel=True):
    """
    Apply Coarse Dropout to an image
    
    Args:
        image: numpy array of shape (C, H, W)
        p_drop: probability of dropping a region
        size_percent: tuple (min_size, max_size) as percentage of image dimensions
        per_channel: whether to apply different masks per channel
        
    Returns:
        Augmented image with Coarse Dropout applied
    """
    C, H, W = image.shape
    result = image.copy()
    
    # Calculate size range in pixels
    min_size = int(min(H, W) * size_percent[0])
    max_size = int(min(H, W) * size_percent[1])
    
    # Ensure minimum size is at least 1 pixel
    min_size = max(1, min_size)
    
    # Number of potential dropout regions
    # Higher number produces more fine-grained dropout pattern
    n_holes = int(p_drop * H * W / (min_size * min_size))
    
    for c in range(C):
        # Skip this channel randomly if not per_channel
        if not per_channel and c > 0:
            result[c] = result[0]  # Copy mask from first channel
            continue
            
        # Generate coarse dropout mask for this channel
        mask = np.ones((H, W), dtype=np.bool)
        
        for _ in range(n_holes):
            # Random region size
            size_h = np.random.randint(min_size, max_size)
            size_w = np.random.randint(min_size, max_size)
            
            # Random position
            y = np.random.randint(0, H - size_h)
            x = np.random.randint(0, W - size_w)
            
            # Dropout probability check - some regions might not be dropped
            if np.random.random() < p_drop:
                mask[y:y+size_h, x:x+size_w] = False
        
        # Apply mask to this channel
        result[c] *= mask
    
    return result
```

## Advanced Implementation

### Structured Coarse Dropout

A more structured approach uses superpixel segmentation to create semantically meaningful dropout regions:

```python
def structured_coarse_dropout(image, p_drop=0.1, n_segments=100):
    """
    Apply Coarse Dropout based on superpixel segmentation
    
    Args:
        image: numpy array of shape (C, H, W)
        p_drop: probability of dropping a segment
        n_segments: number of superpixel segments
        
    Returns:
        Augmented image with structured Coarse Dropout
    """
    from skimage.segmentation import slic
    
    C, H, W = image.shape
    result = image.copy()
    
    # Convert to format for segmentation [H,W,C]
    img_for_segmentation = np.transpose(image, (1, 2, 0))
    
    # Perform superpixel segmentation
    segments = slic(img_for_segmentation, n_segments=n_segments, compactness=10)
    
    # Get unique segment labels
    unique_segments = np.unique(segments)
    
    # Create mask, initialize as all ones (keep all)
    mask = np.ones((H, W), dtype=np.bool)
    
    # Randomly select segments to drop
    n_drop = int(p_drop * len(unique_segments))
    drop_segments = np.random.choice(unique_segments, size=n_drop, replace=False)
    
    # Set mask to False for selected segments
    for seg_id in drop_segments:
        mask[segments == seg_id] = False
    
    # Apply mask to all channels
    for c in range(C):
        result[c] *= mask
    
    return result
```

## Implementation Details

Coarse Dropout differs from standard dropout in several key aspects:

1. **Spatial Coherence**: Drops contiguous regions rather than random pixels
2. **Regional Information Loss**: Forces models to learn context-aware features
3. **Semantic Preservation**: When used with superpixels, can preserve semantic structures

Practical considerations:
- Smaller regions (2-5% of image size) work well for texture-level features
- Larger regions (10-20%) help with higher-level semantic robustness
- Optimal dropout probability (p_drop) typically ranges from 0.05 to 0.2

This augmentation effectively prevents overfitting and improves performance on datasets with partial occlusions or missing regions.