---
title: "Image Augmentation | Log #003"
description: "Resize, Crop, Rotate, Flip..."
date: 2025-01-03
author: "Hemanth KR"
categories: [Vision]
image: "thumbnails/003.jpg"
---

Image augmentation techniques are very useful in various computer vision tasks. They are used to achieve several objectives like improving model performance, expanding limited datasets, etc. Let's look at how this is achieved before looking at types of augmentations.

## Reasons to Augment

(@) #### Prevent overfitting - 
    By creating variations of original images, the model gets exposed to wider reange of possible inputs, making it more generalizable.

(@) #### Expanding Datasets - 
    When working with small datasets, augmentation can increase the size of the training data. This is useful in deep learning as models generally need a large amount of data to perform better.

(@) #### Real-world challenges -  
    Image recognition task often face challenges like occulusion, light intensity and different angles. This can be introduced to the model at training stages with augmentation so the model can learn to perform under difficulties.

While this is benificial, the effectiveness of image augmentation should be carefully chosen depending on the specific task. It could potentially reduce the model performance in some cases when the image augmentation technique is not selected appropriately.

## Types of Augmentation

Common image augmentation techniques are as follows:

 - Horizontal and vertical flipping, rotating by various angles, increasing or decreasing image size, random selection and resizing portions of the image, randomly altering color channels, converting to grayscale, applying Gaussian noise, blurring techniques and so on.

There are many ways to implement these techniques, I am going to be using PyTorch to show a few methods.

```python
from torchvision import transforms

# Resize an image
resize = transforms.Resize((32, 32))
resized_img = resize(img)

# Rotate an image
rotate = transforms.RandomRotation(degrees=90)
rotated_img = rotate(img)
```

We can also chain multiple transforms in PyTorch and apply it to our DataLoader class.


```python
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ToTensor()
])
```

In the above example we have chained the transforms using `Compose` and converted the image to a PyTorch tensor to prepare the image data in a PyTorch optimized format to ensure compatability.

In the next blog we'll look at Neural Networks and how they learn using this image data.