<a href="https://colab.research.google.com/github/ravi0dubey/Transfer-Learning-VGG16/blob/main/Data_Augmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Augmentation

Limited data is a major obstacle in applying deep learning models like convolutional neural networks. Often, imbalanced classes can be an additional hindrance; while there may be sufficient data for some classes, equally important, but undersampled classes will suffer from poor class-specific accuracy.

# What is Data Augmentation?

Definition of “data augmentation” on Wikipedia is “Techniques are used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data.” So data augmentation involves creating new and representative data.

# Why is it important now?

Machine learning applications especially in deep learning domain continue to diversify and increase rapidly. Data augmentation techniques may be a good tool against challenges which artificial intelligence world faces.

Data augmentation is useful to improve performance and outcomes of machine learning models by forming new and different examples to train datasets. If dataset in a machine learning model is rich and sufficient, the model performs better and more accurate.

## Used in image classification and segmentation

For data augmentation, making simple alterations on visual data is popular. In addition, generative adversarial networks (GANs) are used to create new synthetic data. Classic image processing activities for data augmentation are

- padding
- random rotating
- re-scaling,
- vertical and horizontal flipping
- translation ( image is moved along X, Y direction)
- cropping
- zooming
- darkening & brightening/color modification
- grayscaling
- changing contrast
- adding noise
- random erasing

![Source Medium](https://research.aimultiple.com/wp-content/uploads/2021/04/dataaugmention_image_alletranitons.png)

## Famous Research Papers


### Random Erasing Data Augmentation

[Random Erasing Paper Link](https://arxiv.org/pdf/1708.04896.pdf)

### Improved Regularization of Convolutional Neural Networks with Cutout

[Cutout Paper](https://arxiv.org/pdf/1708.04552.pdf)

### Data Augmentation using Random Image Cropping and Patching for Deep CNNs

[RICAP Paper](https://arxiv.org/pdf/1811.09030.pdf)



## What are the benefits of data augmentation?

Benefits of data augmentation include:

1. Improving model prediction accuracy
   - adding more training data into the models
   - preventing data scarcity for better models
   - reducing data overfitting
   - increasing generalization ability of the models
   - helping resolve class imbalance issues in classification
2. Reducing costs of collecting and labeling data

## What are the challenges of data augmentation?

1. Companies need to build evaluation systems for quality of augmented datasets. As use of data augmentation methods increases, assessment of quality of their output will be required.
2. Data augmentation domain needs to develop new research and studies to create new/synthetic data with advanced applications. For example, generation of high-resolution images by using GANs is challenging
3. If real dataset contains biases, data augmented from it will contain biases, too. So, identification of optimal data augmentation strategy is important.

## Data augmentation Framework

1. IMGAUG.
2. AUGMENTOR
3. ALBUMENTATIONS.

Documentation of imgaug https://github.com/aleju/imgaug


In [1]:

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [14]:

data_path = '/content/drive/MyDrive/VGGNet/Flowers/train/daisy'


In [5]:
!pip install Augmentor

Collecting Augmentor
  Downloading Augmentor-0.2.12-py2.py3-none-any.whl (38 kB)
Installing collected packages: Augmentor
Successfully installed Augmentor-0.2.12


In [6]:
 import Augmentor

In [15]:
p = Augmentor.Pipeline(data_path)

Initialised with 120 image(s) found.
Output directory set to /content/drive/MyDrive/VGGNet/Flowers/train/daisy/output.

In [17]:
p.flip_left_right(probability=0.3)
p.rotate180(probability=0.8)

In [18]:
p.sample(1000)

Processing <PIL.Image.Image image mode=RGB size=240x147 at 0x7D988C2772E0>: 100%|██████████| 1000/1000 [00:15<00:00, 65.71 Samples/s]
