# Data Augmentation of Train Dataset

Data augmentation encompasses a wide range of techniques used to generate “new” training samples from the original ones by applying random jitters and perturbations (but at the same time ensuring that the class labels of the data are not changed).

Given that our network is constantly seeing new, slightly modified versions of the input data, the network is able to learn more robust features.
At testing time we do not apply data augmentation and simply evaluate our trained network on the unmodified testing data — in most cases, you’ll see an increase in testing accuracy, perhaps at the expense of a slight dip in training accuracy.

\

The Keras ImageDataGenerator class actually works by:

1.   Accepting a batch of images used for training.
2.   Taking this batch and applying a series of random transformations to each image in the batch (including random rotation, resizing, shearing, etc.).
3.   Replacing the original batch with the new, randomly transformed batch.
4.   Training the CNN on this randomly transformed batch (i.e., the original data itself is not used for training).

## Below we used ImageDataGenerator to generate more training dataset for better efficiency

In [None]:
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
import pandas as pd
from PIL import Image

datagen = ImageDataGenerator(

        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

import glob

list_of_files = glob.glob("/content/drive/My Drive/Naxbit/CNN/Train/Stone Pendant/*.jpg")
for file in list_of_files:
  img = load_img(file)    #target_size=(64,64))  # this is a PIL image
  x = img_to_array(img)
  x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)


  i = 0

  for batch in datagen.flow(x, batch_size=1,
                            save_to_dir='/content/drive/My Drive/Naxbit/CN/Datagen/Stone Pendant', save_prefix='10', save_format='jpg'):

      i += 1

      if i > 20:

          break

## The output of the above model is stored in the CNN/Datagen directory in Google Drive.