# IMAGE AUGMENTATION:
- technique of applying different *transformations to original images* which results in multiple transformed copies of the same image. 
- Each copy, however, is different from the other in certain aspects depending on the augmentation techniques we apply like shifting, rotating, flipping, etc.
- used to expand the size of our dataset + incorporate a level of variation in the dataset : allows the model to generalize better on unseen data. 
- USE *Keras ImageDataGenerator*

### Keras ImageDataGenerator class :
- lets you augment your images in real-time while your model is still training. ------ [*real-time data augmentation*]
- You can apply any random transformation on each training image as it is passed to the model.
- saves memory + model becomes robust
- Creates a large corpus of similar images without having to worry about collecting new images, which is not feasible in a real-world scenario.

- It ensures : model receives new variations of the images at each epoch. But it only returns the transformed images and does not add it to the original corpus of images.[seeing original images multiple times : *Overfiting*]
- requires lower memory usage :
   * Without using this class : we load all the images at once
   * Using it : we load the images in batches which saves a lot of memory

# Rotation:
- Rotate images :[0 and 360 degrees] -- providing an integer value in the *rotation_range* argument.
- So when image is rotated : some pixels will be moved outside the image & leave an empty space that needs to be filled in.
- We can fill this in different ways -- like constant value or nearest pixel values, etc. 
- This is specified in the *fill_mode* argument and the default value is *nearest : simply replaces the empty area with the nearest pixel values*

In [1]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Input path for the images
base_path = "/Users/pratiksha/Documents/Pratiksha/Documents/GitHub/GitHub/Face-expression-recognition-with-Deep-Learning/images/"
# Size of the image: 48x48 pixels
pic_size = 48

# number of images to feed into the NN for every batch
batch_size = 128

datagen_train = ImageDataGenerator() #ImageDataGenerator : it is used to generate a batch of images with some random transformations
datagen_validation = ImageDataGenerator() 


In [2]:
#flow_from_directory() : allows to read the images directly from the directory & augment them while the NN model is learning on the training data.

train_generator = datagen_train.flow_from_directory(directory = base_path + "train",
                                                    target_size = (pic_size,pic_size), #size of i/p images -- every image will be resized to this size.
                                                    color_mode = "grayscale",
                                                    batch_size= batch_size,
                                                    class_mode='categorical', 
                                                    shuffle = True,   #shuffle : shuffle the order of the image                                               
)
#class_mode: 1.categorical: for multi-class classification problems This means that the target output will be a binary matrix representation of the classes.
#            2.binary: for binary classification problems where the labels are 0 or 1. 
#            3.sparse: for multi-class classification problems where the labels are integers.useful when the number of classes is large.
#            4.input: for autoencoders. It returns the input unchanged.
#            5.none: if you don't want any labels returned.


Found 28821 images belonging to 7 classes.


- In above and below code we did: base + "train" and base + "validation" because: our your directory structure might look something like this:
/dataset
    /train
        /class1
        /class2
        ...
    /validation
        /class1
        /class2
        ...

* base_path: This is the root path pointing to the parent directory of train
* train": This specifies the subdirectory within base_path that contains the training images.

In [4]:
validation_generrator = datagen_validation.flow_from_directory(directory = base_path + "validation",
                                                               target_size = (pic_size,pic_size), #size of i/p images -- every image will be resized to this size.
                                                               color_mode = "grayscale",
                                                               batch_size= batch_size,
                                                               class_mode='categorical', 
                                                               shuffle = False,)

# Training Data:   Use shuffle = True to ensure the data is randomized and to help the model generalize better.
# Validation Data: Use shuffle = False to maintain a consistent evaluation set and ensure that the validation metrics are reliable.

Found 7066 images belonging to 7 classes.
