# Part B - Data Augmentation

So far, we've selected a model architecture that vastly improves the model's performance, as it is designed to recognize important features in the images. The validation accuracy is still lagging behind the training accuracy, which is a sign of overfitting: the model is getting confused by things it has not seen before when it tests against the validation dataset.

In order to teach our model to be more robust when looking at new data, we're going to programmatically increase the size and variance in our dataset. This is known as [*data augmentation*](https://link.springer.com/article/10.1186/s40537-019-0197-0), a useful technique for many deep learning applications.

The increase in size gives the model more images to learn from while training. The increase in variance helps the model ignore unimportant features and select only the features that are truly important in classification, allowing it to generalize better.

### Learning Objectives
* Augment the ASL dataset
* Use the augmented data to train the model
* Save the trained model to disk for inference

### Preparing the Data
We are in a new notebook, so we need to load and process the data again.

In [None]:
from google.colab import drive
drive.mount("/content/drive")

In [None]:
import tensorflow.keras as keras
import pandas as pd

In [None]:
# Load in the data from CSV files
train_df = pd.read_csv("/content/drive/MyDrive/asl_data/sign_mnist_train.csv")
valid_df = pd.read_csv("/content/drive/MyDrive/asl_data/sign_mnist_valid.csv")

# Separate target values
y_train = train_df['label']
y_valid = valid_df['label']
del train_df['label']
del valid_df['label']

# Separate image vectors
x_train = train_df.values
x_valid = valid_df.values

# Turn scalar targets into binary categories
num_classes = 24
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

# Normalize training and validation data
x_train = x_train / 255
x_valid = x_valid / 255

# Reshape the image data for the convolutional network
x_train = x_train.reshape(-1,28,28,1)
x_valid = x_valid.reshape(-1,28,28,1)

### Model Creation
We create the same model architecture as the last section

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Dense,
    Conv2D,
    MaxPool2D,
    Flatten,
    Dropout,
    BatchNormalization,
)

In [None]:
model = Sequential()
model.add(Conv2D(75, (3, 3), strides=1, padding="same", activation="relu", input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(50, (3, 3), strides=1, padding="same", activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same", activation="relu"))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Flatten())
model.add(Dense(units=512, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(units=num_classes, activation="softmax"))

### Data Augmentation
Before compiling the model, let's set up our data augmentation.

Keras comes with an image augmentation class called `ImageDataGenerator`. You can chek out the [documentation here](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator). We will use a basic augmentation strategy below:

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [None]:
datagen = ImageDataGenerator(
    rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
    zoom_range=0.1,  # Randomly zoom image
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images horizontally
    vertical_flip=False, # Don't randomly flip images vertically
)  

##### Why do we flip images horizontally, but not vertically? <br />
The dataset contains pictures of hands signing the alphabet. If we want to use this model to classify hand images later, it's unlikely that those hands are going to be upside-down, but, they might be left-handed.

### Batch Size
Another benefit of the `ImageDataGenerator` is that it [batches](https://machinelearningmastery.com/how-to-control-the-speed-and-stability-of-training-neural-networks-with-gradient-descent-batch-size/) the data so that our model can train on a random sample.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [None]:
batch_size = 32
img_iter = datagen.flow(x_train, y_train, batch_size=batch_size)

x, y = img_iter.next()
fig, ax = plt.subplots(nrows=4, ncols=8)
for i in range(batch_size):
    image = x[i]
    ax.flatten()[i].imshow(np.squeeze(image))
plt.show()

### Fitting the Data to the Generator
The generator must be fit on the training dataset.

In [None]:
datagen.fit(x_train)

### Compiling the Model
With the data generator instance created and fit to the training data, the model can now be compiled in the same way as as we previously did:

In [None]:
model.compile(loss='categorical_crossentropy', metrics=['accuracy'])

### Training with Augmentation
When using an image data generator with Keras, a model trains a bit differently: instead of just passing the `x_train` and `y_train` datasets into the model, we pass the generator in, calling the generator's [flow](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#flow) method. This causes the images to get augmented live and in memory right before they are passed into the model for training.

Generators can supply an indefinite amount of data, and when we use them to train our data, we need to explicitly set how long we want each epoch to run, or else the epoch will go on indefinitely, with the generator creating an indefinite number of augmented images to provide the model.

We explicitly set how long we want each epoch to run using the `steps_per_epoch`. Because `steps * batch_size = number_of_images_trained in an epoch` a common practice, that we will use here, is to set the number of steps equal to the non-augmented dataset size divided by the batch_size (which has a default value of 32).

`Note:` The training will take longer than before because we are now training on more data than we previously did

In [None]:
history = model.fit(img_iter,
          epochs=20,
          steps_per_epoch=len(x_train)/batch_size, # Run same number of steps we would if we were not using a generator.
          validation_data=(x_valid, y_valid)
        )

### Evaluating model performance

In [None]:
# Plot training & validation accuracy values
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

plt.tight_layout()
plt.show()

### Discussion of Results
You will notice that the validation accuracy is higher, and more consistent. This means that the model is no longer overfitting in the way it was. It generalises better, making better predictions on new data.

### Saving the Model
Now that we have a well-trained model better than the one we have before, we will use this model to perform inference on new images.
It is common, once we have a trained model that we are happy with to save it to disk.

There are different formats that we can save to save Keras model, but we'll use the default. You can check [the documentation](https://www.tensorflow.org/guide/keras/save_and_serialize) fore more. In the next notebook, we'll load the model and use it to read new sign language pictures:

In [None]:
model.save('asl_model')

### Next
Now that you have a well-trained model saved to disk, you will, in the next Notebook, make predictions on unseen images.


# Part C - Model Inference
Now that we have a well trained model, it's time to use it. In this exercise, we'll expose new images to our model and detect the correct letters of the sign language alphabet.

### Learning Objectives
* Load an already-trained model from disk
* Reformat images for a model trained on images of a different format
* Perform inference with new images, never seen by the trained model and evaluate its performance

### Loading the Model
Now that we're in a new notebook, let's load the saved model that we trained named `asl_model`.

In [None]:
from tensorflow import keras

In [None]:
model = keras.models.load_model('asl_model')

In [None]:
# You can also print the model summary to see in the architecture is intact
model.summary()

### Preparing an Image for the Model
It's now time to use the model to make predictions on new images it has never seen before. You can use the images found on `data/asl_images` folder.

If you open the images, you will notice that they have much higher resolution than the images in our dataset. They are also in color. Remember that our images in the dataset were 28x28 pixels and grayscale. It's important to keep in mind that whenever you make predictions with a model, the input must match the shape of the data that the model was trained on. For this model, the training dataset was of the shape: (27455, 28, 28, 1). This corresponded to 27455 images of 28 by 28 pixels each with one color channel (grayscale). 

#### Displaying Images
When we use our model to make predictions on new images, it will be useful to show the image as well. We can use the matplotlib library to do this.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

In [None]:
def show_image(image_path):
    image = mpimg.imread(image_path)
    plt.imshow(image, cmap='gray')

In [None]:
show_image('data/asl_images/b.png')

### Scaling the Images
The images in our dataset were 28x28 pixels and grayscale. We need to make sure to pass the same size and grayscale images into our method for prediction. There are a few ways to edit images in Python, but Keras has a built-in utility that works well. 

In [None]:
from tensorflow.keras.preprocessing import image as image_utils

In [None]:
def load_and_scale_image(image_path):
    image = image_utils.load_img(image_path, color_mode="grayscale", target_size=(28,28))
    return image

In [None]:
image = load_and_scale_image('data/asl_images/b.png')
plt.imshow(image, cmap='gray')

### Preparing the Image for Prediction
Now that we have a 28x28 pixel grayscale image, we need to pass it to the model. But before that, we need to reshape the image to match the shape of the dataset the model was trained on. Before we can reshape, we need to convert the image into a more rudimentary format. We will do this with a keras utility called image_to_array.

In [None]:
image = image_utils.img_to_array(image)

Now we can reshape the image to get it ready for prediction.

In [None]:
# This reshape corresponds to 1 image of 28x28 pixels with one color channel
image = image.reshape(1,28,28,1) 

Finally, we should remember to normalize our data (making all values between 0-1), as we did with our training dataset:

In [None]:
image = image / 255

### Making Predictions

We are ready to make prediction. This is done by passing the pre-processed image into the model's predict method. 

In [None]:
prediction = model.predict(image)
print(prediction)

### Understanding the Prediction
The predictions are in the format of an array of length 24 corresponding to classes of our alphabets. Each element of the array is a probability between 0 and 1, representing the confidence for each category. To make it a little more readable, we can find which element of the array represents the highest probability. This can be done easily with the numpy library and the [argmax](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) function.

In [None]:
import numpy as np
np.argmax(prediction)

Each element of the prediction array represents a possible letter in the sign language alphabet. Remember that j and z are not options because they involve moving the hand, and we're only dealing with still photos. Let's create a mapping between the index of the predictions array, and the corresponding letter. 

In [None]:
# Alphabet does not contain j or z because they require movement
alphabet = "abcdefghiklmnopqrstuvwxy"

We can now pass in the prediction index to find the corresponding letter.

In [None]:
alphabet[np.argmax(prediction)]

## Putting it all Together
We can put everything above for to make a prediction from the image file

In [None]:
def predict_letter(file_path):
    show_image(file_path)
    image = load_and_scale_image(file_path)
    image = image_utils.img_to_array(image)
    image = image.reshape(1,28,28,1) 
    image = image/255
    prediction = model.predict(image)
    # convert prediction to letter
    predicted_letter = alphabet[np.argmax(prediction)]
    return predicted_letter

In [None]:
predict_letter("data/asl_images/b.png")

In [None]:
predict_letter("data/asl_images/a.png")

### Summary

Great work by finishing these hands-on exercises, Congratulations! You have gone through the full process of training a CNN model from scratch, and then using the model to make new predictions on unseen images. To make it more fun, you can take pictures of your hand showing the sign language alphabet with your webcam and upload them to `data/asl_images` folder to test out your model on them.

### Additional resources
* [Stanford CS231n CNN course](https://cs231n.github.io/convolutional-networks/)
* [Arden Dertat's blog post](https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2#7d8a)
* [University Amsterdam Deep Learning](https://dlvu.github.io/cnns/)
* [NVIDIA Deep Learning Institute]("https://www.nvidia.com/dli")