<a href="https://colab.research.google.com/github/conv_nets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Convolutional Neural Networks
In this practical, we will cover the basics of convolutional neural networks (CNNs/ConvNets). CNNs are good for reading images and classifying them. 
We will use the [American Sign Language alphabet](http://www.asl.gs/) dataset which is publicly available from [Kaggle](http://www.kaggle.com) website. Kaggle is a good resource to find datasets and other deep learning resources.

<img src="./images/asl.png" style="width: 600px;">

### Learning Objectives
* Be able to build a convolutional architecture using Tensorflow and Keras Layers
* Understand the convolution operations of CNN and how it differs from a fully-connected layer 
* Be able to use Keras to train and evaluate a model on a dataset

## Loading and reading the Data 
We will load a custom data such that at the end we will have `x_train`, `y_train`, `x_valid`, and `y_valid` variables which will be used to train and evaluate the model.

In [None]:
import pandas as pd

In [None]:
train_df = pd.read_csv("data/asl_data/sign_mnist_train.csv")
valid_df = pd.read_csv("data/asl_data/sign_mnist_valid.csv")

### Exploring the Data
Let's take a look at our data. We can use the [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) method to print the first few rows of the DataFrame. Each row is an image which has a `label` column, and also, 784 values representing each pixel value in the image. Note that the labels currently are numerical values, not letters of the alphabet:

In [None]:
train_df.head()

### Extracting the Labels
We would like to store our training and validation labels in `y_train` and `y_valid` variables. Here we create those variables and then delete the labels from our original dataframes, where they are no longer needed:

In [None]:
y_train = train_df['label']
y_valid = valid_df['label']
del train_df['label']
del valid_df['label']

### Extracting Images
We would like to store our training and validation images in `x_train` and `x_valid` variables. Here we create those variables:

In [None]:
x_train = train_df.values
x_valid = valid_df.values

### Summarizing the Training and Validation Data
We now have 27,455 images with 784 pixels each for training and their corresponding labels. We also have 7,172 images with their corresponding labels.

In [None]:
print("x_train shape: ", x_train.shape)
print("y_train shape: ", y_train.shape)
print("x_valid shape: ", x_valid.shape)
print("y_valid shape: ", y_valid.shape)

### Visualizing the Data
To visualize the images, we will use the matplotlib library. We don't need to worry about the details of this visualization for now, but if interested, you can learn more about [matplotlib](https://matplotlib.org/) at a later time.

Note that we'll have to reshape the data from its current 1D shape of 784 pixels, to a 2D shape of 28x28 pixels to make sense of the image:

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(40,40))

num_images = 20
for i in range(num_images):
    row = x_train[i]
    label = y_train[i]
    
    image = row.reshape(28,28)
    plt.subplot(1, num_images, i+1)
    plt.title(label, fontdict={'fontsize': 30})
    plt.axis('off')
    plt.imshow(image, cmap='gray')

### Normalize the Image Data
We normalize the image data so that we transform their pixel values that are between 0 and 255 to floating point values between 0 and 1

In [None]:
x_train = x_train / 255
x_valid = x_valid / 255

### Categorize the Labels
we are going to categorically encode the labels. To do this, we can use the [keras.utils.to_categorical](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) method by passing it the values to encode, and, the number of categories to encode it into. We set the number of categories to 24. Note that The [American Sign Language alphabet](http://www.asl.gs/) contains 26 letters. Two of those letters (j and z) require movement, so they are not included in the training dataset.  

In [None]:
import tensorflow.keras as keras

In [None]:
# Turn our scalar targets into binary categories
num_classes = 24
if not y_train.shape[-1] == 24:  # Avoid running multiple times
    y_train = keras.utils.to_categorical(y_train, num_classes)
    y_valid = keras.utils.to_categorical(y_valid, num_classes)

## Loading and Preparing the Data

The below cell contains the data preprocessing techniques we learned in the previous labs. Review it and execute it before moving on:

In [None]:
import tensorflow.keras as keras
import pandas as pd

# Load in our data from CSV files
train_df = pd.read_csv("data/asl_data/sign_mnist_train.csv")
valid_df = pd.read_csv("data/asl_data/sign_mnist_valid.csv")

# Separate out our target values
y_train = train_df['label']
y_valid = valid_df['label']
del train_df['label']
del valid_df['label']

# Separate out our image vectors
x_train = train_df.values
x_valid = valid_df.values

# Turn our scalar targets into binary categories
num_classes = 24
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

# Normalize our image data
x_train = x_train / 255
x_valid = x_valid / 255

## Reshaping Images for a CNN

The individual pictures in our dataset are in the format of long lists of 784 pixels:

In [None]:
print("x_shape:", x_train.shape)
print("y_shape:", y_train.shape)

In this format, we don't have all the information about which pixels are near each other. Because of this, we can't apply convolutions that will detect features. Let's reshape our dataset so that they are in a 28x28 pixel format. This will allow our convolutions to associate groups of pixels and detect important features.

Note that for the first convolutional layer of our model, we need to have not only the height and width of the image, but also the number of [color channels](https://www.photoshopessentials.com/essentials/rgb/). Our images are grayscale, so we'll just have 1 channel.

That means that we need to convert the current shape `(27455, 784)` to `(27455, 28, 28, 1)`. As a convenience, we can pass the [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html#numpy.reshape) method a `-1` for any dimension we wish to remain the same, therefore:

In [None]:
x_train = x_train.reshape(-1,28,28,1)
x_valid = x_valid.reshape(-1,28,28,1)

In [None]:
print("x_shape:", x_train.shape)
print("y_shape:", y_train.shape)

## Creating the CNN Model

We will use the following CNN model that will work well for this problem, covering different kinds of layers.

<img src="images/cnn.png" width=180 />

Model source: [NVIDIA Deep Learning Institute]("https://www.nvidia.com/dli")

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Dense,
    Conv2D,
    MaxPool2D,
    Flatten,
    Dropout,
    BatchNormalization,
)

model = Sequential()
model.add(Conv2D(75, (3, 3), strides=1, padding="same", activation="relu", 
                 input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(50, (3, 3), strides=1, padding="same", activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same", activation="relu"))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Flatten())
model.add(Dense(units=512, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(units=num_classes, activation="softmax"))

### [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D)

<img src="images/conv2d.gif" width=600 />

These are 2D convolutional layers. Small kernels will go over the input image and detect features that are important for classification. Earlier convolutions in the model will detect simple features such as lines. Later convolutions will detect more complex features. Let's look at our first Conv2D layer:
```Python
model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same'...)
```
75 refers to the number of filters that will be learned. (3,3) refers to the size of those filters. Strides refer to the step size that the filter will take as it passes over the image. Padding refers to whether the output image that's created from the filter will match the size of the input image. 

### [BatchNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization)

Like normalizing our inputs, batch normalization scales the values in the hidden layers to improve training. [Read more about it in detail here](https://blog.paperspace.com/busting-the-myths-about-batch-normalization/). 

### [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D)

<img src="images/pooling.png" width=400 />

Max pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be robust to translation (objects moving side to side), and also makes our model faster.

### [Dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout)

<img src="images/dropout.png" width=360 />

Dropout is a technique for preventing overfitting. Dropout randomly selects a subset of neurons and turns them off, so that they do not participate in forward or backward propagation in that particular pass. This helps to make sure that the network is robust and redundant, and does not rely on any one area to come up with answers.    

### [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten)

Flatten takes the output of one layer which is multidimensional, and flattens it into a one-dimensional array. The output is called a feature vector and will be connected to the final classification layer.

### [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)

The first dense layer (512 units) takes the feature vector as input and learns which features will contribute to a particular classification. The second dense layer (24 units) is the final classification layer that outputs our prediction.

## Summarizing the Model

This may feel like a lot of information, but don't worry. It's not critical that to understand everything right now in order to effectively train convolutional models. Most importantly we know that they can help with extracting useful information from images, and can be used in classification tasks.

Here, we summarize the model we just created:

In [None]:
model.summary()

## Compiling the Model

We'll [compile](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#compile) our model using [categorical crossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy) to reflect the fact that we want to fit into one of many categories, and measuring the accuracy of our model:

In [None]:
model.compile(loss="categorical_crossentropy", metrics=["accuracy"])

## Training the Model

Use the model's `fit` method to train it for 20 epochs using the training and validation images and labels created above:

In [None]:
model.fit(x_train, y_train, epochs=20, verbose=1, validation_data=(x_valid, y_valid))

## Discussion of Results

The performance of the model look reasonable with high training accuracy and better validation accuracy. This is a great starting point. However,you may have noticed the validation accuracy jumping around. This is an indication that our model is still not generalizing perfectly. Fortunately, there's more that we can do. We will cover that in the next *data augumentation* section.

## Next

In this section, you have focused on the creation and training the CNN model. In order to further improve performance, you will now turn your attention to [data augmentation](./ASL_b_augmentation.ipynb), a collection of techniques that will allow your models to train on more and better data than what you might have originally at your disposal.

