# THREE

### Convolutional Neural Networks

#### Objectives

* Prep data specifically for a CNN
* Create a more sophisticated CNN model, understanding a greater variety of model layers
* Train a CNN model and observe its performance

In [1]:
import torch.nn as nn
import pandas as pd
import torch
from torch.optim import Adam
from torch.utils.data import Dataset, DataLoader

#### Load and Prepare Data

##### Prepare Images

In [2]:
train_df = pd.read_csv(r"D:\New folder (4)\Deep Learning\data\AmericanSignLang\asl_data\sign_mnist_train.csv")
valid_df = pd.read_csv(r"D:\New folder (4)\Deep Learning\data\AmericanSignLang\asl_data\sign_mnist_valid.csv")

This ASL data is already flattened.

In [3]:
sample_df = train_df.head().copy()  # Grab the top 5 rows
sample_df.pop('label')
sample_x = sample_df.values
sample_x

array([[107, 118, 127, ..., 204, 203, 202],
       [155, 157, 156, ..., 103, 135, 149],
       [187, 188, 188, ..., 195, 194, 195],
       [211, 211, 212, ..., 222, 229, 163],
       [164, 167, 170, ..., 163, 164, 179]], dtype=int64)

In [4]:
sample_x.shape

(5, 784)

In this format, there isn't all the information about which pixels are near each other. Because of this, convolutions that will detect features can't be applied.

[Reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) the dataset so that they are in a 28x28 pixel format. This will allow the convolutions to associate groups of pixels and detect important features.

Note that for the first convolutional layer of the model, not only the height and width of the image, but also the number of [color channels](https://www.photoshopessentials.com/essentials/rgb/) is needed. The images are grayscale, so there's just have 1 channel.

That means that it is required to convert the current shape `(5, 784)` to `(5, 1, 28, 28)`. 

With [NumPy](https://numpy.org/doc/stable/index.html) arrays, pass a `-1` for any dimension to remain the same.

In [5]:
IMG_HEIGHT = 28
IMG_WIDTH = 28
IMG_CHS = 1

sample_x = sample_x.reshape(-1, IMG_CHS, IMG_HEIGHT, IMG_WIDTH)
sample_x.shape

(5, 1, 28, 28)

##### Create a Dataset

In [6]:
class MyDataset(Dataset):
    def __init__(self, base_df):
        x_df = base_df.copy()  # Some operations below are in-place
        y_df = x_df.pop('label')
        x_df = x_df.values / 255  # Normalize values from 0 to 1
        x_df = x_df.reshape(-1, IMG_CHS, IMG_WIDTH, IMG_HEIGHT)
        self.xs = torch.tensor(x_df).float()
        self.ys = torch.tensor(y_df)

    def __getitem__(self, idx):
        x = self.xs[idx]
        y = self.ys[idx]
        return x, y

    def __len__(self):
        return len(self.xs)

##### Create a DataLoader

In [7]:
BATCH_SIZE = 32

train_data = MyDataset(train_df)
train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)
train_N = len(train_loader.dataset)

valid_data = MyDataset(valid_df)
valid_loader = DataLoader(valid_data, batch_size=BATCH_SIZE)
valid_N = len(valid_loader.dataset)

Verify.

In [8]:
batch = next(iter(train_loader))
batch

[tensor([[[[0.8549, 0.8627, 0.8627,  ..., 0.8667, 0.8627, 0.8588],
           [0.8627, 0.8667, 0.8667,  ..., 0.8706, 0.8706, 0.8667],
           [0.8706, 0.8745, 0.8824,  ..., 0.8745, 0.8667, 0.8706],
           ...,
           [0.6118, 0.6235, 0.6392,  ..., 0.9294, 0.9020, 0.8941],
           [0.5412, 0.5451, 0.5608,  ..., 0.8627, 0.9255, 0.9059],
           [0.5569, 0.5608, 0.5725,  ..., 0.8588, 0.9137, 0.9137]]],
 
 
         [[[0.8745, 0.8824, 0.8902,  ..., 0.8314, 0.8235, 0.8196],
           [0.8824, 0.8941, 0.8980,  ..., 0.8392, 0.8275, 0.8196],
           [0.8863, 0.8941, 0.8980,  ..., 0.8392, 0.8353, 0.8235],
           ...,
           [0.9216, 0.9294, 0.9490,  ..., 0.9137, 0.9020, 0.8980],
           [0.9216, 0.9333, 0.9451,  ..., 0.9137, 0.8980, 0.8902],
           [0.9216, 0.9294, 0.9412,  ..., 0.9137, 0.9020, 0.8863]]],
 
 
         [[[0.5725, 0.5882, 0.5922,  ..., 0.5922, 0.5882, 0.5843],
           [0.5765, 0.5882, 0.6000,  ..., 0.5961, 0.5922, 0.5882],
           [0.5843

In [9]:
batch[0].shape

torch.Size([32, 1, 28, 28])

In [10]:
batch[1].shape

torch.Size([32])

#### Creating a Convolutional Model

These days, many data scientists start their projects by borrowing model properties from a similar project. 

Assuming the problem is not totally unique, there's a great chance that people have created models that will perform well which are posted in online repositories like [TensorFlow Hub](https://www.tensorflow.org/hub) and the [NGC Catalog](https://ngc.nvidia.com/catalog/models). 

In [11]:
n_classes = 24
kernel_size = 3
flattened_img_size = 75 * 3 * 3

model = nn.Sequential(
    # First convolution
    nn.Conv2d(IMG_CHS, 25, kernel_size, stride=1, padding=1),  # 25 x 28 x 28
    nn.BatchNorm2d(25),
    nn.ReLU(),
    nn.MaxPool2d(2, stride=2),  # 25 x 14 x 14
    # Second convolution
    nn.Conv2d(25, 50, kernel_size, stride=1, padding=1),  # 50 x 14 x 14
    nn.BatchNorm2d(50),
    nn.ReLU(),
    nn.Dropout(.2),
    nn.MaxPool2d(2, stride=2),  # 50 x 7 x 7
    # Third convolution
    nn.Conv2d(50, 75, kernel_size, stride=1, padding=1),  # 75 x 7 x 7
    nn.BatchNorm2d(75),
    nn.ReLU(),
    nn.MaxPool2d(2, stride=2),  # 75 x 3 x 3
    # Flatten to Dense
    nn.Flatten(),
    nn.Linear(flattened_img_size, 512),
    nn.Dropout(.3),
    nn.ReLU(),
    nn.Linear(512, n_classes)
)

##### Conv2D
Small kernels will go over the input image and detect features that are important for classification. 

Earlier convolutions in the model will detect simple features such as lines. 

Later convolutions will detect more complex features. 

The first Conv2D layer:
```Python
nn.Conv2d(IMG_CHS, 25, kernel_size, stride=1, padding=1)
```
* 25 refers to the number of filters that will be learned. 
* Even though `kernel_size = 3`, PyTorch will assume we want 3 x 3 filters. 
* Stride refer to the step size that the filter will take as it passes over the image. 
* Padding refers to whether the output image that's created from the filter will match the size of the input image.


##### BatchNormalization
Like normalizing  inputs, batch normalization scales the values in the hidden layers to improve training. [Read more about it in detail here](https://blog.paperspace.com/busting-the-myths-about-batch-normalization/).

##### MaxPool2D
`ImageMax` pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be robust to translation (objects moving side to side), and also makes the model faster.

##### Dropout
`ImageDropout` is a technique for preventing overfitting. 
`Dropout` randomly selects a subset of neurons and turns them off, so that they do not participate in forward or backward propagation in that particular pass
This helps to make sure that the network is robust and redundant, and does not rely on any one area to come up with answers. 

##### Flatten
`Flatten` takes the output of one layer which is multidimensional, and flattens it into a one-dimensional array. The output is called a feature vector and will be connected to the final classification layer.

##### Linear
The `first dense layer` (512 units) takes the feature vector as input and learns which features will contribute to a particular classification. 

The `second dense layer` (24 units) is the final classification layer that outputs our prediction.

#### Summarize the Model
It's not critical that to understand everything right now in order to effectively train convolutional models. 
Most importantly keep in mind that they can help with extracting useful information from images, and can be used in classification tasks.

In [12]:
# compile the model
model = torch.jit.script(model)    # torch.compile is not supported on Windows

model

RecursiveScriptModule(
  original_name=Sequential
  (0): RecursiveScriptModule(original_name=Conv2d)
  (1): RecursiveScriptModule(original_name=BatchNorm2d)
  (2): RecursiveScriptModule(original_name=ReLU)
  (3): RecursiveScriptModule(original_name=MaxPool2d)
  (4): RecursiveScriptModule(original_name=Conv2d)
  (5): RecursiveScriptModule(original_name=BatchNorm2d)
  (6): RecursiveScriptModule(original_name=ReLU)
  (7): RecursiveScriptModule(original_name=Dropout)
  (8): RecursiveScriptModule(original_name=MaxPool2d)
  (9): RecursiveScriptModule(original_name=Conv2d)
  (10): RecursiveScriptModule(original_name=BatchNorm2d)
  (11): RecursiveScriptModule(original_name=ReLU)
  (12): RecursiveScriptModule(original_name=MaxPool2d)
  (13): RecursiveScriptModule(original_name=Flatten)
  (14): RecursiveScriptModule(original_name=Linear)
  (15): RecursiveScriptModule(original_name=Dropout)
  (16): RecursiveScriptModule(original_name=ReLU)
  (17): RecursiveScriptModule(original_name=Linear)
)

In [13]:
loss_function = nn.CrossEntropyLoss()
optimizer = Adam(model.parameters())

In [None]:
def get_batch_accuracy(output, y, N):
    pred = output.argmax(dim=1, keepdim=True)
    correct = pred.eq(y.view_as(pred)).sum().item()
    return correct / N

#### Train the Model

These are the same `train` and `validate` functions as before, but they have been mixed up. 

One of them should have `model.train` and the other should have `model.eval`. 

In [None]:
def train():
    loss = 0
    accuracy = 0

    model.train()
    for x, y in train_loader:
        output = model(x)
        optimizer.zero_grad()
        batch_loss = loss_function(output, y)
        batch_loss.backward()
        optimizer.step()

        loss += batch_loss.item()
        accuracy += get_batch_accuracy(output, y, train_N)
    print('Train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))

In [None]:
def validate():
    loss = 0
    accuracy = 0

    model.eval()
    with torch.no_grad():
        for x, y in train_loader:
            output = model(x)

            loss += loss_function(output, y).item()
            accuracy += get_batch_accuracy(output, y, valid_N)
    print('Validation - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))

##### Training Loop

In [17]:
epochs = 20

for epoch in range(epochs):
    print('Epoch: {}'.format(epoch))
    train()
    validate()

Epoch: 0
FIXME - Loss: 280.8473 Accuracy: 0.9034
FIXME - Loss: 105.9261 Accuracy: 3.7065
Epoch: 1
FIXME - Loss: 17.8795 Accuracy: 0.9950
FIXME - Loss: 6.1149 Accuracy: 3.8222
Epoch: 2
FIXME - Loss: 9.8100 Accuracy: 0.9969
FIXME - Loss: 1.6797 Accuracy: 3.8278
Epoch: 3
FIXME - Loss: 10.2877 Accuracy: 0.9964
FIXME - Loss: 26.6138 Accuracy: 3.7875
Epoch: 4
FIXME - Loss: 10.5685 Accuracy: 0.9962
FIXME - Loss: 0.1757 Accuracy: 3.8281
Epoch: 5
FIXME - Loss: 8.3664 Accuracy: 0.9972
FIXME - Loss: 1.1414 Accuracy: 3.8270
Epoch: 6
FIXME - Loss: 6.0007 Accuracy: 0.9979
FIXME - Loss: 1.5657 Accuracy: 3.8270
Epoch: 7
FIXME - Loss: 2.9410 Accuracy: 0.9989
FIXME - Loss: 4.0265 Accuracy: 3.8242
Epoch: 8
FIXME - Loss: 9.6840 Accuracy: 0.9963
FIXME - Loss: 0.1489 Accuracy: 3.8281
Epoch: 9
FIXME - Loss: 0.2519 Accuracy: 1.0000
FIXME - Loss: 0.0099 Accuracy: 3.8281
Epoch: 10
FIXME - Loss: 10.8395 Accuracy: 0.9962
FIXME - Loss: 0.5731 Accuracy: 3.8278
Epoch: 11
FIXME - Loss: 0.5971 Accuracy: 0.9998
FIXME -

#### Conclusion

This model is significantly improved! The training accuracy is very high, and the validation accuracy has improved as well. This is a great result, as all that had done was swap in a new model.

The validation accuracy is jumping around. This is an indication that the model is still not generalizing perfectly.

#### Clear the Memory

In [None]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

: 