# CNN Dropout Lab

### Introduction

In this lab, we'll practice adding regularization to the neural network with the use of dropout layers and batchnorm.  Let's get started.

### Loading the Data

> Now before starting up, let's begin by changing our runtime type to GPU.

Then, let's set our device to cuda, and load up our FashionMNIST dataset. 

In [64]:
import torch
torch.device("cuda")

device(type='cuda')

In [11]:
from torchvision import datasets, transforms

> Make sure to transform the dataset `ToTensor`.

In [12]:
train = None
test = None

> Then, shape our data into batches of size 100, where each observation has consists of one channel.

In [13]:
X_train_reshaped = train.data.reshape(-1, 100, 1, 28, 28)

In [14]:
y_reshaped = train.targets.reshape(-1, 100)

In [15]:
combined = list(zip(X_train_reshaped, y_reshaped))

### Refactoring Network

Ok, now let's start off with the following neural network we had in the last lesson.

In [69]:
import torch.nn
class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, stride=1, kernel_size=5, padding=2)
        self.conv2 = nn.Conv2d(6, 12, stride=1, kernel_size=3, padding=1)
        self.batch_norm2d = nn.BatchNorm2d(12)  
        self.l1 = nn.Linear(12*7*7, 64)
        self.batch_norm1d = nn.BatchNorm1d(64)  
        self.l2 = nn.Linear(64, 10)
    
    def forward(self, X):
        A1 = F.dropout2d(F.relu(self.conv1(X)), p = .1)
        pooled_1 = F.avg_pool2d(A1, kernel_size = 2)
        A2 = F.dropout2d(self.batch_norm2d(F.relu(self.conv2(pooled_1))),p = .1)

        pooled_2 = F.avg_pool2d(A2, kernel_size = 2) # 16x2x2
        reshaped = pooled_2.reshape(-1, 12*7*7)
        L1 = F.dropout(self.batch_norm1d(F.relu(self.l1(reshaped))), p = .3)
        L2 = self.l2(L1)
        return F.log_softmax(L2, dim=1)

> We can initialize an instance of the neural network.

In [76]:
net = Net()

In [77]:
first_batch = combined[0][0]
first_batch.shape

torch.Size([100, 1, 28, 28])

And then pass through a batch of data.

In [79]:
net_predictions = net(first_batch.float())

In [80]:
net_predictions.shape

torch.Size([100, 10])

Next let's convert the neural network above to use nn.Sequential.  We'll get you started with the convolutional layers. Your task will be to flatten the data and add the linear layers.

In [78]:
import torch.nn.functional as F
seq_net = nn.Sequential(
    nn.Conv2d(1, 6, stride=1, kernel_size=5, padding=2),
    nn.ReLU(),
    nn.AvgPool2d(kernel_size = 2),
    nn.Conv2d(6, 12, stride=1, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.AvgPool2d(kernel_size = 2),
    nn.Flatten(),
    nn.Linear(12*7*7, 64),
    nn.ReLU(),
    nn.Linear(64, 10),
    nn.LogSoftmax(dim = 1)
)

Check that `seq_net` returns 10 predictions for each observation.

In [83]:
seq_net(first_batch.float())[:1].shape

# torch.Size([1, 10])

torch.Size([1, 10])

### Adding Dropout 

Ok, so our neural network should look like the following:

In [87]:
import torch.nn.functional as F
seq_net = nn.Sequential(
    nn.Conv2d(1, 6, stride=1, kernel_size=5, padding=2),
    nn.ReLU(),
    nn.Dropout2d(p = .1),
    nn.AvgPool2d(kernel_size = 2),
    nn.Conv2d(6, 12, stride=1, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.BatchNorm2d(12),
    nn.Dropout2d(),
    nn.AvgPool2d(kernel_size = 2),
    nn.Flatten(),
    nn.Linear(12*7*7, 64),
    nn.ReLU(),
    nn.BatchNorm1d(64), 
    nn.Dropout(),
    nn.Linear(64, 10),
    nn.LogSoftmax(dim = 1)
)

* For the first convolutional sequence, let's have `conv > relu > dropout > avgpool`.  This way dropout is applied after our activation function.

* For the second sequence, let's have `conv > relu > batchnorm > dropout > avgpool`.

* By default, Pytorch will set a dropout rate with $p = .5$.  But for convolutional layers, the dropout rate is typically smaller.  So pass through the parameter $p = .2$ for each dropout function.

* And, for the first linear layer apply the sequence of `Linear > Relu > Batch_norm1d > dropout`.

Ok, now our neural network should be updated to look like the following.

In [88]:
seq_net

# Sequential(
#   (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
#   (1): ReLU()
#   (2): Dropout2d(p=0.1, inplace=False)
#   (3): AvgPool2d(kernel_size=2, stride=2, padding=0)
#   (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#   (5): ReLU()
#   (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#   (7): Dropout2d(p=0.5, inplace=False)
#   (8): AvgPool2d(kernel_size=2, stride=2, padding=0)
#   (9): Flatten()
#   (10): Linear(in_features=588, out_features=64, bias=True)
#   (11): ReLU()
#   (12): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#   (13): Dropout(p=0.5, inplace=False)
#   (14): Linear(in_features=64, out_features=10, bias=True)
#   (15): LogSoftmax()
# )


Sequential(
  (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
  (1): ReLU()
  (2): Dropout2d(p=0.1, inplace=False)
  (3): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (4): Conv2d(6, 12, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): ReLU()
  (6): BatchNorm2d(12, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout2d(p=0.5, inplace=False)
  (8): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (9): Flatten()
  (10): Linear(in_features=588, out_features=64, bias=True)
  (11): ReLU()
  (12): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (13): Dropout(p=0.5, inplace=False)
  (14): Linear(in_features=64, out_features=10, bias=True)
  (15): LogSoftmax()
)

Now let's initialize our optimizer and train the neural network.

In [None]:
import torch.optim as optim
optimizer = optim.Adam(new_net.parameters(), lr=.0005)

In [None]:
for epoch in range(7):
    for X_batch, y_batch in combined:
        preds = new_net(X_batch.float())
        loss = F.cross_entropy(preds, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(loss)

### Evaluating

Once we are done training the neural network, it's time to see how it performs on the test data.  For this, we should turn off dropout as we wish to use all of our activations in making the predictions.

If we call `model.eval()` we are returned a copy of the neural network with the dropout layers not activated.

In [92]:
eval_net = seq_net.eval()
# eval_net

> We can see that we are no longer in training mode.

In [93]:
eval_net.training

# False

False

Now let's make predictions on the test set and check our model's accuracy.

In [None]:
predictions = eval_net(test_X_channel.float())

In [None]:
max_predictions = torch.argmax(predictions, dim = 1)

In [None]:
from sklearn.metrics import accuracy_score

accuracy_score(test_y, max_predictions)
# 0.8934

### Resources

[Sequential Example Deep Lizard](https://deeplizard.com/learn/video/bH9Nkg7G8S0)