# Using Convolutional Neural Networks
>  In this last chapter, we learn how to make neural networks work well in practice, using concepts like regularization, batch-normalization and transfer learning.

- toc: true 
- badges: true
- comments: true
- author: Lucas Nunes
- categories: [Datacamp]
- image: images/datacamp/___

> Note: This is a summary of the course's chapter 4 exercises "Introduction to Deep Learning with PyTorch" at datacamp. <br>[Github repo](https://github.com/lnunesAI/Datacamp/) / [Course link](https://www.datacamp.com/tracks/machine-learning-scientist-with-python)

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import numpy as np

## The sequential module


### Sequential module - init method

<div class=""><p>Having learned about the sequential module, now is the time to see how you can convert a neural network that doesn't use sequential modules to one that uses them. We are giving the code to build the network in the usual way, and you are going to write the code for the same network using sequential modules.</p>
<pre><code>class Net(nn.Module):
    def __init__(self, num_classes):
        super(Net, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(in_channels=20, out_channels=40, kernel_size=3, padding=1)

        self.relu = nn.ReLU()

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(7 * 7 * 40, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 10) 
</code></pre>
<p>We want the pooling layer to be used after the second and fourth convolutional layers, while the relu nonlinearity needs to be used after each layer except the last (fully-connected) layer. For the number of filters (kernels), stride, passing, number of channels and number of units, use the same numbers as above.</p></div>

Instructions 1/2
<li>Declare all the layers needed for feature extraction in the <code>self.features</code>.</li>

Instructions 2/2
<li>Declare the three linear layers in <code>self.classifier</code>.</li>

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        # Declare all the layers for feature extraction
        self.features = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1), 
                                      nn.ReLU(inplace=True),
                                      nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1), 
                                      nn.MaxPool2d(2, 2), nn.ReLU(inplace=True),
                                      nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, padding=1),
                                      nn.ReLU(inplace=True),
                                      nn.Conv2d(in_channels=20, out_channels=40, kernel_size=3, padding=1),
                                      nn.MaxPool2d(2, 2), nn.ReLU(inplace=True))
        
        # Declare all the layers for classification
        self.classifier = nn.Sequential(nn.Linear(7 * 7 * 40, 1024), nn.ReLU(inplace=True),
                                       	nn.Linear(1024, 2048), nn.ReLU(inplace=True),
                                        nn.Linear(2048, 10))

### Sequential module - forward() method

<div class=""><p>Now, that you have defined all the modules that the network needs, it is time to apply them in the <code>forward()</code> method. For context, we are giving the code for the <code>forward()</code> method, if the net was written in the usual way.</p>
<pre><code>class Net(nn.Module):
    def __init__(self, num_classes):
        super(Net, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(in_channels=20, out_channels=40, kernel_size=3, padding=1)

        self.relu = nn.ReLU()

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(7 * 7 * 40, 1024)
        self.fc2 = nn.Linear(1024, 2048)
        self.fc3 = nn.Linear(2048, 10) 

    def forward():
        x = self.relu(self.conv1(x))
        x = self.relu(self.pool(self.conv2(x)))
        x = self.relu(self.conv3(x))
        x = self.relu(self.pool(self.conv4(x)))
        x = x.view(-1, 7 * 7 * 40)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x
</code></pre>
<p>Note: for evaluation purposes, the entire code of the class needs to be in the script. We are using the <code>__init__</code> method as you have coded it on the previous exercise, while you are going to code the <code>forward()</code> method here.</p></div>

Instructions
<ul>
<li>Extract the features from the images.</li>
<li>Squeeze the three spatial dimensions of the feature maps into one using the <code>view()</code> method.</li>
<li>Classify images based on the extracted features.</li>
</ul>

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        # Declare all the layers for feature extraction
        self.features = nn.Sequential(nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1), 
                                      nn.ReLU(inplace=True),
                                      nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1), 
                                      nn.MaxPool2d(2, 2), nn.ReLU(inplace=True),
                                      nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, padding=1),
                                      nn.ReLU(inplace=True),
                                      nn.Conv2d(in_channels=20, out_channels=40, kernel_size=3, padding=1),
                                      nn.MaxPool2d(2, 2), nn.ReLU(inplace=True))
        
        # Declare all the layers for classification
        self.classifier = nn.Sequential(nn.Linear(7 * 7 * 40, 1024), nn.ReLU(inplace=True),
                                       	nn.Linear(1024, 2048), nn.ReLU(inplace=True),
                                        nn.Linear(2048, 10))
        
    def forward(self, x):
      
        # Apply the feature extractor in the input
        x = self.features(x)
        
        # Squeeze the three spatial dimensions in one
        x = x.view(-1, 7 * 7 * 40)
        
        # Classify the images
        x = self.classifier(x)
        return x

## The problem of overfitting

### Validation set

<p>You saw the need for validation set in the previous video. Problem is that the datasets typically are not separated into training, validation and testing. It is your job as a data scientist to split the dataset into training, testing and validation. The easiest (and most used) way of doing so is to do a random splitting of the dataset. In PyTorch, that can be done using <code>SubsetRandomSampler</code> object. You are going to split the training part of <code>MNIST</code> dataset into training and validation. After randomly shuffling the dataset, use the first <code>55000</code> points for training, and the remaining <code>5000</code> points for validation.</p>

Instructions
<ul>
<li>Use <code>numpy.arange()</code> to create an array containing numbers [0, 59999] and then randomly shuffle the array.</li>
<li>In the <code>train_loader</code> using <code>SubsetRandomSampler()</code> use the first <code>55k</code> points for training.</li>
<li>In the <code>val_loader</code> use the remaining <code>5k</code> points for validation.</li>
</ul>

In [None]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms

# Shuffle the indices
indices = np.arange(60000)
np.random.shuffle(indices)

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307, ), (0.3081, ))
])

# Build the train loader
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('mnist', download=True, train=True, transform=transform),
    batch_size=64, shuffle=False, sampler=torch.utils.data.SubsetRandomSampler(indices[:55000])
)

# Build the validation loader
val_loader = torch.utils.data.DataLoader(
    datasets.MNIST('mnist', download=True, train=True, transform=transform),
    batch_size=64, shuffle=False, sampler=torch.utils.data.SubsetRandomSampler(indices[55000:])
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/MNIST/raw/train-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=9912422.0), HTML(value='')))


Extracting mnist/MNIST/raw/train-images-idx3-ubyte.gz to mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to mnist/MNIST/raw/train-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=28881.0), HTML(value='')))


Extracting mnist/MNIST/raw/train-labels-idx1-ubyte.gz to mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw/t10k-images-idx3-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=1648877.0), HTML(value='')))


Extracting mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz


HBox(children=(FloatProgress(value=0.0, max=4542.0), HTML(value='')))


Extracting mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw
Processing...
Done!


  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)


### Detecting overfitting

<div class=""><p>Overfitting is arguably the biggest problem in machine learning and data science, and being able to detect it will make you a much better data scientist. While reaching a high (or even perfect) accuracy on training sets is quite easy when you use neural networks, reaching a high accuracy on validation and testing sets is a very different thing.</p>
<hr>
<p>Let's see if you can now detect overfitting. Amongst the accuracy scores below, which network presents the biggest overfitting problem.
?</p></div>

<pre>
Possible Answers
The accuracy in the training set is 90%, the accuracy in the validation set is 88%.
The accuracy in the training set is 90%, the accuracy in the testing set is 70%.
<b>The accuracy in the training set is 90%, the accuracy in the validation set is 70%.</b>
The accuracy in the validation set is 85%, the accuracy in the testing set is 82%.

</pre>

**The accuracy in the training set is much higher than in the validation set, this is a typical example of overfitting.**

## Regularization techniques

### L2-regularization

<p>You are going to implement each of the regularization techniques explained in the previous video. Doing so, you will also remember important concepts studied throughout the course. You will start with l2-regularization, the most important regularization technique in machine learning. As you saw in the video, l2-regularization simply penalizes large weights, and thus enforces the network to use only small weights.</p>

Instructions
<ul>
<li>Instantiate an object called <code>model</code> from class <code>Net()</code>, which is available in your workspace (consider it as a blackbox).</li>
<li>Instantiate the cross-entropy loss.</li>
<li>Instantiate <code>Adam</code> optimizer with <code>learning_rate</code> equals to <code>3e-4</code>, and <code>l2</code> regularization parameter equals to <code>0.001</code>.</li>
</ul>

In [None]:
# Instantiate the network
model = Net()

# Instantiate the cross-entropy loss
criterion = nn.CrossEntropyLoss()

# Instantiate the Adam optimizer
optimizer = optim.Adam(model.parameters(), lr=3e-4, weight_decay=0.001)

**When you will start using bigger networks, this might well make the difference between your network overfitting or not.**

### Dropout

<div class=""><p>You saw that dropout is an effective technique to avoid overfitting. Typically, dropout is applied in fully-connected neural networks, or in the fully-connected layers of a convolutional neural network. You are now going to implement dropout and use it on a small fully-connected neural network. </p>
<p>For the first hidden layer use <code>200</code> units, for the second hidden layer use <code>500</code> units, and for the output layer use <code>10</code> units (one for each class). For the activation function, use ReLU. Use <code>.Dropout()</code> with strength <code>0.5</code>, between the first and second hidden layer. Use the sequential module, with the order being: <code>fully-connected</code>, <code>activation</code>, <code>dropout</code>, <code>fully-connected</code>, <code>activation</code>, <code>fully-connected</code>.</p></div>

Instructions 1/2
<li>Implement the <code>__init__</code> method, based on the description of the network in the context.</li>

Instructions 2/2
<li>Apply the forward pass in the <code>forward()</code> method.</li>

In [None]:
class Net(nn.Module):
    def __init__(self):
        
        # Define all the parameters of the net
        self.classifier = nn.Sequential(
            nn.Linear(28*28, 200),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(200, 500),
            nn.ReLU(inplace=True),
            nn.Linear(500, 10))
        
    def forward(self, x):
    
    	# Do the forward pass
        return self.classifier(x)

### Batch-normalization

<div class=""><p>Dropout is used to regularize fully-connected layers. Batch-normalization is used to make the training of convolutional neural networks more efficient, while at the same time having regularization effects. You are going to implement the <code>__init__</code> method of a small convolutional neural network, with batch-normalization. The feature extraction part of the CNN will contain the following modules (in order): <code>convolution</code>, <code>max-pool</code>, <code>activation</code>, <code>batch-norm</code>, <code>convolution</code>, <code>max-pool</code>, <code>relu</code>, <code>batch-norm</code>.</p>
<p>The first convolutional layer will contain 10 output channels, while the second will contain 20 output channels. As always, we are going to use MNIST dataset, with images having shape (28, 28) in grayscale format (1 channel). In all cases, the size of the <code>filter</code> should be 3, the <code>stride</code> should be 1 and the <code>padding</code> should be <code>1</code>.</p></div>

Instructions
<ul>
<li>Implement the feature extraction part of the network, using the description in the context.</li>
<li>Implement the fully-connected (classifier) part of the network.</li>
</ul>

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        
        # Implement the sequential module for feature extraction
        self.features = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(2, 2), nn.ReLU(inplace=True), nn.BatchNorm2d(10),
            nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(2, 2), nn.ReLU(inplace=True), nn.BatchNorm2d(20))
        
        # Implement the fully connected layer for classification
        self.fc = nn.Linear(in_features=20 * 7 * 7, out_features=10)

## Transfer learning

### Finetuning a CNN

<div class=""><p>Previously, you trained a model to classify handwritten digits and saved the model parameters to <code>my_net.pth</code>. Now you're going to classify handwritten letters, but you have a smaller training set.</p>
<p>In the first step, you'll create a new model using this training set, but the accuracy will be poor. Next, you'll perform the same training, but you'll start with the parameters from your digit classifying model. Even though digits and letters are two different classification problems, you'll see that using information from your previous model will dramatically improve this one.</p></div>

In [None]:
class Net(nn.Module):
    def __init__(self):
      super(Net, self).__init__()

      # instantiate all 3 linear layers
      self.conv1 = nn.Conv2d(1, 128, 3, padding=1)
      self.pool = nn.MaxPool2d(2, 2)
      self.conv2 = nn.Conv2d(128, 256, 3, padding=1)
      self.conv3 = nn.Conv2d(256, 512, 3, padding=1)
      self.fc = nn.Linear(7 * 7 * 512, 10)

      # Set dummy parameters
      self.train_mode = False
      self.is_trained = False
      self.previous_state_loaded = False
    def model_load(self, path):
      try:
        model.load_state_dict(torch.load(path))
        self.previous_state_loaded = True
      except ValueError:
        raise TypeError('Please input a path to an existing model.')
    def forward(self, x):
      x = self.pool(F.relu(self.conv1(x)))
      x = self.pool(F.relu(self.conv2(x)))
      x = F.relu(self.conv3(x))
      x = x.view(-1, 7 * 7 * 512)
      return self.fc(x)
    def train(self):
      self.train_mode = True
    def eval(self):
      # :D
      # Was fc entered correctly?
      if type(self.fc) == type(nn.Linear(7 * 7 * 512, 26)):
        # Were the number of out channel changes?
        if self.fc.out_features == 26:
          # Was the model set to train mode?
          if self.train_mode:
            # Was the model actually trained?
            if self.is_trained:
              # Is this the previously trained model?
              if self.previous_state_loaded:
                return 0.84
              # This is the naieve model
              else:
                return 0.57
            else:
              raise ValueError('Did you remember to train your model?')
          else:
            raise ValueError('Did you remember to set your model to train mode?')
        else:
          raise ValueError('There should be 26 out channels for the 26 letters of the alphabet.')
      else:
        raise ValueError('Did you remember to defined model.fc?')

In [None]:
def train_net(model, optimizer, criterion):
  # Check that model is a Net
  if type(model) == type(Net()):
    # Check that optimizer is an Adam
    if type(optimizer) == type(optim.Adam(model.parameters(), lr=3e-4)):
      # Check that criterion is CrossEntropyLoss
      if type(criterion) == type(nn.CrossEntropyLoss()):
        model.is_trained = True
      else:
        raise TypeError('criterion should be of type CrossEntropyLoss.')
    else:
      raise TypeError('optimizer should be of type  Adam Optimizer.')
  else:
    raise TypeError('model should be of type Net().')

Instructions 1/2
<ul>
<li>Create a new model using the <code>Net()</code> module.</li>
<li>Change the number of output units, to the number of classifications for letters.</li>
</ul>

In [None]:
# Create a new model
model = Net()

# Change the number of out channels
model.fc = nn.Linear(7 * 7 * 512, 26)

# Train and evaluate the model
model.train()
train_net(model, optimizer, criterion)
print("Accuracy of the net is: " + str(model.eval()))

Accuracy of the net is: 0.57


Instructions 2/2
<li>Repeat the training process, but first load the digit classifier parameters from <code>my_net.pth</code>.</li>

In [None]:
torch.save(model.state_dict(), 'my_net.pth') #only for test

In [None]:
# Create a model using
model = Net()

# Load the parameters from the old model
model.model_load('my_net.pth')

# Change the number of out channels
model.fc = nn.Linear(7 * 7 * 512, 26)

# Train and evaluate the model
model.train()
train_net(model, optimizer, criterion)
print("Accuracy of the net is: " + str(model.eval()))

Accuracy of the net is: 0.84


**By incorporating information from the previously trained model, we are able to get a good model for handwritten digits, even with a small training set!**

### Torchvision module

<div class=""><p>You already finetuned a net you had pretrained. In practice though, it is very common to finetune CNNs that someone else (typically the library's developers) have pretrained in ImageNet. Big networks still take a lot of time to be trained on large datasets, and maybe you cannot afford to train a large network on a dataset of 1.2 million images on your laptop.</p>
<p>Instead, you can simply download the network and finetune it on your dataset. That's what you will do right now. You are going to assume that you have a personal dataset, containing the images from all your last <code>7</code> holidays. You want to build a neural network that can classify each image depending on the holiday it comes from. However, since the dataset is so small, you need to use the finetuning technique.</p></div>

Instructions
<ul>
<li>Import the module that lets you download state-of-the-art CNNs.</li>
<li>Download and load a pretrained ResNet18 network.</li>
<li>Freeze all the layers bar the final one.</li>
<li>Change the last layer to correspond to the number of classes (<code>7</code>) in your dataset.</li>
</ul>

In [None]:
# Import the module
import torchvision

# Download resnet18
model = torchvision.models.resnet18(pretrained=True)

# Freeze all the layers bar the last one
for param in model.parameters():
    param.requires_grad = False

# Change the number of output units
model.fc = nn.Linear(512, 7)

Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth


HBox(children=(FloatProgress(value=0.0, max=46827520.0), HTML(value='')))


