### HDS-M05: Deep Learning

Module Leaders: Bartek Papiez and Sharib Ali

Practical Leader and prepared by: Sharib Ali, PhD


### Required packages
[1] [numpy](http://www.numpy.org) is package for scientific computing with python

[2] [h5py](http://www.h5py.org) is package to interact with compactly stored dataset

[3] [matplotlib](http://matplotlib.org) can be used for plotting graphs in python

[4] [pytorch](https://pytorch.org/docs/stable/index.html) is library widely used for bulding deep-learning frameworks

### Ingrediants of DNN and CNN

**What will you learn here?**

Here you will build an *l-* layer neural network using pytorch. Each layer can comprise of one or more nodes. Pytorch provides a module **nn** that comprises of building blocks for networks making it easy to code.You will build both Multi-Layer Perceptron (using Fully connected layers)and Convolutional neural networks (using Fully convolutional layers).

<u>For DNN concentrate on</u>:
- Input Units
- Hidden Units
- Output Units
- Activation functions

<u>For CNN concentrate on</u>:
- Convolution blocks
- Batch normalisation
- Maxpooling layers
- Activation functions
- Flattenning at the last layer

In [None]:
import torch
from torch import nn
import numpy as np

# always check your version
print(torch.__version__)


In [None]:
# above you have samples and labels separately while pytorch dataloader for classification takes the folders
# We have provided you a helper function to fix this

def compile_dataset_sampleLabelPair(x, y):
    return TensorDataset(torch.from_numpy(x).float(), torch.from_numpy(y).long())

### Design a network class for compiling your model 
##### Remember this should have all the above units

nn.Linear module automatically creates a linear transformation with automatically created bias and weight tensors that is used in the forward feed. You can access these using model.hidden.weight or model.hidden.bias 

Recall: $z^{(i)} = w^T x^{(i)} + b \tag{1}$

In [None]:
class myNetwork(torch.nn.Module):
    def __init__(self, D_in, H, H2,  D_out):
        super(myNetwork, self).__init__()
        self.fc1 = torch.nn.Linear(D_in, H)
        self.fc2 = torch.nn.Linear(H, H2)
        self.fc3 = torch.nn.Linear(H2, D_out)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        x = self.fc3(x)
        return x

In [None]:
import torch.nn.functional as F

In [None]:
# call your model here
#model = myNetwork(12288, 128, 128,  2)
model = myNetwork(784, 128, 128, 10)
print(model)

We have 784*(128+1) + 128*(128+1) + 128*(10+1) = ? parameters to train


### Training your model

In [None]:
# 1] create your optimiser
import torch.optim as optim
learning_rate = 0.01 # set your learning rate (crucial)
optimiser = optim.SGD(model.parameters(), lr = learning_rate,weight_decay=1e-6, momentum = 0.9)

# 2] identify your loss function to optimise (its like a cost)
# Note: For classification we will be using cross-entropy (if binary then use binary cross-entropy!)
criterion = nn.CrossEntropyLoss()

In [None]:
# 3] build your data loading
# you can use either pytorch one or a custom dataloader
# training set
from torch.utils.data import DataLoader, TensorDataset

# load MNIST data
from torchvision.datasets import MNIST
from torchvision import transforms

# This is critical, for 3 channel images they will be different, please change here with mean and std.
mean = (0.5)
std = (0.5)
_tasks = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
    ])

mnist = MNIST("data", download=True, train=True, transform=_tasks)

batch_size = 256
shuffle = True # when not sampler used
num_workers = 2
from torch.utils.data.sampler import SubsetRandomSampler

## create training and validation split 
split = int(0.8 * len(mnist))
index_list = list(range(len(mnist)))
train_idx, valid_idx = index_list[:split], index_list[split:]

## create sampler objects using SubsetRandomSampler
tr_sampler = SubsetRandomSampler(train_idx)
val_sampler = SubsetRandomSampler(valid_idx)

## create iterator objects for train and valid datasets
train_loader = DataLoader(mnist, batch_size=batch_size, sampler=tr_sampler, num_workers=num_workers)
valid_loader = DataLoader(mnist, batch_size=batch_size, sampler=val_sampler, num_workers=num_workers)

In [None]:
# always check the shape of your training data
dataiter = iter(train_loader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

# visualizing the training images
import matplotlib.pyplot as plt
plt.imshow(images[0].numpy().squeeze(), cmap='gray')

In [None]:
def topk_accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    maxk = max(topk)
    batch_size = target.size(0)
    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

In [None]:
# 4] Run your training loop with optimiser trying to minimise your cost/loss, dont forget to backpropagate your loss
device = 'cuda' # set your model to your gpu or cpu (depending on your hardware availability!)
model.to(device)
model.train()

# define no. of epochs you want to loop 
epochs = 10
log_interval = 2
for epoch in range(epochs):
    train_loss, valid_loss, epoch_accuracy_top1,epoch_accuracy_top5  = [], [], [], []
    
    for batch_idx, (data, target) in enumerate(train_loader):
        #img = (data).view(-1, 64*64*3)
        img = (data).view(-1, 28*28)
        
        # initialise all your gradients to zero
        optimiser.zero_grad()
        out = model(img.to(device))
        loss = criterion(out, target.to(device))
        loss.backward()
        optimiser.step()
        
        # append
        train_loss.append(loss.item())
        acc_1 = topk_accuracy(out, target.to(device),topk=(1,))
        epoch_accuracy_top1.append(acc_1[0].item())
        
        # TODO: perform validation here (use 20% of your test data)
        # Your code here
        
        
        
        
        
        ## TODO: include validation loss and accuracy top 1% and top 5% and print
        if (batch_idx % log_interval) == 0:
            print('Train Epoch is: {}, train loss is: {:.6f}, train accuracy top1% is {}'.format(epoch, np.mean(train_loss),
                                                                                               np.mean(epoch_accuracy_top1)))
        
            # your code here for printing validation, alternatively you can add above!
            
            

In [None]:
# Test predictions (This usually is separate from validation data, here we will use valid data)
dataiter = iter(valid_loader)
data, labels = dataiter.next()
img = (data).view(-1, 28*28)
output = model(img.to(device))

_, preds_tensor = torch.max(output, 1)


In [None]:
preds = np.squeeze(preds_tensor.detach().cpu().numpy())
print ("Actual:", labels[:10])
print ("Predicted:", preds[:10])

##### Train a CNN model on the same data

    Here, you will be building a CNN network with convolution filters of fixed kernel sizes
    
    You will need to understand the shape of image and how to use kernels and max-pooling layers
    

In [None]:
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        
        ## define the layers: you can write this all as nn.Sequence to call in one line
        # first 2D convolution layer
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.act = nn.ReLU(inplace=True)
        self.pooling = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # second 2D convolution layer
        self.conv2 = nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(16)
        
        # this is redundant, you can use above ones
        self.act2 = nn.ReLU(inplace=True)
        self.pooling2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        
        self.fc = nn.Linear(16*7*7, 10)

    
    def forward(self, x):
        
        x = self.pooling(self.act(self.bn1(self.conv1(x))))
        x = self.pooling2(self.act2(self.bn2(self.conv2(x))))
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        
        return F.log_softmax(x, dim=1)

model = Model()

In [None]:
import torch.optim as optim
loss_function = nn.CrossEntropyLoss()

learning_rate = 0.01 # set your learning rate (crucial)

# set your optimiser (alternatively you can select other optimisers like Adam, RMSProp etc.)
optimiser = optim.SGD(model.parameters(), lr = learning_rate,weight_decay=1e-6, momentum = 0.9)

In [None]:
data.shape

In [None]:
# training starts here (you can make this as a function)
model.to(device)
for epoch in range(1, 31):
    train_loss, valid_loss = [], []
    model.train()
    
    # Todo: 1) write your training pipeline
    # 2) write validation for your model
    # 3) Print train and the validation loss and their top 1% accuracy
    
    # your code here!
 
        

Double-click __here__ for the partial solution.
<!-- Your answer is below:
    for batch_idx, (data, target) in enumerate(train_loader):
        optimiser.zero_grad()
        output = model(data.to(device))
        loss = loss_function(output, target.to(device))
        loss.backward()
        optimiser.step()
        train_loss.append(loss.item()) 
        
        if (batch_idx % log_interval) == 0:
            print('Train Epoch is: {}, train loss is: {:.6f}'.format(epoch, np.mean(train_loss)))
             
    model.eval()
    for data, target in valid_loader:
        output = model(data)
        loss = loss_function(output, target.to(device))
        valid_loss.append(loss.item()) 
-->

In [None]:
# TODO: visualise your test input image and predicted image from your model
# your code here

### Data augmentation to avoid overfitting
##### Take help from practical instructors

You can follow below steps:

- Convert your training section into a function
- Build a simple augmentation and pass your data again into your dataloader
- Call your training function



### Exercise: Use a different ipython notebook to classify CIFAR-10 images using AlexNet


<img src="images/AlexNet.png" style="width:800px;height:200px;">
<caption><center> <u>Figure</u>: AlexNet for image classification.</center></caption>

<u> Detailed Architecture of above figure </u>:

[1] CIFAR10 dataset is available in torchvision.datasets

[2] 10 classes are present with image size of 32x32x3 (color RGB) 

[3] 60,000 image present

[4] You will create an **AlexNet** architecture and train on this dataset
More info on dataset here: <https://www.cs.toronto.edu/~kriz/cifar.html>

*Hint: AlexNet was originally designed for 224x224x3 image sizes so for it to work on this data remove the last maxpool layer from the feature block and you'll end up with a 256x1x1 matrix*


``Alternatively, you can use Breast Cancer Histology dataset <https://iciar2018-challenge.grand-challenge.org/Dataset/>. It will be made available to you.``

Reference: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf


Due on Wednesday 4th November, 2020 (*You will be graded for this exercise*)

<h3>Thanks for completing this lesson!</h3>

Any comments or feedbacks, please send to [Sharib Ali](sharib.ali@eng.ox.ac.uk)