# Part 2. Deep Learning Frameworks

Before we go into deep learning modelling, we will first need to have a quick familiarisation with a deep learning framework. We recommend __[Keras](https://keras.io)__, which is built on top of Tensorflow, but alternatively, you can consider __[PyTorch](https://pytorch.org)__. Resources are abundant online on how to use them, but here are some official guides to get you started:
- PyTorch has a [60 Minute Blitz Guide](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html)
- Tensorflow has an [Intro to Keras guide](https://www.tensorflow.org/guide/keras)

A few words on the difference between Keras and PyTorch - Keras is a high level wrapper on top of Google's Tensorflow, the most popular deep learning framework out there. Being more low level, Tensorflow faces many issues and troubles, which are addressed by the abstractions of Keras, making it a great way to start. Facebook's PyTorch on the other hand is a newcomer which has received massive interest in recent years, and is playing catch up to Tensorflow/Keras.

If you are more interested in how deep learning software has evolved since the days of Caffe and Theano as well as more in depth into what is happening in the software behind the scenes, we also recommend a [full lecture from Stanford](https://www.youtube.com/watch?v=6SlgtELqOWc) on this topic, although this is extra knowledge that isn't fully critical to this week.

Base on the tutorials you go through, you should be ready to build a 2 (or more) layer Multi-Level Perceptron (MLP) with deep learning. With the dataset you have prepared your machine learning model in the previous section, run your data through a MLP model with `Dense` (`Linear`) layers instead. Do some slight model adjustments, and discuss what kind of adjustments lead to improvements in score.

In [1]:
import src.week4_func as wk4
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.preprocessing import normalize, StandardScaler
from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset, DataLoader, TensorDataset
import torch.optim as optim
import pandas as pd

## 1. Data transformations/preprocessing

Most neural networks expect the images of a fixed size. Therefore, you will need to write some prepocessing code. At the basic level, you will need to normalise the data. Use the appropriate data generator/loader methods to encapsulate your data for training purposes. Do the same for both the train and test (and val, if exist) sets.

In [2]:
# load and normalize data
df_1 = wk4.just_dataframes('./data/cifar-10-batches-py/data_batch_1')
df = pd.concat([df_1,wk4.just_dataframes('./data/cifar-10-batches-py/data_batch_2')],axis=0)
df = pd.concat([df,wk4.just_dataframes('./data/cifar-10-batches-py/data_batch_3')],axis=0)
df = pd.concat([df,wk4.just_dataframes('./data/cifar-10-batches-py/data_batch_4')],axis=0)
df = pd.concat([df,wk4.just_dataframes('./data/cifar-10-batches-py/data_batch_5')],axis=0)
df_test = wk4.just_dataframes('./data/cifar-10-batches-py/test_batch')
X_all = df.drop('target',axis=1).values/255

# X_train,X_val,y_train,y_val = train_test_split(X_all,df['target'],test_size=0.05,random_state=42,shuffle=True)
# X_train,X_test,y_train,y_test = train_test_split(X_train,y_train,test_size=0.21,random_state=42,shuffle=True)
X_train = X_all
y_train = df['target'].values
X_test = df_test.drop('target',axis=1).values/255
y_test = df_test['target'].values
print(X_train.shape)

(50000, 3072)


In [3]:
# Build data object
class cifar10(Dataset):
    '''Inherited class from torch.utils.data.Dataset
    this method is more useful if you are dynamically reading samples
    from disk. Otherwise, just use the TensorDataset method.'''
    def __init__(self, X_train, y_train, transform=None):
        self.X = torch.from_numpy(X_train).float()
        self.y = torch.from_numpy(y_train)
        self.y2 = y_train
        self.transform = transform
    def __len__(self):
        return len(self.y2)
    def __getitem__(self, idx):
        image = self.X[idx,:]
        target = self.y[idx]
        sample = {'image': image, 'target': target}
        if self.transform:
            sample = self.transform(sample)
        return sample

# cifar_data = cifar10(X_train,y_train)
# trainloader = DataLoader(cifar_data, batch_size=10, shuffle=True)
# testloader = DataLoader(cifar10(X_test,y_test), batch_size=10, shuffle=False)

'''Found a faster way to load data'''
train_data = TensorDataset(torch.from_numpy(X_train).float(),torch.from_numpy(y_train))
trainloader = DataLoader(train_data,batch_size=10,shuffle=True)
test_data = TensorDataset(torch.from_numpy(X_test).float(),torch.from_numpy(y_test))
testloader = DataLoader(test_data,batch_size=10,shuffle=False)
# for i in range(len(cifar_data)): # Just print this to make sure it worked. '0 torch.Size([3072]) torch.Size([])'
#     sample = cifar_data[i]
#     print(i, sample['image'].size(), sample['target'].size())
#     if i == 3:
#         break

## 2.  Build multi-layer perceptron neural network models with Keras 

The Keras Python library for deep learning focuses on the creation of models as a sequence of layers.

In here, you will discover the simple components that you can use to create neural networks and simple deep learning models using Keras.

In [4]:
# play cheat using torchvision dataset
# import torchvision
# import torchvision.transforms as transforms
# transform = transforms.Compose(
#     [transforms.ToTensor(),
#      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
#                                         download=True, transform=transform)
# trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
#                                           shuffle=True, num_workers=2)

# testset = torchvision.datasets.CIFAR10(root='./data', train=False,
#                                        download=True, transform=transform)
# testloader = torch.utils.data.DataLoader(testset, batch_size=4,
#                                          shuffle=False, num_workers=2)

# classes = ('plane', 'car', 'bird', 'cat',
#            'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

In [5]:
# class Net(nn.Module):
#     def __init__(self):
#         super(Net, self).__init__()
#         self.conv1 = nn.Conv2d(3, 6, 5)
#         self.pool = nn.MaxPool2d(2, 2)
#         self.conv2 = nn.Conv2d(6, 16, 5)
#         self.fc1 = nn.Linear(16 * 5 * 5, 120)
#         self.fc2 = nn.Linear(120, 84)
#         self.fc3 = nn.Linear(84, 10)

#     def forward(self, x):
#         x = self.pool(F.relu(self.conv1(x)))
#         x = self.pool(F.relu(self.conv2(x)))
#         x = x.view(-1, 16 * 5 * 5)
#         x = F.relu(self.fc1(x))
#         x = F.relu(self.fc2(x))
#         x = self.fc3(x)
#         return x

class FlatNet(nn.Module):
    '''Build 2-layer NN'''
    def __init__(self):
        super(FlatNet, self).__init__()
        self.fc1 = nn.Linear(32*32*3, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
    
net = FlatNet()

In [6]:
# define loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [7]:
inputs, labels = next(iter(trainloader))
inputs

tensor([[0.4078, 0.3451, 0.4275,  ..., 0.1569, 0.2000, 0.0863],
        [0.0784, 0.0824, 0.0784,  ..., 0.0941, 0.0902, 0.0902],
        [0.1608, 0.2667, 0.2353,  ..., 0.3137, 0.3451, 0.3529],
        ...,
        [0.4980, 0.4627, 0.4549,  ..., 0.2588, 0.2745, 0.2471],
        [0.3843, 0.4588, 0.4902,  ..., 0.3176, 0.3137, 0.3059],
        [0.7647, 0.6353, 0.6196,  ..., 0.6314, 0.6196, 0.6824]])

In [8]:
def train_model(net,trainloader,criterion,optimizier):
    for epoch in range(5):  # loop over the dataset multiple times

        running_loss = 0.0
        for i,data in enumerate(trainloader):
            # get the inputs
#             inputs, labels = data['image'] , data['target'] # using the self-constructed data object
            inputs, labels = data #otherwise

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            if i % 2000 == 1999:    # print every 200 mini-batches
                print('[%d, %5d] loss: %.3f' %
                      (epoch + 1, i + 1, running_loss / 2000))
                running_loss = 0.0

    print('Finished Training')
    return net
net = train_model(net,trainloader,criterion,optimizer)

[1,  2000] loss: 1.969
[1,  4000] loss: 1.804
[2,  2000] loss: 1.693
[2,  4000] loss: 1.671
[3,  2000] loss: 1.602
[3,  4000] loss: 1.597
[4,  2000] loss: 1.559
[4,  4000] loss: 1.546
[5,  2000] loss: 1.521
[5,  4000] loss: 1.499
Finished Training


In [9]:
def scoring(net,testloader):
    correct = 0
    total = 0
    with torch.no_grad():
        for data in testloader:
#             images, labels = data['image'],data['target'] # if using object data constructor
            images, labels = data
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: %d %%' % (
        100 * correct / total))
scoring(net,testloader)

Accuracy of the network on the 10000 test images: 44 %


## 3. Train the MLP network in CIFAR-10

The main objective is to train the MLP network to achieve a balance between the ability to respond correctly to the input patterns that are used for training and the ability to provide good response to the input that is similar. Use the stochastic gradient descent optimiser with an appropriate learning rate between 1e-2 and 1e-3. Report your evaluation loss and accuracy, and you can also consider doing things like early stopping to prevent overfitting and achieve the best model.

In [10]:
import numpy as np
for lr in np.linspace(1e-3,1e-2,10):
    optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9)
    net = train_model(net,trainloader,criterion,optimizer)
    scoring(net,testloader)

[1,  2000] loss: 1.477
[1,  4000] loss: 1.479
[2,  2000] loss: 1.445
[2,  4000] loss: 1.450
[3,  2000] loss: 1.417
[3,  4000] loss: 1.431
[4,  2000] loss: 1.402
[4,  4000] loss: 1.408
[5,  2000] loss: 1.387
[5,  4000] loss: 1.388
Finished Training
Accuracy of the network on the 10000 test images: 48 %
[1,  2000] loss: 1.487
[1,  4000] loss: 1.484
[2,  2000] loss: 1.450
[2,  4000] loss: 1.449
[3,  2000] loss: 1.421
[3,  4000] loss: 1.433
[4,  2000] loss: 1.402
[4,  4000] loss: 1.412
[5,  2000] loss: 1.387
[5,  4000] loss: 1.395
Finished Training
Accuracy of the network on the 10000 test images: 47 %
[1,  2000] loss: 1.445
[1,  4000] loss: 1.468
[2,  2000] loss: 1.443
[2,  4000] loss: 1.453
[3,  2000] loss: 1.431
[3,  4000] loss: 1.424
[4,  2000] loss: 1.410
[4,  4000] loss: 1.412
[5,  2000] loss: 1.397
[5,  4000] loss: 1.420
Finished Training
Accuracy of the network on the 10000 test images: 44 %
[1,  2000] loss: 1.467
[1,  4000] loss: 1.493
[2,  2000] loss: 1.457
[2,  4000] loss: 1.472