<a href="https://colab.research.google.com/github/mohamedssafini/pyTorch/blob/master/Deep_Learning_Frameworks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning II: Deep Learning
# Deep Learning Frameworks

In this tutorial, we will learn to build neural networks using PyTorch and Skorch. All the code here can be run on Google Colab directly, and results will be displayed in our browser.

---

To run this tutorial,

1. At the top-right of the menu bar, choose *connect to hosted runtime*.
2. In the menu, choose *Runtime -> Run all*.

Install PyTorch and Skorch.

In [0]:
!pip install -q torch skorch torchvision

In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import sklearn
import skorch
import matplotlib.pyplot as plt
import numpy as np

Check if GPU is available on the current machine. For this notebook, the answer should be true.

Note this doesn't necessarily mean everything runs on GPU by default.

In [0]:
torch.cuda.is_available()

## Tensor basics

In deep learning frameworks, data are represented by tensors. Let's review some tensor basics before we go deeper.

We first load a classical image and represent it as a tensor in PyTorch.

In [0]:
!wget https://upload.wikimedia.org/wikipedia/en/7/7d/Lenna_%28test_image%29.png -O lenna.png

In [0]:
from PIL import Image

np_image = np.array(Image.open("lenna.png"))
image = torch.as_tensor(np_image)
plt.imshow(image)

By convention
- **dimension** refers to an axis of the tensor
- **size** refers to the length of axis in the tensor
- **index** refers to a specific coordinate in the tensor

The above image is a 512x512x3 tensor of `uint8`. The last size `3` corresponds to the RGB channels.

In [0]:
print(image.shape)
print(image.ndim)
print(image.dtype)

Operations on tensors are similar to their matrix counterparts.

Compute the mean value along an axis.

In [0]:
channel_mean = image.float().mean(dim=2)
print(channel_mean.shape)
plt.imshow(channel_mean, cmap="gray")

Slice the first half of an axis.

In [0]:
horizontal_crop = image[:, :256, :]
print(horizontal_crop.shape)
plt.imshow(horizontal_crop)

Transpose (exchange) two axes.

In [0]:
transposition = image.transpose(0, 1)
print(transposition.shape)
plt.imshow(transposition)

Extend a tensor. Repeat the tensor to form a batch of images.

In [0]:
def plot(data, labels=None, num_sample=5):
  n = min(len(data), num_sample)
  for i in range(n):
    plt.subplot(1, n, i+1)
    plt.imshow(data[i], cmap="gray")
    plt.xticks([])
    plt.yticks([])
    if labels is not None:
      plt.title(labels[i])

batch = image.unsqueeze(0).repeat(3, 1, 1, 1)
print(batch.shape)
plot(batch)

Reshape a tensor.

The tensor is converted into a long thin matrix. This is a common approach if we want to apply transformations over multiple axes.

e.g. transformation over all pixels.

In [0]:
flat = image.flatten(0, 1)
print(flat.shape)
plt.imshow(flat)

## Practice of basic tensor operations and auto-gradient

**1. Softmax on a vector**

Assume that $w$ is a vector of size $d$. Our goal is to compute the $\text{softmax}(w)$, where $\text{softmax}(w)_i = \frac{\exp(w_i)}{\sum_{k=1}^d \exp(w_k)}$.

In [0]:
w = torch.tensor([1., 2., 3., 4., 5.])

# TO DO:
# Compute the softmax function on w

**2. KL divergence between two categorical distributions**

Assume that $p$ and $q$ are two $d$-dimensional categorical distributions. The goal is to compute the KL divergence $\text{KL}(q, p) = \mathbb{E}_q[\log \frac{q}{p}] = \sum_x q(x) \log \frac{q(x)}{p(x)}$.

In [0]:
p = torch.tensor([0.1, 0.2, 0.3, 0.4])
q = torch.tensor([0.4, 0.3, 0.2, 0.1])

# TO DO:
# Compute the KL divergence between q and p

**3. Compute the derivative for a function**

Consider a function $f(x) = \exp(x^3 \sin (\log x))$. The goal is to compute $f'(2)$.

In [0]:
# To DO:
# Compute f'(2)

## A linear classifier

Next, we use a simple linear classifier to illustrate how we can do model training in PyTorch through auto-gradient.

In [0]:
!wget https://raw.githubusercontent.com/mnqu/mnqu.github.io/master/data/toy_data.train
!wget https://raw.githubusercontent.com/mnqu/mnqu.github.io/master/data/toy_data.test

import numpy as np
import torch

# Load the data.
data_train = np.loadtxt("/content/toy_data.train")
data_test = np.loadtxt("/content/toy_data.test")

x_train = torch.Tensor(data_train[:,0:2])
y_train = torch.Tensor(data_train[:,2])

x_test = torch.Tensor(data_test[:,0:2])
y_test = torch.Tensor(data_test[:,2])

w = torch.tensor([[0.], [0.]], requires_grad=True)
b = torch.tensor(0., requires_grad=True)

for epoch in range(100):
  # Make the prediction.
  pred_train = torch.mm(x_train, w).squeeze_() + b

  # Compare the prediction with the ground-truth outputs.
  # Compute a scalar loss.
  loss = (pred_train - y_train).pow(2).mean()

  if epoch != 0:
    w.grad.zero_()
    b.grad.zero_()

  # Compute the gradient for the model parameters.
  loss.backward()

  # Update model parameters.
  w.data = w.data - 0.01 * w.grad
  b.data = b.data - 0.01 * b.grad

pred_test = torch.mm(x_test, w).squeeze_() + b
pred_test = pred_test.ge(0.5).int()
y_test = y_test.int()
accuracy = torch.eq(pred_test, y_test).sum().float() / y_test.size(0)

print(accuracy)


## A digit recognition classifier

Here we will build a neural network for image classification. We demonstrate with a classical digit recognition dataset, MNIST.

First, let's download the dataset.

In [0]:
train = torchvision.datasets.MNIST("./data", train=True, download=True)
test = torchvision.datasets.MNIST("./data", train=False, download=True)

In [0]:
train

In [0]:
test

In [0]:
type(train)

In [0]:
train[0]

In [0]:
image,label=train[0]

In [0]:
image.size

In [0]:
label

In [0]:
plt.imshow(image,cmap='gist_yarg')

`To have some intuition of the dataset, we visualize some samples.

In [0]:
train.labels = [train.classes[target] for target in train.targets]
plot(train.data, train.labels)

Now we define our models. We start from a simple multi-layer perceptron (MLP) model.

In [0]:
class MLP(nn.Module):
  def __init__(self, input_dim, hidden_dim, output_dim, dropout=0.5):
    super(MLP, self).__init__()
    self.fc1 = nn.Linear(input_dim, hidden_dim)
    self.fc2 = nn.Linear(hidden_dim, output_dim)
    self.dropout = nn.Dropout(dropout)
  
  def forward(self, images):
    x = images.flatten(1)
    x = F.relu(self.fc1(x))
    x = self.dropout(x)
    x = F.softmax(self.fc2(x), dim=-1)
    return x

Then we create an instance of the module, and wrap it with Skorch.

We train our model on MNIST for 20 epochs. i.e. the dataset is passed through for 20 times during training. By specifying `device="cuda"`, we can enjoy the acceleration from GPU.

In [0]:
mlp = MLP(
    input_dim=train.data.shape[1] * train.data.shape[2],
    hidden_dim=128,
    output_dim=len(train.classes))
model = skorch.NeuralNetClassifier(mlp, max_epochs=20, lr=0.1, device="cuda")
model.fit(train.data / 255.0, train.targets)

The training process will output a table of training loss, validation accuracy and validation loss.

The training loss indicates how good training is. The validation accuracy and loss indicates how good the model generalizes to unseen data. Smaller loss and higher accuracy are better.

Let's try to investigate the predictions from our model.

Looks good, uh?

In [0]:
test.mlp_predictions = model.predict(test.data / 255.0)
plot(test.data, test.mlp_predictions)

Quantitatively, we can evaluate the predictions by the average accuracy on the test set.

In [0]:
sklearn.metrics.accuracy_score(test.targets, test.mlp_predictions)

We can use pickle to load / save a model at any time.


In [0]:
import pickle

with open("MLP.pkl", "wb") as fout:
  pickle.dump(model, fout)
with open("MLP.pkl", "rb") as fin:
  model = pickle.load(fin)

## Standard models

In many cases, we want to use some off-the-shelf models for our task. `torchvision` has provided us with a bunch of standard models for image related tasks.

Here we leverage ResNet-18, the 18-layer version of ResNet. Since MNIST has 10 classes, we override the last fully connected layer to output 10 categories.

In [0]:
resnet18 = torchvision.models.resnet18()
resnet18.fc = torch.nn.Linear(resnet18.fc.in_features, len(train.classes))

Because ResNet is designed for colored images, we need to convert the B&W images to RGB ones.

In [0]:
train.color_data = train.data.unsqueeze(1).expand(-1, 3, -1, -1)
test.color_data = test.data.unsqueeze(1).expand(-1, 3, -1, -1)

Train our ResNet model. This make take several minutes.

In [0]:
model = skorch.NeuralNetClassifier(
    resnet18, criterion=torch.nn.CrossEntropyLoss, max_epochs=2, lr=0.1,
    device="cuda")
model.fit(train.color_data / 255.0, train.targets)

It seems ResNet is much better than MLP. Let's take a look at the samples where ResNet performs better.

In [0]:
import numpy as np

test.resnet_predictions = model.predict(test.color_data / 255.0)
indexes = (test.resnet_predictions == test.targets.numpy()) & \
      (test.mlp_predictions != test.targets.numpy())
predictions = np.stack([test.resnet_predictions, test.mlp_predictions], axis=-1)
plot(test.data[indexes], predictions[indexes])

In the plots, the first prediction is from ResNet and the second is from MLP. Generally, such samples are regarded as **hard samples** of the dataset.

A good news is that parameters of standard models are also available in `torchvision`.

Because these parameters are pre-trained on the million-scale ImageNet dataset, they are powerful and may serve as a good initialization for our MNIST dataset.

In [0]:
resnet18 = torchvision.models.resnet18(pretrained=True)
resnet18.fc = torch.nn.Linear(resnet18.fc.in_features, len(train.classes))

In [0]:
model = skorch.NeuralNetClassifier(
    resnet18, criterion=torch.nn.CrossEntropyLoss, max_epochs=2, lr=0.1,
    device="cuda")
model.fit(train.color_data / 255.0, train.targets)

A full list of standard models can be found at https://pytorch.org/docs/stable/torchvision/models.html

## Customize models

### Loss functions

**In this section, you are required to modify some code to get the expected results.**

Here we will try to customize an MLP regression model.

The MLP model contains two linear (aka. fully connected) layers. Like previous classification model, there should be an activation function (e.g. ReLU) between two layers. However, there shouldn't be any activation at the final output.

In [0]:
class MLPRegressor(nn.Module):
  def __init__(self, input_dim, hidden_dim):
    super(MLPRegressor, self).__init__()
    # here are parameter definitions
    self.fc1 = nn.Linear(input_dim, hidden_dim)
    self.fc2 = nn.Linear(hidden_dim, 1)
  
  def forward(self, images):
    # here is the forward function
    # torch.rand() is just for surpassing the errors
    # comment out this line before writing your code
    x = torch.rand((images.shape[0], 1), requires_grad=True, device="cuda")
    return x

Train our MLP regressor.

Hint: Both losses should be under 0.5 if we implement correctly.

In [0]:
mlp_regressor = MLPRegressor(
    input_dim=train.data.shape[1] * train.data.shape[2],
    hidden_dim=128)
model = skorch.NeuralNetRegressor(
    mlp_regressor, criterion=nn.SmoothL1Loss, max_epochs=50, lr=0.05,
    device="cuda")
model.fit(train.data / 255.0, train.targets.float().unsqueeze(1))

We round the regression predictions to get the label for each image.

The accuracy score is expected to be around 65%.

In [0]:
test.regressor_predictions = model.predict(test.data / 255.0).round()
sklearn.metrics.accuracy_score(test.targets, test.regressor_predictions)

### Optimizer

**In this section, you are required to modify some code to get the expected results.**

Optimizer are crucial to training speed. We may try different optimizer and learning rate combination to achieve best training efficiency. Common optimizers are `SGD`, `RMSprop`, `Adagrad` and `Adam`.

What is the minimal epoch to achieve ~96% accuracy? What optimizer and learning rate do you use?

In [0]:
import torch.optim as optim

mlp = MLP(
    input_dim=train.data.shape[1] * train.data.shape[2],
    hidden_dim=128,
    output_dim=len(train.classes))
model = skorch.NeuralNetClassifier(mlp, optimizer=optim.RMSprop, max_epochs=10,
                                   lr=1e-4, device="cuda")
model.fit(train.data / 255.0, train.targets)