<a href="https://colab.research.google.com/github/sdgroeve/Machine-Learning-Course-2days/blob/main/mnist-deep-learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import warnings
warnings.filterwarnings('ignore')

# MNIST digit classification: Deep Learning

Let's load the data again:

In [2]:
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

mnist = fetch_openml('mnist_784', as_frame=False, cache=False)

X = mnist.data.astype('float32')
y = mnist.target.astype('int64')

X = MinMaxScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

For Deep Learning we will use the [PyTorch](https://pytorch.org/) library. 

PyTorch can fit models on a GPU, if available:

In [3]:
import torch
from torch import nn
import torch.nn.functional as F

device = 'cuda' if torch.cuda.is_available() else 'cpu'

print(device)

cuda


## 1. Feed-forward neural network

We can now define a neural network with one hidden layer as a class the inherits `nn.Module`. 

All we need to do is define the `forward()` function:

In [7]:
class myNeuralNetwork(nn.Module):
    def __init__(
            self,
            input_dim,
            hidden_dim,
            output_dim
    ):
        super(myNeuralNetwork, self).__init__()
        self.hidden = nn.Linear(input_dim, hidden_dim)
        #self.hidden_2 = nn.Linear(hidden_dim, hidden_dim)
        self.output = nn.Linear(hidden_dim, output_dim)

    def forward(self, X, **kwargs):
        X = F.relu(self.hidden(X))
        #X = F.relu(self.hidden_2(X))
        #X = F.relu(self.hidden_2(X))
        #X = self.dropout(X)
        X = F.softmax(self.output(X), dim=-1)
        return X

The Python skorch library wraps the PyTorch model fitting such that it can be used similarly to Scikit-learn.

We can initialize the `myNeuralNetwork` architecture as follows:

In [8]:
#first we need to install the skorch library
!pip install skorch

from skorch import NeuralNetClassifier

input_dim = 784
hidden_dim = 100
output_dim = 10

net = NeuralNetClassifier(
    myNeuralNetwork(input_dim,hidden_dim,output_dim),
    max_epochs=20,
    lr=0.1, #learning rate
    device=device,
)

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Now we can call the `fit()` function to train the neural network:

In [9]:
net.fit(X_train, y_train)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.6817[0m       [32m0.8641[0m        [35m0.4297[0m  5.5628
      2        [36m0.3208[0m       [32m0.8961[0m        [35m0.3396[0m  1.7509
      3        [36m0.2712[0m       [32m0.9141[0m        [35m0.2882[0m  1.1184
      4        [36m0.2367[0m       [32m0.9259[0m        [35m0.2524[0m  0.8539
      5        [36m0.2099[0m       [32m0.9343[0m        [35m0.2259[0m  0.8254
      6        [36m0.1887[0m       [32m0.9402[0m        [35m0.2046[0m  0.8441
      7        [36m0.1715[0m       [32m0.9450[0m        [35m0.1869[0m  0.8297
      8        [36m0.1571[0m       [32m0.9503[0m        [35m0.1728[0m  0.8229
      9        [36m0.1448[0m       [32m0.9528[0m        [35m0.1617[0m  0.8724
     10        [36m0.1342[0m       [32m0.9552[0m        [35m0.1532[0m  0.8881
     11        [36m0.1249[0m       [32m0.95

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=myNeuralNetwork(
    (hidden): Linear(in_features=784, out_features=100, bias=True)
    (output): Linear(in_features=100, out_features=10, bias=True)
  ),
)

We can make prediction using the fitted model as follows:

In [10]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

y_predicted = net.predict(X_test)

print("Accuracy = {}%\n".format(accuracy_score(y_test, y_predicted)*100))

print("Classification Report\n {}".format(classification_report(y_test, y_predicted, labels=range(0,10))))

Accuracy = 96.54285714285714%

Classification Report
               precision    recall  f1-score   support

           0       0.98      0.98      0.98      1714
           1       0.97      0.99      0.98      1977
           2       0.97      0.96      0.96      1761
           3       0.96      0.95      0.96      1806
           4       0.96      0.97      0.96      1587
           5       0.96      0.97      0.96      1607
           6       0.96      0.98      0.97      1761
           7       0.97      0.96      0.97      1878
           8       0.96      0.95      0.95      1657
           9       0.95      0.95      0.95      1752

    accuracy                           0.97     17500
   macro avg       0.97      0.97      0.97     17500
weighted avg       0.97      0.97      0.97     17500



## 2. Convolutional neural network

The input of the CNN is the image in 2D, not 1D (flattened) as for the previous neural network.

Images can also have channels. For color images there are typically 3 channels: one for red, one for green, and one for blue.
For gray-scale images there is just one channel. 

For the MNIST data we reshape the datasets as follows:

In [11]:
print(X.shape)

XCnn = X.reshape(-1, 1, 28, 28)
#XCnn = X.reshape(-1, 3, 32, 32)

print(XCnn.shape)

(70000, 784)
(70000, 1, 28, 28)


Next we create training and test set:

In [12]:
XCnn_train, XCnn_test, y_train, y_test = train_test_split(XCnn, y, test_size=0.25, random_state=42)

print(XCnn_train.shape)
print(y_train.shape)

(52500, 1, 28, 28)
(52500,)


We define the CNN:

In [17]:
class myCNN2(nn.Module):
    def __init__(self, dropout=0.5):
        super(myCNN2, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)
        self.conv2_drop = nn.Dropout2d(p=dropout)
        self.fc1 = nn.Linear(1600, 100) # 1600 = number channels * width * height
        self.fc2 = nn.Linear(100, 10)
        self.fc1_drop = nn.Dropout(p=dropout)

    def forward(self, x):
        x = torch.relu(F.max_pool2d(self.conv1(x), 2))
        x = torch.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        
        # flatten over channel, height and width = 1600
        x = x.view(-1, x.size(1) * x.size(2) * x.size(3))
        
        x = torch.relu(self.fc1_drop(self.fc1(x)))
        x = torch.softmax(self.fc2(x), dim=-1)
        return x

In [14]:
class myCNN(nn.Module):
    def __init__(self, dropout=0.5):
        super(myCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.softmax(self.fc3(x))
        return x

Now we can use skorch to wrap the `myCNN` so we can use the `fit()` and `predict()` functions:

In [18]:
cnn = NeuralNetClassifier(
    myCNN2,
    max_epochs=10,
    lr=0.002,
    optimizer=torch.optim.Adam,
    device=device,
)

In [19]:
cnn.fit(XCnn_train, y_train)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.5239[0m       [32m0.9668[0m        [35m0.1130[0m  5.7581
      2        [36m0.2011[0m       [32m0.9774[0m        [35m0.0756[0m  1.4908
      3        [36m0.1596[0m       [32m0.9807[0m        [35m0.0633[0m  1.4805
      4        [36m0.1372[0m       [32m0.9841[0m        [35m0.0546[0m  1.4876
      5        [36m0.1213[0m       [32m0.9850[0m        [35m0.0504[0m  1.4752
      6        [36m0.1105[0m       [32m0.9851[0m        [35m0.0492[0m  1.4841
      7        [36m0.1004[0m       [32m0.9875[0m        [35m0.0423[0m  1.4808
      8        [36m0.0980[0m       0.9871        [35m0.0414[0m  1.5139
      9        [36m0.0898[0m       [32m0.9891[0m        [35m0.0373[0m  1.4646
     10        [36m0.0850[0m       [32m0.9896[0m        [35m0.0363[0m  1.4665


<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=myCNN2(
    (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
    (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
    (conv2_drop): Dropout2d(p=0.5, inplace=False)
    (fc1): Linear(in_features=1600, out_features=100, bias=True)
    (fc2): Linear(in_features=100, out_features=10, bias=True)
    (fc1_drop): Dropout(p=0.5, inplace=False)
  ),
)

In [20]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

y_predicted_cnn = cnn.predict(XCnn_test)

print("Accuracy = {}%\n".format(accuracy_score(y_test, y_predicted_cnn)*100))

print("Classification Report\n {}".format(classification_report(y_test, y_predicted_cnn, labels=range(0,10))))

Accuracy = 98.68571428571428%

Classification Report
               precision    recall  f1-score   support

           0       1.00      0.99      0.99      1714
           1       0.99      0.99      0.99      1977
           2       0.98      0.99      0.98      1761
           3       0.99      0.99      0.99      1806
           4       0.98      0.99      0.99      1587
           5       0.99      0.98      0.98      1607
           6       0.99      0.99      0.99      1761
           7       0.99      0.98      0.99      1878
           8       0.98      0.98      0.98      1657
           9       0.99      0.98      0.98      1752

    accuracy                           0.99     17500
   macro avg       0.99      0.99      0.99     17500
weighted avg       0.99      0.99      0.99     17500

