In this notebook we examine how the same problem is done via the [pytorch](https://pytorch.org/) libary.
A lot is abstracted for us obviously, however the core functions such as the optimizer are still controlable.  

We achive an accuracy of arround 97.9% percent on avereage which outperforms our simple network.

This tutorial is based on [Sentex's pytorch video series](https://www.youtube.com/watch?v=BzcBsTou0C0&list=PLQVvvaa0QuDdeMyHEYc0gxFpYwHY2Qfdh) 

In [3]:
"""
First we download MNIST directly from torchvision
"""
import torch

import torch
import torchvision
from torchvision import transforms, datasets

print("Downloading dataset, this may take a while until Done Message pops up...")
train = datasets.MNIST('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

test = datasets.MNIST('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False)

Downloading dataset, this may take a while...
Using downloaded and verified file: MNIST/raw/train-images-idx3-ubyte.gz
Extracting MNIST/raw/train-images-idx3-ubyte.gz to MNIST/raw
Using downloaded and verified file: MNIST/raw/train-labels-idx1-ubyte.gz
Extracting MNIST/raw/train-labels-idx1-ubyte.gz to MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to MNIST/raw/t10k-images-idx3-ubyte.gz


100.4%

Extracting MNIST/raw/t10k-images-idx3-ubyte.gz to MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to MNIST/raw/t10k-labels-idx1-ubyte.gz


180.4%

Extracting MNIST/raw/t10k-labels-idx1-ubyte.gz to MNIST/raw
Processing...
Done!


In [25]:
"""
Lets see what the number distribution in our database is.
Not that this is very important, if lets say we had a very unqeual distribution of 
around 60% 3s in our datasets our network will develop a bias towards recognizing as a 3
and it will be hard to get out (think of highly activated or non-activated neurons are hard to change)
"""
i = 0
counts ={"0":0,"1":0,"2":0,"3":0,"4":0,"5":0,"6":0,"7":0,"8":0,"9":0}
for data in trainset:
    numbers_array = data[1]
    for number in numbers_array:
        number = number.item()
        counts[str(number)] += 1
    i+=1
print("counts", counts)

counts {'0': 5923, '1': 6742, '2': 5958, '3': 6131, '4': 5842, '5': 5421, '6': 5918, '7': 6265, '8': 5851, '9': 5949}


In [26]:
"""
Now we construct the network
torch.nn and torch.nn.functional are mostly interchangeable (object oriented vs functional)
"""
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        ## Initialize parent class nn.Module
        super().__init__()
        # is the first fully conected layer, the flattend image 
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)
        
    def forward(self,x):
        # running activation function over whole layer
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        # This is just a cost function, dim 1 takes it in the correct dimension
        # This means the outputs themselves are a confidence score, adding up to 1.
        return F.log_softmax(x, dim=1)

        return x
        
        
net = Net()
# pytorch allows pretty printing
print(net)

Net(
  (fc1): Linear(in_features=784, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=64, bias=True)
  (fc4): Linear(in_features=64, out_features=10, bias=True)
)


In [27]:
"""
Now we train the network
"""
import torch.optim as optim

loss_function = nn.CrossEntropyLoss()
# Could freeze first layers and only train later layers e.g in CNNs
optimizer = optim.Adam(net.parameters(), lr=0.001)
epochs = 3
print(f"Training {epochs} epochs, this may take a while...")

for epoch in range(epochs): # 3 full passes over the data
    for data in trainset:  # `data` is a batch of data
        x, y = data  # X is the batch of features, y is the batch of targets.
        net.zero_grad()  # sets gradients to 0 before loss calc. You will do this likely every step (otherwise edge case)
        output = net(x.view(-1,28*28))  # pass in the reshaped batch (recall they are 28x28 atm)
        loss = F.nll_loss(output, y)  # calc and grab the loss value
        loss.backward()  # apply this loss backwards thru the network's parameters
        optimizer.step()  # attempt to optimize weights to account for loss/gradients
    print(loss)  # print loss. We hope loss (a measure of wrong-ness) declines! 

Training 3 epochs, this may take a while...
tensor(0.0951, grad_fn=<NllLossBackward>)
tensor(0.1433, grad_fn=<NllLossBackward>)
tensor(0.0119, grad_fn=<NllLossBackward>)


In [None]:
"""
Now we test the accuracy of our network 
"""
correct = 0
total = 0

print("Testing network now")
# Open savely and dont apply gradients when testing
with torch.no_grad():
    for data in trainset:
        X, y = data
        output = net(X.view(-1, 28*28))
        for idx, i in enumerate(output):
            #print(torch.argmax(i), y[idx])
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1
print('Accuracy:', round(correct/total, 3))

Testing network now
