# Overview

In this lab, you will complete a handwriten digit recognition  number recognition using a basic feed forward neural network

In [None]:
import torch
import torchvision
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import math
import numpy as np
from scipy.stats import norm
from torch import nn
from tqdm import tqdm
from torch import optim
from torch.nn import functional
import torchmetrics
import statistics
%matplotlib inline

## introduction to tensors

Tensors are multi-dimensional arrays with a uniform type, here is an example a 3 3-axis tensor

In [None]:
t= torch.tensor(
[
  [[0, 1, 2, 3, 4],
   [5, 6, 7, 8, 9]],
  [[10, 11, 12, 13, 14],
   [15, 16, 17, 18, 19]],
  [[20, 21, 22, 23, 24],
   [25, 26, 27, 28, 29]],]
)

here is what it looks like:

![image](Resources/tensor.png)

Try to  understand it with it's shape:

In [None]:
t.shape

you can switch dimension by transpose them

In [None]:
t_p = torch.transpose(t, 0, 1)
t_p.shape

Or fuse several dimension by reshape it:

In [None]:
t_r = torch.reshape(t, [6, 5])
t_r.shape

**Q:** Please find 2 way to transform the tensor `t` into shape of [10, 3], and state what the different between these two methods

In [None]:
"""your code here"""
import torch

t1 = torch.tensor(
    [
        [[0, 1, 2, 3, 4],
         [5, 6, 7, 8, 9]],
        [[10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],
        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29]], ]
)

print(t1.shape)

t2 = t1.view(10, 3)
print(t2.shape)
print(t2)

t3 = torch.reshape(t1, [10, 3])
print(t3.shape)
print(t3)

# difference: The main difference is that the view() depends on the memory layout of the original tensor and may not create a continuous tensor. Modify an element in 't_view', which also affects the original tensor 't' because both tensors share the same memory. This is because 'view' does not guarantee that the result is continuous. 
# On the other hand, whenever possible, reshape() attempts to create a continuous tensor from a copy of the data, so modifying it does not affect the original tensor.

## Download and load dataset

> MNIST stands for Mixed National Institute of Standards and Technology, which has produced a handwritten digits dataset. This is one of the most researched datasets in machine learning, and is used to classify handwritten digits. This dataset is helpful for predictive analytics because of its sheer size, allowing deep learning to work its magic efficiently. This dataset contains 60,000 training images and 10,000 testing images, formatted as 28 x 28 pixel monochrome images.

here we are using the **DataLoader** to build the input data. The Data loader Combines a dataset and a sampler, and provides an iterable over the given dataset.
It allows us to load data in a small batch at one time, while loading them all at once may make the training process too computationally heavy for your device.

[reference for dataloader](https://pytorch.org/docs/stable/data.html)

In [None]:
batch_size_train = 128
batch_size_test = 128

In [None]:
    #load data
train_loader = torch.utils.data.DataLoader(
    torchvision.datasets.MNIST('./data/', train=True, download=True, transform=torchvision.transforms.ToTensor()),
    batch_size=batch_size_train, shuffle=True)
test_loader = torch.utils.data.DataLoader(
    torchvision.datasets.MNIST('./data/', train=False, download=True, transform=torchvision.transforms.ToTensor()),
    batch_size=batch_size_test, shuffle=True)

## illustrate example data

here we pick first batch of data, to see what it looks like and its shape when become a tensor.

In [None]:
examples = enumerate(test_loader)
index, (example_data, example_labels) = next(examples)
print(example_data.shape)

In [None]:
#plot firtst 6 data in in test set
ig, axes = plt.subplots(nrows=2, ncols=3)

for index_row, row_axes in enumerate(axes):
    for index_col, ax in enumerate(row_axes):
        i = index_row*3 + index_col
        ax.imshow(example_data[i].numpy().squeeze(), cmap="Greys")

## Data preprocessing

here we use adjusted standardize for image, in case of the stds become zero. the formula is:

$$
output = \frac{x-\mu}{\hat{\sigma}} \\
\hat{\sigma} = max(\sigma, \frac{1.0}{\sqrt{N}})
$$

In [None]:
#preprocessing
def standardize(data, dim):
    means = data.mean(dim = dim, keepdims=True)
    stds = data.std(dim = dim, keepdims=True)
    stds = torch.maximum(stds, torch.tensor(1./math.sqrt(28*28)))
    return (data - means) / stds

example_data_standardized = standardize(example_data, dim=(-2, -1))
fig, ax = plt.subplots()
ax.hist(example_data_standardized[0].numpy().flatten(), density=True, label="data distribution")

x = np.arange(-3, 3, 0.01)
ax.plot(x, norm.pdf(x, 0, 1.), label="standard norm distribution")
plt.legend()


## building the feed forward neural network

there are 5 hidden layers. The first 4 layers use Relu for activation, while the last layer use softmax as input

[how to build neural network in pytorch](https://pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
[category of all pytorch layers(modules)](https://pytorch.org/docs/stable/nn.html#loss-functions)

**Q**: why we need softmax as the last activation function?

**A**: For multi-classification tasks, the softmax function takes the raw output of the network and converts it into a probability vector that sums to 1, and the sigmoid function can only output values between 0 and 1. In this way, the output can be interpreted as the probability that the input belongs to each class.

In [None]:
model = nn.Sequential(
    nn.Linear(28*28, 128),
    nn.ReLU(),
    nn.Linear(128, 128),
    nn.ReLU(),
    nn.Linear(128, 256),
    nn.ReLU(),
    nn.Linear(256, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
    nn.Softmax(dim=-1)
)

## training the model

we use **cross entropy** as loss function and **stochastic gradient descent(SDG)** as optimizer

[how to train a pytorch model](https://pytorch.org/tutorials/beginner/introyt/trainingyt.html)

In [None]:
# training a model
def train(model, n_epochs, learning_rate):
    optimizer = optim.SGD(model.parameters(), lr=learning_rate)
    testing_accuracy = []
    average_train_loss = []
    for epoch in range(n_epochs):
        highest_accuracy = 0
        accuracy_train = torchmetrics.Accuracy(num_classes=10, task="multiclass")
        pbar = tqdm(train_loader, desc="epoch: " + str(epoch))
        batch_loss = []
        for index, (data, label) in enumerate(pbar):
            optimizer.zero_grad()

            x = standardize(data, (-2, -1))
            x = torch.flatten(x, start_dim=1, end_dim=-1)
            y = model(x)

            loss = functional.cross_entropy(y, label)
            loss.backward()
            optimizer.step()

            acu = accuracy_train(y, label).item()
            pbar.set_postfix({
                    'batch_accuracy': acu,
                    'loss': loss.item()
                })
            batch_loss.append(loss.item())

        #add average loss to the list
        average_train_loss.append(statistics.mean(batch_loss))
        #test
        accuracy_test = torchmetrics.Accuracy(num_classes=10, task="multiclass")

        for index, (data, label) in enumerate(test_loader):
            x = standardize(data, (-2, -1))
            x = torch.flatten(x, start_dim=1, end_dim=-1)
            y = model(x)
            accuracy_test.update(y, label)

        accu_test = accuracy_test.compute().item()
        print('test_accuracy='+str(accu_test), end=', ')

        #add test loss to the list
        testing_accuracy.append(accu_test)

        #save the model of highest accuracy
        if accu_test > highest_accuracy:
            highest_accuracy = accu_test
            torch.save(model.state_dict(), r"checkpoint")

    return testing_accuracy, average_train_loss

accuracy, losses = train(model, n_epochs=20, learning_rate=0.5)

## Evaluation
you can check the training history using these two variables

In [None]:
#draw a plot refecting the change of accuracy and losses relative to the number of epoch
print(accuracy)
print(losses)

**Q** please draw a plot refecting the change of accuracy and losses relative to the number of epoch

In [None]:
"""your code here"""
plt.cla()
x1 = range(0, 20)
print(x1)
y1 = losses
print(y1)
plt.plot(x1, y1, '.-')
plt.xlabel('epochs', fontsize=20)
plt.ylabel('Train loss', fontsize=20)
plt.grid()


x2 = range(0, 20)
print(x2)
y2 = accuracy
print(y2)
labels = ["Accuracy", "Loss"]
plt.title('Accuracy & Loss vs Epochs', fontsize=20)
plt.plot(x2, y2, '.-')
plt.xlabel('Epochs', fontsize=20)
plt.ylabel('Accuracy & Loss', fontsize=20)
plt.grid()

**Q** please draw a confusion matrix using test data loader

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

#read model for evaluation, using test dataloader to draw
model.load_state_dict(torch.load(r"checkpoint"))
model.eval()

# Initialize lists to store predicted results and true labels
predicted_labels = []
true_labels = []

# Calculate predicted results and true labels using the test dataset
for data, label in test_loader:
    data = standardize(data, dim=(-2, -1))
    data = torch.flatten(data, start_dim=1, end_dim=-1)
    output = model(data)
    _, predicted = torch.max(output, 1)
    predicted_labels.extend(predicted.tolist())
    true_labels.extend(label.tolist())

# Calculte the confusion matrix
confusion = confusion_matrix(true_labels, predicted_labels)

plt.matshow(confusion, cmap=plt.cm.Blues)   # Greens, Blues, Oranges, Reds
plt.colorbar()
for i in range(len(confusion)):
    for j in range(len(confusion)):
        plt.annotate(confusion[j,i], xy=(i, j), horizontalalignment='center', verticalalignment='center')
plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.title("Confusion Matrix")
plt.show()

# validation

**Q** please use different learning rate to train the model, then comparing the training process by draw a similar plot mentioned above

In [None]:
"""your code here"""
# we just change the accuracy, losses = train(model, n_epochs=20, learning_rate=0.1) to learning_rate=0.02 and learning_rate=0.5 respectively.

accuracy, losses = train(model, n_epochs=20, learning_rate=0.1)
accuracy, losses = train(model, n_epochs=20, learning_rate=0.5)
accuracy, losses = train(model, n_epochs=20, learning_rate=0.02)
