## Problem

Call the Mnist Data and split it into training and test datasets. Furthermore randomly split the training dataset into training and validation datasets. <br>
Fit a logistic model, using cross entropy as the objetive function. Then apply SGD, printing out both loss and accuracy in each epoch. Accuracy should be calculated based on validation dataset. <br>
Then we save the model in data folder. Then reset the whole thing, call the model, and apply it on the test dataset.

## Necessary Codings

```torchvision``` is used to call MNIST data.

* ```from torchvision.datasets import MNIST```: access to MNIST data.
* ```import torchvision.transforms as transforms```: convert the image files into 3-dim torch.tensors. (If it is a batch, it should be 4-dim array, with the first dimension indicating the number of samples.
* ```train_dataset=MNIST(root='data/', download=True, train=True, transform=transforms.ToTensor())``` : train_dataset contains 60000 elments, each element being a tuple of two sub-elements (tensor object & label).

```train_ds, val_ds=torch.utils.data.random_split(train_dataset, [Len1, Len2])```: Split the train_dataset into (sub-)training and validation dataset.

* train_ds and val_ds are dataset.subsets, but they can still be inputs of DataLoader.
* When forming train_dl, we shuffle=True. But when forming val_ds, we shuffle=False.

A general model(=nn.Linear(p,k)) takes an __input matrix of $N\times p$ and outputs $N\times K$ ($N\times 1$ when $K=1$) vector__. But for our case it should take $N\times d_1 \times d_2 \times d_3$ as the input. In __transforming the model so that it can take a non-matrix input__, we do the following.

    MnistModel = Class(nn.module):
        def __init__(self):
            super().__init__()
            self.linear = nn.Linear(p,K)
        def __forward(self, xb):
            xb = xb.reshape(-1,p)
            out = self.linear(xb)
            return out
        

The optimizing steps (constructing pred=model(xb) and then optimizing with respect to the loss function value) are basically the same as standard linear regression, but there is one difference in our MNIST data case. Since val_ds is a 10000-lengthed object, with each element having two different types of elements, we separate it into two different objects. <br>
The first one is a $N\times d_1\times d_2 \times d_3$, which represents $N$ tensor objects of size=$d_1 \times d_2 \times d_3$. The second one is $N$-lengthed vector which consists of $N$ labels.
We may be confused whether if the vector input should be $N\times 1$-matrix (or column vector) or $N$-lengthed vector. In such cases, look up to the reference. https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html#torch.nn.functional.cross_entropy


* ```images, labels = zip(\*train_ds)```: images and labels are both $N$-lengthed tuples, each element being either a tensor object or an integer.
* ```images_tensor = cat(torch.images)``` : concatenate the $N$ tensors with respect to the 1st dimension. So it returns a $(Nd_1)\times d_2 \times d_3$ tensor object. So we reshape it later on into a $N\times d_1\times d_2 \times d_3$ tensor object.


We want to save the model that is already trained so that we don't need to go through training once more. Then we can load the model iin the future. However, even when we are loading it, we still need all the packages and functions that are needed in using that model.
* ```torch.save(model.state_dict(), 'data/mnist-logistic.pth')``` : save the model in the stated directory.
* ```model.load_state_dict(torch.load('data/mnist-logistic.pth'))``` : load the saved directory.

## Stage 1: Call Mnist data.

In [None]:
num_epoch=20
alpha=1/6
learning_rate=0.001
batch_size=128

In [None]:
import torch 
import numpy as np
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
torch.manual_seed(0)

In [None]:
train_dataset = MNIST(root='data/', download=True, train=True, transform=transforms.ToTensor())
train_len = int(len(train_dataset) * (1-alpha))

In [None]:
len(train_dataset)
for xb, yb in train_dataset:
    print(xb.shape)
    print(yb)
    break

## Stage 2: Form the DataLoader.

In [None]:
from torch.utils.data import DataLoader
from torch.utils.data import random_split

In [None]:
train_ds, val_ds = random_split(train_dataset, [train_len, len(train_dataset) - train_len])
train_loader = torch.utils.data.DataLoader(dataset=train_ds, batch_size=batch_size, shuffle=True)

In [None]:
size = train_ds[0][0].shape
p = np.prod(size)
_,labels=zip(*train_ds)
K=len(np.unique(labels))

## Stage 3: Form a model.

* MnistModel inherits the functionalities of torch.nn.Module. https://pytorch.org/docs/stable/generated/torch.nn.Module.html
* Memorize this beneath form!

In [None]:
import torch.nn as nn
class MnistModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(p,K)
    def forward(self, xb):
        xb = xb.reshape(-1, p)
        out = self.linear(xb)
#         out = torch.sigmoid(self.linear(xb))
        return out

In [None]:
import torch.nn.functional as F
model=MnistModel()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
history=[]

## Stage 4: Iterations and Plot the results.

In [None]:
for epoch in range(num_epoch):
    for xb, yb in train_loader:
        preds = model(xb)
        loss = F.cross_entropy(preds, yb)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    
    images, labels = zip(*val_ds)
    images_tensor = torch.cat(images).reshape(len(val_ds), size[0], size[1], size[2])
    preds = model(images_tensor)
    loss_val = F.cross_entropy(preds, torch.tensor(labels)).item()
    _,predoutcomes = torch.max(preds, axis=1)
    accuracy_val = torch.sum(predoutcomes==torch.tensor(labels)) / len(predoutcomes)
    
    history.append( (loss_val, accuracy_val.item()) )
    print('Epoch [{}/{}], Loss: {:.4f}, Accuracy: {:.4f}'.format(epoch+1, num_epoch, loss_val, accuracy_val))


In [None]:
import matplotlib.pyplot as plt
_,accuracies=zip(*history)
plt.plot(np.arange(num_epoch)+1, accuracies, "-o")
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.title("Accuracy vs No. of Epochs")

## Stage 5: Save and call the data, and then fit into the test data.

In [None]:
torch.save(model.state_dict(), 'data/mnist-logistic.pth')
%reset

In [None]:
import torch 
import numpy as np
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F

In [None]:
test_ds = MNIST(root='data/', download=True, train=False, transform=transforms.ToTensor())
images_test,labels_test = zip(*test_ds)
size = images_test[0].shape
p = np.prod(size)
K = len(np.unique(labels_test))

In [None]:
class MnistModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(p,K)
    def forward(self, xb):
        xb = xb.reshape(-1, p)
        out = self.linear(xb)
        return out

In [None]:
model = MnistModel()
model.load_state_dict(torch.load('data/mnist-logistic.pth'))

In [None]:
images_test = torch.cat(images_test).reshape(len(test_ds), size[0], size[1], size[2])
preds = model(images_test)
loss_val = F.cross_entropy(preds, torch.tensor(labels_test)).item()
_,predoutcomes = torch.max(preds, axis=1)
accuracy_val = torch.sum(predoutcomes==torch.tensor(labels_test)) / len(predoutcomes)
print(round(loss_val, 4), round(accuracy_val.item(), 4))