# MNIST data set: recognizing handwritten digits

In [0]:
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F

random_seed = 1
torch.manual_seed(random_seed);


# Preparing the data set

In [0]:
batch_size_train = 128
batch_size_test = 128

# training set
train_dataset = torchvision.datasets.MNIST('./files/', 
                train=True, download=True,transform = torchvision.transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=batch_size_train, shuffle=True)
# test set
test_dataset = torchvision.datasets.MNIST('./files/', train=False, download=True,
                             transform = torchvision.transforms.ToTensor())
test_loader = torch.utils.data.DataLoader(test_dataset,batch_size=batch_size_test, shuffle=True)

Let us look at some examples.

In [0]:
examples = enumerate(train_loader)
batch_idx, (example_data, example_targets) = next(examples)

print('Shape of one training mini batch',example_data.shape)
print('Shape of one target mini batch',example_targets.shape)
#print('Example training sample', example_data[1])
print('Target values', example_targets[:])
print(train_loader.dataset)

To get a feeling for the data, we visualize some examples.

In [0]:
import matplotlib.pyplot as plt
for i in range(6):
    plt.subplot(2,3,i+1)
    plt.tight_layout()
    plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
    plt.title("Ground Truth: {}".format(example_targets[i]))
    plt.xticks([])
    plt.yticks([])


# Defining the Network

Define the network. A few things to keep in mind:
<ul>
    <li> Make sure that the networks dimensions are compatible with our data. The network automatically takes care of handling batches (the first dimension of our data). The remaining dimensions must be met by the data. Thus, we first need to convert the images into vectors before they can be passed through the first linear layer. This can be done in the forward computation of the network.</li>
    <li> Each image belongs to one of ten classes. Thus, our output should be 10-dimensional with each output representing one class. To transform the outputs into probabilities, we use can the *softmax* function
    \begin{align}
        \text{softmax}\left(\underline{y}\right) = \frac{e^{-y_i}}{\sum_{i}e^{-y_i}}
    \end{align}
        that transforms a vector of real numbers into a vector of probabilities. The network can then be trained with the cross entropy loss function. For this to work, we need to transform the labels $y_i\in\{0,1\ldots,9\}$ into *one-hot* labels 
    \begin{align}
    y_i = 3 \quad \rightarrow \quad {\hat{y}}_i = [0,0,0,1,0,0,0,0,0,0]
    \end{align}
    </li>
<li> **However:** PyTorch has the built-in cost function `nn.CrossEntropyLoss` (see https://pytorch.org/docs/stable/nn.html#crossentropyloss) that takes care of all this. That is, we can just define our network with a linear output layer with ten neurons and then pass the outputs as well as the target values (from 0 to 9) to the loss function. For prediction, we can then simply take the largest value of the outputs are apply the `F.softmax` function if we wish to have probabilities. </li>
 </ul>


In [0]:
class Net(nn.Module):
    ##### your code here #####
    # initialize the network and define all learnable parameters

    
    # define the forward pass

    
    ##########################


Initialize the network.

In [0]:
# net = ...

Define the training procedure as before. To judge the training process, it makes sense to print both the loss values as well as the classification error rates.

In [0]:
def train(NeuralNetwork,train_loader,loss_function,num_epochs, learning_rate=0.001, wd=0 ):
    """
    Trains a neural network.
    
    NeuralNetwork = neural network to be trained
    dataloader = DataLoader that deals batches for mini-batch learning
    loss_function = cost function to be optimized
    num_epochs = number of training epochs
    l_rate = learning rate (default value 0.001)
    wd = weight decay regularization (default value 0)
    """
    ##### your code here #####
    
    
    
    
    
    
    ##########################


Train the network. Note that since we have a large dataset, we will train fewer epochs than for the small data set used in the previous exercises.

In [0]:
# train

In order to evaluate our model properly and avoid overfitting, we need to run the network on the training set.
Write a routine that computes the classification error rate on the training data.

In [0]:
#### your code here #####




#########################


Suggestion for further work: 
<ul>
    <li> If your error rates on the training and test set differ significantly, your model is overfitting. What can you do against this? </li>
<li> If you achieve a low error rate on the test set: find the images that are classified incorrectly by the network. Would you classify those correctly? </li>
<li> For comparison of your networks performance, you can take a look at the Wikipedia page: https://en.wikipedia.org/wiki/MNIST_database </li>
</ul>

