# Exercise 8

In [None]:
from datasets import load_dataset
import torch
import matplotlib.pyplot as plt

## Data Preparation

The data preparation only uses concepts you already know from previous lectures. We therefore start with clean datasets for training and validation.

In [None]:
data = load_dataset("mnist")
data.set_format("torch")

In [None]:
example = data["test"][0]
print(f"True label: {int(example['label'])}")
fig = plt.imshow(example["image"])

In [None]:
img_size = example["image"].numel()
img_size

In [None]:
# Dividing by 255 maps pixel values to 0, 1
X_train = data["train"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
X_test = data["test"]["image"][:].reshape(-1, img_size).to(torch.float) / 255
y_train = data["train"]["label"]
y_test = data["test"]["label"]
X_train.shape

## Dimensions of our Neural Network

In [None]:
# the input dimension
n_in = img_size
# the dimension of our 2 hidden layers
n_hidden = 16
# the dimension of our output layer
n_out = 10

## Task 1: How many Parameters?

The number of trainable parameters is entirly determined by the number of layers and their dimensions. 

Write a function called `count_params(n_in, n_hidden, n_out)` that counts how many parameters will be in our model. Assume that there are 2 hidden layers. 

## Task 2: Set up random start parameters

We want to draw random start parameters that are distributed uniformly between -0.5 and 0.5. 

Since we are going to modify the parameters in-place while training the model, we need a way to freshly generate the start parameters multiple times. We therefore create a function that draws start parameters. 

The function takes the following arguments:
    - n_in
    - n_hidden
    - n_out
    - seed (give it a default value of 1995 so we all get the same results)
    
The function returns:
    - a list of weight matrices with the correct shapes
    - a list of biases with the correct shapes 

## Task 3: Implement relu and softmax

1. Implement a relu function that takes a 1d tensor and applies the relu nonlinearity elementwise
2. Implement a softmax function that takes a 1d tensor of logits and returns a 1d tensor of probabilities
3. Test your function on a small tensor 
4. If you have time implement other nonlinearities such as sigmoid, tanh, ...

## Task 4: Implement the model

The model should take the following arguments:
- x: A 1d tensor with a flattened image
- weights: The list of weights from task 2
- biases: The list of biases from task 2

It should return a 1d tensor of length `n_out` that contains probabilities for each category. 

1. Implement a `model` function
2. Try it out on the first element of the training data
3. Try out the batch_model function on the first few rows of the training data

In the training process we need a `batch_model` function that evaluates the model on a batch of data. This is not very instructional, so I give you the function right away.

In [None]:
def batch_model(batch, weights, biases):
    n_out = len(biases[-1])
    out = torch.zeros((len(batch), n_out))
    for i, x in enumerate(batch):
        out[i] = model(x, weights, biases)
    return out

## Task 5: Implement loss functions

1. Write a function called `nnl_loss` that takes the result of the batch_model and returns the average negative log likelihood.
2. Try it out on the first 100 rows of the training data
3. Implement an `accuracy` function that takes the same arguments as the loss function
4. Try it out on the first 100 rows of the training data

## Task 6: The training loop

0. Create fresh weights and biases
1. Set `requires_grad` to True for all tensors in the weights and biases list. 
2. Write a training loop to train your model with SGD and the following hyper-parameters
    - n_epochs: 2
    - batch_size: 100,
    - learning_rate: 0.001
3. If you have time, try the model out on a few images

**Important**: Do the entire training in just one cell and re-create the start parameters at the beginning of that cell, so each training run starts from the same position. 

In [None]:
# create fresh random weights and biases

# set requires_grad to True for training

# define the hyperparameters

# loop over epochs

# loop over batches
# evaluate model
# evaluate loss
# backwards

# loop over the paramter lists
# SGD updates for each parameter tensor

# Zero the gradients for the next iteration

## Task 7: Diagnostics

1. Copy-paste the training loop from the previous task or work in the same cell as before.
2. After each epoch, evaluate the batch_model on test data with the current best parameters; Use `torch.no_grad` to disable gradients.
3. Calculate the accuracy score no the result and print it.

## Task 8: Training the model

Tweak the number of epochs, batch size and learning rate until you get an accuracy of at least 90 %

Copy paste the code from the previous task or work in the same cell. 