# My first Neural Network

Today we will how to build, train and test a simple neural network with PyTorch. In particular, we will train a **multi layer perceptron (MLP)** for digit recognition on the MNIST dataset

As a first step, let's import the modules we need. The `torch` module contains all the tools we need to build ad train the network, whereas `torchvision` contains several Computer Vision oriented utilities, such as shortcuts to standard benchmarks 



In [None]:
import torch
import torchvision

## Dataset and Dataloaders
PyTorch provides useful utilities to efficiently load training, testing and evaluation data, namely the `Dataset` and `Dataloader` modules. The former implements all the functionalities needed to load the dataset in the desired format, while the latter provides the corresponding iteration utilities. PyTorch provides an [implemented Dataset](https://pytorch.org/vision/stable/datasets.html#mnist) for MNIST; for the Dataloader, we can use the [default implementation](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader).

In [None]:
def get_data(batch_size, test_batch_size=256):
    # convert the PIL images to Tensors
    transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

    # load data
    full_training_data = torchvision.datasets.MNIST(
        "./data", train=True, transform=transform, download=True
    )
    test_data = torchvision.datasets.MNIST(
        "./data", train=False, transform=transform, download=True
    )

    # create train and validation splits
    num_samples = len(full_training_data)
    training_samples = int(num_samples * 0.5 + 1)
    validation_samples = num_samples - training_samples

    training_data, validation_data = torch.utils.data.random_split(
        full_training_data, [training_samples, validation_samples]
    )

    # initialize dataloaders
    train_loader = torch.utils.data.DataLoader(training_data, batch_size, shuffle=True)
    val_loader = torch.utils.data.DataLoader(
        validation_data, test_batch_size, shuffle=False
    )
    test_loader = torch.utils.data.DataLoader(test_data, test_batch_size, shuffle=False)

    return train_loader, val_loader, test_loader

## Network architecture
In this block we want to define the architecture of our MLP. Let us define it as a module consisting of 2 fully connected linear layers. This type of layer can is provided by PyTorch through `torch.nn.Linear`. We also need to include an activation function between the two layers, e.g. Sigmoid (`torch.nn.Sigmoid`). There is plenty of [alternatives](https://pytorch.org/docs/stable/nn.html) that can be taken into account. 

In [None]:
class MyFirstNetwork(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(MyFirstNetwork, self).__init__()

        # first linear layer (input)
        self.input_to_hidden = torch.nn.Linear(input_dim, hidden_dim)

        # activation function
        self.activation = torch.nn.Sigmoid()

        # second linear layer (output)
        self.hidden_to_output = torch.nn.Linear(hidden_dim, output_dim)

        # initialize bias
        self.input_to_hidden.bias.data.fill_(0.0)
        self.hidden_to_output.bias.data.fill_(0.0)

    def forward(self, x):
        # puts the output in (batch_size, input_dim) format
        x = x.view(x.shape[0], -1)

        # forward the input through the layers
        x = self.input_to_hidden(x)
        x = self.activation(x)
        x = self.hidden_to_output(x)

        return x

## Optimizer
The optimizer is the tool that takes care of actually carrying out the optimization of the paramters with respect to the chosen loss function. There is a variety of implemented optimizers in the [`torch.optim`](https://pytorch.org/docs/stable/optim.html) module. Let's define our optimizer giving as input the network parameters, the learning rate, the weight decay coefficient and the momentum.

In [None]:
def get_optimizer(net, lr, wd, momentum):
    optimizer = torch.optim.SGD(
        net.parameters(), lr=lr, weight_decay=wd, momentum=momentum
    )
    return optimizer

## Loss function
The loss/cost function expresses the value that you wish to minimize by optimizing the parameters of your network. In other words, it should efficiently express the prediction error. Given that we are addessing a multi-class classification task, a suitable choice is a cross-entropyw with softmax. This is available, along with many alternatives, in the [`torch.nn`](https://pytorch.org/docs/stable/nn.html#loss-functions) module. Note that `torch.nn.CrossEntropyLoss` already applies the softmax function, i.e. we don't need to manually define it.

In [None]:
def get_cost_function():
    cost_function = torch.nn.CrossEntropyLoss()
    return cost_function

## Training and test steps
We are ready to define our training and test steps. These would be two separate functions which 


1.   **iterate** over a given set of data
2.   **forward** the data through the neural network
3.   **compare** the network output with the groung truth labels, to compute loss and/or evaluation metrics

Additionally, during training, we need these steps to actually carry out the optimization

1.   perform the backward pass (`loss.backward()`) to **compute gradients**
2.   call the optimizer to consequently **update the weights** (`optimizer.step()`)
3.   **reset** the gradients in order not to accumulate it (`optimizer.zero_grad()`)


In [None]:
def training_step(net, data_loader, optimizer, cost_function, device="cuda"):
    samples = 0.0
    cumulative_loss = 0.0
    cumulative_accuracy = 0.0

    # set the network to training mode
    net.train()

    # iterate over the training set
    for batch_idx, (inputs, targets) in enumerate(data_loader):
        # load data into GPU
        inputs = inputs.to(device)
        targets = targets.to(device)

        # forward pass
        outputs = net(inputs)

        # loss computation
        loss = cost_function(outputs, targets)

        # backward pass
        loss.backward()

        # parameters update
        optimizer.step()

        # gradients reset
        optimizer.zero_grad()

        # fetch prediction and loss value
        samples += inputs.shape[0]
        cumulative_loss += loss.item()
        _, predicted = outputs.max(
            dim=1
        )  # max() returns (maximum_value, index_of_maximum_value)

        # compute training accuracy
        cumulative_accuracy += predicted.eq(targets).sum().item()

    return cumulative_loss / samples, cumulative_accuracy / samples * 100


def test_step(net, data_loader, cost_function, device="cuda"):
    samples = 0.0
    cumulative_loss = 0.0
    cumulative_accuracy = 0.0

    # set the network to evaluation mode
    net.eval()

    # disable gradient computation (we are only testing, we do not want our model to be modified in this step!)
    with torch.no_grad():
        # iterate over the test set
        for batch_idx, (inputs, targets) in enumerate(data_loader):
            # load data into GPU
            inputs = inputs.to(device)
            targets = targets.to(device)

            # forward pass
            outputs = net(inputs)

            # loss computation
            loss = cost_function(outputs, targets)

            # fetch prediction and loss value
            samples += inputs.shape[0]
            cumulative_loss += (
                loss.item()
            )  # Note: the .item() is needed to extract scalars from tensors
            _, predicted = outputs.max(1)

            # compute accuracy
            cumulative_accuracy += predicted.eq(targets).sum().item()

    return cumulative_loss / samples, cumulative_accuracy / samples * 100

## Put it all together!
We need a compact procedure to apply all the components and functions defined so far into the actual optimization procedure. In particular, we want our model to iterate over training step and test step for multiple epochs, tracking the partial results.

In [None]:
from torch.utils.tensorboard import SummaryWriter


# tensorboard logging utilities
def log_values(writer, step, loss, accuracy, prefix):
    writer.add_scalar(f"{prefix}/loss", loss, step)
    writer.add_scalar(f"{prefix}/accuracy", accuracy, step)


# main funcition
def main(
    batch_size=128,
    input_dim=28 * 28,
    hidden_dim=100,
    output_dim=10,
    device="cuda:0",
    learning_rate=0.01,
    weight_decay=0.000001,
    momentum=0.9,
    epochs=10,
):
    # create a logger for the experiment
    writer = SummaryWriter(log_dir="runs/exp1")

    # get dataloaders
    train_loader, val_loader, test_loader = get_data(batch_size)

    # instantiate the network and move it to the chosen device (GPU)
    net = MyFirstNetwork(input_dim, hidden_dim, output_dim).to(device)

    # instantiate the optimizer
    optimizer = get_optimizer(net, learning_rate, weight_decay, momentum)

    # define the cost function
    cost_function = get_cost_function()

    # computes evaluation results before training
    print("Before training:")
    train_loss, train_accuracy = test_step(net, train_loader, cost_function)
    val_loss, val_accuracy = test_step(net, val_loader, cost_function)
    test_loss, test_accuracy = test_step(net, test_loader, cost_function)

    # log to TensorBoard
    log_values(writer, -1, train_loss, train_accuracy, "train")
    log_values(writer, -1, val_loss, val_accuracy, "validation")
    log_values(writer, -1, test_loss, test_accuracy, "test")

    print(
        "\tTraining loss {:.5f}, Training accuracy {:.2f}".format(
            train_loss, train_accuracy
        )
    )
    print(
        "\tValidation loss {:.5f}, Validation accuracy {:.2f}".format(
            val_loss, val_accuracy
        )
    )
    print("\tTest loss {:.5f}, Test accuracy {:.2f}".format(test_loss, test_accuracy))
    print("-----------------------------------------------------")

    # for each epoch, train the network and then compute evaluation results
    for e in range(epochs):
        train_loss, train_accuracy = training_step(
            net, train_loader, optimizer, cost_function
        )
        val_loss, val_accuracy = test_step(net, val_loader, cost_function)

        # logs to TensorBoard
        log_values(writer, e, val_loss, val_accuracy, "Validation")

        print("Epoch: {:d}".format(e + 1))
        print(
            "\tTraining loss {:.5f}, Training accuracy {:.2f}".format(
                train_loss, train_accuracy
            )
        )
        print(
            "\tValidation loss {:.5f}, Validation accuracy {:.2f}".format(
                val_loss, val_accuracy
            )
        )
        print("-----------------------------------------------------")

    # compute final evaluation results
    print("After training:")
    train_loss, train_accuracy = test_step(net, train_loader, cost_function)
    val_loss, val_accuracy = test_step(net, val_loader, cost_function)
    test_loss, test_accuracy = test_step(net, test_loader, cost_function)

    # log to TensorBoard
    log_values(writer, epochs, train_loss, train_accuracy, "Train")
    log_values(writer, epochs, val_loss, val_accuracy, "Validation")
    log_values(writer, epochs, test_loss, test_accuracy, "Test")

    print(
        "\tTraining loss {:.5f}, Training accuracy {:.2f}".format(
            train_loss, train_accuracy
        )
    )
    print(
        "\tValidation loss {:.5f}, Validation accuracy {:.2f}".format(
            val_loss, val_accuracy
        )
    )
    print("\tTest loss {:.5f}, Test accuracy {:.2f}".format(test_loss, test_accuracy))
    print("-----------------------------------------------------")

    # closes the logger
    writer.close()

## Run it!
Let's run our model

In [None]:
!rm -r runs

In [None]:
main()