# Pytorch Tutorial

### 3. Training and Evaluating a Model

- General training setup
- Managing and loading datasets
- Experience the conventions of training and evaluating

Setup torch and variables

In [None]:
import sys

import torch, torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
import torchvision.transforms as transforms
from tqdm import tqdm

## General training setup

We define the ```device``` variable, which specifies where we will run our training process.

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

CPU is the default device. If you are to run our model on cpu, the following content isn't relevent.

However if we want to run our model on cuda, two thing must be satisfied:

1. The input data (tensors) should be on cuda.  
2. The model parameters should be on cuda.

This can be achieved in two different ways:

1. Creating tensors with argument device:
    ```python
    x = torch.randn((4,5), device=device)
    ```
2. Moving an object to ```device```:
    ```python
    model = SimpleNet().to(device)
    ```
    ```python
    input_data = input_data.to(device)
    ```

## Managing Datasets

```torchvision.datasets``` provides famous datasets, which are listed in the [documentation](https://pytorch.org/docs/stable/torchvision/datasets.html).

Let us toy with the simplest dataset, MNIST. You can apply transforms to the data here with the ```transform``` argument.

In [None]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))])

In [None]:
MNIST_train = torchvision.datasets.MNIST(root='MNIST', train=True, transform=transform, download=True)
MNIST_test = torchvision.datasets.MNIST(root='MNIST', train=False, transform=transform, download=True)

In [None]:
MNIST_train

In [None]:
MNIST_test

DataLoaders load batches from the dataset. Here we just used ```shuffle=True```, but it is also possible to have ```shuffle=False``` and use an instance from ```torch.utils.data.RandomSampler``` and pass it to the parameter ```sampler```.

In [None]:
train_data_loader = torch.utils.data.DataLoader(MNIST_train, batch_size=64, shuffle=True)
test_data_loader = torch.utils.data.DataLoader(MNIST_test, batch_size=64, shuffle=False)

Creating custom datasets is also possible by definig a class that inherits ```torch.utils.data.Dataset```. You need to write functions ```__init__```, ```__getitem__```, and ```__len__```.

In [None]:
class CustomDataset(torch.utils.data.Dataset):
    
    def __init__(self, filename, division):
        assert division in ['train', 'test']
        with np.load(filename) as f:
            self.x, self.y = f[f'x_{dtype}'], f[f'y_{dtype}']
        
    def __getitem__(self, ind):
        return self.x[ind], self.y[ind]

    def __len__(self):
        assert len(self.x) == len(self.y)
        return len(self.x)

## Train and Evaluate

Here we toy with the simplist network, LeNet, and use the MNIST dataset loaded above.

In [None]:
class LeNet(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)
        
        for m in self.modules():
            if type(m) in [nn.Linear, nn.Conv2d]:
                nn.init.kaiming_normal_(m.weight)
                m.bias.data.fill_(0.)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

The following is the convention for the training logic.

In [None]:
epochs = 1
print_every = 1

In [None]:
model = LeNet().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)

Set model to training mode. The opposite is ```.eval()```. This part is necessary if the model includes layers that behave differently on evaluation phase, like batch normalization or dropout.

In [None]:
model = model.train()

In [None]:
for epoch in range(epochs):
    for batch_ind, (input_data, target_data) in enumerate(train_data_loader):
        # Move data to device
        input_data, target_data = input_data.to(device), target_data.to(device)

        # Forward propagation
        output = model(input_data)

        # Calculate loss function
        loss = F.cross_entropy(output, target_data)

        # Backward propagation
        optimizer.zero_grad()    # This is equivalent to model.zero_grad()
        loss.backward()

        # Update parameters
        optimizer.step()

        # Print progress
        if batch_ind % print_every == 0:
            train_log = f'Epoch {epoch+1:2d}/{epochs:2d}\tLoss: {loss.cpu().item():.6f}\tTrain: [{batch_ind+1}/{len(train_data_loader)} ({100.*batch_ind/len(train_data_loader):.0f}%)]            '
            print(train_log, end='\r')
            sys.stdout.flush()
    print()

The following is the convention for the testing logic.

In [None]:
model = model.eval()

In [None]:
correct = 0
with torch.no_grad():
    with tqdm(total=len(test_data_loader)) as pbar:
        for batch_ind, (input_data, target_data) in enumerate(test_data_loader):
            # Move data to device
            input_data, target_data = input_data.to(device), target_data.to(device)
            
            # Inference
            output = model(input_data)
            pred = output.argmax(dim=1)
            
            # Count number of correct predictions
            correct += pred.eq(target_data.view_as(pred)).sum()
            
            # Progress bar update
            pbar.update(1)

print(f'Test accuracy: {100. * int(correct) / len(MNIST_test)}%')