# Training A Model In Pytorch

This tutorial is adapted from [this](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html). 

We will learn the common procedure of training DL models via the image classification, a typical supervised machine learning task.


In [None]:
# configuration

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # so the IDs match nvidia-smi
os.environ["CUDA_VISIBLE_DEVICES"] = "0"       # eg. "0, 1, 2" for multiple

DATA_ROOT = '/data1/cifar/'
DEVICE = 'cuda:0'
BATCH_SIZE = 4

In [None]:
# spells...

import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

import tensorboardX

writer = tensorboardX.SummaryWriter()
device = torch.device(DEVICE if torch.cuda.is_available() else "cpu")

## Torchvision

The library `torchvision` is a sub-project of Pytorch, with which we can load some pretrained models and several common datasets, and use some utilities for computer vision(CV).

### Loading data

Target dataset: [CIFAR 10](https://www.cs.toronto.edu/~kriz/cifar.html)

In [None]:
classes = ('plane', 'car', 'bird', 'cat', 'deer', 
           'dog', 'frog', 'horse', 'ship', 'truck')

trainset = torchvision.datasets.CIFAR10(root=DATA_ROOT, train=True)
print(trainset)
print()
print(trainset.train_data.shape, type(trainset.train_data))

### Normalization



We can estimate the means and standard deviations of each channel on training data.

In [None]:
mean = trainset.train_data.mean(axis=(0, 1, 2)) / 255
std = trainset.train_data.std(axis=(0, 1, 2), ddof=1) / 255                # What's the ddof? 

print(mean)
print(std)

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),                                     # .div_(255)
#    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    transforms.Normalize(mean, std)
])

### Reloading train data and loading test data with the transformer into batch loaders

In [None]:
trainset = torchvision.datasets.CIFAR10(root=DATA_ROOT, train=True, 
                                        transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE, 
                                          shuffle=True)

testset = torchvision.datasets.CIFAR10(root=DATA_ROOT, train=False,
                                       transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE, 
                                         shuffle=False)

### Showing images in Tensorboard

In [None]:
def imshow(tag, img):
    npimg = img.numpy()
    
    # unnormalize
    # npimg = npimg / 2 + 0.5
    # Your code:
    npimg = 
    
    writer.add_image(tag, npimg)

dataiter = iter(trainloader)
images, labels = dataiter.next()

imshow('train/'+' '.join(classes[i] for i in labels), torchvision.utils.make_grid(images, nrow=4))

## Convolution Neural Network(CNN) Model

In Pytorch, a model definition is a Python class inheritated from `torch.nn.Module`.

By instantiating a Python object from the class, we can obtain a model to train.

For more details about CNN, see the tutorials in [cs231n](http://cs231n.github.io/convolutional-networks/) and [theano](http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html).

In [None]:
import math

def get_output_size(in_size, kernel_size, stride=1, padding=0, dilation=1):
    return math.floor((in_size + 2 * padding - dilation * (kernel_size - 1) - 1) / stride + 1)

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.cnn1 = nn.Sequential(
            nn.Conv2d(3, 6, 5, padding=2),              # Check the meanings of each argument.
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )
        
        self.cnn2 = nn.Sequential(
            nn.Conv2d(6, 16, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )
        
        self.mlp = nn.Sequential(
            nn.Linear(, ),                  # Fill it!
            
            nn.ReLU(),
            nn.Linear(120, 84),
            nn.ReLU(),
            nn.Linear(84, 10)
        )
        
    def forward(self, x):
        x = self.cnn1(x)
        x = self.cnn2(x)
        
        # Your code:
        x = 
        
        x = self.mlp(x)
        return x


net = Net().to(device)       # Instantiate a model and put all the parameters of it into the device

writer.add_graph(net, torch.zeros(1, 3, 32, 32, device=device))

## Loss function

In this tutorial, we just use Cross-Entropy Loss, a loss function commonly for classification.

For more details about loss function will be introduced in the futrue.

In [None]:
criterion = nn.CrossEntropyLoss()

## Optimizer

Also, we employ a basic optimizer called Stochastic Gradient Descent(SGD) with momentum and defer the explanation.

The first argument passed to the optimizer is a iterable object of all the parameters you want to train.

In [None]:
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

## Training

In [None]:
for epoch in range(5):                           # loop over the dataset multiple times

    running_loss = 0.0
    total_loss = 0.0
    
    for i, data in enumerate(trainloader, 0):
        
        # set the model in training mode
        net.train()
        
        # get the inputs
        inputs, labels = data
        # Your code:
        inputs, labels = 

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch+1, i+1, running_loss/2000))
            running_loss = 0.0
        
        total_loss += loss.item()
    
    writer.add_scalar('loss', total_loss/(i+1), epoch+1)
        

print('Finished Training')


## Testing

In [None]:
dataiter = iter(testloader)
images, labels = dataiter.next()

# print images
imshow('test/'+'_'.join(classes[i] for i in labels), torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[i] for i in labels))

In [None]:
net.train(False)                         # set the model in testing mode.
outputs = net(images.to(device))
print(outputs)

In [None]:
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[i] for i in predicted))
print(predicted)

### Performance on the whole dataset

In [None]:
correct = 0
total = 0

# set the model in validation mode
net.eval()

with torch.no_grad():  
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)      # the second return value is indices of the maximums
        
        total += labels.size(0)
        correct += int((predicted == labels).sum())   # sum boolean mask into total correct results; 0d-tensor to int

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

**Exercise**: Performances on each classes.

In [None]:
class_correct = [0.] * 10
class_total = [0.] * 10

# Your codes:



for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))