# Develop CNN on GPU

In this tutorial, we will study how to move DNN/CNN traning to GPU

**To focus on the network architecture, we ignore the train/validation/test split, hyper-parameter tuning, and others.**

## Outcome

In this tutorial, you will discover how to develop a convolutional neural network using GPU.

In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import os
import random

seed = 1234
np.random.seed(seed)
random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

### Load Dataset

In PyTorch, you can use the torchvision.transforms module to apply the normalization transformation to the train and test datasets. Here's an example:

In [2]:
# Load the MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data',
                                           train=True,
                                           download=True,
                                           transform=transforms.ToTensor())
test_dataset = torchvision.datasets.MNIST(root='./data',
                                          train=False,
                                          download=True,
                                          transform=transforms.ToTensor())

# Normalize the pixel values
train_dataset.transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.1307, ), (0.3081, ))])
test_dataset.transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.1307, ), (0.3081, ))])

In [3]:
batch_size = 32
train_dataloader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size,
                                               shuffle=True,
                                               num_workers=4)
test_dataloader = torch.utils.data.DataLoader(train_dataset,
                                              batch_size=batch_size,
                                              shuffle=True,
                                              num_workers=4)

## Train a Deep FCN in GPU

Given that the problem is a multi-class classification task, we know that we will require an output layer with 10 nodes in order to predict the probability distribution of an image belonging to each of the 10 classes. 
This will also require the use of a softmax activation function. 
Between the feature extractor and the output layer, we can add a dense layer to interpret the features, in this case with 100 nodes.

Training on CPU and on GPU are very similar.
The only difference is to move your model and data to GPU.
We first define `device` which is `gpu` if it is available.
`cpu` is a backup in case `gpu` is not available.

In [4]:
device = torch.device("cuda" if torch.cuda.is_available else "cpu")

Then, we move data and model to GPU using `.to(device)`.

In [5]:
dnn = nn.Sequential(
    nn.Flatten(start_dim=1, end_dim=-1),
    nn.Linear(784, 100),
    nn.ReLU(),
    nn.Linear(100, 10),
    nn.Softmax(dim=1)
)

# move model to GPU
dnn = dnn.to(device)

lr = 0.1
num_inputs, num_outputs = 784, 10
optimizer = torch.optim.SGD(dnn.parameters(), lr=lr)
criterion = torch.nn.CrossEntropyLoss()

for epoch in range(5):
    epoch_loss = []
    for i_batch, sample_batched in enumerate(train_dataloader):      
        inputs, labels = sample_batched
        # move data to GPU
        inputs, labels = inputs.to(device), labels.to(device)
        prob_distr = dnn(inputs)
        loss = criterion(prob_distr, labels)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    accu_number = 0.
    for X, y in test_dataloader:
        # move test fata to GPU
        X, y = X.to(device), y.to(device)
        inputs = X.reshape(-1, num_inputs)
        predicted_class = torch.argmax(dnn(inputs), dim=1)
        accu_number += torch.sum(predicted_class == y)
    print(
        f'{epoch+1}: testing accuracy: {accu_number / len(test_dataloader.dataset):0.4f}'
    )

1: testing accuracy: 0.9317
2: testing accuracy: 0.9492
3: testing accuracy: 0.9600
4: testing accuracy: 0.9675
5: testing accuracy: 0.9714


## Train LeNet-5 on GPU

We will move **LeNet-5** to GPU.

In [6]:
lenet = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=10, kernel_size=5, stride=1),
    nn.ReLU(), nn.MaxPool2d(kernel_size=2), nn.ReLU(),
    nn.Conv2d(in_channels=10, out_channels=20, kernel_size=5, stride=1),
    nn.ReLU(), nn.MaxPool2d(kernel_size=2), nn.Flatten(start_dim=1,
                                                       end_dim=-1),
    nn.Linear(in_features=20 * 4 * 4, out_features=100), nn.ReLU(),
    nn.Linear(in_features=100, out_features=10), nn.Softmax(dim=1))

# move the model to gpu
lenet = lenet.to(device)

lr = 0.1
optimizer = torch.optim.SGD(lenet.parameters(), lr=lr)
criterion = torch.nn.CrossEntropyLoss()

for epoch in range(5):
    epoch_loss = []
    for i_batch, sample_batched in enumerate(train_dataloader):
        inputs, labels = sample_batched
        # move data to GPU
        inputs, labels = inputs.to(device), labels.to(device)

        prob_distr = lenet(inputs)
        loss = criterion(prob_distr, labels)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    accu_number = 0.
    for X, y in test_dataloader:
        # move data to gpu
        X, y = X.to(device), y.to(device)
        predicted_class = torch.argmax(lenet(X), dim=1)
        accu_number += torch.sum(predicted_class == y)
    print(
        f'{epoch+1}: testing accuracy: {accu_number / len(test_dataloader.dataset):0.4f}'
    )

1: testing accuracy: 0.9658
2: testing accuracy: 0.9728
3: testing accuracy: 0.9801
4: testing accuracy: 0.9848
5: testing accuracy: 0.9860


## Summary

In this tutorial, you discovered how to develop a convolutional neural network using GPU.
The major benifit is speed. You can train a deep model in a very short time.
In addtion, GPU has a large capacibility for parallel training.