# 9. Deep Learning with PyTorch

In this session, we will practice building, training, and predicting with deep neural networks using the PyTorch library.

Before we get started, import below listed libraries/packages.

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

## 9.1. Getting Started with PyTorch

First off, let us look around the key features that PyTorch offers.

(9.1. code obtained from: https://github.com/yunjey/pytorch-tutorial) 


### 9.1.1. Autograd

Autograd (automatic differentiation, AD) computes the derivatives from the network automatically.

In [None]:
# Declare "tensor"-type variable, and create a "computation graph"
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w * x + b    # y = 2 * x + 3

# The backward() function obtains the derivatives from the computation graph
y.backward()


Let us check the results.

In [None]:
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 


The following code snippet is a snapshot showing how the autograd feature is used in practice. (You do not need to execute the snippet at this moment.)

In [None]:
# Create two matrices with random values. x is a 10x3 matrix; y is a 10x2 matrix.
x = torch.randn(10, 3)
y = torch.randn(10, 2)

# Instantiating a linear regression model that takes x as input and y as output.
# (Using the neural network jargon, below is creating a fully connected layer.)
linear = nn.Linear(3, 2)
print ('w: ', linear.weight)
print ('b: ', linear.bias)

# Declare the loss function and optimizer for training.
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(linear.parameters(), lr=0.01)

# Forward pass
pred = linear(x)

# Compute loss
loss = criterion(pred, y)
print('** loss before 1 step optimization: ', loss.item())

# Backward pass
loss.backward()

# Print out the derivatives
print ('dL/dw: ', linear.weight.grad) 
print ('dL/db: ', linear.bias.grad)

# Run the first iteration of the gradient descent process.
optimizer.step()
# You can also perform gradient descent at the low level
# linear.weight.data.sub_(0.01 * linear.weight.grad.data)
# linear.bias.data.sub_(0.01 * linear.bias.grad.data)

# Printing out the updated loss (output from the loss function) after the first iteration of gradient descent
pred = linear(x)
loss = criterion(pred, y)
print('** loss after 1 step optimization: ', loss.item())


### 9.1.2. Loading data from `numpy`

By using function `torch.from_numpy()`, one can easily convert a `ndarray`-type variable (a multi-dimensional matrix defined in `numpy`) into a torch specific data type (and *vice versa*).

In [None]:
import numpy as np

# Create a numpy array
x = np.array([[1, 2], [3, 4]])

# Convert: numpy array -> torch tensor
y = torch.from_numpy(x)

# Convert: torch tensor -> numpy array
z = y.numpy()

### 9.1.3. DataLoader

`DataLoader` lets the user easily build a PyTorch-specific dataset for training and prediction. 

The next example shows how one can load the CIFAR-10 dataset using `DataLoader`. (Like `sklearn.datasets`, `torchvision.datasets` includes a few widely used datasets, such as MNIST and CIFAR-10.)

In [None]:
# Download and load the CIFAR-10 dataset in the memory
train_dataset = torchvision.datasets.CIFAR10(root='../../data/',
                                             train=True, 
                                             transform=transforms.ToTensor(),
                                             download=True)

# Accessing the first instance of the dataset
image, label = train_dataset[0]
print (image.size())
print (label)

# Instantiating a DataLoader using the loaded dataset
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=64, 
                                           shuffle=True)

# Code skeleton for training, with the above created DataLoader object
for images, labels in train_loader:
    # ----------------------------------------------
    # -- Your training code should be placed here --
    # ----------------------------------------------
    pass

# One can compose mini-batches using Iterator
# data_iter = iter(train_loader)
# images, labels = data_iter.next()

The next code skeleton shows how to create an object for custom (user supplied) data. With a complete `CustomDataset` class, one can instantiate a dataset object that works with the `DataLoader` class, which is explained above.

In [None]:
# With a complete `CustomDataset` class, one can instantiate 
# a dataset object that works with the `DataLoader` class
# (Before you comlete the sections marked with 'TODO', this code snippet does not run)

# You should build your custom dataset as below
class CustomDataset(torch.utils.data.Dataset):
  def __init__(self):
    # TODO
    # 1. Initialize file paths or a list of file names
    pass
  def __getitem__(self, index):
    # TODO
    # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open)
    # 2. Preprocess the data (e.g. torchvision.Transform)
    # 3. Return a data pair (e.g. image and label)
    
    pass
  def __len__(self):
    # You should change 0 to the total size of your dataset
    return 0 

# You can then use the prebuilt data loader
custom_dataset = CustomDataset()
train_loader = torch.utils.data.DataLoader(dataset=custom_dataset,
                                           batch_size=64, 
                                           shuffle=True)

### 9.1.4. Loading a Pretrained Model

One can load a pretrained model that has been written and trained using PyTorch. The next code snippet shows how to bring in a pretrained ResNet-18 model (an image classifier).

In [None]:
# Download and load a pretrained ResNet-18 model
resnet = torchvision.models.resnet18(pretrained=True)


# One may train more (finetune) a pretrained model
# The next code shows how to train the last layer of the pretrained ResNet-18 model
for param in resnet.parameters():
    param.requires_grad = False
resnet.fc = nn.Linear(resnet.fc.in_features, 100)  # 100 is an example


# By taking the forward pass of the model, the pretrained model can be applied to a prediction task
images = torch.randn(64, 3, 224, 224)
outputs = resnet(images)
print (outputs.size())     # (64, 100)

## 9.2. Logistic Regression (in the PyTorch way!)

This section shows a re-implementation of logistic regression using PyTorch, which is distinct from that of Scikit-learn or Numpy.

We will load the MNIST dataset for this tutorial. The MNIST dataset consists of hand-written digits 0-9. 

![Image is not found](https://miro.medium.com/max/530/1*VAjYygFUinnygIx9eVCrQQ.png)

In [None]:
input_size = 784
num_classes = 10
batch_size = 100

# MNIST dataset 
train_dataset = torchvision.datasets.MNIST(root='../../data', 
                                           train=True, 
                                           transform=transforms.ToTensor(),  
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data', 
                                          train=False, 
                                          transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

Next snippet defines a logistic regression model using PyTorch.

In [None]:
# Hyper-parameter setting
learning_rate = 0.001

# Model definition: building a network that consists of one linear layer
model = nn.Linear(input_size, num_classes)

# Training-parameter setting: specifying the loss function(i.e., objective function) and optimizer for training
criterion = nn.CrossEntropyLoss()  # nn.CrossEntropyLoss() computes softmax internally
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)


Next code describes how the above model is trained and used for prediction.

In [None]:
# Train the model
def train_logreg(train_loader, num_epochs):
  total_step = len(train_loader)
  for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
      # Reshape images to (batch_size, input_size)
      images = images.reshape(-1, 28*28)
      
      # Forward pass
      outputs = model(images)
      loss = criterion(outputs, labels)
      
      # Backward and optimize
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
      
      # Display the progress
      if (i+1) % 300 == 0:
        print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
              .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
        
# Test the model
def test_logreg(model, test_loader):
  # In test phase, we don't need to compute gradients (for memory efficiency)
  with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
      images = images.reshape(-1, 28*28)
      outputs = model(images)
      _, predicted = torch.max(outputs.data, 1)
      total += labels.size(0)
      correct += (predicted == labels).sum()

    # Display the result
    print('Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))

Now let us use the above script to train and evaluate the logistic regression model.

**Q: How much accuracy could you obtain?**

In [None]:
train_logreg(train_loader, num_epochs=10)
test_logreg(model, test_loader)

## 9.3. Feed-forward Neural Networks

This section shows how to define and apply a feed-forward neural network (or multi-layer perceptron, MLP) to the MNIST dataset.


First, let us specify the hyperparameters for model and training.

In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters 
hidden_size = 500
learning_rate = 0.001

Next code defines the class for a feed-forward neural network.

In [None]:
# Fully connected neural network with one hidden layer
class FFNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FFNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

Below defines training and prediction with the above defined model.

In [None]:
# Train the model
def train_ffnet(model, train_loader, num_epochs):
  total_step = len(train_loader)
  for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  
      # Move tensors to the configured device
      
      images = images.reshape(-1, 28*28).to(device)
      labels = labels.to(device)
      
      # Forward pass
      outputs = model(images)
      loss = criterion(outputs, labels)
      
      # Backward and optimize
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
      
      # Display the progress
      if (i+1) % 300 == 0:
        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
        
# Test the model
def test_ffnet(model, test_loader):
  # In test phase, we don't need to compute gradients (for memory efficiency)
  with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
      images = images.reshape(-1, 28*28).to(device)
      labels = labels.to(device)
      outputs = model(images)
      _, predicted = torch.max(outputs.data, 1)
      total += labels.size(0)
      correct += (predicted == labels).sum().item()

    # Display the result
    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))


Now let us run the script on the MNIST dataset.

**Q: What is the accuracy that a feed-forward NN can achieve? Is it better than what you got from a logistic regression?**

## 9.4. Convolutional Neural Networks

This section deals with convolutional neural network (CNN), that is commonly used for image classification.

First off, let us declare the hyperparameters for training.

In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Hyper-parameters 
learning_rate = 0.001

Next code defines a convolutional neural network used in this tutorial.

In [None]:
# Convolutional neural network (two convolutional layers)
class ConvNet(nn.Module):
  def __init__(self, num_classes=10):
    super(ConvNet, self).__init__()
    self.layer1 = nn.Sequential(
      nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),
      nn.BatchNorm2d(16),
      nn.ReLU(),
      nn.MaxPool2d(kernel_size=2, stride=2))
    self.layer2 = nn.Sequential(
      nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),
      nn.BatchNorm2d(32),
      nn.ReLU(),
      nn.MaxPool2d(kernel_size=2, stride=2))
    self.fc = nn.Linear(7*7*32, num_classes)
      
  def forward(self, x):
    out = self.layer1(x)
    out = self.layer2(out)
    out = out.reshape(out.size(0), -1)
    out = self.fc(out)
    return out

Below defines the training and prediction process.

In [None]:
# Train the model
def train_convnet(model, train_loader, num_epochs):
  total_step = len(train_loader)
  for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
      images = images.to(device)
      labels = labels.to(device)
      
      # Forward pass
      outputs = model(images)
      loss = criterion(outputs, labels)
      
      # Backward and optimize
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
      
      # Display the progress
      if (i+1) % 300 == 0:
        print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
              .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model      
def test_convnet(model, test_loader):
  model.eval()  # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)
  with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
      images = images.to(device)
      labels = labels.to(device)
      outputs = model(images)
      _, predicted = torch.max(outputs.data, 1)
      total += labels.size(0)
      correct += (predicted == labels).sum().item()

    # Display the result
    print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))

Let us run and apply the above code to the MNIST dataset.

**Q: How the CNN model performs? Is the accuracy better than that of logistic regression and feed-forward neural network?**

In [None]:
model = ConvNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

train_convnet(model, train_loader, num_epochs=10)
test_convnet(model, test_loader)