# PyTorch Basics

[PyTorch](https://pytorch.org/) is an optimized tensor library for deep learning using GPUs and CPUs.

In this first tutorial, we'll start with the basics of:
- `torch.tensor` and `torch.autograd`.
- Setting up the input system.
- Setting up the training and test pipeline.
- Downloading and using pretrained models.
- Saving and loading models.




## Libraries

In [1]:
# !pip install torch
# !pip install torchvision

In [2]:
import torch
import torchvision
import torch.nn as nn
import numpy as np
import torchvision.transforms as transforms
from tqdm import tqdm

## Basic `autograd` Example

To start, a `torch.tensor` is a multi-dimensional vector containing elements of a single data type.

`torch.autograd` is PyTorch’s automatic differentiation engine that powers neural network training. In backpropagation, the neural network adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions, and optimizing the parameters using gradient descent.

In [3]:
# Create tensors.
a = torch.tensor([1.], requires_grad=True)
b = torch.tensor([2.], requires_grad=True)

# Build a computational graph.
L = 3 * a**3 - b**2

# Compute gradients.
L.backward()

# Print out the gradients.
print(9 * a**2 == a.grad)
print(-2 * b == b.grad)

tensor([True])
tensor([True])


Indeed, if $L = 3a^3 - b^2$, then $\frac{\partial L}{\partial a} = 9a^2$ and
$\frac{\partial L}{\partial b} = -2b$.

## Input Pipeline

PyTorch's modules require the input tensors to be `torch.tensor` objects. It is
possible to cast NumPy arrays or lists to `torch.tensor` tensors.

In [4]:
# Create a numpy array.
x = np.array([[1, 2], [3, 4]])

# Convert the numpy array to a torch tensor.
y = torch.from_numpy(x)

# Convert the torch tensor to a numpy array.
z = y.numpy()

# Print the vectors.
print(type(x))
print(type(y))
print(type(z))

<class 'numpy.ndarray'>
<class 'torch.Tensor'>
<class 'numpy.ndarray'>


PyTorch provides two data primitives: `torch.utils.data.DataLoader` and `torch.utils.data.Dataset` that allow you to use pre-loaded datasets as well as your own data. `Dataset` stores the samples and their corresponding labels, and `DataLoader` wraps an iterable around the `Dataset` to enable easy access to the samples.

In [5]:
# Download and construct CIFAR-10 dataset.
train_dataset = torchvision.datasets.CIFAR10(root='.',
                                             train=True,
                                             transform=transforms.ToTensor(),
                                             download=True)

# Data loader (this provides queues and threads in a very simple way).
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=64,
                                           shuffle=True)

# Actual usage of the data loader is as below.
for i, (images, labels) in enumerate(train_loader):
    print(f'batch {i+1:03d}: images {images.shape}, labels {labels.shape}')
    # TODO
    # Training code goes here.

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:03<00:00, 48207713.41it/s]


Extracting ./cifar-10-python.tar.gz to .
batch 001: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 002: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 003: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 004: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 005: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 006: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 007: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 008: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 009: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 010: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 011: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 012: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 013: images torch.Size([64, 3, 32, 32]), labels torch.Size([64])
batch 014: images torch.Size([64, 3,

In [6]:
# You should build your custom dataset as below.
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self):
        # TODO
        # 1. Initialize file paths or a list of file names.
        pass
    def __getitem__(self, index):
        # TODO
        # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
        # 2. Preprocess the data (e.g. torchvision.Transform).
        # 3. Return a data pair (e.g. image and label).
        pass
    def __len__(self):
        # You should change 0 to the total size of your dataset.
        return 0

## Training and Test

The training and test pipeline needs to be manually defined. The following cells
define a simple multilayer perceptron that will be trained on the MNIST dataset.

First, the MNIST dataset is downloaded and the corresponding `Dataset` and `DataLoader` objects are created.

In [7]:
# Device configuration.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Download the MNIST dataset.
train_dataset = torchvision.datasets.MNIST(root='.',
                                           train=True,
                                           transform=transforms.ToTensor(),
                                           download=True)
test_dataset = torchvision.datasets.MNIST(root='.',
                                          train=False,
                                          transform=transforms.ToTensor())

# Create the loaders.
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=64,
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=64,
                                          shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 129707243.34it/s]


Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 63091507.20it/s]


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 44664759.73it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 15796458.35it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



Then, the MLP model is defined. To define a model in PyTorch it is necessary to
create a custom class that inherits from `nn.Module`. The constructor of this class is used to store the parameters of the model, while the `forward` method defines what to do during each forward pass.

In [8]:
# Hyper-parameters.
input_size = 784
hidden_size = 512
num_classes = 10
num_epochs = 5
learning_rate = 0.001

# Multilayer Perceptron.
class MLP(nn.Module):

    # Constructor.
    def __init__(self, input_size, hidden_size, num_classes):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    # Forward pass.
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

# Create the model.
model = MLP(input_size, hidden_size, num_classes).to(device)

To train the model, the cross entropy loss function (`nn.CrossEntropyLoss`) is used. This function takes as input the output logits produced by the model, and the ground truth labels. Moreover, the optimizer of choice will be Adam (`torch.optim.Adam`).

The `train` and `test` functions are defined to train and test the model, respectively. Both functions provide a way of iterating over the entire loaders. These functions are called once per epoch.

In [9]:
# Loss and optimizer.
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Training function.
def train(epoch, device='cpu'):
    l = 0
    for data in tqdm(train_loader, desc=f'Epoch {epoch+1:03d}'):
        x = data[0].reshape(-1, 28 * 28).to(device)
        y = data[1].to(device)
        out = model(x)
        loss = criterion(out, y)
        l += loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    return l

# Test function.
def test(device='cpu'):
    l = 0
    correct = 0
    total = 0
    with torch.no_grad():
        for data in test_loader:
            x = data[0].reshape(-1, 28 * 28).to(device)
            y = data[1].to(device)
            out = model(x)
            l += criterion(out, y)
            _, pred = torch.max(out.data, 1)
            total += y.size(0)
            correct += (pred == y).sum().item()
    return l, correct / total

# Training and test for five epochs.
for epoch in range(num_epochs):
    train_loss = train(epoch, device)
    test_loss, test_acc = test(device)
    print(f'Epoch {epoch+1:03d}: training loss {train_loss:.4f}, test loss {test_loss:.4f}, test acc {test_acc}')

Epoch 001: 100%|██████████| 938/938 [00:12<00:00, 76.46it/s]


Epoch 001: training loss 238.2715, test loss 19.6506, test acc 0.961


Epoch 002: 100%|██████████| 938/938 [00:09<00:00, 95.26it/s]


Epoch 002: training loss 93.1838, test loss 13.8053, test acc 0.9716


Epoch 003: 100%|██████████| 938/938 [00:09<00:00, 95.34it/s] 


Epoch 003: training loss 60.9267, test loss 12.6637, test acc 0.9744


Epoch 004: 100%|██████████| 938/938 [00:10<00:00, 91.82it/s]


Epoch 004: training loss 43.4754, test loss 11.6361, test acc 0.9762


Epoch 005: 100%|██████████| 938/938 [00:10<00:00, 93.41it/s]


Epoch 005: training loss 30.8465, test loss 10.6565, test acc 0.9793


## Pretrained Models

The `torchvision.models` subpackage contains definitions of models for addressing different computer vision tasks. Here, a pretrained ResNet-18 is downloaded.

Once downloaded, all parameters of the model are set to be non-trainable, and the last layer (i.e., the classification head) is replaced with a new linear layer (`nn.Linear`) that can be trained on a different dataset to solve a classification problem that considers an arbitrary number of classes. This procedure is called *fine-tuning*.

In [10]:
# Download and load the pretrained ResNet-18.
resnet = torchvision.models.resnet18(weights='ResNet18_Weights.DEFAULT')

# If you want to finetune only the top layer of the model, set as below.
for param in resnet.parameters():
    param.requires_grad = False

# Replace the top layer for finetuning.
resnet.fc = nn.Linear(resnet.fc.in_features, 100)  # 100 classes.

# Forward pass.
image = torch.randn(1, 3, 224, 224)
output = resnet(image)
print(output.size())

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 169MB/s]


torch.Size([1, 100])


## Save and Load Models

In [11]:
# Save and load the entire model.
torch.save(resnet, 'model.ckpt')
model = torch.load('model.ckpt')

# Save and load only the model parameters (recommended).
torch.save(resnet.state_dict(), 'params.ckpt')
resnet.load_state_dict(torch.load('params.ckpt'))

<All keys matched successfully>