# Pytorch Tutorial

In this tutorial we will walk through some examples of the Pytorch API that you might find useful on your assignments. 

We will walk through an example of using the Pytorch API to demonstrate its functionality:

- Implementing a simple neural network using Pytorch from API resources

This notebook contains two sections: 

1. Setup
2. Building networks from scratch using Pytorch

## Pytorch API Explanation

As a brief overview, we can conceive of Pytorch as offering three levels of abstraction:

| Level   | API             | Flexibility | Convenience |
|---------------------|-----------------|-------------|-------------|
| 1 | Barebone        | High        | Low         |
| 2 | `nn.Module`     | High        | Medium      |
| 3 | `nn.Sequential` | Low         | High        |


Previously we have worked with the Level-1 Barebones Pytorch API by manually defining forward-backward passes and implementing all logic ourselves.

At Level-2 Pytorch's `nn.Module` allows us to encapsulate the arbitrary logic for a generic neural network architecture. 
Using `nn.Module` one can define "components" of a neural network such as a ResNet block or something as simple as a "flatten" operation by defining a `forward()` pass.
These `nn.Module` objects can then be composed together to perform the operations we desire in our overall network with Pytorch's libraries able to manage optimization and updates.
[You can find more details about the nn.Module API here.](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)

At Level-3, we can compose several Pytorch Modules using `nn.Sequential` in order to make our logic even simpler. 
`nn.Sequential` encapsulates the composition of several `nn.Module` objects, automatically applying the forward-passes end-to-end. 
[You can find more details about the nn.Sequential API here](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html)

Importantly, Level-3 also automatically interfaces with Pytorch optimization libraries to simplify training code as well.

We will use all three levels of abstraction in our assignments. However, this tutorial focuses on both `nn.Module` and `nn.Sequential` abstraction layers as a tool for building neural networks.

You can see that in A4 we call the `nn.Sequential` constructor in `FCOSPredictionNetwork.__init__()` you will learn what objects to pass to the `nn.Sequential` constructor in this notebook. NOTE: all of our models inherit from `nn.Module` to take advantage of the Pytorch API.


# I. Setup

### Load Packages
Here we will load the relevant torch packages

In [None]:
import os
import time
os.environ["TZ"] = "US/Eastern"
time.tzset()

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler
import torchvision.datasets as dset
import torchvision.transforms as T

from collections import OrderedDict

# for plotting
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%matplotlib inline

We will use the GPU to accelerate our computation. Run this cell to make sure you are using a GPU.

We will be using `torch.float = torch.float32` for data and `torch.long = torch.int64` for labels.

Please refer to https://pytorch.org/docs/stable/tensor_attributes.html#torch-dtype for more details about data types.

In [None]:
to_float= torch.float
to_long = torch.long

if torch.cuda.is_available():
    print('Good to go!')
    notebook_device = 'cuda'
else:
    print('Please set GPU via Edit -> Notebook Settings.')
    notebook_device = 'cpu'

### Load CIFAR
We are using [torchvision.datasets.CIFAR10](https://pytorch.org/docs/stable/torchvision/datasets.html?highlight=cifar#torchvision.datasets.CIFAR10) to download the CIFAR-10 dataset.

We instantiate a `DataLoader` object using the function below `load_CIFAR` in order to interact with our dataset. You can observe that this step is fundementally similar to how we interact with the `VOC2007` dataset in A4

In [None]:
def load_CIFAR(path='./datasets/'):
    NUM_TRAIN = 49000
    # The torchvision.transforms package provides tools for preprocessing data
    # and for performing data augmentation; here we set up a transform to
    # preprocess the data by subtracting the mean RGB value and dividing by the
    # standard deviation of each RGB value; we've hardcoded the mean and std.
    transform = T.Compose([
                  T.ToTensor(),
                  T.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
              ])

    # We set up a Dataset object for each split (train / val / test); Datasets load
    # training examples one at a time, so we wrap each Dataset in a DataLoader which
    # iterates through the Dataset and forms minibatches. We divide the CIFAR-10
    # training set into train and val sets by passing a Sampler object to the
    # DataLoader telling how it should sample from the underlying Dataset.
    cifar10_train = dset.CIFAR10(path, train=True, download=True,
                               transform=transform)
    
    # This data loader object is actually what we will use to interact with our data.
    # while the dataset object created above gives us an API for the dataset as a whole
    # the dataloader object allows us to grab chunks of our dataset by batch_size.
    # this is orchestrated by the sampler object we pass into the constructor
    loader_train = DataLoader(cifar10_train, batch_size=64, 
                            sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN)))

    cifar10_val = dset.CIFAR10(path, train=True, download=True,
                             transform=transform)
    loader_val = DataLoader(cifar10_val, batch_size=64, 
                          sampler=sampler.SubsetRandomSampler(range(NUM_TRAIN, 50000)))

    cifar10_test = dset.CIFAR10(path, train=False, download=True, 
                              transform=transform)
    loader_test = DataLoader(cifar10_test, batch_size=64)
    return loader_train, loader_val, loader_test


In [None]:
loader_train, loader_val, loader_test = load_CIFAR(path='./datasets/')

# II. PyTorch Sequential API

For simple models like a stack of feed forward layers, you still need to go through 3 steps: 
1. subclass `nn.Module`
2. assign layers to class attributes in `__init__`
3. call each layer one by one in `forward()`. 

**Is there a more convenient way?**

Fortunately, PyTorch provides a container Module called `nn.Sequential`, which merges the above steps into one. It is not as flexible as `nn.Module`, because you cannot specify more complex topology than a feed-forward stack, but it's good enough for many use cases.

## Writing a Two-Layer Network
Let's see how to write a two-layer fully connected network with `nn.Sequential`, and train it using a simple training-loop.

We will skip advanced weight initialization for simplicity.

In [None]:
# We need the ability to flatten a non-1D tensor. Typically we would call nn.functional.flatten() or F.flatten() on a given tensor.
# However, we can encapsulate this logic in a nn.Module block in order to make composing the function with other Modules easier.
# We do this by overriding the forward pass of a nn.Module. This is all we need to do to create a new Module!
class Flatten(nn.Module):
    def forward(self, x):
        return torch.flatten(x, start_dim=1) # 0 is the batch dimension so we flatten images at dimension 1
    
# Here we implement a custom Linear module to show how one might write their own!
# Note pytorch has a Linear module class but we will do this ourselves

class CustomLinear(nn.Module):
    # All we need to do is handle the constructor by writing an __init__ function
    # And then we need to write the forward pass for our module
    def __init__(self, input_layer_size, output_layer_size):
        # We have to call init on our super() class.
        super().__init__() 
        # create our tensor of weights.
        # we initialize our weight tensor normally-random
        # We tell our Module that the self.weights member variable is a Parameter
        # This will automatically apply any weight initialization or device movement
        # We choose to perform on our Model to this specific tensor.
        self.weights = nn.Parameter(
            torch.randn(
                input_layer_size, 
                output_layer_size
                )
        )
        
    def forward(self, x):
        # simply define the forward pass
        # this is the matrix multiplication of our weight and our input.
        return torch.matmul(x, self.weights)
    

In [None]:
# we can then define some constants for ease.
C, H, W = 3, 32, 32
num_classes = 10

# these hyperparameters could be generated using some other function, 
# but we will set them as constants as they are not our focus.
hidden_layer_size = 4000
learning_rate = 1e-2
weight_decay = 1e-4
momentum = 0.5

# We can create an model using a list passed to the nn.Sequential constructor.
model = [
    Flatten(), 
    nn.Linear(C*H*W, hidden_layer_size),
    nn.ReLU(),
    # NOTE: Linear(hidden_layer_size, num_classes) would result in the same
    # object as our CustomLinear construction below.
    CustomLinear(hidden_layer_size, num_classes)
]

model = nn.Sequential(*model)

print('Architecture:')
print(model) # printing `nn.Module` shows the architecture of the module.

# We can also create an identical architecture without using a list by 
# directly passing nn.Module objects to the constructor.
model_alt = nn.Sequential(
    Flatten(), 
    nn.Linear(C*H*W, hidden_layer_size),
    nn.ReLU(),
    nn.Linear(hidden_layer_size, num_classes)
)

print("Model Alt Architecture:")
print(model_alt)

# Because of Pytorch's API compatability, as long as we have defined 
# the model as a nn.Module (the super() class for nn.Sequential)
# We can call the .parameters() method for the ENTIRE model and pass that to an optimizer. 
# This way we don't need to worry about missing parameters.
# Also, this optimizer now handles the update for our model and 
# Pytorch will keep track of the computational graph to ensure that gradients are applied correctly
optimizer = optim.SGD(model.parameters(), lr=learning_rate, 
                      weight_decay=weight_decay,
                      momentum=momentum, nesterov=True)

print("SGD Optimizer")
print(optimizer)
# The specific parameters of this optimizer are not particularly important for us at the moment.

### A simple training loop example:

Below we implement a training loop that expects an optimizer/model as well as a DataLoader class.

This will hopefully show how simple training loops can become if we use the Pytorch API intelligently. This should be useful for the later stages of A4.


In [None]:
def train_sequential(model, optimizer, epochs=1, learning_rate_decay=.1, schedule=[], verbose=True):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.

    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for

    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=notebook_device)  # move the model parameters to CPU/GPU
    num_iters = epochs * len(loader_train) # our DataLoader object has __len__ implemented.
    
    # we use the same learning rate scheduler in A4 to adjust the learning rate of our optimizer
    lr_scheduler = optim.lr_scheduler.MultiStepLR(
        optimizer, 
        milestones=[int(0.6 * num_iters), int(0.9 * num_iters)]
    )
    
    # these are printing parameters
    print_every = 100
    if verbose:
        num_prints = num_iters // print_every + 1
    else:
        num_prints = epochs
    train_loss_history = []
    
    # Enter the for loop for each epoch we would like to train for.
    for e in range(epochs):
    
        #adjust the learning rate of our optimizer.
        lr_scheduler.step()
        # put model to training mode
        model.train() 
        # we can use the default python implicit iteration from the DataLoader!
        for t, (x, y) in enumerate(loader_train):  
            x = x.to(device=notebook_device, dtype=to_float) 
            # move to device, e.g. GPU, use our to_float/to_long variables
            y = y.to(device=notebook_device, dtype=to_long)

            # the __call__() method of a nn.Sequential calls the forward pass! It's that easy.
            scores = model(x)
            # here is an example of using a functional call rather than the nn.Module nn.CrossEntropyLoss
            loss = F.cross_entropy(scores, y) 

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            # This is the backwards pass: compute the gradient of the loss with
            # respect to each  parameter of the model.
            # This is AUTOMATICALLY computed by Pytorch because we create the 
            # nn.Sequential object and registered it with the optimizer.
            # Loss is expected to be a pytorch scalar element with a single element
            # During the forward pass all the operations for each element that results in loss
            # are kept track of by Pytorch Autgrad in order to compute the gradients with respect
            # to each parameter that impacts the loss value.
            loss.backward()

            # To keep track of our training loss we will append to the variable
            # train_loss_history -- HOWEVER, we need to make sure pytorch doesn't
            # keep track of the computational graph for this variable.
            # We can register a variable as not-in-the-computational-graph
            # by using .detatch()
            # It is also convention to make these objects on the cpu so we do not
            # consume GPU memory from this process.
            train_loss_history.append(loss.detach().to('cpu'))

            # Actually update the parameters of the model using the gradients
            # computed by the backwards pass.
            # once again this update step is handled automatically by Pytorch AFTER 
            # we have called loss.backward() to compute the gradients for each learnable parameter.
            optimizer.step()

        # Printing steps below.
        num_correct = 0
        num_samples = 0
        # Set the model into evaluation mode
        model.eval()

        # Do not compute the computational graph for these operations
        print("========== VALIDATION SET EVALUATION ==========")
        with torch.no_grad():
            for x, y in loader_val:
                x = x.to(device=notebook_device, dtype=to_float)
                y = y.to(device=notebook_device, dtype=to_long)
                # compute the logits of our model
                scores = model(x)
                _, preds = scores.max(1)
                num_correct += (preds == y).sum()
                num_samples += preds.size(0)
                acc = float(num_correct)/num_samples

            print(f"Got {num_correct} / {num_samples} correct {(100*acc):.2f}")

    # Test set evaluation:

    print("========== TEST SET EVALUATION ==========")
    # Printing steps below.
    num_correct = 0
    num_samples = 0
    # Set the model into evaluation mode
    model.eval()

    # Do not compute the computational graph for these operations
    with torch.no_grad():
        for x, y in loader_test:
            x = x.to(device=notebook_device, dtype=to_float)
            y = y.to(device=notebook_device, dtype=to_long)
            # compute the logits of our model
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
            acc = float(num_correct)/num_samples

        print(f"Got {num_correct} / {num_samples} correct {(100*acc):.2f}")

    return train_loss_history

In [None]:
# We can now see the training for our simple two-layer network using the training loop above:
train_loss_history = train_sequential(model, optimizer) # feel free to change the hyperparameters for fun!
print("TRAIN LOSS HISTORY")
plt.plot(train_loss_history)