![PyTorch](notebook_diagrams/pytorch.jpg)

# Tutorial for PyTorch
PyTorch, like TensorFlow, is a deep learning library for creating scalable deep learning models that are efficient and compact.  It is also widely used across deep learning applications, and offers a great balance of creating highly-flexible machine learning frameworks with making code interpretable and easy to understand.

The key element behind PyTorch is also the computation graph. PyTorch, unlike TensorFlow, utilizes only [eager execution](https://medium.com/coding-blocks/eager-execution-in-tensorflow-a-more-pythonic-way-of-building-models-e461810618c8) (so you don't have to worry about issues with integrating numpy, printing variables, etc.).

This tutorial is derived from a combination of tutorials from PyTorch.  See the reference [here](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html).

### What is PyTorch?

It’s a Python-based scientific computing package targeted at two sets of
audiences:

-  A framework that's similar to numpy, but enables you to better utilize the power of GPUs
-  a deep learning research platform that provides maximum flexibility
   and speed
   
Let's explore what PyTorch can do!

## 0. Install PyTorch
We'll use Anaconda to install PyTorch on our AWS machines for this tutorial.  If you don't want to install this package through Anaconda, you can also do so through `pip`.

In [None]:
# Activate conda environment
! conda activate local_env

# Install PyTorch in Conda environment
! conda install -c pytorch pytorch
! pip install torchvision

# Check PyTorch version
! pip show torch

# Use matplotlib inline version
%matplotlib inline

## 1. Import PyTorch
Quick note here: notice how we will use `import torch`, not `import pytorch`.

In [None]:
# For an explanation on the future module: https://stackoverflow.com/questions/7075082/what-is-future-in-python-used-for-and-how-when-to-use-it-and-how-it-works/7075121
from __future__ import print_function

# Import PyTorch package and modules
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

# Import numpy for later
import numpy as np

# Import matplotlib for graphing
import matplotlib.pyplot as plt

## 2. PyTorch Tensors
Also like TensorFlow, the key data structure with PyTorch is the tensor.  Remember that these tensors are very similar to the numpy `nd_array` data structures we saw when we were learning `numpy`.

The diagram below also shows us another way we can think about [PyTorch tensors](https://www.datacamp.com/community/tutorials/investigating-tensors-pytorch).

![Tensor Intuition](notebook_diagrams/tensor_diagram.jpeg)

### 2.1 Tensor Examples
For each of these exercises, think about a potential equivalent operation we could use for numpy `nd_arrays`.

**2.1.1 Construct a 5x3 matrix, uninitialized**.  Think about filling this matrix with null/empty values.  Recall, especially with tensors, that it's computationally less expensive to allocate space in memory for the `tensor` all at once, rather than gradually increasing the size it needs over time.


In [None]:
# Create tensor of empty values
x = torch.empty(5, 3)
print(x)

**2.1.2 Construct a randomly initialized matrix**.  This kind of operation can be helpful if we are trying to generate probabilistic data from a distribution, without using a numpy wrapper.



In [None]:
# Create tensor of random numbers
x = torch.rand(5, 3)
print(x)

**2.1.3 Construct a matrix filled zeros and of dtype long**.  Think about filling this matrix with null/empty values.  Recall, especially with tensors, that it's computationally less expensive to allocate space in memory for the `tensor` all at once, rather than gradually increasing the size it needs over time.



In [None]:
# Create tensor of zeros
x = torch.zeros(5, 3, dtype=torch.long)
print(x)

**2.1.4 Construct a tensor directly from data**.  You can think of this operation as placing a pytorch `tensor` wrapper on the native Python list.  From a performance perspective, operations like these can be important for numpy, pytorch, and tensorflow, because they can make numerical computations on data types such as lists much more quickly and efficiently than native Python can, especially if the user has access to GPUs.



In [None]:
# Create tensor directly from list
x = torch.tensor([5.5, 3])
print(x)

**2.1.5 Construct a tensor directly from another tensor**.  With PyTorch, you can also create a tensor based on an existing tensor. These methods will reuse properties of the input tensor, e.g. dtype, unless new values are provided by the user.

In [None]:
# Create tensor directly from another tensor
x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

# Create tensor directly from another tensor
x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

We can also access the size of any tensor in PyTorch through the `size()` method:

In [None]:
print("The size of the tensor x is: %s" % (str(x.size())))

### 2.2 More Tensor Operations
Like the numpy library, we also have access to many pytorch operations that we can use for numerical computations on `tensor` objects.

#### 2.2.1 Tensor Addition

In [None]:
# Method 1: tensor + tensor
y = torch.rand(5, 3)
print(x + y)

In [None]:
# Method 2: torch.add(tensor, tensor)
print(torch.add(x, y))

In [None]:
# Method 3: torch.add(tensor, tensor) > result
result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

In [None]:
# Method 4: In place (Any operation that mutates a tensor in-place is post-fixed with an ``_``.)
y.add_(x)
print(y)

#### 2.2.2 Indexing and Resizing
Indexing in pytorch is quite similar to indexing in numpy.  We'll find that a lot of the same functionalities we used before, such as:

- Multidimensional slicing
- Conditional indexing

Are also helpful for manipulating pytorch `tensor` objects.



##### Resizing
If you want to resize/reshape tensor, you can use ``torch.view``:



In [None]:
# Make tensor of random numbers with shape --> (4, 4)
x = torch.randn(4, 4)

# Convert shape of random tensor to shape --> (16, 1)
y = x.view(16)

# Can also "infer dimensions from other dimensions" --> (2, 8)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions

# Compare results
print("Shape of x: \n %s, \n \n, Shape of y: \n %s \n \n, Shape of z: \n %s \n \n" % 
      (x.size(), y.size(), z.size()))

##### Extracting Items from Tensors
If you have a one element tensor, use ``.item()`` to get the value as a
Python number.



In [None]:
# Create one-element tensor
x = torch.randn(1)

# Print tensor and item
print(x)
print(x.item())

### 2.3 More Functions
Like numpy, in addition to the operations discussed in this tutorial, pytorch offers 100+ Tensor operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random numbers, etc., are described [here](https://pytorch.org/docs/torch).


## 3. Relating PyTorch to NumPy: NumPy Bridge

Converting a Torch Tensor to a NumPy array and vice versa is a breeze.

The Torch Tensor and NumPy array will share their underlying memory
locations (if the Torch Tensor is on CPU), and changing one will change
the other.

### 3.1 Interchange between tensor and nd_array
Below, we'll convert a Torch Tensor to a NumPy Array.

In [None]:
# Create random PyTorch tensor
a = torch.ones(5)
print("PyTorch tensor: %s" % (a))

In [None]:
# Convert random PyTorch tensor to numpy nd_array
b = a.numpy()
print("Numpy array: %s" % (b))

See how the numpy array changed in value.



In [None]:
a.add_(1)
print(a)
print(b)

In [None]:
# Create a numpy array
a = np.ones(5)

# Convert to a pytorch tensor
b = torch.from_numpy(a)

# Now convert back to numpy automatically
np.add(a, 1, out=a)

# numpy array
print(a)

# pytorch tensor
print(b)

All the Tensors on the CPU except a CharTensor support converting between NumPy and PyTorch tensors.

## 4. Neural Networks/Deep Learning in PyTorch
The best features of PyTorch are its functionalities for efficiently and compactly creating, training, and evaluating complicated neural networks.

Neural networks can be constructed using the ``torch.nn`` package.  These networks use functionality known as `autograd` ([automatic differentiation](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)) that automatically computes gradients used for training.  We won't need to use it explicitly, but it's something to be mindful of when understanding what's going on when we train neural networks.

An `nn.Module` object contains layers, and a method for predicting outputs from inputs: `forward(input)` that returns the `output` predicted value.

A typical training procedure for a neural network is as follows:

1. Define the neural network that has some learnable parameters (or weights)
2. Iterate over a dataset of inputs
3. Process input through the network
4. Compute the loss (how far is the output from being correct)
5. Propagate gradients back into the network’s parameters
6. Update the weights of the network, typically using a simple update rule:
  ``weight = weight - learning_rate * gradient``


### 4.1 Let's Make a Neural Network in PyTorch!
We can now use the ``nn.Module`` object to create our own neural network in PyTorch!  This method is similar to "sub-classing" (modifying the defaults of an object to introduce custom features) models in TensorFlow and Keras to create our own custom models.

In [None]:
class Net(nn.Module):  # We'll call this object "net"

    # This is the constructor method!  This "initializes" the object when we make it.
    def __init__(self):
        
        # Use this call to inherit from the super class, which runs the constructor for nn.Module
        super(Net, self).__init__()
        
        # Now we can add our own customizable features
        # Network specification:
        #     input image channel, 6 output channels, 3x3 square convolution kernel
        
        # Convolution layers
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        
        # Make fully connected layers
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension 
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    # This is the method we use to make predictions from our input "x" to our output.
    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    # We can also define other custom methods here
    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

# Instantiate our neural network!  This creates an object according to the definitions above.
net = Net()
print(net)

When creating neural network objects, all you need to do is define the `__init__` (constructor) and `forward` functions/class methods (note: you can define other methods as well, but it's not necessary to for training).  Once you define the two class methods above, the `backward`
function (where gradients are computed during training) is **automatically** defined for you
using `autograd`.  You can use any of the Tensor operations in the `forward` function.

After defining the architecture and forward methods of our network, we can view the trainable parameters of this network by calling the default class method `net.parameters()`.

In [None]:
# Get list of parameters
params = list(net.parameters())

# Get number of parameters
print(len(params))
print(params[0].size())  # conv1's .weight

### 4.2 Making Neural Network Predictions in PyTorch
Let's try making a prediction using our network on a random 32x32 input.
Note: expected input size of this net (LeNet) is 32x32. To use this net on
the MNIST dataset, please resize the images from the dataset to 32x32.

In [None]:
# Create the random input
x = torch.randn(1, 1, 32, 32)

# Make the network prediction and print it
out = net(x)
print(out)

### 4.3 Zeroing Gradients in PyTorch
One small nuance of PyTorch is that by default, gradients will accumulate.  You will need to "zero" them whenever you make another update step.

Below, we'll zero the gradient buffers of all parameters and backprops with random gradients:

In [None]:
# Zero the gradients
net.zero_grad()

# Calling the backward method propagates gradients backward
out.backward(torch.randn(1, 10))

**NOTE**: `torch.nn` only supports mini-batches. The entire `torch.nn`
    package only supports inputs that are a mini-batch of samples, and not
    a single sample.

    For example, `nn.Conv2d` will take in a 4D Tensor of
    ``nSamples x nChannels x Height x Width``.

    If you have a single sample, just use `input.unsqueeze(0)` to add
    a fake batch dimension

Before proceeding further, let's recap all the classes you’ve seen so far.

**Recap:**
  -  `torch.Tensor` - A *multi-dimensional array* with support for autograd
     operations like ``backward()``. Also *holds the gradient* w.r.t. the
     tensor.
     
  -  `nn.Module` - Neural network module. *Convenient way of
     encapsulating parameters*, with helpers for moving them to GPU,
     exporting, loading, etc.
     
  -  `nn.Parameter` - A kind of Tensor, that is *automatically
     registered as a parameter when assigned as an attribute to a*
     `Module`.
     
  -  `autograd.Function` - Implements *forward and backward definitions
     of an autograd operation*. Every ``Tensor`` operation creates at
     least a single `Function` node that connects to functions that
     created a `Tensor` and *encodes its history*.

**At this point, we covered:**
  -  Defining a neural network
  -  Processing inputs and calling backward

**Still Left:**
  -  Computing the loss
  -  Updating the weights of the network

### 4.4 Loss Functions in PyTorch
A loss function takes the (output, target) pair of inputs, and computes a
value that estimates how far away the output is from the target.

There are several different
[loss functions](https://pytorch.org/docs/nn.html#loss-functions) under the
nn package.
A simple loss is: `nn.MSELoss` which computes the mean-squared error
between the input and the target.  Let's look at the example below to see this.

In [None]:
# Make a network prediction
output = net(x)

# Set a "target" value (this is a "label" in supervised learning)
target = torch.randn(10)  # a dummy target, for example

# Reshape the target
target = target.view(1, -1)  # make it the same shape as output

# Define the loss function
criterion = nn.MSELoss()

# Compute loss and return it
loss = criterion(output, target)
printable_loss = str(loss.detach().numpy())
print("Loss is: %s" % (printable_loss))

Now, if you follow `loss` in the backward direction, using its
`.grad_fn` attribute, you will see a set of computations that look
like this:

    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
          -> view -> linear -> relu -> linear -> relu -> linear
          -> MSELoss
          -> loss

So, when we call `loss.backward()`, all tensors that have `requires_grad=True`
will have their `.grad` Tensor accumulated with the gradient.  **This is how PyTorch stores gradients for the back-propagation algorithm!**

### 4.5 Backpropagation in PyTorch

To backpropagate the error all we have to do is call `loss.backward()`.
**You need to clear the existing gradients though, else gradients will be
accumulated to existing gradients.** You can do this by calling `net.zero_grad()`.


Now we shall call `loss.backward()`, and have a look at conv1's bias
gradients before and after the backward.



In [None]:
# Zero gradients before running back-propagation
net.zero_grad()     

# Show gradients before weight update
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

# Run the backpropagation step
loss.backward()

# Show gradients after weight update
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

Now, we have seen how to use loss functions.

**Read Later:**

  The neural network package contains various modules and loss functions
  that form the building blocks of deep neural networks. A full list with
  documentation is [here](https://pytorch.org/docs/nn).

**The only thing left to learn is:**

  - Updating the weights of the network

### 4.6 Weight Updates in PyTorch

The simplest update rule used in practice is the Stochastic Gradient
Descent (SGD).  Under this update rule, we have that:

`weight = weight - learning_rate * gradient`
    
However, as you use neural networks, you want to use various different
update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.
To enable this, we built a small package: ``torch.optim`` that
implements all these methods.  Typically, the recommended optimizer to use is [ADAM](https://pytorch.org/docs/stable/_modules/torch/optim/adam.html).  Using these optimizers is simple - let's investigate this below:

In [None]:
# Import torch optimizers module
import torch.optim as optim

# Create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# In your training loop:
optimizer.zero_grad()   # zero the gradient buffers

# Make network predictions 
output = net(x)

# Compute loss
loss = criterion(output, target)
printable_loss = str(loss.detach().numpy())
print("Loss is: %s" % (printable_loss))

# Take gradient update step
loss.backward()

# Make the gradient update
optimizer.step()

**NOTE**: A reminder that zeroing the gradients is extremely important for ensuring that your gradients don't accumulate!  This should be done every time you make a new weight update step (one of the guides below will show you how this fits into neural network training).

### 4.7 Specifying Operation Modes for Neural Network Models
One of the last nuanced features of PyTorch is that we will need to tell our network when we want to be considered in "training mode" and in "evaluation mode".  These commands are simple; supposing that our neural network is called `Net`, we only need to call:

- **Training**: `Net.train()`
- **Evaluation**: `Net.eval()`

The recommended way to include these statements in your code is to have `Net.train()` right before the beginning of your **training** loop, and `Net.eval()` right before the beginning of your **testing/evaluation** loop.

## 5. Devices in PyTorch
While TensorFlow contains separate installations for CPU and GPU-based packages, PyTorch does not.  This means  you will need to tell PyTorch what device you plan to make computations through (by default, PyTorch assumes this is CPU).  **This is easier than you think!**

To explain how we tell our machine which device to use, we'll introduce [CUDA (Compute Unified Device Architecture)](https://en.wikipedia.org/wiki/CUDA), a parallel processing platform and Application Programming Interface (API) developed by NVIDIA.  We'll use this package frequently with PyTorch when we want to make computations with our GPU.

**NOTE**: Though PyTorch's runtime performance can be significantly improved by using parallel processing with multiple CPUs/GPUs, it is not necessary for you to (though you should definitely consider doing so if you're planning to deploy any applications with PyTorch).  PyTorch will still perform at the same level in terms of accuracy regardless of the device we use.

![CUDA](notebook_diagrams/cuda.jpg)

### 5.1 Check if CUDA is available on your machine
CUDA is usually available (already installed or installation is possible) if your device contains GPU capabilities.  We can check this below.

If CUDA is available, we can create tensors and send them to the GPU, and then do computations on the GPU.  Two ways in which we can do this:

- Create a tensor directly on the GPU by specifying the `device=device` argument when we create `tensor` objects.
- Create a tensor on the CPU, and then move it to the GPU with the `.to()` tensor method.

**NOTE**: If your machine does not have CUDA, you can install it by following the installation link [here](https://developer.nvidia.com/cuda-downloads), and selecting your correct operating system.  Note that for our AWS machines, we are using **Linux**.

In [None]:
# We will use ``torch.device`` objects to move tensors in and out of GPU
if torch.cuda.is_available():
    
    # Check where cuda is, and store as a CUDA device object
    device = torch.device("cuda")          
    
    # Directly create a pytorch tensor on the 
    y = torch.ones_like(x, device=device)  
    
    # Or send device to GPU with "tensor.to(device)"
    x = x.to(device)
    
    # Do tensor computation on GPU
    z = x + y
    print(z)
    
    # Send tensor from GPU --> CPU, and change dtype
    print(z.to("cpu", torch.double))       

### 5.2 Why Should We Care About Devices?
Computing with PyTorch objects **can** (but won't always) be accelerated significantly with GPU-computing, which is why we should consider moving our `tensor` objects to and from the GPU.  

**NOTE**: It's important to keep in mind that most GPUs have significantly less memory than CPUs, so oftentimes, especially if we're working with large datasets, it is not possible to store the entire dataset on the GPU, and we'll need to add it incrementally.  This is another reason why PyTorch has compact methods and functions for moving data between devices.

Below is a diagram for a typical PyTorch workflow with **GPU-based computing**.  This uses the following three steps:

- Create a tensor or batch of tensors in CPU memory.
- Move tensor or batch of tensors in CPU memory to GPU memory.  Make numerical computations on GPU tensor(s).
- Move tensor or batch of tensors back from GPU memory to CPU memory.

![CUDA Workflow](notebook_diagrams/cuda_workflow.jpg)


### 5.3 Parallel Processing and GPUs are Great, But Do They Always Work?
Somemtimes, more devices does not mean better performance.  This can happen for a variety of different reasons, but this usually happens because the GPUs cannot take advantage of parallel computation.  

When using more advanced devices such as single or multiple GPUs, it is important to compare the performance of your numerical computations with different device configurations (such as CPU-only or GPU-only).  The code below can help us do just that (the reference for this code snippet can be found [here](https://discuss.pytorch.org/t/cpu-x10-faster-than-gpu-recommendations-for-gpu-implementation-speed-up/54980)).

In [None]:
# Only run code block if CUDA available
if torch.cuda.is_available():
    # Make start and end timer objects
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)

    # Start timer
    start.record()

    # Put numerical computations here
    y = torch.rand(5, 3)
    x = torch.rand(5, 3)
    z = x + y

    # End timer
    end.record()

    # Compute total execution time
    torch.cuda.synchronize()
    execution_time = start.elapsed_time(end)
    print("Execution time is: %s" % (execution_time))

### 5.4 Examples When PyTorch GPU is Slower/Faster than GPU
Below, we will show examples for when PyTorch is slower/faster using a GPU as compared to using a CPU.  In the case where it is faster (increasing the batch size), it is because the additional complexity we are introducing is highly parallelizable, whereas in the case where we simply increase the variability of the data we analyze, we see that using a GPU can actually result in slower performance.  It's important to be mindful of what likely will, and will not, result in improved runtime performance with GPUs.

An example when using a GPU results in **slower** runtime (**more complex data**):
![GPU Fast](notebook_diagrams/gpu_slow.png)

An example when using a GPU results in **faster** runtime (**larger batch size**):
![GPU Fast](notebook_diagrams/gpu_fast.png)


## 6. Conclusion
In this tutorial, we've introduced the deep learning library **PyTorch**, a flexible and fast Python library for creating compact, effective deep learning models and training them.  Some of the most important concepts we discussed were:

- **tensors**: PyTorch's data structure for storing data and making computations on it.  These `tensor` objects are almost identical to numpy `nd_array` objects, and many operations we can use for `tensor` objects are quite similar to other operations we've seen.


- **devices**: PyTorch can be used on different devices (such as CPUs and GPUs).  The Compute Unified Device Architecture (CUDA) interface/API can be used with PyTorch to work with different devices that are available to users.  Users can also transfer data to and from the CPU and GPU.


- **models**: Neural networks are the central functionality for PyTorch.  These are defined in a similar way as we saw for Keras and TensorFlow.  The core methods we need to define for these models are the `__init__` (constructor) method, and the `forward` method (for prediction).


- **predictions, loss, and updates**: We showed you how to make predictions, calculate losses, and make weight updates according to gradient steps.  The next tutorials will show you how these all fit into training.

## 7. Link to PyTorch Exercises
The best way to master PyTorch is to practice!  We've posted some additional exercises that you should work through if you would like to gain more experience using PyTorch.  The exercises are as follows:

- `pytorch-tutorial-image-classifier.ipynb` is a tutorial that walks through how to train an image classifier using PyTorch.  All of the work is done for you, but we highly encourage you to work through it to understand how all the elements of tensors, models, devices, prediction, and training are tied together to train models that do tasks such as image classification.  **This tutorial can be found [here](pytorch-tutorial-image-classifier.ipynb).**


- `pytorch-tutorial-LSTM.ipynb` is a tutorial that walks through how to train a [Long Short-Term Memory (LSTM)](https://en.wikipedia.org/wiki/Long_short-term_memory) neural network, a kind of neural network that is most commonly used for Natural Language Processing applications.  In this tutorial, you will use the LSTM to make an email spam detector.  **This tutorial can be found [here](pytorch-tutorial-LSTM.ipynb).**

## 8. Where do I go next?
If you finish the exercises above, there are also many more exercises, tutorials, and concepts to learn through the [PyTorch website](https://pytorch.org/), and we highly encourage you to look into these as well!  Once you feel comfortable with this library, try to use PyTorch to start solving problems that you think deep learning can help improve!