In [1]:
import numpy as np
import torch

# PyTorch Basics

We will be using PyTorch in the notebooks for the next several modules. In this notebook, we will go over some of the basic notions and operations in PyTorch. You may find it helpful to check out the [official guide](https://pytorch.org/tutorials/beginner/basics/intro.html) for PyTorch, from which many examples in this notebook are derived.

## Tensors

PyTorch's `tensor`s are similar to NumPy's `ndarray`s that you've been using throughout the first half of this course. If you come from a background in mathematics or physics you might be expecting something with stricter definitions. In PyTorch and many other deep learning frameworks, however, "tensor" is just another name for multi-dimensional array. They can be used to represent data or model parameters.

In [2]:
t = torch.tensor([[1., 2.], [3., 4.]])
print(t)
print(t.dtype)

tensor([[1., 2.],
        [3., 4.]])
torch.float32


You can also convert between NumPy arrays and PyTorch tensors. This is helpful when you loaded your data as NumPy arrays or when you want to visualize PyTorch tensors with libraries that work with NumPy arrays.

In [3]:
d = t.numpy()
print(d)
print(d.dtype)

[[1. 2.]
 [3. 4.]]
float32


In [4]:
t = torch.from_numpy(d)
print(t)
print(t.dtype)

tensor([[1., 2.],
        [3., 4.]])
torch.float32


Tensors can be copied between CPUs and GPUs (if available).

In [5]:
if torch.cuda.is_available():
    t_gpu = t.to('cuda:0')
    print(t_gpu)
    t = t_gpu.to('cpu')
    print(t)
else:
    print('CUDA is not available.')

tensor([[1., 2.],
        [3., 4.]], device='cuda:0')
tensor([[1., 2.],
        [3., 4.]])


## Automatic Differentiation

PyTorch tensors are useful not only because they can be used on GPU, but also because PyTorch has automatic differentiation funtionalities built for them.

The most typical way of training a deep learning model is the following:
- The model is defined and the model parameters are initialized.
- A loss function is define to measure how good the model's predictions are.
- During training, for each batch of training samples, do the following:
    - Forward pass: Training inputs are fed into the network to generate predictions.
    - Calculate the loss given the predictions.
    - Backward pass: Calculate the gradients of the loss with respect to the parameters.
    - Update model parameters based on the gradients.

(We will cover the details in class in the next module. Here, you just need to know that it's essential to calculate these gradients.)

When you define your model parameters and all the computations, PyTorch creates a computational graph and uses its built-in differentitaion engine called `torch.autograd` to calculate gradients for the graph.

In [6]:
x = torch.tensor([[1.2, 1.1, 0.5, 1.6]])  # input
y = torch.tensor([[3.]])  # label

w = torch.randn(4, 1, requires_grad=True)  # layer weight
b = torch.randn(1, requires_grad=True)  # layer bias
z = x @ w + b

loss = (y - z)**2
print(loss)

tensor([[0.0725]], grad_fn=<PowBackward0>)


Gradients can be calculated by calling `loss.backward()`:

In [7]:
loss.backward()
print(w.grad)

tensor([[0.6462],
        [0.5924],
        [0.2693],
        [0.8616]])


We can [derive](http://cs231n.stanford.edu/handouts/linear-backprop.pdf) the gradients by hand and verify that this is indeed what `torch.autograd` returns.

In [8]:
with torch.no_grad():
    w_grad = x.T @ (2 * (z - y))
print(w_grad)
print(torch.allclose(w.grad, w_grad))

tensor([[0.6462],
        [0.5924],
        [0.2693],
        [0.8616]])
True


## Layers and Modules

PyTorch has the [`torch.nn`](https://pytorch.org/docs/stable/nn.html) namespace that provides many commonly used neural net layers so you don't have to write out all the computations in the model and manually manage all the weights.

In [9]:
from torch import nn

layer = nn.Linear(4, 1)
print(layer)

Linear(in_features=4, out_features=1, bias=True)


Here is the same example we saw in the last section, but using the layer instead.

In [10]:
x = torch.tensor([[1.2, 1.1, 0.5, 1.6]])  # input
y = torch.tensor([[3.]])  # label

z = layer(x)

loss = (y - z)**2
print(loss)

layer.zero_grad()
loss.backward()
print(layer._parameters['weight'].grad.T)

tensor([[7.5471]], grad_fn=<PowBackward0>)
tensor([[-6.5933],
        [-6.0438],
        [-2.7472],
        [-8.7910]])


Layers in `torch.nn` are subclasses of `torch.nn.Module`. When building your own networks, you often want multiple layers, and you can define them in this way:

In [11]:
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

model = Model()
print(model)

Model(
  (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(20, 20, kernel_size=(5, 5), stride=(1, 1))
)


Notice that the network itself is also a subclass of `nn.Module`. PyTorch has many functionalities built around the abstraction of modules, such as integration with optimizers and functionalities to save and restore models. Check out more details [here](https://pytorch.org/docs/stable/notes/modules.html) if you are interested.