# Learning to use PyTorch

In [1]:
import torch
import numpy as np

## Tensor initialization
Tensors can be initialized in various ways. Take a look at the following examples:

### Directly from data

Tensors can be created directly from data. The data type is automatically inferred.

In [4]:
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
x_data

tensor([[1, 2],
        [3, 4]])

### From a NumPy array

Tensors can be created from NumPy arrays (and vice versa)

In [7]:
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
x_np

tensor([[1, 2],
        [3, 4]])

### From another tensor:

The new tensor retains the properties (shape, datatype) of the argument tensor, unless explicitly overridden. 

In [9]:
x_ones = torch.ones_like(x_data) # retains the properties of x_data
print(f'ones tensor: \n {x_ones} \n')

x_rand = torch.rand_like(x_data, dtype=torch.float)
print(f'random tensor: \n {x_rand} \n')

ones tensor: 
 tensor([[1, 1],
        [1, 1]]) 

random tensor: 
 tensor([[0.8915, 0.5247],
        [0.7910, 0.8178]]) 



### With random or constant values:

`shape` is a tuple of tensor dimensions. In the functions below, it determines the dimensionality of the output tensor. 

In [11]:
shape = (2, 3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f'random tensor: \n {rand_tensor} \n')
print(f'ones tensor: \n {ones_tensor} \n')
print(f'zeros tensor: \n {zeros_tensor} \n')


random tensor: 
 tensor([[0.6257, 0.1930, 0.2026],
        [0.3547, 0.0330, 0.8057]]) 

ones tensor: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]]) 

zeros tensor: 
 tensor([[0., 0., 0.],
        [0., 0., 0.]]) 



## Tensor attributes

Tensor attributes describe their shape, datatype, and the device on which they're stored.

In [12]:
tensor = torch.rand(3, 4)

print(f'shape of tensor: {tensor.shape}')
print(f'datatype of tensor: {tensor.dtype}')
print(f'device tensor is stored on: {tensor.device}')

shape of tensor: torch.Size([3, 4])
datatype of tensor: torch.float32
device tensor is stored on: cpu


## Tensor Operations

Over 100 tensor operations, including transposing, indexing, slicing, mathematical operations, linear algebra, random sampling, and more are comprehensively described [here](https://pytorch.org/docs/stable/torch.html).

Each of them can be run on the GPU (at typically higher speeds than the CPU). 

In [13]:
if torch.cuda.is_available():
    tensor = tensor.to('cuda')
    print(f"device tensor is stored on: {tensor.device}")

### Joining tensors

You can use torch.cat to concatenate a sequence of tensors along a given dimension. See also [torch.stack](https://pytorch.org/docs/stable/generated/torch.stack.html), another tensor joining op that is subtly different from `torch.cat`.

In [14]:
t1 = torch.cat([tensor, tensor, tensor], dim=1)
print(t1)

tensor([[0.1001, 0.1994, 0.4170, 0.4572, 0.1001, 0.1994, 0.4170, 0.4572, 0.1001,
         0.1994, 0.4170, 0.4572],
        [0.5947, 0.0360, 0.5177, 0.8934, 0.5947, 0.0360, 0.5177, 0.8934, 0.5947,
         0.0360, 0.5177, 0.8934],
        [0.6096, 0.3497, 0.1454, 0.8861, 0.6096, 0.3497, 0.1454, 0.8861, 0.6096,
         0.3497, 0.1454, 0.8861]])


In [15]:
t1.shape

torch.Size([3, 12])

### Multiplying tensors

In [16]:
# This computes the element-wise product
print(f'tensor.mul(tensor) \n {tensor.mul(tensor)} \n')
# alternative syntax
print(f'tensor * tensor \n {tensor * tensor}')

tensor.mul(tensor) 
 tensor([[0.0100, 0.0398, 0.1739, 0.2090],
        [0.3537, 0.0013, 0.2681, 0.7981],
        [0.3716, 0.1223, 0.0211, 0.7851]]) 

tensor * tensor 
 tensor([[0.0100, 0.0398, 0.1739, 0.2090],
        [0.3537, 0.0013, 0.2681, 0.7981],
        [0.3716, 0.1223, 0.0211, 0.7851]])


This computes the matrix multiplication between two tensors

In [17]:
print(f'tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n')
# Alternative syntax:
print(f'tensor @ tensor.T \n {tensor @ tensor.T}')

tensor.matmul(tensor.T) 
 tensor([[0.4327, 0.6911, 0.5965],
        [0.6911, 1.4212, 1.2420],
        [0.5965, 1.2420, 1.3001]]) 

tensor @ tensor.T 
 tensor([[0.4327, 0.6911, 0.5965],
        [0.6911, 1.4212, 1.2420],
        [0.5965, 1.2420, 1.3001]])


### In-place operations
Operations that have a _ suffix are in-place. For example: `x.copy_(y)`, `x.t_()` will change `x`.

In [18]:
print(tensor, '\n')
tensor.add_(5)
print(tensor)

tensor([[0.1001, 0.1994, 0.4170, 0.4572],
        [0.5947, 0.0360, 0.5177, 0.8934],
        [0.6096, 0.3497, 0.1454, 0.8861]]) 

tensor([[5.1001, 5.1994, 5.4170, 5.4572],
        [5.5947, 5.0360, 5.5177, 5.8934],
        [5.6096, 5.3497, 5.1454, 5.8861]])


## NumPy array to Tensor

In [19]:
n = np.ones(5)
t = torch.from_numpy(n)

Changes in the NumPy array reflects in the tensor.

In [20]:
np.add(n, 1, out=n)
print(f't: {t}')
print(f'n: {n}') 

t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]


## A gentle introduction to `torch.autograd`

`torch.autograd` is PyTorch's automatic differentiation engine that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train. 

### Background

Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by __parameters__ (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

**Forward Propagation:** In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess. 

**Backward Propagation:** In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients) and optimizing the parameters using gradient descent. 

## Usage in PyTorch

Let's take a look at a single training step. For this example, we load a pretrained ResNet18 model from torchvision. We create a random data tensor to represent a single image with 3 channels. and height & width of 64, and its corresponding `label` initialized to some random values. Label in pretrained models has shape (1, 1000).

In [21]:
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /Users/jarl/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100.0%


In [22]:
prediction = model(data) # forward pass

We use the model's prediction and the corresponding label to calculate the error (loss). The next step is to backpropagate this error through the network. Backpropagation is kicked off when we call `.backward()` on the error tensor. Autograd then calculates and stores the gradients for each model parameter in the parameter's `.grad` attribute.

In [23]:
loss = (prediction - labels).sum()
loss.backward()  # backward pass

In [26]:
print(prediction.shape)
print(labels.shape)
print(loss)

torch.Size([1, 1000])
torch.Size([1, 1000])
tensor(-487.3716, grad_fn=<SumBackward0>)


Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. We register all the parameters of the model in the optimizer.

In [28]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

Finally, we call `.step()` to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in `.grad`.

In [29]:
optim.step()  # gradient descent

### Differentiation in Autograd

Let's take a look at how `autograd` collects gradients. We create two tensors `a` and `b` with `requires_grad=True`. This signals to `autograd` that every operation on them should be tracked. 

In [35]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

We create another tensor `Q`  from `a` and `b`.

$ Q = 3a^3 - b^2 $

In [36]:
Q = 3*a**3 - b**2

Let's assume `a` and `b` to be parameters of an NN, and `Q` to be the error. In NN training we want gradients of the error w.r.t. parameters, i.e.:

$ \frac{\partial Q}{\partial a} = 9a^2 $, $ \frac{\partial Q}{\partial b} = -2b $

When we call `.backward()` on `Q`, autograd calculates these gradients and stores them in the respectives tensors' `.grad` attribute.

We need to explicityly pass a `gradient` argument in `Q.backward()` because it is a vector. `gradient` is a tensor of the same shape as `Q`, and it represents the gradient of Q wrt itself, i.e.

$ \frac{dQ}{dQ} = 1 $

Evidently, we can also aggregate Q into a scalar and call backward implicitly, like `Q.sum().backward()`.

In [37]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

Gradients are now deposited in `a.grad` and `b.grad`

In [38]:
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])


In [50]:
x = torch.ones(1, requires_grad=True)
y = x + 2
z = y * y * 2

z.backward()
print(x.grad)

tensor([12.])
