# PyTorch Basics
- automatic derivatives with tensors
- tensors as neural network abstractions: `torch.nn`
- optimizers: `nn.optim`

## Package imports

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# import torchvision

In [2]:
# from pprint import pprint

import matplotlib.pyplot as plt
import numpy as np

# from IPython.core.debugger import set_trace

In [3]:
from numpy.linalg import inv
from numpy.linalg import multi_dot as mdot

# Automatic differentiation with `autograd`
Since `v0.4`, `Tensor` can record gradients directly if you tell it do do so, e.g. `torch.ones(3, requires_grad=True)`.

Ref:
- https://pytorch.org/docs/stable/autograd.html
- https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

We rarely use `torch.autograd` directly.
Pretty much everything is part or `torch.Tensor`.
Simply add `requires_grad=True` to the tensors you want to calculate the gradients for.
`nn.Module` track gradients automatically.

In [4]:
from torch import autograd

In [5]:
x = torch.tensor(2.)
x

tensor(2.)

In [6]:
# requires_grad = True -> tracks all operations on the tensor
x = torch.tensor(2., requires_grad=True) # first show with False, then True
x

tensor(2., requires_grad=True)

In [7]:
print(x.requires_grad)

True


In [8]:
print(x.grad)

None


In [9]:
y = x ** 2

print("Grad of x:", x.grad)

Grad of x: None


In [10]:
# y was created as a result of an operation, so it has a grad_fn attribute.
# grad_fn: references a Function that has created the Tensor
y

tensor(4., grad_fn=<PowBackward0>)

In [11]:
# Let's compute the gradients with backpropagation
# When we finish our computation we can call .backward() and have all the gradients computed automatically.
# The gradient for this tensor will be accumulated into .grad attribute.
# It is the partial derivate of the function w.r.t. the tensor

y = x ** 2
y.backward()

print("Grad of x:", x.grad)

Grad of x: tensor(4.)


In [12]:
# What is going to happen here?
# x = torch.tensor(2.)
# x.backward()

In [13]:
# Don't record the gradient - useful for inference

# Stop a tensor from tracking history:
# For example during our training loop when we want to update our weights
# then this update operation should not be part of the gradient computation
# - x.requires_grad_(False)
# - x.detach()
# - wrap in 'with torch.no_grad():'

# .requires_grad_(...) changes an existing flag in-place.

params = torch.tensor(2., requires_grad=True)

with torch.no_grad():
    y = x * x
    print(x.grad_fn)

None


In [14]:
# Model with non-scalar output:
# If a Tensor is non-scalar (more than 1 elements), we need to specify arguments for backward() 
# specify a gradient argument that is a tensor of matching shape.
# needed for vector-Jacobian product

x = torch.randn(3, requires_grad=True)

In [15]:
x.shape

torch.Size([3])

In [16]:
y = x*2

In [17]:
y

tensor([ 4.0781, -1.0776,  2.0358], grad_fn=<MulBackward0>)

In [18]:
for _ in range(10):
    y = y * 2

In [19]:
y

tensor([ 4175.9912, -1103.4823,  2084.7021], grad_fn=<MulBackward0>)

In [20]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float32)

y.backward(v)

print(x.grad)

tensor([2.0480e+02, 2.0480e+03, 2.0480e-01])


`nn.Module` and `nn.Parameter` keep track of gradients for you.

## `torch.nn`
The neural network modules contains many different layers.

In [21]:
lin = nn.Linear(2, 1, bias=True)
lin.weight

Parameter containing:
tensor([[-0.6702,  0.0883]], requires_grad=True)

In [22]:
type(lin.weight)

torch.nn.parameter.Parameter

In [23]:
isinstance(lin.weight, torch.FloatTensor)

True

In [24]:
lin_reg = nn.Linear(1, 1, bias=True)
lin_reg

Linear(in_features=1, out_features=1, bias=True)

In [25]:
nn.Conv2d

torch.nn.modules.conv.Conv2d

In [26]:
nn.Conv3d

torch.nn.modules.conv.Conv3d

In [27]:
nn.BatchNorm2d

torch.nn.modules.batchnorm.BatchNorm2d

### Activations

In [28]:
nn.ReLU

torch.nn.modules.activation.ReLU

In [29]:
nn.Sigmoid

torch.nn.modules.activation.Sigmoid

### Losses

In [30]:
nn.Softmax

torch.nn.modules.activation.Softmax

In [31]:
nn.CrossEntropyLoss

torch.nn.modules.loss.CrossEntropyLoss

In [32]:
nn.BCELoss

torch.nn.modules.loss.BCELoss

In [33]:
nn.MSELoss

torch.nn.modules.loss.MSELoss

### Functional (stateless) alternatives

In [34]:
F.mse_loss

<function torch.nn.functional.mse_loss>

In [35]:
F.relu

<function torch.nn.functional.relu>

In [36]:
F.relu6

<function torch.nn.functional.relu6>

## `torch.optim`

In [37]:
optim.SGD

torch.optim.sgd.SGD

In [38]:
optim.Adam

torch.optim.adam.Adam

In [39]:
optim.AdamW

torch.optim.adamw.AdamW