<p style="align: center;">
    <img align=center src="../img/dls_logo.jpg" width=500 height=500>
</p>

<h1 style="text-align: center;">
    Физтех-Школа Прикладной математики и информатики (ФПМИ) МФТИ
</h1>

---

<h1 style="text-align: center;">
    PyTorch basics: syntax, torch.cuda and torch.autograd
</h1>

<img src="../img/pytorch_logo.png" width=500>

Hi! In this notebook we will cover the basics of the **PyTorch** deep learning framework. 

## Intro

**Frameworks** are the specific code libraries with their own internal structure and pipelines.

There are many deep learning frameworks nowadays. The difference between them is in the internal computation principles. For example, in [**Caffe**](http://caffe.berkeleyvision.org/) and [**Caffe2**](https://caffe2.ai/) you write the code using some "ready blocks" (just like the LEGO :)). In [**TensorFlow**](https://www.tensorflow.org/) and [**Theano**](http://deeplearning.net/software/theano/) you declare the computation graph at first, then compile it and use it for inference/training (`tf.session`). By the way, now TensorFlow (since v1.10) has the [Eager Execution](https://www.tensorflow.org/guide/eager), which can be handy for fast prototyping and debugging. [**Keras**](https://keras.io/) is a very popular and useful DL framework that allows to create networks fast and has many demanding features. 

<img src="../img/pytorch_basics_1.png" width=700>

We will use **PyTorch** bacause it's been actively developed and supported by the community and [Facebook AI Research](https://ai.facebook.com/).

## Installation

You can find detailed instructions on how to install **PyTorch** on the [official PyTorch website](https://pytorch.org/).

## Syntax

In [None]:
import torch

Some facts about **PyTorch**:

* dynamic computation graph

* handy `torch.nn` and `torchvision` modules for fast neural network prototyping

* even faster than **TensorFlow** on some tasks

* allows to use GPU easily

At its core, **PyTorch** provides two main features:

* An $n$-dimensional Tensor, similar to **NumPy** but can run on GPUs

* Automatic differentiation for building and training neural networks

If **PyTorch** was a formula, it would be:  

$$
PyTorch = NumPy + CUDA + Autograd
$$

More about CUDA - [wiki](https://en.wikipedia.org/wiki/CUDA).

Let's see how we can use **PyTorch** to operate with vectors and tensors.

Recall that **a tensor** is a multidimensional vector, e.g.:

`x = np.array([1, 2, 3])` - a vector, i.e. a tensor with $1$ dimension (to be more precise: `(3,)`)

`y = np.array([[1, 2, 3], [4, 5, 6]])` - a matrix, i.e. a tensor with $2$ dimensions (`(2, 3)` in this case)

`z = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
               [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
               [[1, 2, 3], [4, 5, 6], [7, 8, 9]]])` - "a cube" $3 \times 3 \times 3$, i.e. a tensor with $3$ dimensions (`(3, 3, 3)` in this case)

One real example of $3$-dimensional tensor is **an image**, it has $3$ dimensions: `height`, `width` and the `channel depth` ($3$ for color images, $1$ for a greyscale). You can think of it as of parallelepiped consisting of the real numbers.

In **PyTorch** we will use `torch.Tensor` (`FloatTensor`, `IntTensor`, `ByteTensor`) for all the computations.

All tensor types:

In [None]:
# 16 bit, floating point
torch.HalfTensor

In [None]:
# 32 bit, floating point
torch.FloatTensor

In [None]:
# 64 bit, floating point
torch.DoubleTensor

In [None]:
# 16 bit, integer, signed
torch.ShortTensor

In [None]:
# 32 bit, integer, signed
torch.IntTensor

In [None]:
# 64 bit, integer, signed
torch.LongTensor

In [None]:
# 8 bit, integer, signed
torch.CharTensor

In [None]:
# 8 bit, integer, unsigned
torch.ByteTensor

We will use only `torch.FloatTensor` and `torch.IntTensor`. 

Let's begin to do something!

Creating the tensor:

In [None]:
a = torch.FloatTensor([1, 2])

In [None]:
a.shape

In [None]:
b = torch.FloatTensor([[1, 2, 3], [4, 5, 6]])
b

In [None]:
b.shape

In [None]:
x = torch.FloatTensor(2, 3, 4)
x

In [None]:
x = torch.FloatTensor(100)
x

In [None]:
x = torch.IntTensor(45, 57, 14, 2)
x.shape

**Note.** If you create tensor with the following constructor, it will be filled with the "random trash numbers":

In [None]:
x = torch.IntTensor(3, 2, 4)
x

Here is a way to fill a new tensor with zeroes:

In [None]:
x1 = torch.FloatTensor(3, 2, 4).zero_()
x2 = torch.zeros(3, 2, 4)
x3 = torch.zeros_like(x1)

assert torch.allclose(x1, x2) and torch.allclose(x1, x3)

x1

Random distribution initialization:

In [None]:
# normal(0, 1) with the given shape
x = torch.randn((2, 3))
x

In [None]:
# discrete uniform[0, 10]
x.random_(0, 10)
x

In [None]:
# continuous uniform[0, 1]
x.uniform_(0, 1)
x

In [None]:
# normal with given mean and std
x.normal_(mean=0, std=1)

In [None]:
# bernoulli with parameter p
x.bernoulli_(p=0.5)

## NumPy -> PyTorch

Many **NumPy** functions [have their counterparts](https://github.com/torch/torch7/wiki/Torch-for-Numpy-users) in **PyTorch**.

`np.reshape` is similar to `torch.view`:

In [None]:
b, b.stride()

In [None]:
b.view(3, 2), b.view(3, 2).stride()

**Note.** `view` creates a new tensor, the old one remains unchanged.

In [None]:
b.view(-1)

In [None]:
b

In [None]:
b.T.stride(), b.is_contiguous(), b.T.is_contiguous()

In [None]:
# returns view of contiguous tensor
b.reshape(-1)

In [None]:
b

Change tensor type:

In [None]:
a = torch.FloatTensor([1.5, 3.2, -7])

In [None]:
a.type_as(torch.IntTensor())

In [None]:
a.to(torch.int32)

In [None]:
a.type_as(torch.ByteTensor())

In [None]:
a.to(torch.uint8)

**Note.** `type_as` creates a new tensor, the old one remains unchanged.

In [None]:
a

Indexing is just like in `NumPy`:

In [None]:
a = torch.FloatTensor([[100, 20, 35], [15, 163, 534], [52, 90, 66]])
a

In [None]:
a[0, 0]

In [None]:
a[0:2, 1]

**Arithmetic and boolean operations** and their counterparts:

| Operator | Analogue |
|:-:|:-:|
|`+`| `torch.add()` |
|`-`| `torch.sub()` |
|`*`| `torch.mul()` |
|`/`| `torch.div()` |

Addition:

In [None]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [None]:
a + b

In [None]:
a.add(b)

In [None]:
b = -a
b

In [None]:
a + b

Subtraction:

In [None]:
a - b

In [None]:
# copy
a.sub(b)

In [None]:
# inplace
a.sub_(b)

Multiplication (elementwise):

In [None]:
a * b

In [None]:
a.mul(b)

Division (elementwise):

In [None]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [None]:
a / b

In [None]:
a.div(b)

**Note.** All these operations create new tensors, the old tensors remain unchanged.

In [None]:
a

In [None]:
b

Comparison operators:

In [None]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [None]:
a == b

In [None]:
a != b

In [None]:
a < b

In [None]:
a > b

Using boolean mask indexing:

In [None]:
a[a > b]

In [None]:
b[a == b]

Elementwise application of the **universal functions**:

In [None]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])

In [None]:
# torch.sin(a)
a.sin()

In [None]:
# torch.tan(a)
a.tan()

In [None]:
# torch.exp(a)
a.exp()

In [None]:
# torch.log(a)
a.log()

In [None]:
b = -a
b

In [None]:
# torch.abs(b)
b.abs()

Aggregate functions: `sum`, `mean`, `max`, `min`:

In [None]:
# dim parameter is equivalent to axis parameter in NumPy
a.sum(dim=1)

In [None]:
a.mean()

Along the given axis:

In [None]:
a

In [None]:
a.sum(dim=0)

In [None]:
a.sum(1)

In [None]:
a.max()

In [None]:
a.max(0)

In [None]:
a.min()

In [None]:
a.min(0)

**Note.** The second tensor returned by `max` and `min` contains the indices of max/min elements along this axis. E.g. in that case `a.min()` returned `[1., 2., 3.]`, which are the minimum elements along $0$ axis (along columns) and their indices along $0$ axis are `[0, 0, 0]`.

### Matrix operations

Transpose a tensor:

In [None]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
a

In [None]:
# a.T
a.t()

It is not not the inplace operation too:

In [None]:
a

Dot product of vectors:

In [None]:
a = torch.FloatTensor([1, 2, 3, 4, 5, 6])
b = torch.FloatTensor([-1, -2, -4, -6, -8, -10])

In [None]:
a.dot(b)

In [None]:
a.shape, b.shape

In [None]:
a @ b

In [None]:
type(a)

In [None]:
type(b)

In [None]:
type(a @ b)

Matrix product:

In [None]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1, -2, -3], [-10, -20, -30], [100, 200, 300]])

In [None]:
a.mm(b)

In [None]:
a @ b

Original tensors remain unchanged:

In [None]:
a

In [None]:
b

In [None]:
a = torch.FloatTensor([[1, 2, 3], [10, 20, 30], [100, 200, 300]])
b = torch.FloatTensor([[-1], [-10], [100]])

In [None]:
print(a.shape, b.shape)

In [None]:
a @ b

If we unroll the tensor `b` in an array (`torch.view(-1)`) the multiplication would be like with the column:

In [None]:
b

In [None]:
b.view(-1)

In [None]:
a @ b.view(-1)

In [None]:
a.mv(b.view(-1))

For multidimensional tensors multiplication is performed for the last two dimensions:

In [None]:
y = torch.Tensor(2, 3, 4, 5)
z = torch.Tensor(2, 3, 5, 6)
(y @ z).shape

### NumPy to PyTorch conversion

In [None]:
import numpy as np

a = np.random.rand(3, 3)
a

In [None]:
b = torch.from_numpy(a)
b

**Note.** `a` and `b` have the same data storage, so the changes in one tensor will lead to the changes in another:

In [None]:
b -= b
b

In [None]:
a

### PyTorch to NumPy conversion

In [None]:
a = torch.FloatTensor(2, 3, 4)
a

In [None]:
type(a)

In [None]:
x = a.numpy()
x

In [None]:
x.shape

In [None]:
type(x)

In [None]:
x -= x

In [None]:
a

## Simple neuron

Let's write the `forward_pass(X, w)` ($w_0$ is a part of the $w$) for a single neuron (with sigmoid as the activation function) using **PyTorch**:

In [None]:
def forward_pass(X, w):
    return torch.sigmoid(X @ w)

In [None]:
X = torch.FloatTensor([[-5, 5], [2, 3], [1, -1]])
w = torch.FloatTensor([[-0.5], [2.5]])
result = forward_pass(X, w)
print(f'{result = }')

## CUDA

[Wikipedia](https://ru.wikipedia.org/wiki/CUDA)

[CUDA documentation](https://docs.nvidia.com/cuda/)

We can use both CPU (Central Processing Unit) and GPU (Graphical Processing Unit) to make the computations with **PyTorch**. We can switch between them easily, this is one of the most important things in **PyTorch** framework.

In [None]:
x = torch.FloatTensor(1024, 10024).uniform_()
x

In [None]:
x.is_cuda

Place a tensor on GPU (GPU memory is used):

In [None]:
!nvidia-smi

In [None]:
x = x.cuda()

In [None]:
!nvidia-smi

In [None]:
x

In [None]:
x = x.cpu()
!nvidia-smi

torch.cuda.empty_cache()
!nvidia-smi

In [None]:
device = torch.device('cuda:0')
x = x.to(device)
x

Let's multiply two tensors on GPU and then move the result on the CPU:

In [None]:
a = torch.FloatTensor(10000, 10000).uniform_()
b = torch.FloatTensor(10000, 10000).uniform_()
c = a.cuda().mul(b.cuda()).cpu()

In [None]:
c

In [None]:
a

Tensors placed on CPU and tensors placed on GPU are unavailable for each other:

In [None]:
a = torch.FloatTensor(10000, 10000).uniform_().cpu()
b = torch.FloatTensor(10000, 10000).uniform_().cuda()

In [None]:
a + b

Example of working with GPU:

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [None]:
x = torch.FloatTensor(5, 5, 5).uniform_()

# check for CUDA availability (NVIDIA GPU)
if torch.cuda.is_available():
    # get the CUDA device name
    device = torch.device('cuda')          # CUDA-device object
    y = torch.ones_like(x, device=device)  # create a tensor on GPU
    x = x.to(device)                       # or just `.to('cuda')`
    z = x + y
    print(z)
    # you can set the type inside the `.to()` method
    print(z.to('cpu', torch.double))

## AutoGrad

**Chain rule** (a.k.a. **backpropagation in NN**) is used here.

Assume we have $f(w(\theta))$, then the derivative of $f$ with respect to $\theta$ is:

$$
{\frac  {\partial{f}}{\partial{\theta}}}
={\frac  {\partial{f}}{\partial{w}}} {\frac  {\partial{w}}{\partial{\theta}}}
$$


**Note.** In multidimentional case it is described by composition of partial derivatives:

$$
D_\theta(f\circ w) = D_{w(\theta)}(f)\circ D_\theta(w)
$$

Simple example of gradient propagation:

$$
y = \sin \left(x_2^2(x_1 + x_2)\right)
$$

<img src="../img/pytorch_basics_2.jpg" width=700>

The autograd package provides automatic differentiation for all operations on tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

Examples:

In [None]:
dtype = torch.float
# uncomment this to run on GPU
# device = torch.device('cuda:0')
device = 'cpu'

# N is batch size
# D_in is input dimension
# H is hidden dimension
# D_out is output dimension
N, D_in, H, D_out = 64, 3, 3, 10

# create random tensors to hold input and outputs
# setting requires_grad=False indicates that we do not need to compute
# gradients with respect to these Tensors during the backward pass
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Create random Tensors for weights.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

In [None]:
y_pred = (x @ w1).clamp(min=0).matmul(w2)
loss = (y_pred - y).pow(2).sum()
# calculate the gradients
loss.backward()

In [None]:
print((y_pred - y).pow(2).sum())

In [None]:
w1.grad, w2.grad

In [None]:
# can't access to non-leaf grad in AD tree
loss.grad

In [None]:
# make the variable remember grad of loss
y_pred = (x @ w1).clamp(min=0).matmul(w2)
y_pred.retain_grad()

loss = (y_pred - y).pow(2).sum()
loss.retain_grad()

loss.backward()

In [None]:
loss.grad

In [None]:
# doesn't require grad
x.grad

In [None]:
# doesn't require grad
y.grad

**Note.** The gradients are placed into the `grad` field of tensors (variables) on which gradients were calculated. Gradients **aren't placed** in the variable `loss` here!

In [None]:
w1

In [None]:
with torch.no_grad():
    pass

## Further reading

1. Official **PyTorch** tutorials - https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#sphx-glr-beginner-blitz-tensor-tutorial-py

2. arXiv article about the deep learning frameworks comparison - https://arxiv.org/pdf/1511.06435.pdf

3. Useful repo with different tutorials - https://github.com/yunjey/pytorch-tutorial

4. Facebook AI Research (main contributor of **PyTorch**) website - https://ai.facebook.com/tools/