# An introduction to Pytorch

Pytorch is a platform for deep learning in Python. 

It provides tools for efficiently creating, training, testing and analyzing neural networks:

* Different types of layers (embedding, linear, convolutional, recurrent)
* Activation functions (tanh, relu, sigmoid, etc.)
* Gradient computation
* Optimizer (adam, adagrad, RMSprop, SGD, etc.)
* Implementations speed gains in GPU

## Tensors

Let's start with some basics: tensors are similar to numpy arrays

In [None]:
import numpy as np
import torch

In [None]:
v1 = np.arange(10)
v2 = np.arange(10, 20)

print("v1: %s\n" % v1)
print("v2: %s\n" % v2)
print("Dot product: %d" % v1.dot(v2))

In [None]:
v1 = torch.arange(10)
v2 = torch.arange(10, 20)

print("v1: %s\n" % v1)
print("v2: %s\n" % v2)
print("Dot product: %d" % v1.dot(v2))

#### Setting values manually or randomly:

In [None]:
v3 = np.array([2, 4, 6, 8])
v4 = np.random.random(10)

print("v3: %s\n" % v3)
print("v4: %s\n" % v4)

In [None]:
v3 = torch.tensor([2, 4, 6, 8])
v4 = torch.rand(10)

print("v3: %s\n" % v3)
print("v4: %s\n" % v4)

#### You can also change a value inside the tensor manually

In [None]:
v4[1] = 0.1
print(v4)

#### Accessing values (indexing)

Individual tensor positions are scalars, or 0-dimension tensor:

In [None]:
v1 = torch.arange(10)

In [None]:
print(v1[0])
print(v1[0].shape)

`.item()` returns a Python number:

In [None]:
number = v1[0].item()
print(number)
print(isinstance(number, int))

## Converting

In [None]:
A = torch.eye(3)
A

In [None]:
# torch --> numpy
B = A.numpy()
B

In [None]:
# numpy --> torch
torch.from_numpy(np.eye(3))

## Elementwise operations

In [None]:
v1

In [None]:
v2

In [None]:
v1 + v2

In [None]:
v1 * v2

Some caveats when working with integer values!

In [None]:
v1 / v2 

In [None]:
x = v1.to(torch.float)
y = v2.to(torch.float)
x / y

#### Operations with constants

In [None]:
x

In [None]:
x + 1

In [None]:
x ** 2

#### Matrices

In [None]:
m1 = torch.rand(5, 4)
m2 = torch.rand(4, 5)

print("m1: %s\n" % m1)
print("m2: %s\n" % m2)
print(m1.dot(m2))

Oops... that can be misleading if you are used to numpy. Instead, call `mm`

In [None]:
print(m1.mm(m2))

In [None]:
print(m1 @ m2)

What if I have batched data? It's better to use `.bmm()`! This is a common source of errors.

In [None]:
m1 = torch.rand(2, 5, 4)
m2 = torch.rand(2, 4, 5)

print(m1.bmm(m2))

`@` will work as `.bmm()`!

In [None]:
print(m1 @ m2)

## Broadcasting

Broadcasting means doing some arithmetic operation with tensors of different ranks, as if the smaller one were expanded, or broadcast, to match the larger.

Let's experiment with a matrix (rank 2 tensor) and a vector (rank 1).

In [None]:
m = torch.rand(5, 4)
v = torch.rand(4)

In [None]:
print("m:", m)
print()
print("v:", v)
print()

In [None]:
m_plus_v = m + v
print("m + v:", m_plus_v)

Let's see row by row

In [None]:
print("m[0] = %s\n" % m[0])
print("v = %s\n" % v)

row_sum = m[0] + v
print("m[0] + v = %s\n" % row_sum)
print("(m + v)[0] = %s" % m_plus_v[0])

We can also reshape tensors

In [None]:
v.shape

In [None]:
v

In [None]:
v = v.view(2, 2)
v

In [None]:
v = v.view(4, 1)
v

Note that shape `[4, 1]` is not broadcastable to match `[5, 4]`!

In [None]:
m + v

... but `[1, 4]` is!

In [None]:
v = v.view(1, 4)
m + v

Broadcasting can be tricky sometimes:

In [None]:
u = torch.rand(4, 1)
u + v

In [None]:
u

In [None]:
v

Always take care with tensor shapes! It is a good practice to verify in the interpreter how some expression is evaluated before inserting into your model code.

## Useful Functions

Pytorch (and other libraries) have many functions that operate on tensors. Let's try some of them and plot the results.

In [None]:
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as pl

Create a vector x with values from -10 to 10, and intervals of 0.1.

In [None]:
x = torch.arange(-10, 10, 0.1, dtype=torch.float)

In [None]:
x.shape

The `.numpy()` method converts Pytorch tensors to numpy array. It is necessary to plot with matplotlib.

In [None]:
y = x.sin()
pl.plot(x.numpy(), y.numpy())

Hyperbolic tangent

In [None]:
y = x.tanh()
pl.plot(x.numpy(), y.numpy())

$e^x$ 

In [None]:
y = x.exp()
pl.plot(x.numpy(), y.numpy())

In [None]:
y = torch.log(x)
pl.plot(x.numpy(), y.numpy())

# But what about the GPU?
How do I use the GPU?

If you have a GPU make sure that the right pytorch is installed
(check https://pytorch.org/ for details).

In [None]:
my_device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
my_device

If you have a GPU you should get something like: 
`device(type='cuda', index=0)`

You can move data to the GPU by doing `.to(device)`.

In [None]:
torch.ones(5, device=my_device)

In [None]:
data = torch.eye(3)
data.to(my_device)

Now the computation happens on the GPU.

In [None]:
res = data + data
res

In [None]:
res.device

# Automatic differentiation with `autograd`

Ref:
- https://pytorch.org/docs/stable/autograd.html
- https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html

In [None]:
x = torch.tensor(2.)
x

In [None]:
x = torch.tensor(2., requires_grad=True)
x

In [None]:
print(x.requires_grad)

In [None]:
print(x.grad)

In [None]:
y = x ** 2

print("Grad of x:", x.grad)

In [None]:
y.backward()

print("Grad of y with respect to x:", x.grad)

In [None]:
# What is going to happen here?
x = torch.tensor(2., requires_grad=True)
x.backward()

In [None]:
# Don't record the gradient
# Useful for inference

x = torch.tensor(2.)

with torch.no_grad():
    y = x * x
    print(x.grad)

`nn.Module` and `nn.Parameter` keep track of gradients for you.

In [None]:
# w.x + b
lin = torch.nn.Linear(2, 1, bias=True)
lin.weight

In [None]:
type(lin.weight)

If you still don't believe autograd works, here's something that I think will change your mind --- we're going to compute the derivative of a very complicated function:

$$ y(x) = \sum_x e^{0.001 x^2} + \sin(x^3) \times \log(x)$$

In [None]:
x = torch.arange(1, 10, 0.1, dtype=torch.float, requires_grad=True)

In [None]:
def crazy_func(X):
    return torch.sum(torch.exp(0.001 * X ** 2) + torch.sin(X ** 3) * torch.log(X))

In [None]:
y = crazy_func(x)
y.backward()

In [None]:
pl.plot(x.detach(), x.grad.detach())

In [None]:
x.grad

**Exercise:**

Derive the gradient $$\frac{\partial y}{\partial x}$$ and make a function that computes it. Check that it gives the same as `x.grad`.