# Pytorch Tutorial

### 1. Tensors and Dynamic Graphs

- Basic Operations involving ```torch.Tensor```.
- Introduction to the dynamic graphs of ```torch.Autograd```.

Setup torch and some variables

In [None]:
import torch

In [None]:
a = torch.ones(4,5) * 2
b = torch.ones(4,5) * 3

In [None]:
print('Tensor a:')
print(a)
print()
print('Tensor b:')
print(b)

The most important feature of pytorch is its dynamic computational graph, which automatically computes the gradient with respect to tensors.
However not all tensors have their gradients calculated; only those with ```requires_grad = True``` do. Values of these tensors are cached during computation in the background.

```requires_grad``` is ```False``` by default when you create a tensor. This can be provided as an optional argument:
```python
a = torch.ones((4,5), requires_grad=True)
b = torch.ones((4,5), requires_grad=True)
```

In [None]:
print(a.requires_grad)
print(b.requires_grad)

In [None]:
a.requires_grad = True
b.requires_grad = True

In [None]:
gradient_tensor = a + b
print(gradient_tensor.requires_grad)

However, there are indeed situations where you do not want pytorch to keep track of the operations you make. ```with torch.no_grad():``` is for this.

In [None]:
with torch.no_grad():
    no_gradient_tensor = a + b

print(no_gradient_tensor.requires_grad)
print(no_gradient_tensor.data == gradient_tensor.data)

Let us perform simple operations to ```a``` and ```b```. The resulting ```m``` is a scalar value, represented in pytorch as a tensor with one element.

In [None]:
y = a + b
m = torch.mean(y)
print(m)

The following implicitly calls ```m.backward(torch.Tensor([1]))```.

In [None]:
m.backward()

In [None]:
print(f'a.is_leaf={a.is_leaf}')
print('Gradient w.r.t tensor a:')
print(a.grad)
print()
print(f'b.is_leaf={b.is_leaf}')
print('Gradient w.r.t tensor b:')
print(b.grad)
print()
print(f'y.is_leaf={y.is_leaf}')
print('Gradient w.r.t tensor y:')
print(y.grad)

New gradients generated by back propagation are added up.

In [None]:
m.backward()

In [None]:
print(f'a.is_leaf={a.is_leaf}')
print('Gradient w.r.t tensor a:')
print(a.grad)
print()
print(f'b.is_leaf={b.is_leaf}')
print('Gradient w.r.t tensor b:')
print(b.grad)

Reset variables.

In [None]:
a.grad = None
b.grad = None

The best part of dynamic computational graphs is that the graph is recreated from scratch at every iteration. Thus you don't have to encode all possible paths before you launch the training - what you run is what you differentiate.

[Autograd mechanics - Pytorch master documentation](https://pytorch.org/docs/stable/notes/autograd.html)

In [None]:
# torch.randn samples numbers from a standard normal distribution.
if torch.randn(1).item() > 0:
    print('y = torch.sum(a)')
    y = torch.sum(a)
else:
    print('y = torch.sum(b)')
    y = torch.sum(b)

In [None]:
y.backward()

In [None]:
print(f'a.is_leaf={a.is_leaf}')
print('Gradient w.r.t tensor a:')
print(a.grad)
print()
print(f'b.is_leaf={b.is_leaf}')
print('Gradient w.r.t tensor b:')
print(b.grad)