# Introduction to PyTorch
---

[PyTorch](https://pytorch.org/docs/stable/index.html) is a framework for building trainable (automatically differentiable) directed acyclic graphs in dynamic manner (in cotrast with e.g. Tensorflow which builds static dags).   

PyTorch's main building block are tensors (and it's highlevel abstractions e.g. `torch.nn` layers) and operations upon those tensors. Using PyTorch we can define minimization problems, which can be solved using `torch` optimization modules.

**Overvoew of PyTorch package**
 - `torch.nn`  Highl-level abstractions useful for designing neural network architectures including various neural network layer types, loss functions and containers for more complex models.
 - `torch.nn.functional`  Similar as torch.nn, not defined in class manner but functional.
 - `torch.nn.init` Set of methods used for initialization of torch Tensor.
 - `torch.optim` Module with various optimizers and learning rate schedulers for training of neural networks.
 - `torch.utils.data` Collection of classes for data manipulation.
 - `torch.autograd`  Reverse automatic differentiation system which enables automatical computation of the gradients using the chain rule.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

---

## PyTorch Tensors

### Analogy with Numpy
We can use similar methods as in NumPy to initialze and manipulate with tensors.

In [None]:
import torch
import numpy as np

In [None]:
np.zeros([3, 3])

In [None]:
torch.zeros([3, 3], dtype=torch.long, device=torch.device('cpu'))

In [None]:
np.random.rand(3, 3)

In [None]:
torch.rand(3, 3)

In [None]:
numpy_tensor = np.array([[1, 2] ,[3, 4]], dtype=np.float)
numpy_tensor

In [None]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

In [None]:
numpy_tensor.shape

In [None]:
torch_tensor.shape

In [None]:
torch_tensor.numpy()

In [None]:
torch.tensor(numpy_tensor)

### Basic operations with tensors

In [None]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

In [None]:
torch_tensor + torch_tensor

In [None]:
torch_tensor + 2

In [None]:
torch_tensor * torch_tensor

In [None]:
torch_tensor.mm(torch_tensor)

In [None]:
torch.nn.init.normal_(torch_tensor)
torch_tensor

### Work with shape

In [None]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

In [None]:
torch_tensor.view(-1)

In [None]:
torch_tensor[1, :]

In [None]:
torch.cat([torch_tensor, torch_tensor], dim=1)

In [None]:
torch.unsqueeze(torch_tensor, 0)

In [None]:
torch.transpose(torch_tensor, 1, 0)

### Special tensor properties
All those attributes are related to optimizations we can use over tensors.

 - `.requires_grad`  Indication that we want to compute gradinet for this tensor. Pytorch will start to track all operations on it.
 - `.grad` After calling `y.backward()`, we have in `x.grad` (in case it requires_grad) gradinet defined as $\frac{dy}{dx}$.
 - `.grad_fn` Reference to function that has created the Tensor.

In [None]:
tt = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float, requires_grad=True)
tt

In [None]:
tt_m = tt * tt
tt_m

In [None]:
tt_m = tt_m.mean()
tt_m

In [None]:
tt_m.grad_fn

In [None]:
tt_m.requires_grad

In [None]:
tt.grad is None

Let's compute gradinet of `tt_m` variable with respect to all `torch.Tensor`s with `.require_grad=True`.
To calculate the gradients, we need to run the `tt_m.backward()`.  
This will calculate the gradient for `tt_m` with respect to `tt`

$$
\frac{\partial tt\_m}{\partial tt_x} = \frac{\partial}{\partial tt_x}\left[\frac{1}{n}\sum_i^n tt_i^2\right] = \frac{2}{n}tt_{i=x}
$$

In [None]:
tt_m.backward()
tt.grad

This is way how to stop collecting gradinet information

In [None]:
with torch.no_grad():
    print((tt * tt).requires_grad)

---

## Neural Network Definition
PyTorch enables definition of neural networks with several level of abstraction. Let's eplore them

### Data

In [None]:
input_batch = torch.tensor([[0.20, 0.15],
                            [0.30, 0.20],
                            [0.86, 0.99],
                            [0.91, 0.88]])

label_batch = torch.tensor([[1.],
                            [1.],
                            [-1.],
                            [-1.]])

### Low level approach
Using just `torch.Tensor` and `torch.autograd`.

In [None]:
learning_rate = 1e-3
training_iterations = 55000

In [None]:
# Define trainable parameters.
w1 = torch.randn(2, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w2 = torch.randn(1, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w1, w2

In [None]:
# After each iteration, we adjust w1 and w2 parameters.
for training_iteration in range(training_iterations):
    # Here is actual forward pass through simple nn with 2 layers defines by w1 and w2.
    prediction = input_batch.mm(w1)
    prediction = torch.tanh(prediction)
    prediction = prediction.mm(w2)
    prediction = torch.tanh(prediction)
    
    # We can calculate err as mean square error, we need to get single scalar number for optimizer.
    loss = (prediction - label_batch).pow(2).mean()
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    # Here we compute all the gradients of variables
    loss.backward()
    
    # We don't want to collect gradient information for optimization steps.
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        # Clear gradients for next interation, we don't want to cummulate it.
        w1.grad.zero_()
        w2.grad.zero_()

In [None]:
# Check predictions.
prediction = input_batch.mm(w1)
prediction = torch.tanh(prediction)
prediction = prediction.mm(w2)
prediction = torch.tanh(prediction)
prediction

In [None]:
torch.save({'w1': w1, 'w2': w2}, './models/ckpt.pth')

In [None]:
state_dict = torch.load('./models/ckpt.pth')
w1.data = state_dict['w1']
w2.data = state_dict['w2']

### Container approach
Integrating `torch.nn.Module` container.

In [None]:
learning_rate = 1e-3
training_iterations = 55000

In [None]:
class SimpleNN(torch.nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        # In case we use basic tensors, we need to label them as trainable parameters of this Module.
        self.w1 = torch.nn.Parameter(torch.randn(2, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu")))
        self.w2 = torch.nn.Parameter(torch.randn(1, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu")))
        
    def forward(self, input_batch):
        prediction = input_batch.mm(self.w1)
        prediction = torch.tanh(prediction)
        prediction = prediction.mm(self.w2)
        prediction = torch.tanh(prediction)
        return prediction

simple_nn = SimpleNN()

In [None]:
list(simple_nn.parameters())

In [None]:
for training_iteration in range(training_iterations):
    prediction = simple_nn(input_batch)
    
    loss = (prediction - label_batch).pow(2).mean()
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    simple_nn.zero_grad()
    loss.backward()
    with torch.no_grad():
        for p in simple_nn.parameters():
            p -= p.grad * learning_rate


In [None]:
simple_nn(input_batch)

### Container approach with torch.nn and  torch.optim

In [None]:
from torch.optim import SGD
from torch.nn import Linear, MSELoss, Tanh

In [None]:
learning_rate = 1e-3
training_iterations = 55000

In [None]:
class SimpleNN(torch.nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.layer_1 = Linear(2, 1)
        self.layer_2 = Linear(1, 1)
        
    def forward(self, input_batch):
        prediction = self.layer_1(input_batch)
        prediction = torch.tanh(prediction)
        prediction = self.layer_2(prediction)
        prediction = torch.tanh(prediction)
        return prediction

simple_nn = SimpleNN()

In [None]:
list(simple_nn.parameters())

In [None]:
loss_fce = MSELoss(reduction='sum')

In [None]:
optimizer = SGD(simple_nn.parameters(), lr=learning_rate, momentum=0.9)
optimizer

In [None]:
for training_iteration in range(training_iterations):
    prediction = simple_nn(input_batch)
    
    loss = loss_fce(prediction, label_batch)
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
simple_nn(input_batch)

In [None]:
simple_nn.load_state_dict(simple_nn.state_dict())

### Container approach with torch.nn.Sequential

In [None]:
learning_rate = 1e-3
training_iterations = 55000

In [None]:
simple_nn_seq = torch.nn.Sequential(
    Linear(2, 1),
    Tanh(),
    Linear(1, 1),
    Tanh()
)

In [None]:
loss_fce = MSELoss(reduction='sum')
optimizer = SGD(simple_nn_seq.parameters(), lr=learning_rate, momentum=0.9)

In [None]:
for training_iteration in range(training_iterations):
    prediction = simple_nn_seq(input_batch)
    
    loss = loss_fce(prediction, label_batch)
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
simple_nn_seq(input_batch)

---

## Custom layers

In [None]:
class CustomReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

custom_relu = CustomReLU().apply

In [None]:
custom_relu(torch.tensor([-1,0,1]))