# Introduction to PyTorch
---

Pytorch is a framework for building trainable (automatically differentiable) directed acyclic graphs in dynamic manner (in cotrast with e.g. Tensorflow which builds static dags).   

Pytorch's main building block are tensors (and it's highlevel abstractions e.g. nn layers) and operations upon those tensors. Using Pytorch we can define minimization problems, which can be solved using torch optimization modules.

**Overvoew of torch package**
 - `torch.nn`  Highl level abstractions useful for designing neural network architectures including various neural network layer types, loss functions and containers for more complex models.
 - `torch.nn.functional`  Similar as torch.nn, not defined in class manner but functional.
 - `torch.nn.init` Set of methods used for initialization of torch Tensor.
 - `torch.optim` Module with various optimizers for training of neural networks.
 - `torch.utils.data` Collection of classes for data manipulation.
 - `torch.autograd`  Reverse automatic differentiation system which enables automatical computation of the gradients using the chain rule.


In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import torch
import numpy as np

## Pytorch Tensors

### Analogy with Numpy
We can use similar methods as in Numpy to initialze and manipulate with tensors.

In [3]:
np.zeros([3, 3])

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [4]:
torch.zeros([3, 3], dtype=torch.long, device=torch.device('cpu'))

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [5]:
np.random.rand(3, 3)

array([[0.88364104, 0.63975389, 0.06888968],
       [0.04862613, 0.28330183, 0.31709038],
       [0.5280827 , 0.08183833, 0.73918128]])

In [6]:
torch.rand(3, 3)

tensor([[0.3424, 0.4011, 0.6443],
        [0.9133, 0.8513, 0.5996],
        [0.0336, 0.4923, 0.9217]])

In [7]:
numpy_tensor = np.array([[1, 2] ,[3, 4]], dtype=np.float)
numpy_tensor

array([[1., 2.],
       [3., 4.]])

In [8]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [10]:
numpy_tensor.shape

(2, 2)

In [11]:
torch_tensor.shape

torch.Size([2, 2])

In [12]:
torch_tensor.numpy()

array([[1., 2.],
       [3., 4.]], dtype=float32)

In [13]:
torch.tensor(numpy_tensor)

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

### Basic operations with tensors

In [14]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [15]:
torch_tensor + torch_tensor

tensor([[2., 4.],
        [6., 8.]])

In [16]:
torch_tensor + 2

tensor([[3., 4.],
        [5., 6.]])

In [17]:
torch_tensor * torch_tensor

tensor([[ 1.,  4.],
        [ 9., 16.]])

In [18]:
torch_tensor.mm(torch_tensor)

tensor([[ 7., 10.],
        [15., 22.]])

In [19]:
torch.nn.init.normal_(torch_tensor)

tensor([[ 0.5426, -1.6506],
        [-0.1434, -1.2375]])

### Work with shape

In [20]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [21]:
torch_tensor.view(-1)

tensor([1., 2., 3., 4.])

In [22]:
torch_tensor[1, :]

tensor([3., 4.])

In [23]:
torch.cat([torch_tensor, torch_tensor], dim=1)

tensor([[1., 2., 1., 2.],
        [3., 4., 3., 4.]])

In [24]:
torch.unsqueeze(torch_tensor, 0)

tensor([[[1., 2.],
         [3., 4.]]])

In [25]:
torch.transpose(torch_tensor, 1, 0)

tensor([[1., 3.],
        [2., 4.]])

### Special tensor properties
 - `.requires_grad`  Indication that we want to compute gradinet for this tensor. Pytorch will start to track all operations on it.
 - `.grad` After calling y.backward(), we have in x.grad (in case it requires_grad) gradinet defined as $\frac{dy}{dx}$.
 - `.grad_fn` Reference to function that has created the Tensor.

In [44]:
tt = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float, requires_grad=True)
tt_m = tt * tt
print(tt_m)
tt_m = tt_m.mean()
print(tt_m)

tensor([[ 1.,  4.],
        [ 9., 16.]], grad_fn=<MulBackward0>)
tensor(7.5000, grad_fn=<MeanBackward1>)


In [33]:
tt_m.grad_fn

<MeanBackward1 at 0x7f82783cc5f8>

In [34]:
tt_m.requires_grad

True

In [35]:
tt.grad

Let's compute gradinet of `tt_m` variable with respect to all torch.Tensor with `.require_grad=True`.
To calculate the gradients, we need to run the `.backward` on `tt_m`.  
This will calculate the gradient for `tt_m` with respect to `tt`

$$
\frac{\partial tt\_m}{\partial tt_x} = \frac{\partial}{\partial tt_x}\left[\frac{1}{n}\sum_i^n tt_i^2\right] = \frac{2}{n}tt_x
$$

In [36]:
tt_m.backward()
tt.grad

tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])

This is way how to stop collecting gradinet information

In [37]:
with torch.no_grad():
    print((tt * tt).requires_grad)

False


## Feed forward Neural Network

### Data

In [38]:
input_batch = torch.tensor([[0.20, 0.15],
                            [0.30, 0.20],
                            [0.86, 0.99],
                            [0.91, 0.88]])

label_batch = torch.tensor([[1.],
                            [1.],
                            [-1.],
                            [-1.]])

### Low level approach
Using just `torch.Tensor` and `torch.autograd`.

In [41]:
learning_rate = 1e-3
training_iterations = 25000

In [42]:
w1 = torch.randn(2, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w2 = torch.randn(1, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w1, w2

(tensor([[-0.9787],
         [-0.3549]], requires_grad=True),
 tensor([[1.1667]], requires_grad=True))

In [43]:
for training_iteration in range(training_iterations):
    prediction = input_batch.mm(w1)
    prediction = torch.tanh(prediction)
    prediction = prediction.mm(w2)
    prediction = torch.tanh(prediction)
    
    loss = (prediction - label_batch).pow(2).sum()
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    loss.backward()
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 3.678546905517578
5000 2.5342657566070557
10000 1.0456336736679077
15000 0.43044692277908325
20000 0.23462997376918793


In [None]:
torch.save({'w1': w1, 'w2': w2}, './ckpt.pth')

In [None]:
state_dict = torch.load('./ckpt.pth')
w1.data = state_dict['w1']
w2.data = state_dict['w2']

### Container approach
Integrating torch.nn.Module container.

In [None]:
learning_rate = 1e-3
training_iterations = 25000

In [None]:
class SimpleNN(torch.nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.w1 = torch.nn.Parameter(torch.randn(2, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu")))
        self.w2 = torch.nn.Parameter(torch.randn(1, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu")))
        
    def forward(self, input_batch):
        prediction = input_batch.mm(self.w1)
        prediction = torch.tanh(prediction)
        prediction = prediction.mm(self.w2)
        prediction = torch.tanh(prediction)
        return prediction

simple_nn = SimpleNN()

In [None]:
list(simple_nn.parameters())

In [None]:
for training_iteration in range(training_iterations):
    prediction = simple_nn(input_batch)
    
    loss = (prediction - label_batch).pow(2).sum()
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    simple_nn.zero_grad()
    loss.backward()
    with torch.no_grad():
        for p in simple_nn.parameters():
            p -= p.grad * learning_rate


### Container approach with torch.nn and  torch.optim

In [None]:
from torch.optim import SGD
from torch.nn import Linear, MSELoss, Tanh

In [None]:
learning_rate = 1e-3
training_iterations = 25000

In [None]:
class SimpleNN(torch.nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.layer_1 = Linear(2, 1)
        self.layer_2 = Linear(1, 1)
        
    def forward(self, input_batch):
        prediction = self.layer_1(input_batch)
        prediction = torch.tanh(prediction)
        prediction = self.layer_2(prediction)
        prediction = torch.tanh(prediction)
        return prediction

simple_nn = SimpleNN()

In [None]:
print("Trainable parameters\n")
print(list(simple_nn.parameters()))

In [None]:
loss_fce = MSELoss(reduction='sum')

In [None]:
optimizer = SGD(simple_nn.parameters(), lr=learning_rate, momentum=0.9)

In [None]:
for training_iteration in range(training_iterations):
    prediction = simple_nn(input_batch)
    
    loss = loss_fce(prediction, label_batch)
    if training_iteration % 1000 == 0:
        print(training_iteration, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
simple_nn(input_batch)

In [None]:
simple_nn.load_state_dict(simple_nn.state_dict())

### Container approach with torch.nn.Sequential

## Custom layers

In [45]:
class CustomReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

custom_relu = CustomReLU().apply

In [None]:
custom_relu(torch.tensor([-1,0,1]))