# Introduction to PyTorch
---

Pytorch is a framework for building trainable (automatically differentiable) directed acyclic graphs in dynamic manner (in cotrast with e.g. Tensorflow which builds static dags).   

Pytorch's main building block are tensors (and it's highlevel abstractions e.g. nn layers) and operation upon those tensors. Using Pytorch we can define minimization problems, which can be solved using torch optimization modules.

**Overvoew of torch package**
 - ***torch.nn***  Highl level abstractions useful for designing neural network architectures including various neural network layer types, loss functions and containers for more complex models.
 - ***torch.nn.functional***  Similar as torch.nn, not defined in class manner but functional
 - ***torch.nn.init*** Set of methods used for initialization of torch Tensor.
 - ***torch.optim*** Module with various optimizers for training of neural networks.
 - ***torch.utils.data*** Collection of classes for data manipulation.
 - ***torch.autograd***  Reverse automatic differentiation system which enables automatical computation of the gradients using the chain rule.


In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import torch
import numpy as np

## Pytorch Tensors

### Analogy with Numpy
We can use very similar methods as in Numpy to define and operate with tensors.

In [302]:
np.zeros([3, 3])

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [303]:
torch.zeros([3, 3], dtype=torch.long, device=torch.device('cpu'))

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [304]:
np.random.rand(3, 3)

array([[0.10951122, 0.10335077, 0.23059896],
       [0.37116742, 0.39505034, 0.09514046],
       [0.56634504, 0.05584924, 0.26862313]])

In [305]:
torch.rand(3, 3)

tensor([[0.0503, 0.4258, 0.2193],
        [0.5295, 0.8487, 0.2291],
        [0.3933, 0.9737, 0.6898]])

In [306]:
numpy_tensor = np.array([[1, 2] ,[3, 4]], dtype=np.float)
numpy_tensor

array([[1., 2.],
       [3., 4.]])

In [307]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [308]:
numpy_tensor.shape, torch_tensor.shape

((2, 2), torch.Size([2, 2]))

In [309]:
torch_tensor.numpy()

array([[1., 2.],
       [3., 4.]], dtype=float32)

In [310]:
torch.tensor(numpy_tensor)

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

### Basic operations with tensors

In [311]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [312]:
torch_tensor + torch_tensor

tensor([[2., 4.],
        [6., 8.]])

In [313]:
torch_tensor + 2

tensor([[3., 4.],
        [5., 6.]])

In [314]:
torch_tensor * torch_tensor

tensor([[ 1.,  4.],
        [ 9., 16.]])

In [315]:
torch_tensor.mm(torch_tensor)

tensor([[ 7., 10.],
        [15., 22.]])

In [316]:
torch.nn.init.normal_(torch_tensor)

tensor([[ 1.1162,  0.0655],
        [-0.1135,  0.3766]])

### Work with shape

In [317]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [318]:
torch_tensor.view(-1)

tensor([1., 2., 3., 4.])

In [319]:
torch_tensor[1, :]

tensor([3., 4.])

In [320]:
torch.cat([torch_tensor, torch_tensor], dim=1)

tensor([[1., 2., 1., 2.],
        [3., 4., 3., 4.]])

In [321]:
torch.unsqueeze(torch_tensor, 0)

tensor([[[1., 2.],
         [3., 4.]]])

In [322]:
torch.transpose(torch_tensor, 1, 0)

tensor([[1., 3.],
        [2., 4.]])

### Special tensor properties
 - ***.requires_grad***  Indication that we want to compute gradinet for this tensor. Pytorch will start to track all operations on it.
 - ***.grad*** After calling y.backward(), we have in x.grad gradinet defines as dy/dx
 - ***.grad_fn*** Reference to function that has created the Tensor.

In [323]:
tt = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float, requires_grad=True)
tt_m = tt * tt
print(tt_m)
tt_m = tt_m.mean()
print(tt_m)

tensor([[ 1.,  4.],
        [ 9., 16.]], grad_fn=<MulBackward0>)
tensor(7.5000, grad_fn=<MeanBackward1>)


In [324]:
tt_m.grad_fn

<MeanBackward1 at 0x7f620a5004a8>

In [325]:
tt_m.requires_grad

True

In [326]:
tt.grad

Let's compute gradinet of all torch.Tensor with .require_grad=True with respect to tt_m variable

In [327]:
tt_m.backward()

In [328]:
tt.grad

tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])

This is way how to stop collecting gradinet information

In [329]:
with torch.no_grad():
    print((tt * tt).requires_grad)

False


## Feed forward Neural Network

### Data

In [3]:
input_batch = torch.tensor([[0.20, 0.15],
                            [0.30, 0.20],
                            [0.86, 0.99],
                            [0.91, 0.88]])

label_batch = torch.tensor([[1.],
                            [1.],
                            [-1.],
                            [-1.]])

### Low level approach
Using just torch.Tensor and torch.autograd.

In [4]:
learning_rate = 1e-3
training_iterations = 25000

In [5]:
w1 = torch.randn(2, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w2 = torch.randn(1, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w1, w2

(tensor([[-0.9725],
         [ 0.2950]], requires_grad=True),
 tensor([[-0.0477]], requires_grad=True))

In [7]:
for training_iteration in range(training_iterations):
    prediction = input_batch.mm(w1)
    prediction = torch.tanh(prediction)
    prediction = prediction.mm(w2)
    prediction = torch.tanh(prediction)
    
    loss = (prediction - label_batch).pow(2).sum()
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    loss.backward()
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 4.06564474105835
5000 3.046189308166504
10000 1.847407341003418
15000 0.681189239025116
20000 0.32193928956985474


In [8]:
torch.save({'w1': w1, 'w2': w2}, './ckpt.pth')

In [9]:
state_dict = torch.load('./ckpt.pth')
w1.data = state_dict['w1']
w2.data = state_dict['w2']

### Container approach

In [10]:
learning_rate = 1e-3
training_iterations = 25000

In [11]:
class SimpleNN(torch.nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.w1 = torch.nn.Parameter(torch.randn(2, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu")))
        self.w2 = torch.nn.Parameter(torch.randn(1, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu")))
        
    def forward(self, input_batch):
        prediction = input_batch.mm(self.w1)
        prediction = torch.tanh(prediction)
        prediction = prediction.mm(self.w2)
        prediction = torch.tanh(prediction)
        return prediction

simple_nn = SimpleNN()

In [12]:
list(simple_nn.parameters())

[Parameter containing:
 tensor([[-0.2384],
         [-0.6661]], requires_grad=True), Parameter containing:
 tensor([[-0.5653]], requires_grad=True)]

In [14]:
for training_iteration in range(training_iterations):
    prediction = simple_nn(input_batch)
    
    loss = (prediction - label_batch).pow(2).sum()
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    simple_nn.zero_grad()
    loss.backward()
    with torch.no_grad():
        for p in simple_nn.parameters():
            p -= p.grad * learning_rate


0 0.1391650289297104
5000 0.09896896034479141
10000 0.07473386824131012
15000 0.05894413962960243
20000 0.04804077371954918


### Container approach with torch.nn and  torch.optim

In [15]:
from torch.optim import SGD
from torch.nn import Linear, MSELoss, Tanh

In [16]:
learning_rate = 1e-3
training_iterations = 25000

In [17]:
class SimpleNN(torch.nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.layer_1 = Linear(2, 1)
        self.layer_2 = Linear(1, 1)
        
    def forward(self, input_batch):
        prediction = self.layer_1(input_batch)
        prediction = torch.tanh(prediction)
        prediction = self.layer_2(prediction)
        prediction = torch.tanh(prediction)
        return prediction

simple_nn = SimpleNN()

In [18]:
print("Trainable parameters\n")
print(list(simple_nn.parameters()))

Trainable parameters

[Parameter containing:
tensor([[-0.6391, -0.3267]], requires_grad=True), Parameter containing:
tensor([0.3719], requires_grad=True), Parameter containing:
tensor([[-0.9807]], requires_grad=True), Parameter containing:
tensor([-0.9801], requires_grad=True)]


In [19]:
loss_fce = MSELoss(reduction='sum')

In [20]:
optimizer = SGD(simple_nn.parameters(), lr=learning_rate, momentum=0.9)

In [21]:
for training_iteration in range(training_iterations):
    prediction = simple_nn(input_batch)
    
    loss = loss_fce(prediction, label_batch)
    if training_iteration % 1000 == 0:
        print(training_iteration, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 7.091989994049072
1000 0.011961407959461212
2000 0.0050717005506157875
3000 0.003165489761158824
4000 0.002285524969920516
5000 0.0017821136862039566
6000 0.001457310514524579
7000 0.001230896101333201
8000 0.0010642915731295943
9000 0.0009367091115564108
10000 0.0008359516505151987
11000 0.0007544196560047567
12000 0.0006871313089504838
13000 0.0006306694122031331
14000 0.0005826319102197886
15000 0.0005412806640379131
16000 0.0005053048953413963
17000 0.0004737473791465163
18000 0.0004458475741557777
19000 0.00042099138954654336
20000 0.00039872201159596443
21000 0.0003786351880989969
22000 0.00036047238972969353
23000 0.0003439330612309277
24000 0.00032881434890441597


In [200]:
simple_nn(input_batch)

tensor([[ 0.9908],
        [ 0.9817],
        [-0.9867],
        [-0.9830]], grad_fn=<TanhBackward>)

In [296]:
simple_nn.load_state_dict(simple_nn.state_dict())

### Container approach with torch.nn.Sequential

## Custom layers

In [22]:
class CustomReLU(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

custom_relu = CustomReLU().apply

In [23]:
custom_relu(torch.tensor([-1,0,1]))

tensor([0, 0, 1])