# Setup of Colab Environment


Every Colab runs it's own instance on cloud. We need setup workshop enviroment in those steps:  
* Setup GPU instance: Runtime ->  Change runtime type 
* Install workshop package with all requiremetns from git
* Import all packages
* Mount GDrive  

In [0]:
!pip install git+https://github.com/adamoz/colab_image_processing_workshop.git

In [0]:
from google.colab import drive
from google.colab import files
from shutil import rmtree
import os

import numpy as np
import torch
from torch.optim import SGD
from torch.nn import Linear, MSELoss, Tanh

In [0]:
drive.mount('./drive', force_remount=True)

Mounted at ./drive


In [0]:
os.listdir('./drive/My Drive/ml_college_data')

['models']

# Introduction to PyTorch
---



[PyTorch](https://pytorch.org/docs/stable/index.html) is a framework for building trainable (automatically differentiable) directed acyclic graphs in dynamic manner (in cotrast with e.g. Tensorflow which builds static dags).   

PyTorch's main building block are tensors (and it's highlevel abstractions e.g. `torch.nn` layers) and operations upon those tensors. Using PyTorch we can define minimization problems, which can be solved using `torch` optimization modules.

**Overvoew of PyTorch package**
 - `torch.nn`  Highl-level abstractions useful for designing neural network architectures including various neural network layer types, loss functions and containers for more complex models.
 - `torch.nn.functional`  Similar as torch.nn, not defined in class manner but functional.
 - `torch.nn.init` Set of methods used for initialization of torch Tensor.
 - `torch.optim` Module with various optimizers and learning rate schedulers for training of neural networks.
 - `torch.utils.data` Collection of classes for data manipulation.
 - `torch.autograd`  Reverse automatic differentiation system which enables automatical computation of the gradients using the chain rule.

## PyTorch Tensors

### Analogy with Numpy
We can use similar methods as in NumPy to initialze and manipulate with tensors.

In [0]:
np.zeros([3, 3])

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [0]:
torch.zeros([3, 3], dtype=torch.long, device=torch.device('cpu'))

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [0]:
np.random.rand(3, 3)

array([[0.25914973, 0.1988332 , 0.90028694],
       [0.92610521, 0.42813633, 0.09843541],
       [0.07368324, 0.85255441, 0.29087014]])

In [0]:
torch.rand(3, 3)

tensor([[0.0204, 0.0301, 0.0132],
        [0.2504, 0.4262, 0.3025],
        [0.5280, 0.0237, 0.2495]])

In [0]:
numpy_tensor = np.array([[1, 2] ,[3, 4]], dtype=np.float)
numpy_tensor

array([[1., 2.],
       [3., 4.]])

In [0]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [0]:
numpy_tensor.shape

(2, 2)

In [0]:
torch_tensor.shape

torch.Size([2, 2])

In [0]:
torch_tensor.numpy()

array([[1., 2.],
       [3., 4.]], dtype=float32)

In [0]:
torch.tensor(numpy_tensor)

tensor([[1., 2.],
        [3., 4.]], dtype=torch.float64)

### Basic operations with tensors

In [0]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [0]:
torch_tensor + torch_tensor

tensor([[2., 4.],
        [6., 8.]])

In [0]:
torch_tensor + 2

tensor([[3., 4.],
        [5., 6.]])

In [0]:
torch_tensor * torch_tensor

tensor([[ 1.,  4.],
        [ 9., 16.]])

In [0]:
torch_tensor.mm(torch_tensor)

tensor([[ 7., 10.],
        [15., 22.]])

In [0]:
torch.nn.init.normal_(torch_tensor)
torch_tensor

tensor([[-1.2703, -0.8698],
        [-1.6750, -0.3926]])

### Work with shape

In [0]:
torch_tensor = torch.tensor([[1, 2] ,[3, 4]], dtype=torch.float)
torch_tensor

tensor([[1., 2.],
        [3., 4.]])

In [0]:
torch_tensor.view(-1)

tensor([1., 2., 3., 4.])

In [0]:
torch_tensor[1, :]

tensor([3., 4.])

In [0]:
torch.cat([torch_tensor, torch_tensor], dim=1)

tensor([[1., 2., 1., 2.],
        [3., 4., 3., 4.]])

In [0]:
torch.unsqueeze(torch_tensor, 0)

tensor([[[1., 2.],
         [3., 4.]]])

In [0]:
torch.transpose(torch_tensor, 1, 0)

tensor([[1., 3.],
        [2., 4.]])

### Special tensor properties
All those attributes are related to optimizations we can use over tensors.

 - `.requires_grad`  Indication that we want to compute gradinet for this tensor. Pytorch will start to track all operations on it.
 - `.grad` After calling `y.backward()`, we have in `x.grad` (in case it requires_grad) gradinet defined as $\frac{dy}{dx}$.
 - `.grad_fn` Reference to function that has created the Tensor.

In [0]:
x = torch.tensor([[5]], dtype=torch.float, requires_grad=True)
x

tensor([[5.]], requires_grad=True)

In [0]:
x_pow3 =  torch.pow(x, 3)
x_pow3

tensor([[125.]], grad_fn=<PowBackward0>)

In [0]:
x_pow3.grad_fn

<PowBackward0 at 0x7f1d7daa95c0>

In [0]:
x_pow3.requires_grad

True

In [0]:
x_pow3.grad is None

True

Let's compute gradinet of `x_pow3` variable with respect to all `torch.Tensor`s with `.require_grad=True`.
To calculate the gradients, we need to run the `x_pow3.backward()`.  
This will calculate the gradient for `x_po3` with respect to `x`

$$
\frac{\partial x^3}{\partial x} = 3x^2
$$

In [0]:
x_pow3.backward()
x.grad

tensor([[75.]])

This is way how to stop collecting gradinet information

In [0]:
with torch.no_grad():
    print((x * x).requires_grad)

False


---

## Neural Network Definition
PyTorch enables definition of neural networks with several level of abstraction. Let's eplore them..

### Data

In [0]:
input_batch = torch.tensor([[0.20, 0.15],
                            [0.30, 0.20],
                            [0.86, 0.99],
                            [0.91, 0.88]])

label_batch = torch.tensor([[1.],
                            [1.],
                            [-1.],
                            [-1.]])

### Low level approach
Using just `torch.Tensor` and `torch.autograd`.

In [0]:
learning_rate = 1e-3
training_iterations = 55000

In [0]:
# Define trainable parameters.
w1 = torch.randn(2, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w2 = torch.randn(1, 1, dtype=torch.float, requires_grad=True, device=torch.device("cpu"))
w1, w2

(tensor([[ 1.1183],
         [-0.9058]], requires_grad=True),
 tensor([[-0.9243]], requires_grad=True))

In [0]:
##############
# Playground #
##############

In [0]:
# After each iteration, we adjust w1 and w2 parameters.
for training_iteration in range(training_iterations):
    # Here is actual forward pass through simple nn with 2 layers defines by w1 and w2.
    prediction = input_batch.mm(w1)
    prediction = torch.tanh(prediction)
    prediction = prediction.mm(w2)
    prediction = torch.tanh(prediction)
    
    # We can calculate err as mean square error, we need to get single scalar number for optimizer.
    loss = (prediction - label_batch).pow(2).mean()
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    # Here we compute all the gradients of variables
    loss.backward()
    
    # We don't want to collect gradient information for optimization steps.
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        # Clear gradients for next interation, we don't want to cummulate it.
        w1.grad.zero_()
        w2.grad.zero_()

0 0.9989905953407288
5000 0.8485435843467712
10000 0.8325045108795166
15000 0.8153863549232483
20000 0.7936765551567078
25000 0.7641147375106812
30000 0.7215176820755005
35000 0.6587221622467041
40000 0.5715191960334778
45000 0.46837449073791504
50000 0.3680548071861267


In [0]:
# Check predictions.
prediction = input_batch.mm(w1)
prediction = torch.tanh(prediction)
prediction = prediction.mm(w2)
prediction = torch.tanh(prediction)
prediction

tensor([[ 0.1914],
        [ 0.4499],
        [-0.9368],
        [-0.5814]], grad_fn=<TanhBackward>)

In [0]:
torch.save({'w1': w1, 'w2': w2}, './drive/My Drive/ml_college_data/models/ckpt.pth')

In [0]:
state_dict = torch.load('./drive/My Drive/ml_college_data/models/ckpt.pth')
w1.data = state_dict['w1']
w2.data = state_dict['w2']

### Container approach with torch.nn and  torch.optim

In [0]:
learning_rate = 1e-3
training_iterations = 55000

In [0]:
class SimpleNN(torch.nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.layer_1 = Linear(2, 1)
        self.layer_2 = Linear(1, 1)
        
    def forward(self, input_batch):
        prediction = self.layer_1(input_batch)
        prediction = torch.tanh(prediction)
        prediction = self.layer_2(prediction)
        prediction = torch.tanh(prediction)
        return prediction

simple_nn = SimpleNN()

In [0]:
list(simple_nn.named_parameters())

[('layer_1.weight', Parameter containing:
  tensor([[-0.6511, -0.0455]], requires_grad=True)),
 ('layer_1.bias', Parameter containing:
  tensor([-0.3567], requires_grad=True)),
 ('layer_2.weight', Parameter containing:
  tensor([[0.7398]], requires_grad=True)),
 ('layer_2.bias', Parameter containing:
  tensor([0.6367], requires_grad=True))]

In [0]:
loss_fce = MSELoss(reduction='sum')

In [0]:
optimizer = SGD(simple_nn.parameters(), lr=learning_rate, momentum=0.9)
optimizer

SGD (
Parameter Group 0
    dampening: 0
    lr: 0.001
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)

In [0]:
for training_iteration in range(training_iterations):
    prediction = simple_nn(input_batch)
    
    loss = loss_fce(prediction, label_batch)
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 3.396700859069824
5000 0.0016939886845648289
10000 0.0008032257319428027
15000 0.0005221785977482796
20000 0.0003854953101836145
25000 0.00030496128601953387
30000 0.0002519561385270208
35000 0.0002144993923138827
40000 0.00018662482034415007
45000 0.00016508072440046817
50000 0.0001479603088228032


In [0]:
simple_nn(input_batch)

tensor([[ 0.9968],
        [ 0.9928],
        [-0.9946],
        [-0.9935]], grad_fn=<TanhBackward>)

In [0]:
simple_nn.load_state_dict(simple_nn.state_dict())

<All keys matched successfully>

### Container approach with torch.nn.Sequential

In [0]:
learning_rate = 1e-3
training_iterations = 55000

In [0]:
simple_nn_seq = torch.nn.Sequential(
    Linear(2, 1),
    Tanh(),
    Linear(1, 1),
    Tanh()
)

In [0]:
loss_fce = MSELoss(reduction='sum')
optimizer = SGD(simple_nn_seq.parameters(), lr=learning_rate, momentum=0.9)

In [0]:
for training_iteration in range(training_iterations):
    prediction = simple_nn_seq(input_batch)
    
    loss = loss_fce(prediction, label_batch)
    if training_iteration % 5000 == 0:
        print(training_iteration, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

0 4.9542341232299805
5000 0.0017550825141370296
10000 0.0008246406214311719
15000 0.0005343377124518156
20000 0.00039378588553518057
25000 0.00031115204910747707
30000 0.0002568674390204251
35000 0.00021854341321159154
40000 0.00019005290232598782
45000 0.00016804509505163878
50000 0.0001505768159404397


In [0]:
simple_nn_seq(input_batch)

tensor([[ 0.9962],
        [ 0.9929],
        [-0.9952],
        [-0.9930]], grad_fn=<TanhBackward>)

---