# Deep Learning with PyTorch

## Tensors
*Tensors* represent
* *multilinear maps* between vector spaces (mathematics)
* generic *n-dimensional arrays* (computer science)

In [1]:
import numpy as np
import torch

# Create an uninitialized 3x2 tensor of 32-bit floats
a = torch.FloatTensor(3, 2)
a

tensor([[1.3556e-19, 3.0097e+29],
        [7.1853e+22, 4.5145e+27],
        [1.8040e+28, 1.5769e-19]])

In [2]:
# Initialize the tensor (in-place) with zeros
a.zero_()

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

There are two types of methods in the PyTorch API:
* Functional ones that return transformed copies have standard names like `some_function()`
* In-place mutating operations will have a trailing underscore in their name, e.g. `some_function_()`

In [3]:
# Create a tensor from a standard collection
torch.FloatTensor([[1, 2, 3], [3, 2, 1]])

tensor([[1., 2., 3.],
        [3., 2., 1.]])

In [4]:
n = np.zeros(shape=(3, 2))
n

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [7]:
# Create a tensor from a numpy ndarray
b = torch.tensor(n)
b

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]], dtype=torch.float64)

In [8]:
# Change the numpy array to 64-bit float
#  - This translates to the tensor
n = np.zeros(shape=(3, 2), dtype=np.float32)
torch.tensor(n)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [9]:
# Alternatively specify a PyTorch dtype
n = np.zeros(shape=(3, 2))
torch.tensor(n, dtype=torch.float32)

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [11]:
a = torch.tensor([1, 2, 3])
a

tensor([1, 2, 3])

In [12]:
# Scalar tensors can be results of some aggregations
s = a.sum()
s

tensor(6)

In [13]:
# There's a convenient method to access the value of a scalar tensor
s.item()

6

In [14]:
torch.tensor(42)

tensor(42)

### Tensor Operations
Each tensor has associated `device` where the computation takes place. The options are
* `cpu` - computation takes place on the CPU
* `cuda` or `cuda:<index>` - computation takes place on the GPU (with a device id `<index>`)

In [18]:
# Determine computation device based on CUDA availability
device = "cuda" if torch.cuda.is_available() else "cpu"

a = torch.FloatTensor([2, 3])

# Move the tensor to GPU (if there's CUDA available)
a = a.to(device)
a

tensor([2., 3.])

In [19]:
a.device

device(type='cpu')

### Tensors and Gradients
Each tensor has following info related to automatic gradient computation:
* `grad` is a property holding computed gradients (tensor of the same shape)
* `is_leaf` is true if the tensor was constructed by a user and false if it's a result of a computation
* `requires_grad` is true if the tensor requires gradients to be computed

In [23]:
# Define some tensors
#  - The first one requires gradients to be computed
v1 = torch.tensor([1.0, 1.0], requires_grad=True)
v2 = torch.tensor([2.0, 2.0])

# Define a computational graph on these tensors
#  - Notice: Result contains a function coputing the gradient.
v_sum = v1 + v2
v_res = (v_sum * 2).sum()
v_res

tensor(12., grad_fn=<SumBackward0>)

In [24]:
v1.is_leaf, v2.is_leaf

(True, True)

In [25]:
v_sum.is_leaf, v_res.is_leaf

(False, False)

In [27]:
v1.requires_grad, v2.requires_grad

(True, False)

In [28]:
v_sum.requires_grad, v_res.requires_grad

(True, True)

In [29]:
# Calculate the gradients of our graph
v_res.backward()

# Show backpropagated gradients in v1
v1.grad

tensor([2., 2.])

In [30]:
# v2 does not require any gradients so there's nothing
v2.grad

## Neural Network Building Blocks

In [32]:
import torch.nn as nn  # noqa

# Construct a 2-to-5 dense layer with an implicit bias
#  - Note: Weights of this layer are randomly initialized.
dense = nn.Linear(2, 5)

# Each PyTorch NN module acts as a callable
inputs = torch.FloatTensor([1, 2])
dense(inputs)

tensor([-1.1789, -0.4192, -0.8343, -1.2979, -0.6996], grad_fn=<AddBackward0>)

Some important methods from PyTorch API:
* `parameters()` returns an iterable of all trainable variables (those that require gradients)
* `zero_grad()` initializes all gradients to zero
* `to(device)` moves the computation to a device
* `state_dict()` exports all weights to a dictionary for model serialization
* `load_state_dict()` oppisite of the previous which imports weights

In [33]:
# Build a sequential model with
#  - Three dense layers with ReLU activations
#  - A dropout layer
#  - And a softmax output over the feature dimension
model = nn.Sequential(
    nn.Linear(2, 5),
    nn.ReLU(),
    nn.Linear(5, 20),
    nn.ReLU(),
    nn.Linear(20, 10),
    nn.ReLU(),
    nn.Dropout(p=0.3),
    nn.Softmax(dim=1),
)

model

Sequential(
  (0): Linear(in_features=2, out_features=5, bias=True)
  (1): ReLU()
  (2): Linear(in_features=5, out_features=20, bias=True)
  (3): ReLU()
  (4): Linear(in_features=20, out_features=10, bias=True)
  (5): ReLU()
  (6): Dropout(p=0.3, inplace=False)
  (7): Softmax(dim=1)
)

In [34]:
# Feed an input tensor through our sequential model
#  - There's single instance in the input batch
model(torch.FloatTensor([[1, 2]]))

tensor([[0.0937, 0.1025, 0.0937, 0.1092, 0.0937, 0.0937, 0.1323, 0.0937, 0.0937,
         0.0937]], grad_fn=<SoftmaxBackward>)

### Custom Layers
Creating custom modules (layers) is as easy as inheriting from `nn.Module` class and implementing the `forward()` method. Every other instance of a module assigned to a field is automatically registered under this module.

Note that the convention is to use the module as a callable - this is because the `Module` class does some extra work in the `__call__` method.

In [35]:
from typing import TypeVar  # noqa

T = TypeVar("T", bound=torch.Tensor)


class MyModule(nn.Module):
    """Custom PyTorch module"""

    def __init__(
        self,
        n_inputs: int,
        n_outputs: int,
        dropout_prob: float = 0.3,
    ) -> None:
        super().__init__()
        # Build a sequential model
        #  - Every field that is a Module is auto-discovered
        self.pipe = nn.Sequential(
            nn.Linear(n_inputs, 5),
            nn.ReLU(),
            nn.Linear(5, 20),
            nn.ReLU(),
            nn.Linear(20, n_outputs),
            nn.Dropout(p=dropout_prob),
            nn.Softmax(dim=1),
        )

    def forward(self, x: T) -> T:
        # We must treat the sub-module as a callable!
        return self.pipe(x)


# Build an instance of this model and show its structure
net = MyModule(n_inputs=2, n_outputs=3)
net

MyModule(
  (pipe): Sequential(
    (0): Linear(in_features=2, out_features=5, bias=True)
    (1): ReLU()
    (2): Linear(in_features=5, out_features=20, bias=True)
    (3): ReLU()
    (4): Linear(in_features=20, out_features=3, bias=True)
    (5): Dropout(p=0.3, inplace=False)
    (6): Softmax(dim=1)
  )
)

In [36]:
# Feed an input batch to the model
net(torch.FloatTensor([[2, 3]]))

tensor([[0.3437, 0.2639, 0.3924]], grad_fn=<SoftmaxBackward>)

### Loss Functions and Optimizers

PyTorch includes standard set of loss functions and of course allows simple implementation of custom ones. Here's a short list of some loss function classes:
* `nn.MSELoss` is the *mean squared error* typically used for regression problems
* `nn.BCELoss` and `BCEWithLogits` are *binary cross-entropy* losses for binary classification problems - the former expects single probability value while the latter raw scores (usually preferable)
* `nn.CrossEntropyLoss` and `nn.NLLLoss` for multi-class classification problems

Similarly there is buch of traditional optimizers such as vanilla `SGD`, `RMSprop`, `Adagrad` or the popular `Adam`. Finally, here's a typical training loop in PyTorch.
```python
# Define model and loss function
model = ...
loss_fn = ...

# Register all trainable parameters in an optimizer
optimizer = optim.Adam(params=model.parameters(), ...)

# Iterate over mini-batches of training data
for X_train, y_trian in iterate_batches(data, batch_size=32)
    
    # Wrap examples and labels into tensors
    X_train = torch.tensor(X_train)
    y_trian = torch.tensor(y_trian)
    
    # Make predictions using the model
    y_pred = model(X_train)
    
    # Compute model's prediction loss
    loss = loss_fn(y_pred, y_train)
    
    # Compute gradients of the loss function w.r.t. all the weights
    #  - The loss is just a computation graph over tensors in the model
    #  - And because model's weights "require gradients" this calculates dL/dw
    loss.backward()
    
    # Perform one gradient descent step using computed gradients
    #  - Note: The optimizer has access to `grad` for registered `params`
    optimizer.step()
    
    # Clear gradients for this step
    #  - Alternatively this can be done as the beginning of a step
    #  - Note: This is a convenience method for calling it on the model
    optimizer.zero_grad()
```

## Monitoring with TensorBoard(X)
Following example shows how to log arbitrary metrics and investigate them with *TensorBoard*. Because TensorBoard expects data in *TensorFlow* format we use `tensorboardX` for easy integration (also because it's a dependecy of PyTorch Ignite that we'll use later).

Example below computes values of few trigonometric functions for varying angles and stores the output in `runs/` director. For later view one can run TensorBoard with
```bash
tensorboard --log-dir runs
```

In [40]:
import math  # noqa

from tensorboardX import SummaryWriter  # noqa

# Define some functions representing our metrics
funcs = {"sin": math.sin, "cos": math.cos, "tan": math.tan}

# Create tesorboardX writer
#  - Note: The default output directory is './runs'
#  - Note 2: By default each call creates new "run"
with SummaryWriter() as writer:

    # Register one metric per function
    for name, f in funcs.items():

        # Evaluate and record f on interval [-360, 360)
        for angle in range(-360, 360):
            val = f(angle * math.pi / 180)
            writer.add_scalar(name, val, angle)

In [41]:
!tree runs

[01;34mruns[00m
├── [01;34mMar10_15-13-12_mpc-xps[00m
│   └── events.out.tfevents.1615385592.mpc-xps
└── [01;34mMar10_15-14-45_mpc-xps[00m
    └── events.out.tfevents.1615385685.mpc-xps

2 directories, 2 files
