# PyTorch for Deep Learning Practitioners

## Overview
The guide is divided into 2 main parts:

1. **Theory**: A brief introduction to PyTorch (concepts + code).
2. **Practices**: We will implement, train and deploy a basic feed-forward neural networks.


## Theory

### What is PyTorch?
It’s a Python-based scientific computing package that can be used for:
1. A replacement for NumPy to use the power of GPUs
2. An open source deep learning platform that provides a seamless path from research prototyping to production deployment.




In [2]:
import torch

### Components of PyTorch

Components of pytorch can be grouped into 3: Low-level API, High-level API and Utilities API.

#### Low-level API
1. Tensors
2. Tensors Operations
3. Autograd

#### High-level API
1. Layers
2. Activations
3. Loss functions
4. Optimizer

#### Utilities API
1. Data
2. Checkpoint


### Tensors
`torch.Tensor` are generalizations of a matrix that can be indexed in more than 2 dimensions.

#### Creating a Tensors
Tensors can be created from Python lists with the `torch.Tensor()` function.

In [3]:
# Vector
vector = [0.0, 1.0, 0.0]
torch.Tensor(vector)

tensor([0., 1., 0.])

In [5]:
# Matrix
matrix = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
torch.Tensor(matrix)

tensor([[1., 0., 0.],
        [0., 1., 0.]])

In [7]:
# 3D Tensor
X = [[[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], [[0.0, 1.0, 0.0], [0.0, 0.0, 0.0]]]
torch.Tensor(X)

tensor([[[1., 0., 0.],
         [0., 1., 0.]],

        [[0., 1., 0.],
         [0., 0., 0.]]])

You can create a tensor with random data and the supplied dimensionality with `torch.randn()`.

In [6]:
X = torch.randn((3, 5, 5))
X

tensor([[[ 0.1371, -0.7129,  0.1852, -1.8040, -1.6251],
         [-0.3541,  0.3962,  2.2637,  0.6944,  0.4783],
         [-0.9223, -0.5015,  1.5656, -1.1779, -1.2732],
         [-0.7556,  0.7183,  1.3599, -0.9272,  2.1511],
         [-1.7306,  0.8332, -2.1668,  0.5663,  0.7291]],

        [[ 0.4086,  0.0951, -1.8476,  1.2116, -0.5195],
         [-0.9379,  0.2132, -2.0691,  0.9345,  0.9150],
         [ 0.2643,  1.6124,  1.1625, -1.4626, -0.5683],
         [-1.4458, -0.7793,  1.9571,  0.7178, -1.4030],
         [ 1.1021, -0.1194, -0.6543, -0.4118,  0.8679]],

        [[ 0.9585,  2.5555,  0.8472, -0.1089,  0.6870],
         [ 1.4873, -2.0203, -0.1194, -1.6814,  1.2057],
         [ 0.8190,  0.1549,  0.2254, -0.0157, -1.6386],
         [-1.3638,  0.4304, -0.4318, -0.9131,  1.2420],
         [-0.5192,  0.2369, -0.1555,  2.2627, -0.5231]]])

You can create a tensor a tensor filled with the scalar value 0 and specified dimension. Useful for bias initialization.

In [19]:
biases = torch.zeros(size=(3, 1, 3))
biases

tensor([[[0., 0., 0.]],

        [[0., 0., 0.]],

        [[0., 0., 0.]]])

We can also specify the tensor data type using `dtype` property:

In [20]:
torch.zeros(size=(3, 1, 3), dtype=torch.int64)

tensor([[[0, 0, 0]],

        [[0, 0, 0]],

        [[0, 0, 0]]])

### Tensors Operations

#### Mathematical operations
You can perform mathematical operations on a tensors.

In [8]:
x = torch.randn(size=(3, 2, 2))
x

tensor([[[ 0.4710, -0.7448],
         [ 0.5936, -0.1468]],

        [[-0.7063, -1.7917],
         [-0.9366, -2.1349]],

        [[-0.2467,  0.6639],
         [-0.0783,  0.4409]]])

In [9]:
y = torch.randn(size=(3, 2, 2))
y

tensor([[[-0.3371,  1.2213],
         [-0.0828, -1.4272]],

        [[ 1.1819,  2.8166],
         [ 0.7510, -0.0315]],

        [[ 1.3905,  0.4487],
         [ 1.3234,  0.9833]]])

In [10]:
z1 = x + y
z1

tensor([[[ 0.1340,  0.4765],
         [ 0.5108, -1.5740]],

        [[ 0.4756,  1.0249],
         [-0.1856, -2.1665]],

        [[ 1.1438,  1.1127],
         [ 1.2451,  1.4242]]])

In [11]:
z2 = x - y
z2

tensor([[[ 0.8081, -1.9661],
         [ 0.6764,  1.2804]],

        [[-1.8881, -4.6083],
         [-1.6876, -2.1034]],

        [[-1.6372,  0.2152],
         [-1.4017, -0.5424]]])

In [12]:
z3 = x * y
z3

tensor([[[-0.1588, -0.9097],
         [-0.0492,  0.2095]],

        [[-0.8347, -5.0466],
         [-0.7034,  0.0673]],

        [[-0.3431,  0.2979],
         [-0.1036,  0.4335]]])

In [13]:
z4 = x / y
z4

tensor([[[-1.3974, -0.6099],
         [-7.1671,  0.1028]],

        [[-0.5976, -0.6361],
         [-1.2471, 67.7289]],

        [[-0.1774,  1.4795],
         [-0.0592,  0.4484]]])

#### Beyond Mathematical operations
You can perform indexing, slicing, joining, mutating operations on a tensors. For example:


In [14]:
# Indexing
x = torch.randn(size=(3,))
x

tensor([ 0.9490,  0.2055, -0.8137])

In [15]:
x[0]

tensor(0.9490)

In [16]:
# Joining (Concatenation)
xx = torch.cat((x, x))
xx

tensor([ 0.9490,  0.2055, -0.8137,  0.9490,  0.2055, -0.8137])

See the [documentation](http://pytorch.org/docs/torch.html) for a complete list of the massive number of operations available to you.

### Autograd

Autograd helps you compute the gradient of a tensor operations automatically. It is very useful for back propagation algorithm. 

For example, suppose:

$$s = \sum_{i} x_{i}w_{i}$$


How to compute gradient of s w.r.t element of w?

$$\frac{\partial s}{\partial w_{i}}$$


Easy. You don’t need to do it manually.

In [17]:
x = torch.randn(size=(10,), requires_grad=True)
x

tensor([-0.3055, -0.4000, -0.7373,  1.1278,  0.3979, -1.5794,  0.7775,  0.7185,
        -0.4478,  2.0922], requires_grad=True)

In [18]:
w = torch.randn(size=(10,), requires_grad=True)
w

tensor([ 0.4985, -0.0072,  2.1056,  1.4728, -0.1571, -1.2620, -0.6132, -0.1135,
        -0.8372, -0.0665], requires_grad=True)

In [19]:
z = x * w
z

tensor([-0.1523,  0.0029, -1.5523,  1.6610, -0.0625,  1.9931, -0.4768, -0.0816,
         0.3749, -0.1392], grad_fn=<ThMulBackward>)

In [20]:
z.grad_fn

<ThMulBackward at 0x10b08b6a0>

In [21]:
s = z.sum()
s

tensor(1.5673, grad_fn=<SumBackward0>)

$$\frac{\partial s}{\partial w_{i}}=\frac{\partial}{\partial w_{i}}\sum_{i} x_{i}w_{i}
=\sum_{i} \frac{\partial x_{i}w_{i}}{\partial w_{i}} = x_{i}$$

In [23]:
s.backward()

In [24]:
w.grad

tensor([-0.3055, -0.4000, -0.7373,  1.1278,  0.3979, -1.5794,  0.7775,  0.7185,
        -0.4478,  2.0922])

In [25]:
w.grad.equal(x)

True

### Layers
PyTorch layers is a python class that represents a neural network layers. It built on top of Tensor, Ops and Autograd.

Available layers:
1. Convolution Layers
2. Pooling Layers
3. Padding Layers
4. Normalization Layers
5. Recurrent layers
6. Linear layers
7. Dropout Layers
8. Embedding Layers


#### Linear
Applies a linear transformation to the incoming tensor.

In [28]:
from torch import nn

lin = nn.Linear(40, 50)

x = torch.randn(size=(10, 40))

output = lin(x)
output.size()

torch.Size([10, 50])

See the [documentation](https://pytorch.org/docs/stable/nn.html) for a other layers.

### Activation Functions

In PyTorch there are already exists a bunch of pre-defined activation functions that we can use.

For example:
1. ReLU (`torch.nn.ReLU`)
2. Sigmoid (`torch.nn.Sigmoid`)
3. Tanh (`torch.nn.Tanh`)
4. Softmax (`torch.nn.Softmax`)

#### Softmax
Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range (0,1) and sum to 1.

$$x_{i} = \frac{e^{x_{i}}}{\sum_{j} e^{x_{j}}}$$


In [31]:
s = nn.Softmax(dim=0)
x = torch.randn(size=(5,))
x

tensor([-0.3067,  0.0207,  1.6923, -1.2514, -1.9483])

In [32]:
y = s(x)
y

tensor([0.0966, 0.1340, 0.7131, 0.0376, 0.0187])

In [33]:
y.sum()

tensor(1.)

### Loss Functions

In PyTorch there are exists pre-defined loss function that we can use.

For example:
1. MSELoss (`torch.nn.MSELoss`)
2. CrossEntropyLoss (`torch.nn.CrossEntropyLoss`)


#### Cross Entropy Loss
It is useful when training a classification problem with N classes. 


In [78]:
loss = nn.CrossEntropyLoss()
y_hat = torch.randn(3, 5, requires_grad=True)
y = torch.empty(3, dtype=torch.long).random_(5)
output = loss(y_hat, y)
output

tensor(2.3084, grad_fn=<NllLossBackward>)

## Practices
We will build a neural network based a linear classifier to predict what species of flower it is. 

Step by Step:
1. Data Preparation
2. Defining a model
3. Training a model
4. Deploying a model

### Data Preparation
In this step, we will convert data to tensor then split the data as training and test set.

In [35]:
from sklearn.datasets import load_iris

from torch.utils.data import Dataset

class IrisDataset(Dataset):
    def __init__(self):
        iris = load_iris()
        self.features, raw_labels = iris.data, iris.target
        self.labels = []
        for i in range(len(raw_labels)):
            if raw_labels[i] == 0:
                self.labels.append([1, 0, 0])
            if raw_labels[i] == 1:
                self.labels.append([0, 1, 0])
            if raw_labels[i] == 2:
                self.labels.append([0, 0, 1])
                
    def __getitem__(self, index):
        feature = torch.Tensor(self.features[index])
        label = torch.Tensor(self.labels[index])
        sample = {"feature": feature, "label": label}
        return sample

    def __len__(self):
        return len(self.features)

In [36]:
dataset = IrisDataset()

In [38]:
from torch.utils import data

In [39]:
len(dataset)

150

In [40]:
data.random_split(dataset, [112, 38])

[<torch.utils.data.dataset.Subset at 0x10b0ba160>,
 <torch.utils.data.dataset.Subset at 0x10b0ba128>]

In [41]:
training_dataset, testing_dataset = data.random_split(dataset, [112, 38])

In [42]:
training_dataset

<torch.utils.data.dataset.Subset at 0x10b0ba390>

### Defining a model
In this step, we will define our model.

In [144]:
class IrisClassifier(nn.Module):
    def __init__(self):
        super(IrisClassifier, self).__init__()
        # Parameters
        self.learning_rate = 0.01
        
        # Define the layers
        self.h1_layer = nn.Linear(4, 10)
        self.tanh = nn.Tanh()
        self.h2_layer = nn.Linear(10, 3)
        self.softmax = nn.Softmax(dim=0)
        
        # Define loss functions
        self.loss = nn.BCELoss()
        
        # Define optimizer
        self.optimizer = torch.optim.SGD(params=self.parameters(), 
                                         lr=self.learning_rate)  
        
    def forward(self, x):
        h = self.h1_layer(x)
        y_hat = self.softmax(h)
        return y_hat
    
    def backward(self, y_hat, y):
        self.optimizer.zero_grad()
        
        # Calculate the loss
        loss = self.loss(y_hat, y)
        
        # Backward
        loss.backward()
        
        # Update parameter
        self.optimizer.step()
        return loss.data.item()
    
    def predict(self, x):
        y_hat = self.forward(x)
        _, predicted = torch.max(y_hat, dim=1)
        return predicted

### Training a model

In [145]:
from torch import autograd

max_epoch = 50000
batch_size = 64
loader = data.DataLoader(dataset=dataset,
                         batch_size=batch_size,
                         shuffle=True)

# Initialize a model
model = IrisClassifier()
# Set model to training mode
model.train()

losses = []
for epoch in range(1, max_epoch + 1):
    batch_losses = []
    for batch in loader:
        x = autograd.Variable(batch["feature"])
        y = autograd.Variable(batch["label"])

        # Forward & Backward
        y_hat = model.forward(x)
        loss = model.backward(y_hat, y)
        batch_losses.append(loss)
    sum_loss = 0
    for loss in batch_losses:
        sum_loss += loss
    losses.append(sum_loss/len(batch_losses))

# Save the model
output_path = "IrisClassifier.pickle"
torch.save(model.state_dict(), output_path)