## Deep learing with Pytorch

In [1]:
import torch
import numpy as np

## 1. Introduction

### 1. 1 What is Pytorch
PyTorch is a python package that provides two high-level features: Tensor computation (like numpy) with strong GPU acceleration and Deep Neural Networks built on a tape-based autograd system <sup>[1](#myfootnote1)</sup>. These features enable pytorch to act as a replacement for numpy to use the power of GPUs and deep learning research platform that provides maximum flexibility and speed

Unlike Tensorfolow,  PyTorch supports creation of dynamic computation graphs (DCG), whereas Tensorflow use a static computation graph (SCG). For a clear and systematically comparison between PyTorch and TensorFlow you may refer to this [blog post](https://awni.github.io/pytorch-tensorflow/).


### 1.2 Install pytorch
 To install pytorch follow installion procedures available in [pytorch website](http://pytorch.org/). However if your machine support GPU you need first to install NVDIA drivers by following [these instruction](). Pytorch comes with its own runtime cuda libraries, but driver has to be installed on each machine separately.

## 2. Basic Pytorch Operations

### 2.1 Pytorch Tensors

The main building block of the PyTorch is the tensors. So what is tensor? 

**Tensor** is a multi-dimensional matrix containing elements of a single data type. They are very similar to the NumPy array. However, unlike numpy array, pytorch tensor can utilize GPU.

#### 2.1.1 A tensor can be constructed from a Python list or sequence with the **torch.Tensor()** function.

In [2]:
#Create a torch.Tensor object with the given data.  It is a 1D vector
data = [1., 2., 3.]
V = torch.Tensor(data)
print(V)


 1
 2
 3
[torch.FloatTensor of size 3]



In [3]:
# Creates a matrix
data = [[1., 2., 3.], [4., 5., 6]]
M = torch.Tensor(data)
print (M)


 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]



In [4]:
# Create a 3D tensor of size 2x2x2.
data = [[[1.,2.], [3.,4.]],
          [[5.,6.], [7.,8.]]]
T = torch.Tensor(data)
print (T)


(0 ,.,.) = 
  1  2
  3  4

(1 ,.,.) = 
  5  6
  7  8
[torch.FloatTensor of size 2x2x2]



Vectors and matrices are special cases of torch.Tensors, where their dimension is 1 and 2 respectively

#### 2.1.2 You can create a tensor with *random data* and the supplied dimensionality with **torch.randn()**

In [5]:
x = torch.randn(2, 4)
print(x)


-1.9459 -1.2877 -1.5302 -1.6099
 0.5595  2.0083 -0.2192 -0.0422
[torch.FloatTensor of size 2x4]



In [6]:
y = torch.randn(2, 2, 4)
print(y)


(0 ,.,.) = 
 -0.8481 -0.2042  1.2779 -0.6543
  0.5905  0.3888 -1.4197 -0.6554

(1 ,.,.) = 
 -0.3308 -0.2200  1.5656  0.0879
  0.1800  0.3002  1.9382 -0.5665
[torch.FloatTensor of size 2x2x4]



You can also use special tensors line ones and zeros

In [7]:
torch.ones(2, 3)


 1  1  1
 1  1  1
[torch.FloatTensor of size 2x3]

In [8]:
torch.zeros(3, 5)


 0  0  0  0  0
 0  0  0  0  0
 0  0  0  0  0
[torch.FloatTensor of size 3x5]

### 2.2 Numpy Bridge
   
You can easily convernt pytorh tensor into numpy array and viceversa.

In [9]:
import numpy as np

numpy_tensor = np.random.randn(3, 4)
print(numpy_tensor)

[[ 0.38679829 -0.20805581  1.09448074 -3.23681379]
 [ 2.14710092  0.50938828  1.65233662  0.75678135]
 [ 0.58488052 -1.54270815 -1.00415993  0.04386759]]


In [10]:
# convert numpy array to pytorch array
pytorch_tensor = torch.Tensor(numpy_tensor)
print(pytorch_tensor)


 0.3868 -0.2081  1.0945 -3.2368
 2.1471  0.5094  1.6523  0.7568
 0.5849 -1.5427 -1.0042  0.0439
[torch.FloatTensor of size 3x4]



In [11]:
# convert torch tensor to numpy representation
pytorch_tensor.numpy()

array([[ 0.3867983 , -0.20805581,  1.0944808 , -3.2368138 ],
       [ 2.147101  ,  0.50938827,  1.6523366 ,  0.75678134],
       [ 0.58488053, -1.5427082 , -1.0041599 ,  0.04386758]],
      dtype=float32)

### 2.3 Operations with Tensors

You can operate on tensors in the ways you would expect.

In [12]:
x = torch.Tensor([ 1., 2., 3. ])
y = torch.Tensor([ 4., 5., 6. ])
z = x + y
print (z)


 5
 7
 9
[torch.FloatTensor of size 3]



In [13]:
# You can also use
z = torch.add(x, y)
print(z)


 5
 7
 9
[torch.FloatTensor of size 3]



#### Reshaping Tensors

The **.view()** method  provide a function to reshape a tensor. This method receives heavy use, because many neural network components expect their inputs to have a certain shape. Often you will need to reshape before passing your data to the component.


In [14]:
x = torch.randn(1, 3, 4)
x


(0 ,.,.) = 
 -0.4414  0.4427  0.9195  0.7832
 -0.4930  1.3495  0.6200  0.7397
 -0.3259 -0.4232 -0.0343  0.4567
[torch.FloatTensor of size 1x3x4]

In [15]:
# Reshape to 1 rows, 12 columns
x.view(1, 12) 



Columns 0 to 9 
-0.4414  0.4427  0.9195  0.7832 -0.4930  1.3495  0.6200  0.7397 -0.3259 -0.4232

Columns 10 to 11 
-0.0343  0.4567
[torch.FloatTensor of size 1x12]

In [16]:
# Reshape to 1x6x2 
x.view(1, 6, 2) 


(0 ,.,.) = 
 -0.4414  0.4427
  0.9195  0.7832
 -0.4930  1.3495
  0.6200  0.7397
 -0.3259 -0.4232
 -0.0343  0.4567
[torch.FloatTensor of size 1x6x2]

##   Autograd and Variables 
    

**Autograd** provide a mechanism to compute error gradients and back-propagated through the computational graph. The **Variable class** is the main component of this autograd system in PyTorch.

**Variables** are wrappers above tensors and construct a chain of operations between tensors.They are like placeholders in Tensorflow. Variables are useful when building  computational graph, and computing gradients automatically. 

Every variable instance has two attributes: **.data** that contain initial tensor itself and **.grad** that will contain gradients for the corresponding tensor. Unlike TensorFlow’s, PyTorch Variable will have data in it. **Autograd** allows you to automatically compute gradients of tensor variable.

###NOTE:
**Computation graph** is simply a specification of how your data is combined to give you the output. Since the graph totally specifies what parameters were involved with which operations, it contains enough information to compute derivatives. 



For example: if we have $y = wx + b$ it clear that $\frac{\partial y}{\partial x} =w$, $\frac{\partial y}{\partial b} = 1$ and $\frac{\partial y}{\partial w} = x$


To compute the derivatives, you can call **.backward()** on a Variable. If Variable is a scalar (i.e. it holds a one element tensor), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that is a tensor of matching shape.

### Example 1:

In [17]:
from torch.autograd import Variable
# Create tensors.
x = Variable(torch.Tensor([1]), requires_grad=True)
w = Variable(torch.Tensor([2]), requires_grad=True)
b = Variable(torch.Tensor([3]), requires_grad=True)

# Build a computational graph.
y = w * x + b    # y = 2 * x + 3

# Compute gradients.
y.backward()

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 

Variable containing:
 2
[torch.FloatTensor of size 1]

Variable containing:
 1
[torch.FloatTensor of size 1]

Variable containing:
 1
[torch.FloatTensor of size 1]



### Example 2:

In [18]:
# Create tensors variables.
x = Variable(torch.ones(1, 1), requires_grad=True) 

# perform operations
y = x + 2
z = y * y * 3

# find gradient
z.backward()

#print gradient
print(x.grad)

Variable containing:
 18
[torch.FloatTensor of size 1x1]



The gradient of x is equal to 18. This is equivalent to:
$$
z = 3y^2 \text{ where } y = x + 2 \Rightarrow z = 3(x + 2)^2
$$

Thus: $$ \frac{dz}{dx} = 6(x +2) = 6(1+2) = 18$$

## 3 Deep Learning Building Blocks

Deep learning consists of composing linearities with non-linearities modules. The introduction of non-linearities allows for powerful models. Given linear and non-liear module how to define objective function and train deep learninh model in pytorch.

Neural networks can be constructed using the **torch.nn** package.

In [19]:
import torch.nn as nn
import torch.nn.functional as F

### 3.1 Linear function (Affine Maps)

This is the core building block of deep learning defined is a function:
$$ f(x) = \mathbf{wx + b}$$ for a matrix $\mathbf{w} $ and vectors $\mathbf{x,b}$. Linear function is implemented in: torch.nn

**torch.nn.Linear(in_features, out_features, bias=True)**


Note: pytorch maps the rows of the input instead of the columns

In [20]:
lin = nn.Linear(2, 1, bias=True)
data = Variable(torch.rand(10, 2))
print(lin(data))

Variable containing:
-0.9201
-0.8693
-0.8491
-0.6123
-1.1473
-0.9041
-0.7474
-0.9685
-0.6398
-0.7985
[torch.FloatTensor of size 10x1]



### 3.2 Non-Linearities Function (Activation Function)

Most used non-linear functions are: sigmoid, tanh and relu function.

In [21]:
print(F.relu(data))

Variable containing:
 0.3890  0.5418
 0.1263  0.4530
 0.7522  0.4513
 0.1511  0.0656
 0.5568  0.8930
 0.5517  0.5252
 0.0433  0.2650
 0.8734  0.6374
 0.5678  0.1264
 0.2168  0.3501
[torch.FloatTensor of size 10x2]



In [22]:
print(F.sigmoid(data))

Variable containing:
 0.5960  0.6322
 0.5315  0.6114
 0.6797  0.6109
 0.5377  0.5164
 0.6357  0.7095
 0.6345  0.6284
 0.5108  0.5659
 0.7054  0.6542
 0.6383  0.5316
 0.5540  0.5866
[torch.FloatTensor of size 10x2]



## Creating a neural network

To create a neural network in PyTorch, we use **nn.Module** base class with Python class inheritance which allows us to use all of the functionality of the **nn.Module base class**.

In [24]:
class Model(torch.nn.Module):

    def __init__(self, nb_feature, nb_output):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(nb_feature, nb_output)  
        
        
    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        y_pred = self.linear(x)
        return y_pred

- In the class definition, you can see the inheritance of the base class **torch.nn.Module**. 
- Then, in the first line of the class initialization (def __init__(self):) we have the required Python **super() function**, which creates an instance of the base **torch.nn.Module** class. 
- The next line define a linear object defined by **torch.nn.Linear**, with the first argument in the definition being the number of input feature and the next argument being the number of output.
- After that we need to define how data flows through out network. This can be doe using **forward()** method in which we supply the input data x as the primary argument. 

The next step is to create an instance of this network architecture and assign this instance to cuda() method if available. Suppose we have the following data.

In [25]:
# Create tensors.
x = Variable(torch.randn(10, 2))
y = Variable(torch.randn(10, 1))

In [26]:
model = Model(2, 1)
if torch.cuda.is_available():
    model = model.cuda()
    x = x.cuda()
    y = y.cuda()

We can check the instance of our model:

In [27]:
print(model)

Model(
  (linear): Linear(in_features=2, out_features=1)
)


### Training the network
To train this model we need to setup an optimizer and a loss criterion:

In [28]:
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

- In the first line, we create a stochastic gradient descent optimizer, and we specify the learning rate and supply the model parameters using **model.parameters()** method of the base **torch.nn.Module** class that we inherit.
- Next, we set our loss criterion to be the **MSE** loss. For details on different loss function you may refer to [pytorch documentation](http://pytorch.org/docs/master/nn.html#loss-functions)

### In the training process:

- First we run optimizer.zero_grad() – this zeroes / resets all the gradients in the model, so that it is ready to go for the next back propagation pass. In other libraries this is performed implicitly, but in PyTorch you have to remember to do it explicitly.
- Then we we pass the input data into the model **pred = model(x)** – this will call the **forward()** method in our model class.
- After that we get the MSE loss between the output of our network and the target data as **loss = criterion(y_pred, y_data)**.

In [29]:
optimizer.zero_grad()
pred = model(x)
loss = criterion(pred, y)
print('loss: ', loss.data[0])

loss:  12.298934936523438


- Then we runs a back-propagation operation from the loss Variable backwards through the network using **loss.backward()***
- Finaly we tell PyTorch to execute a gradient descent step based on the gradients calculated during the **.backward()** operation using **optimizer.step()**.


In [31]:
loss.backward()
optimizer.step()

## References:

-[Adventures in machine learning](http://adventuresinmachinelearning.com/pytorch-tutorial-deep-learning/)
-[DeepLearningZeroToAll](https://github.com/hunkim/DeepLearningZeroToAll)

<a name="myfootnote1">1</a>: http://pytorch.org/about/