## Note:
To run this notebook as slides, download and install RISE from this link: https://github.com/damianavila/RISE

# PyTorch Tutorial

In this presentation, I will introduce PyTorch - a new deep learning framework that is picking up use thanks to its fast computation and convenience.

Then, I will give a PyTorch implementation of text classification using CNN.

Let's get started!

## Tensors

A Tensor is a n-dimensional array, i.e: a 1D Tensor is an array, while a 2D Tensor is a matrix.


Tensors are the fundamental building blocks of PyTorch. PyTorch provides tensors of different types and also support accelerated performance on tensors using GPU.

## Initializing Tensors
As a beginning example, we will see how we can create a PyTorch Tensor from a simple Python list.

In [9]:
import torch

x = torch.LongTensor([[1,2,3],[2,3,4]])
print(x)


 1  2  3
 2  3  4
[torch.LongTensor of size 2x3]



PyTorch also integrates well with `numpy`, so we can pass in a numpy `ndarray` to create a Tensor.

In [4]:
import numpy as np

y = np.ndarray([2,3])
x = torch.Tensor(y)
print(x)
z = x.numpy()
print(z)


 0  0  0
 0  0  0
[torch.FloatTensor of size 2x3]

[[ 0.  0.  0.]
 [ 0.  0.  0.]]


If we have GPU available, we can pass our Tensor to be computed by the GPU using PyTorch's command

In [36]:
if torch.cuda.is_available():
    x = x.cuda()

The library also offers many methods of its own to create new tensors.

In [33]:
a = torch.randn(4,5) # Numbers are taken from a normal distribution with mean=0, var=1
print(a)


 0.5994  1.3048  0.9864 -0.7510 -0.2672
-0.4711  1.3678 -0.2598  1.2567 -0.4186
 0.7078 -0.6801 -0.0614 -1.1212 -0.3434
 0.7537  0.0479 -0.1500  0.1030  0.1707
[torch.FloatTensor of size 4x5]



In [34]:
b = torch.zeros(3,4)
print(b)


 0  0  0  0
 0  0  0  0
 0  0  0  0
[torch.FloatTensor of size 3x4]



## Operations on Tensors
As a framework that puts tensors first, Torch offers a wide range of tensor operations.

Most of Python's basic operations on n-dimensional arrays also work with `torch.Tensor`; there are also operations in PyTorch's library that does the same thing.

In [35]:
x = torch.Tensor([[1,2,3]])
y = torch.Tensor([[2,3,4]])
z = x + y
z = torch.add(x,y) # This is equivalent!
print(z)


 3  5  7
[torch.FloatTensor of size 1x3]



In addition to common math expressions like addition and multiplication, PyTorch also offers a wide variety of other operations on Tensors that make them easy to work with.


As an example, here are three operations that I usually use.

The concatenation operator can concatenate multiple vectors in a specified dimension.

If the dimension is not defined, it is assumed to be 0.

In [5]:
x = torch.Tensor([[1,2,3]])
y = torch.Tensor([[2,3,4]])
z = torch.cat((x,y)) # Concatenate along the 0-dimension, a.k.a: the row
print(z)
t = torch.cat((x,y), dim=1)
print(t)


 1  2  3
 2  3  4
[torch.FloatTensor of size 2x3]


 1  2  3  2  3  4
[torch.FloatTensor of size 1x6]



The `view` operation offers the ability to reshape our Tensor at will.

In [23]:
x = torch.randn(2,2,4)
y = x.view(4,4)
print(x)
print(y)


(0 ,.,.) = 
  0.1394 -1.0874 -0.2238  0.1953
 -0.0727  0.5987 -0.0808  0.2204

(1 ,.,.) = 
 -0.0464  1.4881 -0.8613 -0.6313
  0.9307 -0.7568  0.0781 -0.2241
[torch.FloatTensor of size 2x2x4]


 0.1394 -1.0874 -0.2238  0.1953
-0.0727  0.5987 -0.0808  0.2204
-0.0464  1.4881 -0.8613 -0.6313
 0.9307 -0.7568  0.0781 -0.2241
[torch.FloatTensor of size 4x4]



It can also infer the missing dimension in the resulting Tensor if we pass in -1.

In [24]:
# x has size 2x2x4
z = x.view(-1,8)
print(z)


 0.1394 -1.0874 -0.2238  0.1953 -0.0727  0.5987 -0.0808  0.2204
-0.0464  1.4881 -0.8613 -0.6313  0.9307 -0.7568  0.0781 -0.2241
[torch.FloatTensor of size 2x8]



Finally, the `squeeze` operator will remove all "trivial" dimensions, i.e: dimensions that has size 1.


In [39]:
x = torch.randn(1,2,3) # x size 1x2x3
print(x)
y = x.squeeze() # y will have size 2x3
print(y)


(0 ,.,.) = 
 -1.1949  0.3846 -1.0274
  1.1751  1.3324 -0.2114
[torch.FloatTensor of size 1x2x3]


-1.1949  0.3846 -1.0274
 1.1751  1.3324 -0.2114
[torch.FloatTensor of size 2x3]




By contrast, the `unsqueeze` operator adds a trivial dimension to our Tensor.

In [40]:
# Remember that x has size 1x2x3
z = x.unsqueeze(0) # z will have size 1x1x2x3
print(z)


(0 ,0 ,.,.) = 
 -1.1949  0.3846 -1.0274
  1.1751  1.3324 -0.2114
[torch.FloatTensor of size 1x1x2x3]




These operators can be used when we need to reshape our Tensor to have the right size for certain neural networks.

## Autograd mechanism and automatic differentiation
One of the neat things about PyTorch is that the framework supports automatic differentiation; just define how a quantity is computed and PyTorch will immediately calculate the gradients.

This is done via the `Variable` class in `torch.autograd`.

In [44]:
from torch.autograd import Variable
x = Variable(torch.Tensor([[1,2,3],[2,3,1]]))
y = Variable(torch.Tensor([[2,2,2],[1,1,1]]))
z = x + y
print(z)

Variable containing:
 3  4  5
 3  4  2
[torch.FloatTensor of size 2x3]



Here, notice the fact that compared to `z.data`, which is a Tensor, `z` has an extra line saying `Variable containing`.

In [45]:
print(z)
print(z.data)

Variable containing:
 3  4  5
 3  4  2
[torch.FloatTensor of size 2x3]


 3  4  5
 3  4  2
[torch.FloatTensor of size 2x3]



`z` also knows how it was created. This is useful for automatic differentiation.

In [6]:
print(z.creator)

<torch.autograd._functions.basic_ops.Add object at 0x7fbe08e07e48>


Similar to Tensors, we can also move our Variables to GPU in PyTorch

In [32]:
if torch.cuda.is_available():
    z = z.cuda()

So far, we haven't computed any gradients yet. We can ask PyTorch to calculate gradients for each `Variable` automatically via the `backward()` method.

In [46]:
# backward() only works on scalar, so we need to create one
s = z.sum()
print(s)
# Since we haven't called backward() yet, no gradient is found for x
print(x.grad)

Variable containing:
 21
[torch.FloatTensor of size 1]

None


In [47]:
# Now we call backward()
s.backward()
print(x.grad)

Variable containing:
 1  1  1
 1  1  1
[torch.FloatTensor of size 2x3]



PyTorch accumulates the gradient after each call of `backward()`. Thus, if we call `s.backward()` again, the new gradient will be added to the existing `x.grad`.

In [48]:
# x.grad will now double in value
s.backward()
print(x.grad)
# x.grad will now triple
s.backward()
print(x.grad)

Variable containing:
 2  2  2
 2  2  2
[torch.FloatTensor of size 2x3]

Variable containing:
 3  3  3
 3  3  3
[torch.FloatTensor of size 2x3]



To clear gradients, we need to set them to 0. Here's how to do it.

In [21]:
x.grad.data.zero_()
print(x.grad)

Variable containing:
 0  0  0
 0  0  0
[torch.FloatTensor of size 2x3]



Later, we will see another way to zero the gradients in our `Variable`.

## Building a Neural Network
PyTorch offers many common building blocks for a deep learning architecture such as fully-connected layer, convolution, pooling, embedding, etc.



We can create our own deep learning model using such building blocks in a very flexible and convenient way.

All we need to do is to create our model class that extends `torch.nn.Module` and implements the `__init__()` and `forward()` method. The first method is invoked to read in any parameters and define any layers that we need, and the latter to specify how the forward part of training works.

In [26]:
import torch.nn as nn
# nn.functional contains many common functions in deep learning, i.e: relu, sigmoid,...
import torch.nn.functional as F

# A simple feed forward network with one hidden layer
class NeuralNet(nn.Module):
    def __init__(self, init_dim, hid_dim, out_dim):
        super(NeuralNet, self).__init__()
        self.layer1 = nn.Linear(init_dim, hid_dim)
        self.layer2 = nn.Linear(hid_dim, out_dim)
        
    def forward(self, inputs):
        layer1_outputs = F.relu(self.layer1(inputs))
        layer2_outputs = F.log_softmax(self.layer2(layer1_outputs))
        
        return layer2_outputs

# Training a network

After we have defined our model, time to do some training! In addition to our model, we need to specify how to train it: what kind of loss function to use, and similarly choosing the optimization method, i.e: stochastic gradient descents or more advanced optimizers such as `Adam` or `RMSprop`.

As we shall see below, PyTorch's automatic differentiation makes our training procedure much easier: the backward pass of training is as simple as calling `backward()` on our loss function and `step()` on our optimizer to update the parameters.

In [29]:
import torch.optim as optim

model = NeuralNet(20,10,5)
inputs, labels = Variable(torch.randn(10,20)), Variable(torch.LongTensor(10).zero_())

if torch.cuda.is_available(): # GPU support
    model.cuda() # Will move all model parameters to GPU
    inputs, labels = inputs.cuda(), labels.cuda() # Will move training data to GPU
    
loss_function = nn.NLLLoss() # Negative log likelihood loss
optimizer = optim.SGD(model.parameters(), lr=1e-3) # Stochastic gradient descent
num_epochs = 10

for epoch in range(num_epochs):
    optimizer.zero_grad() # Zero the gradients of all model parameters
    # model.zero_grad() is equivalent
    outputs = model(inputs)
    loss = loss_function(outputs, labels)
    print(loss)
    
    loss.backward()
    optimizer.step()

Variable containing:
 1.2490
[torch.FloatTensor of size 1]

Variable containing:
 1.2477
[torch.FloatTensor of size 1]

Variable containing:
 1.2464
[torch.FloatTensor of size 1]

Variable containing:
 1.2451
[torch.FloatTensor of size 1]

Variable containing:
 1.2438
[torch.FloatTensor of size 1]

Variable containing:
 1.2425
[torch.FloatTensor of size 1]

Variable containing:
 1.2411
[torch.FloatTensor of size 1]

Variable containing:
 1.2398
[torch.FloatTensor of size 1]

Variable containing:
 1.2385
[torch.FloatTensor of size 1]

Variable containing:
 1.2372
[torch.FloatTensor of size 1]



Finally, we are done with the tutorial!

If you don't understand everything yet, don't worry! You can always go to [PyTorch's documentation](http://pytorch.org/docs/master/) or check out one of PyTorch's [many tutorials](http://pytorch.org/tutorials/) on its official site.

In the following section, I will present an application of CNN to text classification, implemented in PyTorch.

# Text Classification with CNN

## Motivation

Convolutional Neural Network (CNN) has long been known as a good feature extractor for images, i.e: success of CNN in classifying ImageNet.


Given the fact that convolutional layers can extract local information from the data, as well as being translation invariant, we can also use it to achieve good results on sentence classification.



In Yoon Kim's 2014 paper, Convolutional Neural Networks for Sentence Classification, he successfully apply CNN to achieve state-of-the-art results in many tasks related to sentence classification.

### Model Architecture
<img src="./CNN_architecture.png" alt="CNN_model" style="width: 550px; display:block; margin:auto; "/>

<sup><sub>__Source:__ Zhang, Y., & Wallace, B. (2015). _A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification._</sup></sub>

## Hyper-parameters tuning

## Results
<img src="./CNN_acc1.png" alt="Accuracy with CNN model"/>

## Future Directions
While the result presented above is promising, this is far from achieving state-of-the-art results and would require more analysis to reach that level.


Here are some of the steps I could take to achieve further results:
* Implement more sophisticated architectures (LSTM)

* Find larger datasets to train

* Integrate into the company's codebase