# Exercise 2: Deep learning

CSCS-ICS-DADSi Summer School: Accelerating Data Science with HPC
September 4 - 6, 2017
Swiss National Supercomputing Centre

Mainly inspired from:

http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html

We will use the PyTorch deep learning framework (http://pytorch.org/) in this exercise.

PyTorch is a Python-based reincarnation of the lua-based Torch framework (http://torch.ch/), with features such as dynamic graphs and general-purpose automatic differentiation.

Please visit the PyTorch website for more information and tutorials.

## Part 1: Tensors

We start by importing `torch`.

In [1]:
import torch
from torch import Tensor

Tensors, or multi-dimensional arrays, are the basic unit in PyTorch. Let's create a 5x3 matrix as a 2-dimensional Tensor. This will default to a `FloatTensor`, a tensor of single-precision (32-bits) floating point numbers. The tensor will be initialized with random numbers.

In [2]:
x = Tensor(5,3)
print(x)


-9.3784e-11  4.5623e-41 -9.3784e-11
 4.5623e-41 -4.2671e-10  4.5623e-41
-4.2672e-10  4.5623e-41 -1.0407e-13
 4.5623e-41  8.8364e+16  4.5623e-41
 1.7937e-43  0.0000e+00  1.1351e-43
[torch.FloatTensor of size 5x3]



### Slicing

It is common to have higher-dimensional tensors, to hold all aspects of your data together, such as a 3x200x200 tensor holding the R,G,B channels of an image of 200x200 pixels, or a 100x3x200x200 tensor to hold a batch of 100 such images.

We use slicing operations to view specific subtensors of such a tensor. Note that PyTorch uses zero-based indexing and row-major memory ordering.

In [3]:
# 100 images of 3 channels of size 200x200
x = Tensor(100,3,200,200)
print(x.size())

# Second channel of the first image
y = x[0,1]
print(y.size())

torch.Size([100, 3, 200, 200])
torch.Size([200, 200])


### Numpy conversion

You might have noticed that Tensors are very similar to Numpy's ndarrays. In fact you can easily convert between ndarrays and Tensors.

In [4]:
x_np = x.numpy()
print(x_np.shape)

x = torch.from_numpy(x_np)
print(x.size())

(100, 3, 200, 200)
torch.Size([100, 3, 200, 200])


### A note on GPU usage

PyTorch is frequently used with a CUDA backend on a GPU. One would convert between tensors and CUDA tensors by simply calling

`x.cuda()`

`x.cpu()`

which transfer data to and from a GPU memory.

## Part 2: Automatic differentiation

PyTorch implements reverse-mode automatic differentiation (AD). For using AD features, we need the `Variable` class from  `autograd`.

In [5]:
from torch.autograd import Variable

For automatically evaluating derivatives with AD, we need to wrap tensors within `Variable` instances.

In [6]:
x = Tensor(3) # A vector of length three
x = Variable(x) 
print(x)

Variable containing:
-9.3784e-11
 4.5623e-41
 6.0137e-37
[torch.FloatTensor of size 3]



With `Variable`s, obtaining a gradient of a scalar-valued function is straightforward. Let's use the regular Python language features to define a function.

In [7]:
def f(x):
    y = torch.log(x[0]) + torch.sin(x[1])
    return x[2] * torch.exp(y)

Let's evaluate the gradient of `f(x)` at `x = [2,3,4]`.

In [8]:
x = Variable(Tensor([2,3,4]), requires_grad=True)
y = f(x)
print(y)

y.backward()
print(x.grad)

Variable containing:
 9.2125
[torch.FloatTensor of size 1]

Variable containing:
 4.6063
-9.1203
 2.3031
[torch.FloatTensor of size 3]



Note that we get the function evaluated `f(2,3,4)` as `9.2125` and the gradient `f'(2,3,4)` as `[4.6063, -9.1203, 2.3031]`.

Also note the use of `requires_grad = True` for tagging the `Variable` `x` for being differentiated with respect to. This is for efficiency reasons, allowing PyTorch to ignore the bookkeeping for `Variable`s whose derivatives we don't need.

## Part 3: Neural networks

The previous two parts, namely defining and using `Tensor`s and `Variable`s and being able to obtain derivatives, give us everything needed for constructing and training a neural network.

### Defining a neural network

A very important point to note is that **neural networks are just functions, and there is nothing special about them.**

A neural network is a series of linear algebra operations interleaved with non-linear transformations. Let's create a single feed-forward layer with two neurons connected to four inputs (note that this omits the bias term for simplicity).

In [9]:
def run_layer(weights, inputs):
    outputs = torch.mm(weights , inputs)
    outputs = torch.tanh(outputs)
    return outputs

W = Variable(torch.randn(2,4), requires_grad=True) # A 2x4 weight matrix
x = Variable(Tensor([[1],[2],[3],[4]])) # A column vector of length four

y = run_layer(W, x)
print(y)

Variable containing:
 1.0000
 1.0000
[torch.FloatTensor of size 2x1]



### The loss function

Training this neural network layer would simply mean taking the derivative of a loss function at its output with respect to its trainable weights (the matrix `W` above).

In [10]:
def loss(outputs, targets):
    return torch.norm(outputs - targets)

W = Variable(torch.randn(2,4), requires_grad=True) # A 2x4 weight matrix
x = Variable(Tensor([[1],[2],[3],[4]])) # A column vector of length four

loss = loss(run_layer(W, x), Variable(Tensor([1,1,1])))
print(loss)

loss.backward()
W_grad = W.grad
print(W_grad)

Variable containing:
 3.4641
[torch.FloatTensor of size 1]

Variable containing:
1.00000e-08 *
 -0.3592 -0.7184 -1.0775 -1.4367
 -0.0000 -0.0000 -0.0000 -0.0000
[torch.FloatTensor of size 2x4]



### Optimization

We would then iteratively update the weights in a gradient-based update rule.

In [11]:
learning_rate = 0.001
W = W - learning_rate * W_grad
print(W)

Variable containing:
 0.7038 -0.8773 -0.1036  1.6767
-0.6437 -0.4002 -1.8479 -1.4553
[torch.FloatTensor of size 2x4]



In [12]:
torch.sigmoid(Tensor([1]))


 0.7311
[torch.FloatTensor of size 1]

By taking the gradient of a neural network with respect to its trainable parameters (weights), we can optimize it with gradient descent.

## Using torch.nn modules

In practice, unless you are implementing new neural network architectures, you would use the existing neural network building blocks provided by the `torch.nn` module, which internally work similar to the basic example we covered above.

We will now define a convolutional neural network for image recognition on the CIFAR10 dataset and train and test it, using the regular PyTorch workflow.

The CIFAR10 training set contains 60,000 32x32 images labeled in 10 classes. First we load the CIFAR10. This will automatically download 