<h1 align="center">Getting started with Torch7</h1>

![](images/torch.jpg)

This lab was created by Alison B Lowndes

The following timer counts down to a five minute warning before the lab instance shuts down.  You should get a pop up at the five minute warning reminding you to save your work!  If you are about to run out of time, please see the [Post-Lab](#Post-Lab-Summary) section for saving this lab to view offline later.

<iframe id="timer" src="timer/timer.html" width="100%" height="120px"></iframe>

---
Before we begin, let's verify [WebSockets](http://en.wikipedia.org/wiki/WebSocket) are working on your system.  To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above.  If all goes well, you should see some output returned below the grey cell.  If not, please consult the [Self-paced Lab Troubleshooting FAQ](https://developer.nvidia.com/self-paced-labs-faq#Troubleshooting) to debug the issue.

In [1]:
print ("The answer should be three: " .. (1+2))

The answer should be three: 3	


Let's execute the cell below to display information about the GPUs running on the server.

In [2]:
os.execute("nvidia-smi")

Thu Feb  4 14:44:10 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.46     Driver Version: 346.46         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |


|   0  GRID K520           On   | 0000:00:03.0     Off |                  N/A |
| N/A   23C    P8    17W / 125W |     10MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+


true	exit	0	


The below cell will use Torch to give you similar information about the GPU in the system.

In [None]:
require 'cutorch'
print(  cutorch.getDeviceProperties(cutorch.getDevice()) )

##Please ensure you only execute one cell at a time.

## Introduction

The goal of this hands-on lab is to allow you to quickly understand Torch and its neural networks package in order to train a neural network on a GPU. If you haven’t done the other classes in the Introduction to Deep Learning course, it may be more efficient to go through the other classes first then come back and try this one. All classes and material are available at https://developer.nvidia.com/deep-learning-courses

As you execute cells below, you will know the lab is processing when you see a solid (filled) circle in the top-right of the page.
Otherwise, when it is idle, you will see the following: ![](images/iTorch.jpg)
If a cell is stalled, you can stop it with the stop button in the toolbar.
For troubleshooting, please see [Self-paced Lab Troubleshooting FAQ](https://developer.nvidia.com/self-paced-labs-faq#Troubleshooting) to debug the issue.

## What is Torch?

![](images/torch.jpg)

Torch core features include:

* a powerful N-dimensional array or Tensor
* lots of routines for indexing, slicing, transposing, ...
* amazing interface to C, via LuaJIT
* linear algebra routines
* neural network, and energy-based models
* numeric optimization routines
* fast and efficient GPU support
* fully embeddable, with ports to iOS, Android and FPGA backends

**Why Torch?**
The goal of Torch is for maximum flexibility and speed in building your scientific algorithms while making the process extremely simple. Torch comes with a large ecosystem of community-driven packages in machine learning, computer vision, signal processing, parallel processing, image, video, audio and networking among others, and builds on top of the Lua community.

At the heart of Torch are the popular neural network and optimization libraries which are simple to use, while having maximum flexibility in implementing complex neural network topologies. You can build arbitrary graphs of neural networks, and parallelize them over CPUs and GPUs in an efficient manner.

See http://torch.ch and the Cheatsheet here https://github.com/torch/torch7/wiki/Cheatsheet

Torch started around 2000 with Facebook’s Ronan Collobert the main developer. Torch 7 is the current version, the 4th (using odd numbers only 1,3,5,7) aimed at web-scale learning in speech, image and video applications. Torch is used exclusively for research and prototyping for unsupervised and supervised learning, reinforcement learning, etc. Facebook, especially, spends a great deal of time improving parallelism for multi-GPU (model, data, DAG) and overlapping to improve host-device comms, as well as development in kernel speed for convolutions etc. Torch uses **automatic differentiation**. This is not numerical differentiation but a technique to take exact derivatives without needing symbolic differentation.

**Maintainers**: 
* Ronan Collobert, Research Scientist @ Facebook
* Clement Farabet, Senior Software Engineer @ Twitter
* Koray Kavukcuoglu, Research Scientist @ Google DeepMind
* Soumith Chintala, Research Engineer @ Facebook

Torch is already used in many companies and research labs including:

* Facebook AI Research
* Google + Deepmind
* Twitter
* CILVR @ NYU
* Idiap Research Institute
* e-Lab @ Purdue
* Element Inc
* WhetLab


The Torch core consists of the following packages:
* torch : tensors, class factory, serialization, BLAS 
* nn : neural network Modules and Criterions
* optim : SGD, LBFGS and other optimization functions 
* gnuplot : ploting and data visualization 
* paths : make directories, concatenate file paths, and other filesystem utilities
* image : save, load, crop, scale, warp, translate images and such 
* trepl : the torch LuaJIT interpreter 
* cwrap : used for wrapping C/CUDA functions in Lua 

[LBFGS = Limited memory Broyden–Fletcher–Goldfarb–Shanno - an iterative method for solving unconstrained nonlinear optimization problems, approx. Newton’s method].


## Lua (JIT) 

Lua is a powerful, fast, lightweight, embeddable scripting language combining simple procedural syntax with powerful data description constructs based on associative arrays and extensible semantics. Lua is dynamically typed, runs by interpreting bytecode for a register-based virtual machine, and has automatic memory management with incremental garbage collection, making it ideal for configuration, scripting, and rapid prototyping. The language is maintained by a team at PUC-Rio, the Pontifical Catholic University of Rio de Janeiro in Brazil. Lua was born and raised in Tecgraf, formerly the Computer Graphics Technology Group of PUC-Rio. Lua is now housed at LabLua, a laboratory of the Department of Computer Science of PUC-Rio.

###Just In Time compilation
Using LuaJIT allows for complex applications to be compiled and optimized but also embedded into any environment (iPhone, video games, web backends). The complete Torch framework runs on iPhone, with no mods to scripts. 

Torch’s universal data structure, the table, can be used as an array, a dictionary, hash table, class, struct, object, list. Torch 7 extends the table with a Tensor object, an n-dim array type.

For training neural nets, autoencoders, linear regression, CNN's, RNN’s etc its all about gradients and loss functions. Torch’s nn package provides it all. 

Recasting a pre-defined model as a CUDA model for use on GPU’s is as simple as: model:cuda()

##LuaRocks
Lua itself comes with a very handy manager: luarocks. 
Different demos/tutorials rely on different 3rd-party packages. If a demo crashes because it can't find a package then simply try to install it using luarocks, eg:

```
$ luarocks install image    (an image library for Torch7)
$ luarocks install nnx      (lots of extra neural-net modules)
```

There are many many packages (or rocks) managed by LuaRocks.


## Before We Start

In order to deep learn we need data.  For the first part of this lab, w'll use the CIFAR10 dataset which is large enough to be useful, but small enough to train in a reasonable amount of time.


### Tensors

The Tensor class is the most important class in Torch. Almost every package depends on this class - for handling numeric data. A Tensor is a serializable, potentially multi-dimensional matrix. The number of dimensions is unlimited that can be created using LongStorage.

####Internal data representation
The actual data of a Tensor is contained in a Storage. 
'Storages' are how Lua accesses memory of a C pointer or array. 
Storages can also map the contents of a file to memory. 
A Storage is an array of basic C types. 

Several types of Tensor exists:

* ByteTensor -- contains unsigned chars
* CharTensor -- contains signed chars
* ShortTensor -- contains shorts
* IntTensor -- contains ints
* FloatTensor -- contains floats
* DoubleTensor -- contains doubles

Several Storage classes for all the basic C types exist and have the following self-explanatory names: ByteStorage, CharStorage, ShortStorage, IntStorage, LongStorage, FloatStorage, DoubleStorage. ByteStorage and CharStorage represent both arrays of bytes. 
ByteStorage represents an array of unsigned chars, while CharStorage represents an array of signed chars.

**One could say that a Tensor is a particular way of viewing a Storage: a Storage only represents a chunk of memory, while the Tensor interprets this chunk of memory as having dimensions.**

Let's work through some basic Torch syntax.  Execute the cells below in order and make sure you understand the output.


In [None]:
a = torch.Tensor(5,3) -- construct a 5x3 matrix, uninitialized

In [None]:
a = torch.rand(5,3) -- construct a 5x3 matrix with randomized data
print(a)

In [None]:
b=torch.rand(3,4)

In [None]:
-- matrix-matrix multiplication: syntax 1
a*b 

In [None]:
-- matrix-matrix multiplication: syntax 2
torch.mm(a,b) 

In [None]:
-- matrix-matrix multiplication: syntax 3
c=torch.Tensor(5,4)
c:mm(a,b) -- store the result of a*b in c
print(c)

###CUDA Tensors

Tensors can be moved onto GPU using the :cuda function

In [None]:
require 'cutorch';
a = a:cuda()
b = b:cuda()
c = c:cuda()
c:mm(a,b) -- done on GPU
print(c)

To run neural networks on GPUs we use cunn: **not to be confused with cudnn**
The nn module provides modules which each contain their state, and these modules expect CudaTensors as inputs. 
To use Cuda-based nn modules, you will need to import cunn:

In [None]:
require 'cunn';

## CUDA

To use GPU's with torch you call $ require "cutorch" on a CUDA-capable machine. 
Here's an explanation of the packages needed for using Torch with GPUs:

* cutorch - Torch CUDA Implementation
* cunn - Torch CUDA Neural Network Implementation
* cunnx - Experimental CUDA NN implementations
* cudnn - NVIDIA CuDNN Bindings


## Neural Networks
Neural networks in Torch can be constructed using the nn package.

Modules are the bricks used to build neural networks. Each are themselves neural networks, but can be combined with other networks using containers to create more complex neural networks.

For example, LeNet, is a network that classfies digit images, a simple feed-forward network.

![](images/lenet.jpg)

It takes the input, feeds it through several layers one after the other, and then finally gives the output.
Such a network container is `nn.Sequential` which feeds the input through several layers.

Let's use Torch to create a LeNet network in the cell below.

In [None]:
net = nn.Sequential()
net:add(nn.SpatialConvolution(1, 6, 5, 5)) -- 1 input image channel, 6 output channels, 5x5 convolution kernel
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120))             -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.Linear(120, 84))
net:add(nn.Linear(84, 10))                   -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())                     -- converts the output to a log-probability. Useful for classification problems

print('Lenet5\n' .. net:__tostring());

Every neural network module in Torch has [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). It has a :forward(input) function that computes the output for a given input, flowing the input through the network. and it has a :backward(input, gradient) function that will differentiate each neuron in the network w.r.t. the gradient that is passed in. This is done via the [chain rule](https://en.wikipedia.org/wiki/Chain_rule).

Let's next use the `:forward` and `:backward` functions with the LeNet we just created on some random input data.  Excute the cells below and see if you can understand what is happening.

In [None]:
input = torch.rand(1,32,32) -- pass a random tensor as input to the network

In [None]:
output = net:forward(input)

In [None]:
print(output)

In [None]:
net:zeroGradParameters() -- zero the internal gradient buffers of the network (will come to this later)

In [None]:
gradInput = net:backward(input, torch.rand(10))

In [None]:
print(#gradInput)

So we've plucked a random tensor out and given it to the network as input. Simply by passing this into the forward() function of the net it is processed - fed forward - and output as "output". 

##Criterion: Defining a loss function
When you want a model to learn to do something, you give it feedback on how well it is doing. The function that computes an objective measure of the model's performance is called a loss function and it typically takes in the model's output and the groundtruth and computes a value that quantifies the model's performance.
The model then corrects itself to have a smaller loss.
In Torch, loss functions are implemented just like neural network modules, and have automatic differentiation.
They have two functions - `forward(input, target)` and `backward(input, target)`

For example:

In [None]:
criterion = nn.ClassNLLCriterion() -- a negative log-likelihood criterion for multi-class classification
criterion:forward(output, 3) -- let's say the groundtruth was class number: 3
gradients = criterion:backward(output, 3)

In [None]:
gradInput = net:backward(input, gradients)



### Recap
Networks takes an input and produce an output in the `:forward` pass. 
'Criterion' computes the loss of the network, and its gradients with respect to the output of the network.
Network takes an (input, gradients) pair in its backward pass and calculates the gradients with respect to each layer (and neuron) in the network.

### Next
A neural network layer can have learnable parameters or not.
A convolution layer learns its convolution kernels to adapt to the input data and the problem being solved.
A max-pooling layer has no learnable parameters. It only finds the max of local windows.
A layer in Torch which has learnable weights, will typically have fields `.weight` (and optionally `.bias`)

In [None]:
m = nn.SpatialConvolution(1,3,2,2) -- learn 3 2x2 kernels
print(m.weight) -- initially, the weights are randomly initialized

In [None]:
print(m.bias) -- The operation in a convolution layer is: output = convolution(input,weight) + bias

There are also two other important fields in a learnable layer. The `gradWeight` and `radBias`. 
`gradWeight` accumulates the gradients with respect to each weight in the layer, and `gradBias`, with respect to each bias in the layer.

##Training the network
For the network to adjust itself, it typically does this operation (if you do Stochastic Gradient Descent):

`weight = weight + learningRate * gradWeight [equation 1]`

This update over time will adjust the network weights such that the output loss is decreasing.

###How does each layer in the neural network update the weight according to equation 1?
A simple SGD trainer in the neural network module: `nn.StochasticGradient` has a function `:train(dataset)` that takes a given dataset and simply trains your network by showing different samples from your dataset to the network.

####What about data?
Torch has simple dataloaders: `image.load` or `audio.load` to load your data into a `torch.Tensor` or a Lua table.
We'll use the CIFAR-10 dataset, which has the classes: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'.
The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel colour images of 32x32 pixels in size.
The dataset has 50,000 training images and 10,000 test images in total.

![](images/CIFAR10.jpg)

We now have 5 steps left to train a Torch neural network
1. Load and normalize data
2. Define Neural Network
3. Define Loss function
4. Train network on training data
5. Test network on test data.

**1. Load and normalize data**
In the interest of time, we prepared the data before-hand into a 4D torch ByteTensor of size 10000x3x32x32 (training) and 10000x3x32x32 (testing) Let us load the data and inspect it.

In [None]:
trainset = torch.load('/home/ubuntu/data/cifar.torch/cifar10-train.t7')
testset = torch.load('/home/ubuntu/data/cifar.torch/cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}

In [None]:
print(trainset)

In [None]:
print(#trainset.data)

Lets display an image:

In [None]:
itorch.image(trainset.data[100]) -- display the 100-th image in dataset
print(classes[trainset.label[100]])

To prepare the dataset to be used with **nn.StochasticGradient**, the dataset has to have a `:size()` function and an `[i]` index operator, so that `dataset[i]` returns the ith sample in the datset.

In [None]:
-- ignore setmetatable for now, it is a feature beyond the scope of this tutorial. It sets the index operator.
setmetatable(trainset, 
    {__index = function(t, i) 
                    return {t.data[i], t.label[i]} 
                end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.

function trainset:size() 
    return self.data:size(1) 
end

In [None]:
print(trainset:size()) -- just to test

In [None]:
print(trainset[33]) -- load sample number 33.
itorch.image(trainset[33][1])

**One of the most important things you can do in conditioning your data (in general in data-science or machine learning) is to make your data have a mean of 0.0 and standard-deviation of 1.0.**
This is the final step of our data processing via the tensor indexing operator. It is shown by example:

In [None]:
redChannel = trainset.data[{ {}, {1}, {}, {}  }] -- this picks {all images, 1st channel, all vertical pixels, all horizontal pixels}

In [None]:
print(#redChannel)

In this indexing operator, you initally start with [{ }]. You can pick all elements in a dimension using {} or pick a particular element using {i} where i is the element index. You can also pick a range of elements using {i1, i2}, for example {3,5} gives us the 3,4,5 elements.

Moving back to mean-subtraction and standard-deviation based scaling, doing this operation is simple, using the indexing operator that we learnt above:

In [None]:
mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
    mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
    print('Channel ' .. i .. ', Mean: ' .. mean[i])
    trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction
    
    stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
    print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
    trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

Our training data is now normalized and ready to be used.

**2. Define our neural network**

###Task 1: 

Modify the neural network from the Neural Networks section above and modify it to take 3-channel images (instead of 1-channel images as it was defined).

You can find the [solution](#Task-#1-Answer) in the Answers section below.

In [None]:
net = nn.Sequential()
net:add(nn.SpatialConvolution(1, 6, 5, 5)) -- 1 input image channel, 6 output channels, 5x5 convolution kernel
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120))             -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.Linear(120, 84))
net:add(nn.Linear(84, 10))                   -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())                     -- converts the output to a log-probability. Useful for classification problems

**3. Define the Loss function**
A Log-likelihood classification loss is well suited for most classification problems.

In [None]:
criterion = nn.ClassNLLCriterion()

**4. Train the neural network**
First define an **nn.StochasticGradient** object then we'll give our dataset to this object's :train function.

In [None]:
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 5 -- just do 5 epochs of training.

In [None]:
trainer:train(trainset)

**5. Test the network, print accuracy**
We have trained the network for 5 passes over the training dataset.
To check if the network has learnt anything we can check by predicting the class label that the neural network outputs, and comparing it to the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.

Lets display an image from the test set to get familiar.


In [None]:
print(classes[testset.label[100]])
itorch.image(testset.data[100])

Now normalize the test data with the mean and standard-deviation from the training data.

In [None]:
testset.data = testset.data:double()   -- convert from Byte tensor to Double tensor
for i=1,3 do -- over each image channel
    testset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction    
    testset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

In [None]:
-- print the mean and standard-deviation of example-100
horse = testset.data[100]
print(horse:mean(), horse:std())

Lets see what the neural network thinks these examples above are:

In [None]:
print(classes[testset.label[100]])
itorch.image(testset.data[100])
predicted = net:forward(testset.data[100])

In [None]:
-- the output of the network is Log-Probabilities. To convert them to probabilities, you have to take e^x 
print(predicted:exp())

You can see the network predictions. The network assigned a probability to each class, given the image.
To make it clearer, we can tag each probability with its class-name:

In [None]:
for i=1,predicted:size(1) do
    print(classes[i], predicted[i])
end

For the real deal; how many in total are correct over the test set?

In [None]:
correct = 0
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        correct = correct + 1
    end
end

In [None]:
print(correct, 100*correct/10000 .. ' % ')

Which classes performed well, and which didn't?

In [None]:
class_performance = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        class_performance[groundtruth] = class_performance[groundtruth] + 1
    end
end

In [None]:
for i=1,#classes do
    print(classes[i], 100*class_performance[i]/1000 .. ' %')
end

To swap from CPU to GPU we simply take a neural network, and transfer it over to GPU:

In [None]:
require 'cunn'; 
--brings in CUDA

In [None]:
net = net:cuda()

In [None]:
criterion = criterion:cuda()
-- transfer the criterion

In [None]:
trainset.data = trainset.data:cuda()
-- transfer the data across

In [None]:
-- and train on GPU 
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.001
trainer.maxIteration = 5 -- just do 5 epochs of training.

In [None]:
trainer:train(trainset)
-- this is such a small dataset you wont notice much difference on speedup

### To try another dataset with a simple script ...

Clement Farabet of Twitter lets you run a full network here on Google Street View House Numbers dataset. Due to time we've set the -size flag to small (uses 10,000 training images only of the 73,000+)

 ![](images/SVHN.jpg)
 

In [None]:
----------------------------------------------------------------------
-- This script loads the (SVHN) House Numbers dataset
-- http://ufldl.stanford.edu/housenumbers/
----------------------------------------------------------------------

-- Note: files were converted from their original Matlab format
-- to Torch's internal format using the mattorch package. The
-- mattorch package allows 1-to-1 conversion between Torch and Matlab
-- files.

-- The SVHN dataset contains 3 files:
--    + train: training data
--    + test:  test data
--    + extra: extra training data

train_file = '/home/ubuntu/data/svhn/housenumbers/train_32x32.t7'
test_file = '/home/ubuntu/data/svhn/housenumbers/test_32x32.t7'
extra_file = '/home/ubuntu/data/svhn/housenumbers/extra_32x32.t7'

----------------------------------------------------------------------
print '==> loading dataset'

-- We load the dataset from disk, and re-arrange it to be compatible
-- with Torch's representation. Matlab uses a column-major representation,
-- Torch is row-major, so we just have to transpose the data.

-- Note: the data, in X, is 4-d: the 1st dim indexes the samples, the 2nd
-- dim indexes the color channels (RGB), and the last two dims index the
-- height and width of the samples.

loaded = torch.load(train_file,'ascii')
trainData = {
   data = loaded.X:transpose(3,4),
   labels = loaded.y[1],
   size = function() return trsize end
}

loaded = torch.load(extra_file,'ascii')
extraTrainData = {
   data = loaded.X:transpose(3,4),
   labels = loaded.y[1],
   size = function() return trsize end
}

loaded = torch.load(test_file,'ascii')
testData = {
   data = loaded.X:transpose(3,4),
   labels = loaded.y[1],
   size = function() return tesize end
}

----------------------------------------------------------------------
print '==> visualizing data'

-- Visualization is quite easy, using itorch.image().
if itorch then
   print('training data:')
   itorch.image(trainData.data[{ {1,128} }])
   print('extra training data:')
   itorch.image(extraTrainData.data[{ {1,128} }])
   print('test data:')
   itorch.image(testData.data[{ {1,128} }])
end

In [None]:
----------------------------------------------------------------------
-- This tutorial shows how to train different models on the street
-- view house number dataset (SVHN),
-- using multiple optimization techniques (SGD, ASGD, CG), and
-- multiple types of models.
--
-- This script demonstrates a classical example of training 
-- well-known models (convnet, MLP, logistic regression)
-- on a 10-class classification problem. 
--
-- It illustrates several points:
-- 1/ description of the model
-- 2/ choice of a loss function (criterion) to minimize
-- 3/ creation of a dataset as a simple Lua table
-- 4/ description of training and test procedures
--
-- Clement Farabet
----------------------------------------------------------------------
require 'torch'

----------------------------------------------------------------------
print '==> processing options'

cmd = torch.CmdLine()
cmd:text()
cmd:text('SVHN Loss Function')
cmd:text()
cmd:text('Options:')
-- global:
cmd:option('-seed', 1, 'fixed input seed for repeatable experiments')
cmd:option('-threads', 2, 'number of threads')
-- data:
cmd:option('-size', 'small', 'how many samples do we load: small | full | extra')
-- model:
cmd:option('-model', 'convnet', 'type of model to construct: linear | mlp | convnet')
-- loss:
cmd:option('-loss', 'nll', 'type of loss function to minimize: nll | mse | margin')
-- training:
cmd:option('-save', 'results', 'subdirectory to save/log experiments in')
cmd:option('-plot', false, 'live plot')
cmd:option('-optimization', 'SGD', 'optimization method: SGD | ASGD | CG | LBFGS')
cmd:option('-learningRate', 1e-3, 'learning rate at t=0')
cmd:option('-batchSize', 1, 'mini-batch size (1 = pure stochastic)')
cmd:option('-weightDecay', 0, 'weight decay (SGD only)')
cmd:option('-momentum', 0, 'momentum (SGD only)')
cmd:option('-t0', 1, 'start averaging at t0 (ASGD only), in nb of epochs')
cmd:option('-maxIter', 2, 'maximum nb of iterations for CG and LBFGS')
cmd:option('-type', 'double', 'type: double | float | cuda')
cmd:text()
opt = cmd:parse(arg or {})

-- nb of threads and fixed seed (for repeatable experiments)
if opt.type == 'float' then
   print('==> switching to floats')
   torch.setdefaulttensortype('torch.FloatTensor')
elseif opt.type == 'cuda' then
   print('==> switching to CUDA')
   require 'cunn'
   torch.setdefaulttensortype('torch.FloatTensor')
end
torch.setnumthreads(opt.threads)
torch.manualSeed(opt.seed)

----------------------------------------------------------------------
print '==> executing all'

dofile '/home/ubuntu/data/svhn/1_data.lua'
dofile '/home/ubuntu/data/svhn/2_model.lua'
dofile '/home/ubuntu/data/svhn/3_loss.lua'
dofile '/home/ubuntu/data/svhn/4_train.lua'
dofile '/home/ubuntu/data/svhn/5_test.lua'

----------------------------------------------------------------------
print '==> training!'

while true do
   train()
   test()
end

Our developers are working hard to offer full integration on DIGITS, our deep learning GPU training system and will be releasing a beta shortly. Here is the output of training Lenet on MNIST using Torch - the screen shots looks very much like Caffe but this is raw Torch output.

![](images/digits.jpg)

## Post-Lab Summary

If you would like to download this lab for later viewing, please go to **your browsers File menu** (not the Jupyter notebook file menu) and save the complete web page.  This will ensure the images are copied down as well.

Torch is maintained by the deep learning field's top coders and this lab is thanks to them. They are all very happy to offer this assistance because they want you using their highly optimized framework, **Torch7**. 

The Cheatsheet https://github.com/torch/torch7/wiki/Cheatsheet and all code are maintained on Github https://github.com/torch/torch7 but if you do have **advanced questions ONLY** you can head over to Gitter where most devs hang out, but PLEASE - for install or newbie questions go to the Google Group first for very quick responses.
* Ask for help: http://groups.google.com/forum/#!forum/torch7
* Chat with developers of Torch: http://gitter.im/torch/torch7


### More information

iTorch is simply an iPython kernel for Torch allowing images, video etc and you can use it outside of this lab instance via https://github.com/facebook/iTorch#requirements

For more advanced coding you need to download and try Torch yourself. Torch is open-source, so you can also start with the code on the GitHub repo https://github.com/torch/torch7 and use the Getting Started guide here http://torch.ch/docs/getting-started.html. You can learn more about LuaJIT here http://luajit.org/

* Build crazy graphs of networks: https://github.com/torch/nngraph
* Train on imagenet with multiple GPUs: https://github.com/soumith/imagenet-multiGPU.torch
* Train recurrent networks with LSTM on text: https://github.com/wojzaremba/lstm
* More demos and tutorials: https://github.com/torch/torch7/wiki/Cheatsheet

Due to Torch's embeddable nature it can even be run on our NVIDIA Jetson TK1 boards.
Installation and usage instructions for Torch + CuDNN on Jetson TK1 is here:
https://github.com/e-lab/torch-toolbox/blob/master/Tutorials/Setup-Torch-cuDNN-on-Jetson-TK1.md 

For Matlab users, UCLA’s Ata Mahjoubfar, PhD has kindly written a separate very thorough cheatsheet for you here: http://atamahjoubfar.github.io/Torch_for_Matlab_users.pdf

For Numpy users there’s https://github.com/torch/torch7/wiki/Torch-for-Numpy-users

For advanced users, please refer to the “gotchas” here https://luapower.com/luajit-notes 


To learn more about these other topics, please visit:
* GPU accelerated machine learning: [http://www.nvidia.com/object/machine-learning.html](http://www.nvidia.com/object/machine-learning.html)
* Theano: [http://deeplearning.net/software/theano/](http://deeplearning.net/software/theano/)
* Torch: [http://torch.ch/](http://torch.ch/)
* DIGITS: [https://developer.nvidia.com/digits](https://developer.nvidia.com/digits)
* cuDNN: [https://developer.nvidia.com/cudnn](https://developer.nvidia.com/cudnn)

### Deep Learning Lab Series

Make sure to check out the rest of the classes in this Deep Learning lab series.  You can find them [here](https://developer.nvidia.com/deep-learning-courses).

### Acknowledgements

Many thanks to 
* Soumith Chintala of Facebook for his 2015 **60 minute blitz** on Github https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb
* Clement Farabet for his Madbits tutorials here http://code.madbits.com/wiki/doku.php
* Mark Ebersole and Larry Brown @NVIDIA for their help putting together this lab.

## Answers

### Task #1 Answer

In [None]:
net = nn.Sequential()
net:add(nn.SpatialConvolution(3, 6, 5, 5)) -- 1 input image channel, 6 output channels, 5x5 convolution kernel
net:add(nn.SpatialMaxPooling(2,2,2,2))     -- A max-pooling operation that looks at 2x2 windows and finds the max.
net:add(nn.SpatialConvolution(6, 16, 5, 5))
net:add(nn.SpatialMaxPooling(2,2,2,2))
net:add(nn.View(16*5*5))                    -- reshapes from a 3D tensor of 16x5x5 into 1D tensor of 16*5*5
net:add(nn.Linear(16*5*5, 120))             -- fully connected layer (matrix multiplication between input and weights)
net:add(nn.Linear(120, 84))
net:add(nn.Linear(84, 10))                   -- 10 is the number of outputs of the network (in this case, 10 digits)
net:add(nn.LogSoftMax())                     -- converts the output to a log-probability. Useful for classification problems

[Return to Task #1](#Task-1:)