# Neural Networks with torch7

Torch has a package for neural networks called 'nn'. It can be imported into torch using:

In [1]:
nn = require 'nn';

There are many types of neural networks designed over the years. Most neural networks can be represented as directed acyclic graphs. However (just as a note), we'll see a class of neural networks called recurrent neural networks that can't be represented in that manner. The simplest type of neural networks are sequential feed-forward neural networks. Multilayer perceptrons and Convolutional Neural Networks do fall in that category. 

To create a sequential network, we use a super-module called 'nn.Sequential'. Other modules can be added to this super-module to create a sequential neural network. 

A variant to the sequential network is the parallel module which has multiple neural network pipelines running in parallel. Also, if you want even more power and to design recurrent neural networks you would use another package called the 'nngraph' package. But, we are getting ahead of ourselves. 

## The sequential super-module
Here's how you create a sequential model:

In [2]:
model = nn.Sequential()

That's it! However, our model is empty and can't do anything yet. 
## The Linear module
Let's add a linear module to our model.

In [3]:
model:add(nn.Linear(3,4))

nn.Sequential {
  [input -> (1) -> output]
  (1): nn.Linear(3 -> 4)
}
{
  gradInput : DoubleTensor - empty
  modules : 
    {
      1 : 
        nn.Linear(3 -> 4)
        {
          gradBias : DoubleTensor - size: 4
          weight : DoubleTensor - size: 4x3
          gradWeight : DoubleTensor - size: 4x3
          gradInput : DoubleTensor - empty
          bias : DoubleTensor - size: 4
          output : DoubleTensor - empty
        }
    }
  output : DoubleTensor - empty
}


Ouch, that printed a lot of things. Let's ignore most of it for now and concentrate on the 3 and 4. So, it takes a vector of size 3 and embeds it into a vector space of 4 dimensions.

The linear module is basically just:
$$ Y = W^{\top}X + b$$ where X, Y and b are vectors and W is a matrix. 

If you notice in the output of the module, there are internal variables for W called weight and b called bias. 

Let's verify if they actually work like that. Consider an initialization for X:

In [4]:
X = torch.randn(3)

In [5]:
X

-0.1163
-0.2036
-0.2703
[torch.DoubleTensor of size 3]



## The forward pass
You can get the output from any torch module by doing a <module_name>:forward(). Let's do that here to get Y. 

In [6]:
Y = model:forward(X)

In [7]:
Y

-0.4918
-0.1836
 0.5253
-0.3506
[torch.DoubleTensor of size 4]



Now, let's replicate that as a math calculation.

First let's access the weight of that model.

In [8]:
W = model:get(1).weight

In [9]:
W

 0.0346  0.4622  0.0502
 0.4883 -0.1885 -0.4888
-0.4798  0.4865 -0.5142
-0.3095  0.4759 -0.5465
[torch.DoubleTensor of size 4x3]



The model:get(1) command accesses the first module of model which is the Linear module we just defined. Next we access the weight variable of that module by chaining a '.weight'.

Similarly, we can get the bias of that linear module like this:

In [10]:
b = model:get(1).bias

In [11]:
b

-0.3801
-0.2973
 0.4296
-0.4374
[torch.DoubleTensor of size 4]



Now we can actually do some math. The Linear module can be expressed in torch maths as follows:

In [12]:
-- The * operator is overloaded with matrix multiplication
W*X+b

-0.4918
-0.1836
 0.5253
-0.3506
[torch.DoubleTensor of size 4]



Compare this with the output of the above model. 

In [13]:
model:get(1).output

-0.4918
-0.1836
 0.5253
-0.3506
[torch.DoubleTensor of size 4]



## The backward pass

Just like the forward pass gave us the output, a backward pass through a module will give us its backpropagated gradient.

In [25]:
model:zeroGradParameters()




In [26]:
model:get(1).gradBias

 0
 0
 0
 0
[torch.DoubleTensor of size 4]



In [27]:
model:get(1).gradWeight

 0  0  0
 0  0  0
 0  0  0
 0  0  0
[torch.DoubleTensor of size 4x3]



In [28]:
model:get(1).gradInput

-0.2665
 1.2361
-1.4992
[torch.DoubleTensor of size 3]



In [29]:
model:backward(X, torch.ones(4))

-0.2665
 1.2361
-1.4992
[torch.DoubleTensor of size 3]



In [30]:
model

nn.Sequential {
  [input -> (1) -> output]
  (1): nn.Linear(3 -> 4)
}
{
  gradInput : DoubleTensor - size: 3
  modules : 
    {
      1 : 
        nn.Linear(3 -> 4)
        {
          gradBias : DoubleTensor - size: 4
          weight : DoubleTensor - size: 4x3
          gradWeight : DoubleTensor - size: 4x3
          gradInput : DoubleTensor - size: 3
          bias : DoubleTensor - size: 4
          output : DoubleTensor - size: 4
        }
    }
  output : DoubleTensor - size: 4
}


In [31]:
model:get(1).gradBias

 1
 1
 1
 1
[torch.DoubleTensor of size 4]



In [32]:
model:get(1).gradWeight

-0.1163 -0.2036 -0.2703
-0.1163 -0.2036 -0.2703
-0.1163 -0.2036 -0.2703
-0.1163 -0.2036 -0.2703
[torch.DoubleTensor of size 4x3]



In [33]:
model:get(1).gradInput

-0.2665
 1.2361
-1.4992
[torch.DoubleTensor of size 3]



In [34]:
0.0346+0.4883+-0.4798+-0.3095

-0.2664	
