# Neural Networks

Neural networks can be constructed using the ```torch.nn``` package.

Now that you had a glimpse of autograd, ```nn``` depends on autograd to define models and differentiate them. An ```nn.Module``` contains layers, and a method ```forward(input)``` that returns the output.

For example, look at this network that classfies digit images:

![convnet](resources/Neural-Networks/mnist.png)

It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:

* Define the neural network that has some learnable parameters (or weights)
* Iterate over a dataset of inputs
* Process input through the network
* Compute the loss (how far is the output from being correct)
* Propagate gradients back into the network’s parameters
* Update the weights of the network, typically using a simple update rule:
```python
weight = weight - learning_rate * gradient
```

## Define the network

Let’s define this network:

In [8]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

In [12]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution kernels
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

In [13]:
net = Net()
net

Net (
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)

You just have to define the ```forward``` function, and the ```backward``` function (where gradients are computed) is **automatically** defined for you using autograd. You can use any of the Tensor operations in the ```forward``` function.

The learnable parameters of a model are returned by ```net.parameters()```

In [19]:
params = list(net.parameters())
len(params), params[0].size()

(10, torch.Size([6, 1, 5, 5]))

The input to the forward is an ```autograd.Variable```, and so is the output

In [20]:
_input = Variable(torch.randn(1, 1, 32, 32))
out = net(_input)
out

Variable containing:
-0.0846  0.0483 -0.0866  0.0726  0.0004  0.0736 -0.1055 -0.0678  0.1393 -0.0650
[torch.FloatTensor of size 1x10]

Zero the gradient buffers of all parameters and backprop with random gradients:

In [21]:
net.zero_grad()
out.backward(torch.randn(1, 10))

_Note:_

```torch.nn``` only supports _mini-batches_ The entire ```torch.nn``` package only supports inputs that are a mini-batch of samples, and **not a single sample**.

For example, ```nn.Conv2d``` will take in a 4D Tensor of $(n_{Samples}, n_{Channels}, Height, Width)$.

If you have a single sample, just use ```input.unsqueeze(0)``` to add a _fake_ batch dimension.

Before proceeding further, let’s recap all the classes you’ve seen so far.


**Recap**:
* ```torch.Tensor``` - A multi-dimensional array.
* ```autograd.Variable``` - Wraps a ```Tensor``` and records the _history of operations_ applied to it. Has the same API as a ```Tensor```, with some additions like ```backward()```. Also holds the _gradient_ $w.r.t.$ the ```Tensor```.
* ```nn.Module``` - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
* ```nn.Parameter``` - A kind of ```Variable```, that is _automatically_ registered as a parameter when assigned as an attribute to a Module.
* ```autograd.Function``` - Implements ```forward``` and ```backward``` definitions of an autograd operation. Every ```Variable``` operation, creates at least a single ```Function``` node, that connects to functions that created a ```Variable``` and encodes its history.



**At this point, we covered**:

* Defining a neural network
* Processing inputs and calling ```backward```


**Still Left**:

* Computing the loss
* Updating the weights of the network

## Loss Function

A loss function takes the $(output, target)$ pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the ```nn``` package . A simple loss is: ```nn.MSELoss``` which computes the _mean-squared error_ between the input and the target.

For example:

In [24]:
out = net(_input)
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(out, target)
loss

Variable containing:
 38.5450
[torch.FloatTensor of size 1]

Now, if you follow loss in the backward direction, using it’s ```.grad_fn``` attribute, you will see a graph of computations that looks like this:

```python
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss
```

So, when we call loss.backward(), the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their .grad Variable accumulated with the gradient.

For illustration, let us follow a few steps backward:

In [25]:
loss.grad_fn, loss.grad_fn.next_functions[0][0], loss.grad_fn.next_functions[0][0].next_functions[0][0]

(<torch.autograd.function.MSELossBackward at 0x7fde2e827408>,
 <torch.autograd.function.AddmmBackward at 0x7fde2e827318>,
 <AccumulateGrad at 0x7fde2ebfcb00>)

## Backprop

To backpropogate the error all we have to do is to ```loss.backward()```. You need to **clear the existing gradients** though, else _gradients will be accumulated to existing gradients_

Now we shall call ```loss.backward()```, and have a look at _conv1_’s bias gradients before and after the backward.

In [27]:
net.zero_grad()  # zeroes the gradient buffers of all parameters
'conv1.bias.grad before backward', net.conv1.bias.grad

('conv1.bias.grad before backward', Variable containing:
  0
  0
  0
  0
  0
  0
 [torch.FloatTensor of size 6])

In [28]:
loss.backward()
'conv1.bias.grad after backward', net.conv1.bias.grad

('conv1.bias.grad after backward', Variable containing:
 1.00000e-02 *
   3.2736
   3.0928
  -0.1395
   1.7141
  -5.9198
  -2.7355
 [torch.FloatTensor of size 6])

Now, we have seen how to use loss functions.

The only thing left to learn is:

* updating the weights of the network

_Read Later_:

The neural network package contains various modules and loss functions that form the building blocks of deep neural networks. A full list with documentation is [here](http://pytorch.org/docs/nn)

## Update the weights

The simplest update rule used in practice is the _Stochastic Gradient Descent_ (SGD):

```python
weight = weight - learning_rate * gradient
```

We can implement this using simple python code:

In [29]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

However, as you use neural networks, you want to use various different update rules such as _SGD_, _Nesterov-SGD_, _Adam_, _RMSProp_, etc. To enable this, we built a small package: torch.optim that implements all these methods.

Using it is very simple:

In [32]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), learning_rate)

# in your training loop:
optimizer.zero_grad()  # zero the gradient buffers
output = net(_input)
loss = criterion(output, target)
loss.backward()
optimizer.step()  # Does the update