# Neural Networks #

We can construct Neural Networks using the ``torch.nn`` package.

An ``nn.Module`` contains layers, and a method ``forward(input)`` that returns the ``output``.

A typical training procedure for a neural network is as follows:

Define the neural network that has some learnable parameters (or weights)
Iterate over a dataset of inputs
Process input through the network
Compute the loss (how far is the output from being correct)
Propagate gradients back into the network’s parameters
Update the weights of the network, typically using a simple update rule: 


``weight = weight - learning_rate * gradient``

## Define the Network ##

In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class ClassicNet(nn.Module):
    
    def __init__(self):
        super(ClassicNet,self).__init__()
        # 1 input image channel, 6 output channels, 5x5 convolution
        self.conv1 = nn.Conv2d(1,6,5)
        self.conv2 = nn.Conv2d(6,16,5)
        #an offline operation: y = Wx + b
        self.fc1 = nn.Linear(16*5*5,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
    
    def forward(self,x):
        #Max Pooling over a (2,2) window
        x = F.max_pool2d(F.relu(self.conv1(x)),2) #As the size is square you can specificy as 2 or (2,2)
        x = F.max_pool2d(F.relu(self.conv2(x)),2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return (x)
        
    def num_flat_features(self,x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
    
net=ClassicNet()
print(net)
        
        

ClassicNet(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


Having the network defined, we just have to define the ``forward`` function, and the ``backward`` function (where the gradients are defined) is automatically defined for you using autograd.

The learnable parameteres of a model are returned by ``net.parameters()``

In [4]:
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 5, 5])


Let's try a random 32x32 input.

In [5]:
input = torch.randn(1,1,32,32)
out = net(input)
print(out)

tensor(1.00000e-02 *
       [[ 4.5407,  2.7436,  5.5232,  4.2208,  7.3721,  6.9831, -0.0202,
          1.3512,  3.6153,  4.8502]])


Zero the gradient buffers od all parameters and backprops with random gradients:

In [6]:
net.zero_grad()
out.backward(torch.randn(1,10))

## Recap ##
- ``torch.Tensor`` - A multi-dimmensional array with support for autograd operations like ``backward()``. 
- ``nn.Module`` - Neural Network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc
- ``nn.Parameter`` - A kind of Tensot, that is automatically registered as a parameter when assigned as an attribute to a ``Module``
- ``autograd.Function`` - Implements forward and backward definitions of an autograd operation.


## Loss Function ##
A loss function tqkes the (output,target) pair of inputs, and computes a value that estimates how far away the outputs is from the target.

For example:



In [7]:
output = net(input)
target = torch.arange(1, 11)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(38.0743)


In [8]:
print(loss.grad_fn)

<MseLossBackward object at 0x0000000004DABBE0>


## Backprop ##

To backprop the error all we have to do is to ``loss.backward()``. You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

In [11]:
net.zero_grad()
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([ 0.,  0.,  0.,  0.,  0.,  0.])
conv1.bias.grad after backward
tensor(1.00000e-02 *
       [-4.6056,  0.9413, -8.4903,  0.2279,  4.7284,  2.6397])


## Updating the Weights ##

The simplest update used in practice is the Stochqstic Gradient Descent (SGD):
``weight = weight - learning_rate * gradient``

In [12]:
learning_rate = 0.1
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

SDG is ok, but if you really want to dive into depp neural network, you need to implement different updates as Nesterov-SGD, RSMProp, Adam, etc. To enable this, there is a small package: ``torch.optim`` that implements all these methods.

In [15]:
import torch.optim as optim

optimizer = optim.Adam(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update