# PyTorch101 - Part 3 - Neural Networks
In this part of the tutorial, we are going to deal with neural networks, for which we have all the basic ingredients: tensors and autograd. In this part we are only going to describe basic ingredients for neural networks, while in the following part we will effectively train the network on the MNIST dataset. 
Our architecture will be the following:
- image input 28x28
- 32 3x3 conv filters
- 64 3x3 conv filters
- max pooling size 2
- dropout 0.25
- flatten
- dense layer with 128 neurons
- dropout 0.5
- softmax for the 10 classes

In [2]:
import torch
print(torch.__version__)

0.4.0


## Network definition and forward pass

Neural net structure are usually defined inside classes for simplicity, we will see incrementally how this is implemented. 

Relevant modules we will use in the init phase and relative definition:
- 2D convolution: *Conv2d(input_filters, output_filters, filter_size)*
- Dense: *Linear(input_size, output_size)*
- Softmax: *Softmax()*

We will also use other function inside the forward pass, namely:
- Relu: *F.relu(x)*
- Maxpool2D: *F.max_pool2d(x)*

In [45]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        # Define the network modules
        self.conv1 = nn.Conv2d(1, 32, 3) # input_filters=1 because MNIST is gray-scale
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.fc1 = nn.Linear(9216, 128) # 9216 is the size of the flattened layer
        self.fc2 = nn.Linear(128, 10)
        self.softmax = nn.Softmax(dim=1)
        
    def forward(self, x):
        # Defines what happens in the forward pass
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(F.relu(self.conv2(x)),(2,2))
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        #x = self.softmax(x)
        return x
        
    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

We can now instantiate the network. Having defined the forward pass, pytorch automatically defines the backward pass based on the operations we perform. We can also access the parameters of this network directly.

[SIDENOTE] This is very similar to what we do in Tensorflow

In [52]:
net = Net()
params = list(net.parameters())
print("Number of params:", len(params))
print("First conv size", params[0].size())
print("Value of the first filter:", params[0][0])

Number of params: 8
First conv size torch.Size([32, 1, 3, 3])
Value of the first filter: tensor([[[ 0.3201,  0.2941,  0.0749],
         [-0.0144,  0.1617,  0.0281],
         [ 0.0413, -0.2278,  0.1221]]])


It seems that it is very "transparent" w.r.t. the parameters of the network, making them very accessible. We can throw some random input at it if we want, just to verify that it works

In [53]:
random_input = torch.randn(1, 1, 28, 28)
out = net(random_input)
print(out)

tensor([[ 0.0000,  0.0110,  0.0000,  0.0355,  0.0000,  0.1435,  0.1409,
          0.2006,  0.0000,  0.0000]])


## Loss function and optimization
We still miss the loss function and the optimization which will minimize it.
For this task, we choose a simple cross-entropy computed on the softmax, while we will use vanilla gradient descent as an optimizer.

In [54]:
from torch.autograd import Variable

target = Variable(torch.LongTensor([5]))
criterion = nn.CrossEntropyLoss()

loss = criterion(out, target)
print(loss)

tensor(2.2150)


The CrossEntropyLoss is not so transparent instead, it surely incorporates the one hot encoding (which is required in TF), and I am quite sure that it performs the SoftMax on the outputs automatically (I will do a test later on).

Having defined the loss, we can backprop directly on it using *.backward()* as usual, calculating the gradients (remember also to clean them each time).

In [59]:
net.zero_grad() # Clear the gradients
loss.backward() 

We can also follow the computation backward, identifying which functions are called.

In [61]:
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0])
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])

<NllLossBackward object at 0x10edf0f98>
<LogSoftmaxBackward object at 0x10edf07f0>
<ReluBackward object at 0x10edf0f98>


We can see that, as I suspected, the CrossEntropyLoss automatically performs a LogSoftmax operation.

Given that PyTorch allows a great control over the gradients, we can inspect them directly (not possible in TF):

In [62]:
print(net.conv1.bias.grad)

tensor(1.00000e-02 *
       [ 0.6627,  0.4830, -0.9244,  0.8536,  0.3521,  0.7296,  1.8234,
         2.5440,  0.5693,  0.9661,  0.7899, -1.4543,  0.8642, -0.4157,
        -2.1407, -2.1223,  0.4236, -1.0479,  1.6950,  2.3712, -2.3934,
        -1.7147, -2.8960,  1.3432, -2.8582, -0.8453, -0.7836, -0.2788,
        -1.1999, -0.0280, -1.2987,  0.1399])


The last missing piece is the optimizer: we could use the same strategy as in the previous part with Gradient Descent, just by subtracting the gradients scaled of a given learning rate. But this time, instead, we will use the built-in optim package, which includes not only SGD but also other optimizers. 

How optimizer are handled in PyTorch is quite similar to TF.

In [64]:
import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.01)

# This is one pass of the optimizer
optimizer.zero_grad()                 # Clear the gradients
output = net(random_input)            # Compute outputs
loss = criterion(output, target)      # Compute loss
loss.backward()                       # Backward pass
optimizer.step()                      # Optimization step

And that's it, all we need to know to implement basic neural networks is here. In the next part we will train this architecture on the MNIST dataset.