## Procedure to define a NN:
1. Define the neural network that has some learnable parameters (or weights)
2.Iterate over a dataset of inputs
3.Process input through the network
4.Compute the loss (how far is the output from being correct)
5.Propagate gradients back into the network’s parameters
6.Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [2]:
torch.cuda.set_device(1);

In [3]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        #1 input image channel, 6 output channels, 3*3 squaree convolution kernel
        self.conv1 = nn.Conv2d(1,6,3) #(in_channels, out_channels, kernel_size,stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
        self.conv2 = nn.Conv2d(6,16,3)
        # an affine operation: y=Wx + b
        self.fc1 = nn.Linear(16*6*6,120) #(in_features, out_features, bias=True)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
        
    def forward(self,x):
        # Max pooling over a (2,2) window
        x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)),2)
        x = x.view(-1,self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self,x):
        size=x.size()[1:] #all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features*=s
        return num_features
net = Net()
print(net)
            
        
        
        

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [4]:
params = list(net.parameters())
print(len(params))
print(params[0].size())

10
torch.Size([6, 1, 3, 3])


In [5]:
input = torch.randn(1, 1,32, 32)
out = net(input)
print(out)

tensor([[ 0.0724, -0.0998, -0.0812,  0.0398,  0.0784, -0.0919, -0.0239,  0.1851,
         -0.0024, -0.0564]], grad_fn=<AddmmBackward>)


In [6]:
net.zero_grad()

In [7]:
out.backward(torch.randn(1,10)) #backpropagation

### NOTE
torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

### RECAP:
* torch.Tensor - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.
* nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
* nn.Parameter - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.
* autograd.Function - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history.

## Loss Function

In [11]:
output = net(input)
target = torch.randn(10) #example
target=target.view(1,-1) #reshape but when we don't know how many cols or rows ==>-1
criterion = nn.MSELoss()

loss = criterion(output,target)

In [12]:
print(loss)

tensor(0.6488, grad_fn=<MseLossBackward>)


In [14]:
print(loss.grad_fn)
print(loss.grad_fn.next_functions[0][0]) #Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) #ReLu

<MseLossBackward object at 0x7fe0eca08780>
<AddmmBackward object at 0x7fe0f46d5320>
<AccumulateGrad object at 0x7fe0ecec19e8>


## Backprop

To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call loss.backward(), and have a look at conv1’s bias gradients before and after the backward.

In [17]:
net.zero_grad() #zeroes the gradient buffers of all parameters

In [18]:
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])


In [19]:
loss.backward()

In [20]:
print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad after backward
tensor([ 0.0149, -0.0075,  0.0110, -0.0001,  0.0024,  0.0020])


## Update the weights

In [22]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data*learning_rate)

However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: torch.optim that implements all these methods. Using it is very simple:

In [23]:
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update