In [1]:
%matplotlib inline
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import transforms,datasets


Neural Networks
===============
# TODO
# networjk on very simple dataset predicting stocks de thymamai kana eukolaki kai meta MNIST
# Let's try diffrent networks and setting
# PLOT acuracies and loss
# Confusion matrix
# predictions on sample image
# display mnist images
# (Optional) use tensorboard for display
Use the torch.nn package to build a neural network.
We will build two neural networks and try to classify images with digits MNIST dataset.
In the last lecture, I have already talked about `` autograd ``, `` nn ``package depends on `` autograd ``
 package to define the model and get derivative.
An ``nn.Module`` contains each layer and a forward (input) method, which returns `` output``.

E.g:

![](https://pytorch.org/tutorials/_images/mnist.png)

It is a simple feed-forward neural network that accepts an input, then passes it layer by layer, and
finally outputs the result of the calculation.

The typical training process of neural network is as follows:

1. Define a neural network model containing some learnable parameters (or weights)

1. Iterate over the dataset
1. Process input through neural network
1. Calculate the loss (the difference between the output and the correct value)
1. Parameters of backpropagating the gradient back to the network
1. Update the network parameters, mainly using the following simple update principle:
``weight = weight - learning_rate * gradient``

Create a network:
------------------





In [2]:
class FCNet(nn.Module):
    def __init__(self):
        super(FCNet, self).__init__()
        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(28*28, 32)
        self.fc2 = nn.Linear(32, 10)
    def forward(self,x):
        # print('imnput x')
        x = F.relu(self.fc1(self.flatten(x)))

        x = self.fc2(x)
        return x

class ConvNet(nn.Module):

    def __init__(self):
        super(ConvNet, self).__init__()
        # 1 input image channel, 10 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 10, 5)
        self.conv2 = nn.Conv2d(10, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(256, 32)
        self.fc2 = nn.Linear(32, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = FCNet()
net

FCNet(
  (flatten): Flatten()
  (fc1): Linear(in_features=784, out_features=32, bias=True)
  (fc2): Linear(in_features=32, out_features=10, bias=True)
)

The forward function must be defined in the model. The backward function (used to calculate the gradient) is
automatically created by ``autograd``. You can use any operation for Tensor in the forward function.

``net.parameters()`` returns a list and values of parameters (weights) that can be learned



In [3]:
params = list(net.parameters())
len(params)
params[0].size() # conv1's .weight

torch.Size([32, 784])


Note: The expected input size of this network (LeNet) is 32 × 32. If you use the MNIST dataset to train this network,
please resize the image to 32 × 32.


In [4]:
input = torch.randn(1, 1, 28, 28)
out = net(input)
out

tensor([[-0.3378,  0.5920, -0.0722,  0.6209, -0.0205, -0.0661, -0.0077,  0.5482,
         -0.3803, -0.2409]], grad_fn=<AddmmBackward>)

Clear the gradient buffer of all parameters to zero, and then perform the back propagation of the random gradient:



In [5]:
net.zero_grad()
out.backward(torch.randn(1, 10))


## Note
 ``torch.nn`` only supports small batch input. The whole `` torch.nn``
Packages only support small batch samples, not individual samples.

For example, ``nn.Conv2d`` accepts a 4-dimensional tensor,

``Each dimension is numSamples * nChannels * Height * Width (number of samples * number of channels * height * width) ``

If you have a single sample, just use `` input.unsqueeze (0) `` to add other dimensions 

Before continuing, let's review the classes used so far.

**review:**
* `` torch.Tensor``: a used multi-dimensional array * that automatically calls `` backward() `` to
support automatic gradient calculation,
And save the *gradient* w.r.t about this vector.
* `` nn.Module``: neural network module. Package parameters, move to GPU, run, export, load, etc.
* `` nn.Parameter``: A variable, when it is assigned to a `` Module ``, it is *automatically registered as a parameter*.
* `` autograd.Function ``: To achieve the forward and reverse definition of an automatic derivation operation,
each variable operation creates at least one function node, and each `` Tensor `` operation creates and receives one ``Tensor`` and the ``Function`` node of the function that encodes its history.

**The key points are as follows:**


*    Create a network
*    Forward operation of input
* Calculate loss then backward operation
*    Update network weights





Loss function
-------------
A loss function accepts a pair of (output, target) as input and calculates a value to estimate how much the network
 output differs from the target value.

***Translator's Note: output is the output of the network, and target is the actual value***

There are many different [loss functions] in the nn package (https://pytorch.org/docs/nn.html#loss-functions).
`` nn.MSELoss `` is a relatively simple loss function, which calculates the **mean square error** between the output
and the target,
E.g:

In [6]:
output = net(input)
target = torch.randn(10)  
target = target.view(1, -1)  
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(1.4750, grad_fn=<MseLossBackward>)


Now, if you follow `` loss`` in the reverse process, use its
`` .grad_fn`` attribute, you will see the calculation diagram shown below.

::

input-> conv2d-> relu-> maxpool2d-> conv2d-> relu-> maxpool2d
-> view-> linear-> relu-> linear-> relu-> linear
-> MSELoss
-> loss

So, when we call `` loss.backward () ``, the entire calculation graph will be
Differentiate according to loss, and all tensors in the figure set to `` requires_grad = True ``
Will have a `` .grad `` tensor that accumulates with the gradient.

To illustrate, let us take a few steps back:



In [7]:
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward object at 0x7f6c61005278>
<AddmmBackward object at 0x7f6c61005390>
<AccumulateGrad object at 0x7f6c61005278>


Back propagation
--------
Call ``loss.backward()`` to get the error of back propagation.

However, you need to clear the existing gradient before calling, otherwise the gradient will be accumulated to the
existing gradient.
Now, we will call ``loss.backward()`` and look at the gradient of the bias term of the conv1 layer before and after
back-propagation.



In [8]:
# net.zero_grad()     

# print('conv1.bias.grad before backward')
# print(net.conv1.bias.grad)

# loss.backward()

# print('conv1.bias.grad after backward')
# print(net.conv1.bias.grad)

How to use the loss function

**Read later:**

 The `nn` package contains various modules and loss functions used to form the building blocks of deep neural networks.
 For complete documentation, please see [here] (https://pytorch.org/docs/nn).



Update weights
------------------
In practice, the simplest weight update rule is stochastic gradient descent (SGD):

 `` weight = weight-learning_rate * gradient ``

We can implement this rule using simple Python code:


In [9]:
learning_rate = 0.01
for f in net.parameters ():
    f.data.sub_ (f.grad.data * learning_rate)

But when using a neural network to use various update rules, such as SGD, Nesterov-SGD, Adam, RMSPROP, etc., a package
`` torch.optim `` is built in PyTorch to implement all these rules.
Using them is very simple:

In [10]:


# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

**Note:** 

Observe how to use ``optimizer.zero_grad ()`` to manually set the gradient buffer to zero. This is because the gradient
is accumulated as described in the Backprop section.





# Define our dataloaders

We are going to use ``torchvision`` datasets and specifically MNIST dataset to classify images with digits

In [11]:
batch_size  = 128
#transforms.Normalize(mean=0.5,std=1.0)
trans = transforms.Compose([transforms.ToTensor()])
training_set = datasets.MNIST(root='./',train=True,transform=trans,download=True)

test_set = datasets.MNIST(root='./',train=False,transform=trans,download=True)
train_loader = torch.utils.data.DataLoader(
                 dataset=training_set,
                 batch_size=batch_size,
                 shuffle=True)
test_loader = torch.utils.data.DataLoader(
                dataset=test_set,
                batch_size=batch_size,
                shuffle=False)


# Training our neural network

In [12]:
criterion = nn.CrossEntropyLoss()
if torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
total_epochs = 2
log_interval = 100
average_loss = 0.
net = ConvNet()
net = net.to(device)
optimizer = optim.SGD(net.parameters(), lr=0.01)
for epoch in range(total_epochs):
    average_loss = 0.
    total_samples  = 0
    correct_predictions = 0
    for batch_idx, (inputs, target) in enumerate(train_loader):
        optimizer.zero_grad()
        
        outputs = net(inputs.to(device))
       
        loss = criterion(outputs,target.to(device))
        average_loss+=loss.item()
        loss.backward()
        optimizer.step()
        _, pred_label = torch.max(outputs.cpu().data, 1)
        total_samples += inputs.data.size()[0]
        correct_predictions += (pred_label == target.data).sum()
        if (batch_idx+1) % 100 == 0 or (batch_idx+1) == len(train_loader):
            print(f' epoch: {epoch} Loss: {(average_loss/batch_idx):.2f} Acc: {(correct_predictions * 1.0 / total_samples):.2f}')


 epoch: 0 Loss: 2.32 Acc: 0.18
 epoch: 0 Loss: 2.29 Acc: 0.26
 epoch: 0 Loss: 2.25 Acc: 0.33
 epoch: 0 Loss: 2.16 Acc: 0.37
 epoch: 0 Loss: 2.03 Acc: 0.42
 epoch: 1 Loss: 0.82 Acc: 0.78
 epoch: 1 Loss: 0.70 Acc: 0.80
 epoch: 1 Loss: 0.63 Acc: 0.82
 epoch: 1 Loss: 0.59 Acc: 0.83
 epoch: 1 Loss: 0.56 Acc: 0.83


## Testing

In [13]:
test_loss = 0.
total_samples  = 0
correct_predictions = 0
for batch_idx, (inputs, target) in enumerate(test_loader):
    outputs = net(inputs.to(device))
    loss = criterion(outputs,target.to(device))
    test_loss +=loss.item()
    _, pred_label = torch.max(outputs.cpu().data, 1)
    total_samples += inputs.data.size()[0]
    correct_predictions += (pred_label == target.data).sum()
print(f' epoch: {epoch}, test loss: {(test_loss/len(test_loader)):.2f}, acc: {(correct_predictions * 1.0 / total_samples):.2f}')

 epoch: 1, test loss: 0.38, acc: 0.88
