# Classification of Symbols


- Torch is main pytorch module. 
- The `nn` module contains things like layer definitions, activations, loss functions etc.
- The helper module `functional` provides almost same functionality as the `nn` module. Don't need to initialize object for activation function.
- The `optim` module contains the hyper-optimizers
- `torchvision` is the computer vision module of pytorch.

In [2]:
import torch.nn as nn
import torch.nn.functional as F    
import torch.optim as optim
from torchvision import datasets, transforms

## Define the model

- In Pytorch, to define the neural network model, you have to define class e.g. `Net`. It has to inherit from the `nn` module.
- Another way is sequential way => define what layers your NN has.

In [5]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        """
        - We have 3 layers (500, 1000, 10). The last one is importance since we have 10 classes and we need 10 output neurons.
        - The args contain  no. of input and output nodes. E.g. input node of first layer has 784 nodes. 
        - Why 784? Coz the MNIST dataset has 28x28 px images = 784
        - Last layer is more specific, because we have 10 classes that we're trying to recognise.
        - fc = fully connected
        """
        self.fc1 = nn.Linear(784, 500)
        self.fc2 = nn.Linear(500, 1000)
        self.fc3 = nn.Linear(1000, 10)


    """
    - In Pytorch, if you define forward pass through network, then it uses `autograd` library to automatically calculate the backward pass.
    - it takes `x` vector or tensor as input.
    - the vector is of size 784 at the beginning.
    """
    def forward(self, x):
        x = x.view(-1, 784)   # x.view(<batch-size>, 784)
        x = F.relu(self.fc1(x))  # fc layer 1 is connected to input vector. And ReLU activation function is applied to it.
        x = F.relu(self.fc2(x))  # do the same but for layer two.
        x = self.fc3(x)          # don't apply AF here. Only Linear function

        """
        - logarithmic softmax function applied here. Output of network isn't of size 10 where every value = probability
        - softmax value which has highest probability.
        """
        return F.log_softmax(x, dim=1)  