**Neural Networks** can be constructed using `torch.nn` package

## Typical Training procedure for a neural network is:
1. Define the Neural Network w/ some parameters (or weights) that will be changed as the data is learned
2. Iterate over the training dataset of inputs
3. Process input through the network
4. Compute the loss
    - Loss: How far the caluclated output is from being correct
5. Propagate gradients back into the network's parameters
6. Update the weights of the network, typically using a simple update rule
    - Simple Update Rule: `weight = weight - learning_rate * gradient`

## Define the Network

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [10]:
class Net(nn.Module): # module --> X, Module --> O
    
    def __init__(self):
        super(Net, self).__init__()
        
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        
        self.fc1 = nn.Linear(16*6*6, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        

    def forward(self, x): 
        x = F.max_pool2d(F.relu(self.conv1(x)), (2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2) 
        
        x = x.view(-1, self.num_flat_features(x))
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        
        x = self.fc3(x)
        
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        
        for s in size:
            num_features *= s
        
        return num_features

### Neural Net Def: `class Net(nn.Module)`
- 1 input image channel
- 6 output channels
- 3x3 square convolution

### Key Terms:
**Conv**: Convolution\
**FC**: Fully Connected

### Forward Prop: `def forward(self, x)`
1. Conv Layer #1 & #2 both in ReLu activation function
1. Conv Layer #1 is pooled over a (2,2) window
2. Conv Layer #2 is pooled over a (2,2) window (Only 2 is defined since a single number can be specified)
3. Adjusting parameters


In [11]:
net = Net()

print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)
