# Building the network


In pytorch, **all networks** must subclass from the `Module` class.

Underneath the hood, the `Module` base class is keeping track of the network's weights which are contained within each layer.


In [13]:
import torch.nn as nn
import torch

class Network(nn.Module):
    
    def __init__(self,):
        
        # this line extends the nn.Module base class. 
        super().__init__()
    
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5) # 5x5
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
        
        # linear layers are often used in the final stages of a network
        self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features=10) 

    def forward(self,t):
        
        return t

Let's break the code down:

## Convolutional layers

The choice of number of layers here is arbitrary; this type of configuration is best approached through experimentation

```python
self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5)
self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=5)
```
### Hyperparameters

- `kernel_size` - sets the filter size. 'kernel' and 'filter' are interchangeable terms
- `out_channels` - sets the number of filters. One filter produces one output channel. output channels are also called `feature maps`.
- `in_channels` - number of input channels

In our case above, we are saying that we want **1 input channel** that will be convolved by `six different filters` which will create six output channels

The **in_channel** hyperperameter in the `first conv layer` is dependent on the number of color channels of our input data. We have it as **1** because we have grayscale images.

The second conv layer's input matches the output size of the previous conv layer

## Linear layers

When switching from conv2d layers to Linear, the tensors must be flattened. 

`NOTE`: fc = fully connected layers (Linear, Dense, Fully connected all refer to the same thing)

```python
self.fc1 = nn.Linear(in_features=12*4*4, out_features=120)
self.fc2 = nn.Linear(in_features=120, out_features=60)
self.out = nn.Linear(in_features=60, out_features=10) 
```

### Hyperparameters

- `in_features` - the layer input size
- `out_features` - size fo the layer output

The **out_features** in the `self.out` layer correspond to the number of our classes 


In [22]:
network = Network()
print(network)

Network(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 12, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=192, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=60, bias=True)
  (out): Linear(in_features=60, out_features=10, bias=True)
)


Taking a look at the above output:
- `stride` - tells the conv layer how far the filter should slide after each operation in the overall convolution
- `bias` - an additive parameter 

In [28]:
for name,param in network.named_parameters():
    print(f"{name}\t\t{param.shape}")

conv1.weight		torch.Size([6, 1, 5, 5])
conv1.bias		torch.Size([6])
conv2.weight		torch.Size([12, 6, 5, 5])
conv2.bias		torch.Size([12])
fc1.weight		torch.Size([120, 192])
fc1.bias		torch.Size([120])
fc2.weight		torch.Size([60, 120])
fc2.bias		torch.Size([60])
out.weight		torch.Size([10, 60])
out.bias		torch.Size([10])
