# 03 - Define a custom deep Neural Network in Pytorch

These tutorials are inspired by the book "[Deep Learning with PyTorch](https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf)" by Stevens et al and can be seen as a summary of the part I of the book regarding PyTorch itself. Normally, following the tutorials should be enough and reading the book is not required.

## Contents 

1. .Loading data, training and measuring accuracy (see previous tutorial)  
2. Define a simple custom neural network  
    1. Naive (but totally ok) method  
    3. Using the functional API  
    4. Train our custom network (as any other model)  
    5. Measuring accuracy (as any other model)  
3. Going deeper: defining blocks of layers  
    1. Using nn.Sequential  
    2. Using a subclass of nn.Module  

  
  

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import random_split
from datetime import datetime

torch.manual_seed(123)

<torch._C.Generator at 0x7f51a1a33770>

## 1. Loading data, training and measuring accuracy (see previous tutorial)

#### Loading CIFAR-2

In [2]:
def load_cifar(train_val_split=0.9, data_path='../data/', preprocessor=None):
    
    # Define preprocessor if not already given
    if preprocessor is None:
        preprocessor = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.4915, 0.4823, 0.4468),
                                (0.2470, 0.2435, 0.2616))
        ])
    
    # load datasets
    data_train_val = datasets.CIFAR10(
        data_path,       
        train=True,      
        download=True,
        transform=preprocessor)

    data_test = datasets.CIFAR10(
        data_path, 
        train=False,
        download=True,
        transform=preprocessor)

    # train/validation split
    n_train = int(len(data_train_val)*train_val_split)
    n_val =  len(data_train_val) - n_train

    data_train, data_val = random_split(
        data_train_val, 
        [n_train, n_val],
        generator=torch.Generator().manual_seed(123)
    )

    print("Size of the train dataset:        ", len(data_train))
    print("Size of the validation dataset:   ", len(data_val))
    print("Size of the test dataset:         ", len(data_test))
    
    return (data_train, data_val, data_test)

cifar10_train, cifar10_val, cifar10_test = load_cifar()

# Now define a lighter version of CIFAR10: cifar
label_map = {0: 0, 2: 1}
class_names = ['airplane', 'bird']

# For each dataset, keep only airplanes and birds
cifar2_train = [(img, label_map[label]) for img, label in cifar10_train if label in [0, 2]]
cifar2_val = [(img, label_map[label]) for img, label in cifar10_val if label in [0, 2]]
cifar2_test = [(img, label_map[label]) for img, label in cifar10_test if label in [0, 2]]

print('Size of the training dataset: ', len(cifar2_train))
print('Size of the validation dataset: ', len(cifar2_val))
print('Size of the test dataset: ', len(cifar2_test))

Files already downloaded and verified
Files already downloaded and verified
Size of the train dataset:         45000
Size of the validation dataset:    5000
Size of the test dataset:          10000
Size of the training dataset:  9017
Size of the validation dataset:  983
Size of the test dataset:  2000


#### Training loop and compute accuracy

In [3]:
def train(n_epochs, optimizer, model, loss_fn, train_loader):
    
    n_batch = len(train_loader)
    losses_train = []
    model.train()
    optimizer.zero_grad(set_to_none=True)
    
    for epoch in range(1, n_epochs + 1):
        
        loss_train = 0.0
        for imgs, labels in train_loader:

            imgs = imgs.to(device=device) 
            labels = labels.to(device=device)

            outputs = model(imgs)
            
            loss = loss_fn(outputs, labels)
            loss.backward()
            
            optimizer.step()
            optimizer.zero_grad()

            loss_train += loss.item()
            
        losses_train.append(loss_train / n_batch)

        if epoch == 1 or epoch % 10 == 0:
            print('{}  |  Epoch {}  |  Training loss {:.3f}'.format(
                datetime.now().time(), epoch, loss_train / n_batch))
    return losses_train

def compute_accuracy(model, loader):
    model.eval()
    correct = 0
    total = 0

    with torch.no_grad():
        for imgs, labels in loader:
            imgs = imgs.to(device=device)
            labels = labels.to(device=device)

            outputs = model(imgs)
            _, predicted = torch.max(outputs, dim=1)
            total += labels.shape[0]
            correct += int((predicted == labels).sum())

    acc =  correct / total
    print("Accuracy: {:.2f}".format(acc))
    return acc

## 2. Define a simple custom neural network

### 2.1 Naive (but totally ok) method

*(Inspired by 8.3.1 Our network as subclass of an nn.Module)*

We saw earlier how to define a neural network using [nn.Sequential](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html#torch.nn.Sequential). This solution is simple and convenient but might suffer from a lack of flexibility. In order to take advantage of Pytorch's flexibility we need to define our own [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). 

Since most of the basic building blocks for neural networks are [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) in Pytorch, we will proceed in a similar way if we want to define a custom layer, block of layers, neural network, activation function, loss function etc. etc. It will always start by subclassing the [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) class. 

Let's start with a custom neural network!

In order to subclass nn.Module, at a minimum we need to define a forward function that takes the inputs to the module and returns the output. This is where we define our module’s computation. With PyTorch, if we use standard torch operations, autograd will take care of the backward pass automatically.

In [4]:
class MyNet(nn.Module):
    def __init__(self):
        super().__init__()  # to inherit the '__init__' method from the 'nn.Module' class
        # Add whatever you want here (e.g layers and activation functions)
        # The order and names don't matter here but it is easier to understand
        # if you go for Layer1, fun1, layer2, fun2, etc
        # Some conventions:
        # - conv stands for convolution
        # - pool for pooling
        # - fc for fully connected

        self.flat = nn.Flatten()
        # 32*32*3: determined by our dataset: 32x32 RGB images
        self.fc1 = nn.Linear(32*32*3, 256)
        self.act1 = nn.Tanh()
        self.fc2 = nn.Linear(256, 64)
        self.act2 = nn.ReLU()
        # 2: determined by our number of classes (birds and planes)
        self.fc3 = nn.Linear(64, 2)
        
    # Remember, we saw earlier that `forward` defines the 
    # computation performed at every call (the forward pass) and that it
    # should be overridden by all subclasses.
    def forward(self, x):
        # Now the order matters! 
        out = self.flat(x)
        out = self.act1(self.fc1(out))
        out = self.act2(self.fc2(out))
        out = self.fc3(out)
        return out

In [5]:
# Now we can instantiate a model with the architecture defined in the cell above
model = MyNet()

# Our model can be inspected exactly as we inspected our model in the previous tutorial (which was then defined using nn.Sequential) 
numel_list = [p.numel() for p in model.parameters()]
print("Total number of parameters: ", sum(numel_list))
print("Number of parameter per layer: ", numel_list)

img, _ = cifar2_train[0]
# Again we can feed a input and get the output exactly the same way as before
output_tensor = model(img.unsqueeze(0))
print("Output: \n", output_tensor)

Total number of parameters:  803266
Number of parameter per layer:  [786432, 256, 16384, 64, 128, 2]
Output: 
 tensor([[-0.1395, -0.3130]], grad_fn=<AddmmBackward0>)


### 2.2 Using the functional API
*(Inspired by 8.3.3 The functional API)*

We could write a more concise -- but equivalent -- definition of our custom network. Many things are automatically managed when using already defined [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) objects. For instance, we don't need to implement the convolution operation, we don't need to specify which parameters should be trained nor how to train (update) them. Now, some of the operations are simpler than others: 
- The [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear) layer and the [nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#conv2d) layer automatically instanciate trainable parameters (see [nn.parameter.Parameter](https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html?highlight=parameter#torch.nn.parameter.Parameter)), link them to the network, tell the network how to do the operations, how to derive them, etc.  
- The [nn.MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html?highlight=maxpool#torch.nn.MaxPool2d) layer has no associated trainable parameters and the same holds for activation functions. 

Modules (e.g. layers or activation functions) that do not generate trainable parameters can be more concisely used in Pytorch using [nn.functional](https://pytorch.org/docs/stable/nn.functional.html#torch-nn-functional) (often imported as ``F``) 

For example, the functional counterpart of [nn.MaxPool2d](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d) is [nn.functional.max_pool2d](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.max_pool2d) (often imported as ``F.max_pool2d``). And the functional counterpart of [nn.ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html?highlight=relu#torch.nn.ReLU) is [relu](https://pytorch.org/docs/stable/nn.functional.html?highlight=relu#torch.nn.functional.relu) (often imported as ``F.relu``). Since ``tanh`` is a generic math function and not only used as an activation function, the counterpart of [nn.Tanh](https://pytorch.org/docs/stable/generated/torch.nn.Tanh.html#torch.nn.Tanh) is directly implemented at [torch.tanh](https://pytorch.org/docs/stable/generated/torch.tanh.html?highlight=tanh#torch.tanh)

We need to keep using [nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) for nn.Linear and nn.Conv2d so that our custom networks manage their trainable parameters automatically. However, we can safely switch to the functional counterparts of pooling and activation, since they have no trainable parameters. 

This is a lot more concise than and fully equivalent to our previous definition of MyNet.

Whether to use the [functional]((https://pytorch.org/docs/stable/nn.functional.html#torch-nn-functional)) or the [modular](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) API regarding operations without trainable parameters is a decision based on style and taste. When part of a network is so simple that we want to use nn.Sequential , we're in the modular realm. When we are writing our own forwards, it may be more natural to use the functional interface for things that do not need state in the form of parameters.



In [6]:
# Same model as MyNet but using the functional API
class MyNetBis(nn.Module):
    def __init__(self):
        super().__init__()
        # No need to declare activation functions nor maxpool layers anymore
        self.fc1 = nn.Linear(32*32*3, 256)
        self.fc2 = nn.Linear(256, 64)
        self.fc3 = nn.Linear(64, 2)
        
    def forward(self, x):
        # Activation functions now come from the functional API 
        out = torch.flatten(x, 1)
        out = torch.tanh(self.fc1(out))
        out = F.relu(self.fc2(out))
        # Note that we don't need a softmax function in the output layer if we
        # use nn.CrossEntropyLoss as the loss function
        out = self.fc3(out)
        return out
    
    

# Another model using maxpool layers (no trainable parameters in such layers)
class MyNet02(nn.Module):
    def __init__(self):
        super().__init__()
        # Images are 32x32x3 but we start with a maxpool layer that divide H and W by 2
        self.fc1 = nn.Linear(32//2 * 32//2 * 3, 64)
        self.fc2 = nn.Linear(64, 10)
        
    def forward(self, x):
        # Maxpool layers and activation functions using the functional API 
        out = torch.flatten(F.max_pool2d(x, 2), 1)
        out = torch.relu(self.fc1(out))
        out = self.fc2(out)
        return out

In [7]:
# Again we can use our new custom model as any other module! 
img, _ = cifar2_train[0]
model = MyNetBis()
output_tensor = model(img.unsqueeze(0))
print(output_tensor)

tensor([[ 0.1919, -0.1124]], grad_fn=<AddmmBackward0>)


### 2.3 Train our custom network (as any other model)

In [8]:
device = (torch.device('cuda') if torch.cuda.is_available()
          else torch.device('cpu'))
print(f"Training on device {device}.")

train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=True)
model = MyNetBis().to(device=device) 
optimizer = optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()


# WARNING. This is supposed to much much faster than previously but it 
# might still take a while if your gpu is not available
# AGAIN STOP YOUR KERNEL IF IT'S TOO SLOW 
train(
    n_epochs = 10,
    optimizer = optimizer,
    model = model,
    loss_fn = loss_fn,
    train_loader = train_loader,
)

Training on device cpu.
08:27:34.134459  |  Epoch 1  |  Training loss 0.552
08:27:41.098048  |  Epoch 10  |  Training loss 0.352


[0.5523930487903297,
 0.4804809275248372,
 0.4568690401865235,
 0.437317356150201,
 0.4209292771968436,
 0.40565535151366644,
 0.3901551380647835,
 0.376962959132296,
 0.36580844688500075,
 0.35189006060150496]

### 2.4 Measuring accuracy (as any other model)

In [9]:
train_loader = torch.utils.data.DataLoader(cifar2_train, batch_size=64, shuffle=False)
val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False)

print("Training accuracy:")
compute_accuracy(model, train_loader)
print("Validation accuracy:")
compute_accuracy(model, val_loader)

Training accuracy:
Accuracy: 0.86
Validation accuracy:
Accuracy: 0.82


0.8209562563580874

## 3. Going deeper: defining blocks of layers 

Deep neural networks often contain "blocks" of layers. Blocks are simply groups of layers and make it easier to define deeper neural networks. For example, [ResNet](https://pytorch.org/vision/stable/models.html#id10) uses [Bottleneck](https://github.com/pytorch/vision/blob/65676b4ba1a9fd4417293cb16f690d06a4b2fb4b/torchvision/models/resnet.py#L57) or [BasicBlock](https://github.com/pytorch/vision/blob/65676b4ba1a9fd4417293cb16f690d06a4b2fb4b/torchvision/models/resnet.py#L57) groups of layers.

Since a group of layer and a neural network is exactly the same thing in Pytorch (i.e. a nn.Module), we create block of layers exactly as we would create an entire model. Again, we can use nn.Sequential or defining a custom class that inherits nn.Module. 

### 3.1 Using nn.Sequential

In [10]:
class MyDeepNN(nn.Module):
    def __init__(self, n_blocks=10, n_in_out=128):
        super().__init__()
        self.fc1 = nn.Linear(32*32*3, n_in_out)
        
        # Here we define a block of layer using nn.Sequential
        self.fcblock = nn.Sequential(
            *[nn.ReLU( nn.Linear(n_in_out, n_in_out) ) for _ in range(n_blocks)]
        )
        self.fc2 = nn.Linear(n_in_out, 2)
        
    def forward(self, x):
        out = torch.flatten(x, 1)
        out = F.relu(self.fc1(out))
        out = self.fcblock(out)
        out = F.relu(self.fc2(out))
        return out

In [11]:
img, _ = cifar2_train[0]
model = MyDeepNN(n_blocks=10, n_in_out=128)
output_tensor = model(img.unsqueeze(0))

numel_list = [p.numel() for p in model.parameters()]
print("\nTotal number of parameters: ", sum(numel_list))
print("Number of layers: ", len(numel_list))
print("Number of parameter per layer: ", numel_list)

print("\n", model)


Total number of parameters:  558722
Number of layers:  24
Number of parameter per layer:  [393216, 128, 16384, 128, 16384, 128, 16384, 128, 16384, 128, 16384, 128, 16384, 128, 16384, 128, 16384, 128, 16384, 128, 16384, 128, 256, 2]

 MyDeepNN(
  (fc1): Linear(in_features=3072, out_features=128, bias=True)
  (fcblock): Sequential(
    (0): ReLU(
      inplace=True
      (inplace): Linear(in_features=128, out_features=128, bias=True)
    )
    (1): ReLU(
      inplace=True
      (inplace): Linear(in_features=128, out_features=128, bias=True)
    )
    (2): ReLU(
      inplace=True
      (inplace): Linear(in_features=128, out_features=128, bias=True)
    )
    (3): ReLU(
      inplace=True
      (inplace): Linear(in_features=128, out_features=128, bias=True)
    )
    (4): ReLU(
      inplace=True
      (inplace): Linear(in_features=128, out_features=128, bias=True)
    )
    (5): ReLU(
      inplace=True
      (inplace): Linear(in_features=128, out_features=128, bias=True)
    )
    (6)

### 3.2 Using a subclass of nn.Module

In [12]:
class MyBlock(nn.Module):
    def __init__(
        self, 
        n_in_out = 128, 
        n1 = 1024,
        n2 = 256,
    ):
        super().__init__()
        # If you want to stack your blocks, the input size and the 
        # output size must be consistent, so here n_in_out
        self.fc1 = nn.Linear(n_in_out, n1)
        self.fc2 = nn.Linear(n1, n2)
        self.fc3 = nn.Linear(n2, n_in_out)
        
    def forward(self, x):
        out = F.relu(self.fc1(x))
        out = F.relu(self.fc2(out))
        out = F.relu(self.fc3(out))
        return out

class MyDeepNN_WithMyBlock(nn.Module):
    def __init__(
        self, 
        n_blocks=10, 
        n_in_out=128,
    ):
        super().__init__()
        self.fc1 = nn.Linear(32*32*3, n_in_out)
        # Here we define a block of layer using our custom MyBlock module.
        self.myblocks = nn.Sequential(
            *[
                MyBlock(n_in_out=n_in_out, n1=1024,n2=128*(i+1)) 
                for i in range(n_blocks)
            ]
        )
        self.fc2 = nn.Linear(n_in_out, 2)

        
    def forward(self, x):
        out = torch.flatten(x, 1)
        out = F.relu(self.fc1(out))
        out = self.myblocks(out)
        out = F.relu(self.fc2(out))
        return out

In [13]:
img, _ = cifar2_train[0]
model = MyDeepNN_WithMyBlock(n_blocks=5)
output_tensor = model(img.unsqueeze(0))

numel_list = [p.numel() for p in model.parameters()]
print("\nTotal number of parameters: ", sum(numel_list))
print("Number of layers: ", len(numel_list))
print("Number of parameter per layer: ", numel_list)

print("\n", model)


Total number of parameters:  3268482
Number of layers:  34
Number of parameter per layer:  [393216, 128, 131072, 1024, 131072, 128, 16384, 128, 131072, 1024, 262144, 256, 32768, 128, 131072, 1024, 393216, 384, 49152, 128, 131072, 1024, 524288, 512, 65536, 128, 131072, 1024, 655360, 640, 81920, 128, 256, 2]

 MyDeepNN_WithMyBlock(
  (fc1): Linear(in_features=3072, out_features=128, bias=True)
  (myblocks): Sequential(
    (0): MyBlock(
      (fc1): Linear(in_features=128, out_features=1024, bias=True)
      (fc2): Linear(in_features=1024, out_features=128, bias=True)
      (fc3): Linear(in_features=128, out_features=128, bias=True)
    )
    (1): MyBlock(
      (fc1): Linear(in_features=128, out_features=1024, bias=True)
      (fc2): Linear(in_features=1024, out_features=256, bias=True)
      (fc3): Linear(in_features=256, out_features=128, bias=True)
    )
    (2): MyBlock(
      (fc1): Linear(in_features=128, out_features=1024, bias=True)
      (fc2): Linear(in_features=1024, out_fea