# TOC

  __Chapter 8 - Modern network architectures__

1. [Import](#Import)
1. [Modern network architectures](#Modern-network-architectures)
    1. [ResNet](#ResNet)
        1. [Creating PyTorch datasets](#Creating-PyTorch-datasets)
        1. [Creating loaders for training and validation](#Creating-loaders-for-training-and-validation)
        1. [Creating a ResNet model](#Creating-a-ResNet-model)
        1. [Extracting convolutional features](#Extracting-convolutional-features)
        1. [Creating a custom PyTorch dataset class for the pre-convoluted features and loader](#Creating-a-custom-PyTorch-dataset-class-for-the-pre-convoluted-features-and-loader)
        1. [Training and validating the model](#Training-and-validating-the-model)
    1. [Inception](#Inception)
        1. [Creating an Inception model](#Creating-an-Inception-model)
        1. [Extracting convolutional features using register_forward_hook](#Extracting-convolutional-features-using-register_forward_hook)
        1. [Creating a new dataset for the convoluted features](#Creating-a-new-dataset-for-the-convoluted-features)
        1. [Creating a fully connected model](#Creating-a-fully-connected-model)
        1. [Training and validating the model](#Training-and-validating-the-model2)
    1. [DenseNet](#DenseNet)
        1. [Creating a DenseNet model](#Creating-a-DenseNet-model)
        1. [Extracting DenseNet features](#Extracting-DenseNet-features)
        1. [Creating a dataset and loaders](#Creating-a-dataset-and-loaders)
    1. [Model ensembling](#Model-ensembling)
        1. [Creating models](#Creating-models)
        1. [Extracting the image features](#Extracting-the-image-features) 
        1. [Creating a custom dataset along with data loaders](#Creating-a-custom-dataset-along-with-data-loaders)
        1. [Creating an ensembling model](#Creating-an-ensembling-model) 
        1. [Training and validating the ensemble model](#Training-and-validating-the-ensemble-model)        
        1. [](#)        

# Import

<a id = 'Import'></a>

In [3]:
# standard libary and settings
import os
import sys
import importlib
import itertools
from PIL import Image
from glob import glob
import warnings

warnings.simplefilter("ignore")
from IPython.core.display import display, HTML

display(HTML("<style>.container { width:95% !important; }</style>"))

# data extensions and settings
import numpy as np

np.set_printoptions(threshold=np.inf, suppress=True)
import pandas as pd

pd.set_option("display.max_rows", 500)
pd.set_option("display.max_columns", 500)
pd.options.display.float_format = "{:,.6f}".format

# pytorch tools
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torch.autograd import Variable
from torchvision import datasets, models, transforms

# visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
sns.set_style("whitegrid")

# Modern network architecture

Adding layers to the model can add to its predictive abilities, but also introduces the possiblility of other problem, such as vanishing/exploding gradients. Modern architectures try to solve these problem by introducing different techniques.



<a id = 'Modern-network-architecture'></a>

## ResNet

ResNet approaches these issues by enabling layers in the network to fit to the residuals. In a typical network, we fit a model to find a function that maps the input $x$ to its output $H(x)$ by stacking different layers. ResNet, instead of trying to learn a mapping from $x$ to $H(x)$, tries to learn the difference between the two (aka the residual). To calculate $H(x)$, we add the residual to the input. If the residual is $F(x) = H(x) - x$, then we don't need to learn $H(x)$ directly. Instead, we try to learn $F(x) + x$.

Each ResNet block is comprised of several layers an da shortcut connection that adds the input of the block to the output of the block. The addition operation is performed element-wise, so the inputs and outputs need to be the same size. If the objects are not the same size naturally then we can use padding.

In the example below, the init method initializes all of the different layers, and the forward method is very similar to implementation seen so far, except that the input is being adding back to the layer's output before returning it.



<a id = 'ResNet'></a>

In [None]:
# ResNet block demonstration
class ResNetBasicBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride):
        super().__init__()
        self.conv1 = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False,
        )
        self.bn1 = nn.atchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=3,
            stride=stride,
            padding=1,
            bias=False,
        )
        self.bn2 = nn.atchNorm2d(out_channels)
        self.stride = stride

    def forward(self, x):
        residuals = x
        out = self.conv1(x)
        out = F.relu(self.bn1(out), inplace=True)
        out = self.conv2(out)
        out = self.bn2(out)
        out += residual
        return F.relu(out)

### Creating PyTorch datasets



<a id = 'Creating-PyTorch-datasets'></a>

In [None]:
# create datasets from respective image folders
data_transform = transforms.Compose(
    [
        transforms.Resize((299, 299)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
)

train_data = ImageFolder("../../kaggleDogsVsCats/data/train", transforms=data_transform)
val_data = ImageFolder("../../kaggleDogsVsCats/data/valid", transforms=data_transform)
classes = 2

### Creating loaders for training and validation

The exact sequence of the data need to be maintained in order to facilitate calculating the pre-convoluted features. If the data gets shuffled, then the labels are not maintained. Therefore, it is important to ensure that the shuffle argument is set to False.



<a id = 'Creating-loaders-for-training-and-validation'></a>

In [None]:
# create data loaders
train_loader = DataLoader(train_dset, batch_size=32, shuffle=False, num_workers=3)
val_loader = DataLoader(val_dset, batch_size=32, shuffle=False, num_workers=3)

### Creating a ResNet model

The nn.Sequential instance enables the rapid creation of a model using a set of PyTorch layers. It is important to set requires_grad to False.



<a id = 'Creating-a-ResNet-model'></a>

In [None]:
# setup ResNet 34 model
resnet_model = models.resnet34(pretrained=True)

# set device
if is_cuda:
    resnet_model = resnet_model.cuda()

# discard the last linear layer
resnet_model = nn.Sequential(*list(resnet_model.children())[:-1])

# turn of gradients
for p in resnet_model.parameters():
    p.requires_grad = False

### Extracting convolutional features

Calculating the pre-convouted features can save substantial time in the model training stage. This avoids having to calcualte the features in every iteration.



<a id = 'Extracting-convolutional-features'></a>

In [None]:
# store the training data labels
trn_labels = []

# store the pre-convoluted features of the training data
trn_featuers = []

# iterate through training data, store the calculated featuers and the lables
for d, la in train_loader:
    o = m(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    trn_labels.extend(la)
    trn_featuers.extend(o.cpu().data)

# iterate through validation data, store the calculated featuers and the lables
val_labels = []
val_featuers = []
for d, la in val_loader:
    o = m(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    val_labels.extend(la)
    val_featuers.extend(o.cpu().data)

### Creating a custom PyTorch dataset class for the pre-convoluted features and loader

With the pre-convoluted features in hand, we need to create a custom data set that can select from the pre-convoluted features.



<a id = 'Creating-a-custom-PyTorch-dataset-class-for-the-pre-convoluted-features-and-loader'></a>

In [1]:
# custom dataset class
class FeatureDataset(Dataset):
    def __init__(self, featlst, labellst):
        self.featlst = featlst
        self.labellst = labellst

    def __getitem__(self, index):
        return (self.featlst[index], self.labellst[index])

    def __len__(self):
        return len(self.labellst)

NameError: name 'Dataset' is not defined

In [None]:
# creating dataset for train and validation
trn_feat_dset = FeaturesDataset(trn_features, trn_labels)
val_feat_dset = FeaturesDataset(val_features, val_labels)

# creating data loader for train and validation
trn_feat_loader = DataLoader(trn_feat_dset, batch_size=64, shuffle=True)
val_feat_loader = DataLoader(val_feat_dset, batch_size=64)

### Creating a simple linear model



<a id = 'Creating-a-simple-linear-model'></a>

In [None]:
# full connected model
class FullyConnectedModel(nn.Module):
    def __init__(self, in_size, out_size):
        super().__init__()
        self.fc = nn.Linear(in_size, out_size)

    def forward(self, inp):
        out = self.fc(inp)
        return out


fc_in_size = 8192

fc = FullyConnectedModel(fc_in_size, classes)
if is_cuda:
    fc = fc.cuda()

### Training and validating the model



<a id = 'Training-and-validating-the-model'></a>

In [None]:
# model training and validation loop
train_losses, train_accuracy = [], []
val_losses, val_accuracy = [], []
for epoch in range(1, 10):
    epoch_loss, epoch_accuracy = fit(epoch, fc, trn_feat_loader, phase="training")
    val_epoch_loss, val_epoch_accuracy = fit(
        epoch, fc, val_feat_loader, phase="validation"
    )
    train_losses.append(epoch_loss)
    train_accuracy.append(epoch_accuracy)
    val_losses.append(val_epoch_loss)
    val_accuracy.append(val_epoch_accuracy)

## Inception

The Inception model combines convolutions of different filter sizes and concatenates all of the outputs. The various convolutions of different sizes are applied to the input. This is the simplest variant of Inception. There is a more complicated variant where the input is passed through a 1 by 1 convolution prior to being passed through a 3 by 3 and 5 by 5 convolutions. The 1 by 1 convolution is used for dimensionality reduction, addressing computational bottlebecks. 1 by 1 convolutions evaluate one value at a time across all channels. For example a 10 by 1 by 1 filter on an input of 100 by 64 by 64 results in a 10 by 64 by 64.

<a id = 'Inception'></a>

In [4]:
# CNN helper class
class BasicConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicCOnv2, self).__init__()
        super.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.atchNorm2d(out_channels)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return F.relu(x, inplace=True)

In [None]:
# Inception model class
class InceptionBasicBlock(nn.Module):
    def __init__(self, in_channels, pool_featuers):
        super().__init__()
        self.branch1x1 = BasicConv2d(in_channels, 64, kernel_size=1)

        self.branch5x5_1 = BasicConv2d(in_channels, 48, kernel_size=1)
        self.branch5x5_2 = BasicConv2d(48, 64, kernel_size=5, padding=2)

        self.branch3x3dbl_1 = BasicConv2d(in_channels, 64, kernel_size=1)
        self.branch3x3dbl_2 = BasicConv2d(64, 96, kernel_size=3, padding=1)

        self.branch_pool = BasicConv2d(in_channels, pool_featuers, kernel_size=1)

    def forward(self, x):
        # applies a 1 by 1 conv
        branch1x1 = self.branch1x1(x)

        # 1 by 1 conv followed by a 5 by 5 conv
        branch5x5 = self.branch3x3dbl_1(x)
        branch5x5 - self.branch3x3dbl_2(branch5x5)

        # 1 by 1 conv followed by a 3 by 3 conv
        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl_1)

        # average max pool followed by 1 by 1 conv
        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        # concatenate output
        outputs = (branch1x1, branch5x5, branch3x3dbl, branch_pool)
        return torch.cat(ouputs, 1)

### Creating an Inception model

The Inception v3 model has two branches, and each creates an output, and the loss of each branch gets merged together. In this implementation, we will only use one branch to calculate pre-convoluted features. The process to do so is less straightforward with Inception when compared to ResNet.

Below, we disable one of the branches by setting aux_logits to False.

<a id = 'Creating-an-Inception-model'></a>

In [None]:
# instantiate Inception v#
inceptionModel = models.torchvision_v3(pretrained=True)

# disabl aux_logits
inceptionModel.aux_logits = False

# set device
if is_cuda:
    inceptionModel = inceptionModel.cuda()

### Extracting convolutional features using register_forward_hook

The techniques in this section are similar to how we calculated activations for style transfer.

Since we will be capturing outputs of all the images and storing them, we cannot use the GPU, so this class moves the tensors to the CPU.

The execution of the process involves extracting the output of the Inception model at the last layer, and we exclude the average pooling layer, dropout and linear layer. Pooling is skipped to avoid losing information.


<a id = 'Extracting-convolutional-features-using-register_forward_hook'></a>

In [None]:
# class for extracting CNN layers
class LayerActivations:
    features = []

    def __init__(self, model):
        self.features = []
        self.hook = model.register_forward_hook(self.hook_fn)

    def hook_fn(self, module, input, output):
        self.features.extend(output.view(output.size(0), -1).cpu().data)

    def remove(self):
        self.hook.remove()


# create LayerActivations object to store inception model output at a particular layer
trn_features = LayerActivations(inceptionModel.Mixed_7c)
trn_labels = []

#
for da, la in train_loader:
    _ = inceptionModel(Variableda.cuda())
    trn_labels.extend(la)
trn_features.remove()

# repeat for validation
val_features = LayerActivations(inceptionModel.Mixed_7c)
val_labels = []
for da, la in val_loader:
    _ = inceptionModel(Variableda.cuda())
    val_labels.extend(la)
val_features.remove()

### Creating a new dataset for the convoluted features



<a id = 'Creating-a-new-dataset-for-the-convoluted-features'></a>

In [None]:
# dataset for pre computed features for train and validation data sets
trn_feat_dset = FeaturesDataset(trn_features.features, trn_labels)
val_feat_dset = FeaturesDataset(val_features.features, val_labels)

# data loaders for pre computed features for train and validation data sets

trn_feat_loader = DataLoader(trn_feat_dset, batch_size=64, shuffle=True)
val_feat_loader = DataLoader(val_feat_dset, batch_size=64)

### Creating a fully connected model



<a id = 'Creating-a-fully-connected-model'></a>

In [None]:
# fully connected model
class FullyConnectedModel(nn.Module):
    def __init__(self, in_size, out_size, training=True):
        super().__init__()
        self.fc = nn.Linear(in_size, out_size)

    def forward(self, inp):
        out = F.dropout(inp, training=self.training)
        out = self.fc(out)
        return out


# The size of the output from the selected convolution feature
fc_in_size = 131072

# instantiate fully connected model
fc = FullyConnectedModel(fc_in_size, classes)
if is_cuda:
    fc = fc.cuda()

### Training and validating the model



<a id = 'Training an--validating-the-model2'></a>

In [None]:
# model training loop
for epoch in range(1, 10):
    epoch_loss, epoch_accuracy = fit(epoch, fc, trn_feat_loader, phase="training")
    val_epoch_loss, val_epoch_accuracy = fit(
        epoch, fc, val_feat_loader, phase="validation"
    )
    train_losses.append(epoch_loss)
    train_accuracy.append(epoch_accuracy)
    val_losses.append(val_epoch_loss)
    val_accuracy.append(val_epoch_accuracy)

## DenseNet

DenseNet is a modern architecture that forms connections from each layer to all layers that follow. This means that layers receives all feature maps from all preceding layer, i.e.

$$
X_l = H_l(x_0, x_1, x_2,...,x_{l-1})
$$

where $X_l$ is the layer of interest and $H_1$ is the collection of feature maps up to that point in the model.

Below, DenseBlock is a sequential module where layers are added in a sequential order. num_layers controls the number of objects DenseLayer objects that are added, and each is given a name.

In DenseLayer, the init method adds all layers that the input data needs to be passed to. the forward method is where the forward passing of kernels from previous layers occurs. The input is passed to the forward method of the _super_ class in nn.Sequential, which is shown below:

```python
def forward(self, input):
    for module in self._modules.values():
        input = module(input)
    return input
```

The input is passed through all of the layers that were previously added to the sequential block, and then the output is concatenated to the input. This process is repeated for the specified number of layers in the block. 

Some of the advantages of DenseNet are:
- Substantially reduces the number of parameters required
- Alleviates the vanishing gradient problem
- Encourages features reuse


<a id = 'DenseNet'></a>

In [5]:
# create DenseBlock class
class _DenseBlock(nn.Sequential):
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
        super(_DenseBlock, self).__init__()
        for i in range(num_layers):
            layer = _DenseLayer(
                num_input_features + 1 * growth_rate, growth_rate, bn_size, drop_rate
            )
            self.add_module("denselayer%d" % (i + 1), layer)


# create DenseLayer class
class _DenseLayer(nn.Sequential):
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
        super(_DenseLayer, self).__init__()
        self.add_module("norm.1", nn.BatchNorm2d(num_input_features))
        self.add_module("relu.1", nn.ReLU(inplace=True))
        self.add_module(
            "conv.1",
            nn.Conv2d(
                num_input_features,
                bn_size * growth_rate,
                kernel_size=1,
                stride=1,
                bias=False,
            ),
        )
        self.add_module("norm.2", nn.BatchNorm2d(bn_size * growth_rate))
        self.add_module("relu.2", nn.ReLU(inplace=True))
        self.add_module(
            "conv.2",
            conv2d(
                bn_size * growth_rate,
                growth_rate,
                kernel_size=3,
                stride=1,
                padding=1,
                bias=False,
            ),
        )
        self.drop_rate = drop_rate

    def forward(self, x):
        new_features = super(_DenseLayer, self).forward(x)
        if self.drop_rate > 0:
            new_features = F.dropout(
                new_features, p=self.drop_rate, training=self.training
            )
        return torch.cat([x, new_features], 1)

### Creating a DenseNet model

PyTorch's implementation of DenseNet has two moduels: features, which contains the dense blocks, and classifier, which contains the fully connected block. We will only be using DenseNet as an image feature extractor so we only need to use the feature module.



<a id = 'Creating-a-DenseNet-model'></a>

In [None]:
# instantiate DenseNet121
model_densenet = models.densenet121(pretrained=True).features
if is_cuda:
    model_densenet = model_densenet.cuda()

# turn of gradients
for p in model_densenet.parameters():
    p.required._grad = False

### Extracting DenseNet features



<a id = 'Extracting-DenseNet-features'></a>

In [None]:
# For training data
trn_labels = []
trn_features = []

# code to store densenet features for train dataset.
for d, la in train_loader:
    o = my_densenet(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    trn_labels.extend(la)
    trn_features.extend(o.cpu().data)

# For validation data
val_labels = []
val_features = []

# code to store densenet features for validation dataset.
for d, la in val_loader:
    o = my_densenet(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    val_labels.extend(la)
    val_features.extend(o.cpu().data)

### Creating a dataset and loaders



<a id = 'Creating-a-dataset-and-loaders'></a>

In [None]:
# create dataset for train and validation convolution features
trn_feat_dset = FeaturesDataset(trn_features, trn_labels)
val_feat_dset = FeaturesDataset(val_features, val_labels)

# create data loaders for batching the train and validation datasets
trn_feat_loader = DataLoader(trn_feat_dset, batch_size=64, shuffle=True, drop_last=True)
val_feat_loader = DataLoader(val_feat_dset, batch_size=64)

### Creating a fully connected model and train



<a id = 'Creating-a-fully-connected-model-and-train'></a>

In [None]:
# fully connected model
class FullyConnectedModel(nn.Module):
    def __init__(self, in_size, out_size):
        super().__init__()
        self.fc = nn.Linear(in_size, out_size)

    def forward(self, inp):
        out = self.fc(inp)
        return out


# instantiate fully connected model
fc = FullyConnectedModel(fc_in_Size, classes)
if is_cuda:
    fc = fc.cuda()

In [None]:
# training and validation loop
train_losses, train_accuracy = [], []
val_losses, val_accuracy = [], []
for epoch in range(1, 10):
    epoch_loss, epoch_accuracy = fit(epoch, fc, trn_feat_loader, phase="training")
    val_epoch_loss, val_epoch_accuracy = fit(
        epoch, fc, val_feat_loader, phase="validation"
    )
    train_losses.append(epoch_loss)
    train_accuracy.append(epoch_accuracy)
    val_losses.append(val_epoch_loss)
    val_accuracy.append(val_epoch_accuracy)

## Model ensembling

We can combine outputs from features generated by three different models to build a powerful model. The architecture involves passing the images to each of the models, and each model passes to a fully connected layer. These three fully connected layers are combined into one fully connected layers, which produces the output.

<a id = 'Model-ensembling'></a>

### Creating models



<a id = 'Creating-models'></a>

In [None]:
# ResNet
resnet_model = models.resnet34(pretrained=True)
if is_cuda:
    resnet_model = resnet_model.cuda()

resnet_model = nn.Sequential(*list(resnet_model.children())[:-1])

for p in resnet_model.parameters():
    p.requires_grad = False

# Inception_v3
modelInception = models.inception_v3(pretrained=True)
modelInception.aux_logits = False
if is_cuda:
    modelInception = modelInception.cuda()

for p in modelInception.parameters():
    p.requires_grad = False

# DenseNet
model_densenet = models.densenet121(pretrained=True).features
if is_cuda:
    model_densenet = model_densenet.cuda()

for p in model_densenet.parameters():
    p.requires_grad = False

### Extracting the image features



<a id = 'Extracting-the-image-features'></a>

In [None]:
# ResNet
trn_labels = []
trn_resnet_features = []
for d, la in train_loader:
    o = my_resnet(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    trn_labels.extend(la)
    trn_resnet_features.extend(o.cpu().data)

val_labels = []
val_resnet_features = []
for d, la in val_loader:
    o = my_resnet(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    val_labels.extend(la)
    val_resnet_features.extend(o.cpu().data)

# Inception_v3
trn_inception_features = LayerActivations(my_inception.Mixed_7c)
for da, la in train_loader:
    _ = my_inception(Variable(da.cuda()))

trn_inception_features.remove()

val_inception_features = LayerActivations(my_inception.Mixed_7c)
for da, la in val_loader:
    _ = my_inception(Variable(da.cuda()))
val_inception_features.remove()

# DenseNet
trn_densenet_features = []
for d, la in train_loader:
    o = my_densenet(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    trn_densenet_features.extend(o.cpu().data)

val_densenet_features = []
for d, la in val_loader:
    o = my_densenet(Variable(d.cuda()))
    o = o.view(o.size(0), -1)
    val_densenet_features.extend(o.cpu().data)

### Creating a custom dataset along with data loaders



<a id = 'Creating-a-custom-dataset-along-with-data-loaders'></a>

In [None]:
# custom dataset
class FeaturesDataset(Dataset):
    def __init__(self, featlist1, featlist2, featlst3, labellst):
        self.featlst1 = featlst1
        self.featlst2 = featlst2
        self.featlst3 = featlst3
        self.labellst = labellst

    def __getitem__(self, index):
        return (
            self.featlst1[index],
            self.featlst2[index],
            self.featlst3[index],
            self.labellst[index],
        )

    def __len__(self):
        return len(self.labellst)


# load data from separate models
trn_feat_dset = FeaturesDataset(
    trn_resnet_features,
    trn_inception_features.features,
    trn_densenet_features,
    trn_labels,
)
val_feat_dset = FeaturesDataset(
    val_resnet_features,
    val_inception_features.features,
    val_densenet_features,
    val_labels,
)

trn_feat_loader = DataLoader(trn_feat_dset, batch_size=64, shuffle=True)
val_feat_loader = DataLoader(val_feat_dset, batch_size=64)

### Creating an ensembling model



<a id = 'Creating-an-ensembling-model'></a>

In [None]:
# ensemble class
class EnsembleModel(nn.Module):
    def __init__(self, out_size, training=True):
        super().__init__()
        self.fc1 = nn.Linear(8192, 512)
        self.fc2 = nn.Linear(131072, 512)
        self.fc3 = nn.Linear(82944, 512)
        self.fc4 = nn.Linear(512, out_size)

    def forward(self, inp1, inp2, inp3):
        out1 = self.fc1(F.dropout(inp1, training=self.training))
        out2 = self.fc2(F.dropout(inp2, training=self.training))
        out3 = self.fc3(F.dropout(inp3, training=self.training))
        out = out1 + out2 + out3
        out = self.fc4(F.dropout(out, training=self.training))
        return out


# instantiate ensemble
em = EnsembleModel(2)
if is_cuda:
    em = em.cuda()

### Training and validating the ensemble model



<a id = 'Training-and-validating-the-ensemble-model'></a>

In [None]:
# function for executing fit and validation
def fit(epoch, model, data_loader, phase="training", volatile=False):
    if phase == "training":
        model.train()
    if phase == "validation":
        model.eval()
        volatile = True
    running_loss = 0.0
    running_correct = 0

    for batch_idx, (data1, data2, data3, target) in enumerate(data_loader):
        if is_cuda():
            data1, data2, data3, target = (
                data1.cuda(),
                data2.cuda(),
                data3.cuda(),
                target.cuda(),
            )
        data1, data2, data3, target = (
            Variable(data1, volatile),
            Variable(data2, volatile),
            Variable(data3, volatile),
            Variable(target),
        )
        if phase == "training":
            optimizer.zero_grad()
        output = model(data1, data2, data3)
        loss = F.cross_entropy(output, target)

        running_loss += F.Cross_entropy(output, target, size_average=False).data.item()
        preds = output.data.max(dim=1, keepdim=True)[1]
        running_correct += preds.eq(target.data.view_as(preds)).cpu().sum()
        if phase == "training":
            loss.backward()
            optimizer.step()

    loss = running_loss / len(data_loader.dataset)
    accuracy = 100.0 * running_correct / len(data_loader.dataset)

    print(
        f"{phase} loss is {loss:{5}.{2}} and {phase} accuracy is {running_correct}/{len(data_loader.dataset)}{accuracy:{10}.{4}}"
    )
    return loss, accuracy

In [None]:
# train and validation loop
train_losses, train_accuracy = [], []
val_losses, val_accuracy = [], []
for epoch in range(1, 10):
    epoch_loss, epoch_accuracy = fit(epoch, em, trn_feat_loader, phase="training")
    val_epoch_loss, val_epoch_accuracy = fit(
        epoch, em, val_feat_loader, phase="validation"
    )
    train_losses.append(epoch_loss)
    train_accuracy.append(epoch_accuracy)
    val_losses.append(val_epoch_loss)
    val_accuracy.append(val_epoch_accuracy)