## Model Architectures, Training Parameters

### No Meta Data Model Architecture

For models that did not make use of meta data, we have the following architecture.

<figure>
<img src='https://storage.googleapis.com/reighns/reighns_ml_projects/docs/projects/SIIM-ISIC%20Melanoma%20Classification/images/no_meta_model_architecure.svg' width="800"/>
<figcaption align = "center"><b>No Meta Data Model Architecture.</b></figcaption>
</figure>


### Meta Data Model Architecture

For models that did made use of meta data, we have the following architecture.

<figure>
<img src='https://storage.googleapis.com/reighns/reighns_ml_projects/docs/projects/SIIM-ISIC%20Melanoma%20Classification/images/meta_model_architecture.svg' width="800"/>
<figcaption align = "center"><b>Meta Data Model Architecture.</b></figcaption>
</figure>


We concat the flattened feature maps with the meta features: 

```python
Meta Features: ['sex', 'age_approx', 'site_head/neck', 'site_lower extremity', 'site_oral/genital', 'site_palms/soles', 'site_torso', 'site_upper extremity', 'site_nan']
```

and the meta features has its own sequential layers as ANN:

```python
OrderedDict(
    [
        (
            "fc1",
            torch.nn.Linear(self.num_meta_features, 512),
        ),
        (
            "bn1",
            torch.nn.BatchNorm1d(512),
        ),
        (
            "swish1",
            torch.nn.SiLU(),
        ),
        (
            "dropout1",
            torch.nn.Dropout(p=0.3),
        ),
        (
            "fc2",
            torch.nn.Linear(512, 128),
        ),
        (
            "bn2",
            torch.nn.BatchNorm1d(128),
        ),
        (
            "swish2",
            torch.nn.SiLU(),
        ),
    ]
)
```




For example:

- image shape: $[32, 3, 256, 256]$
- meta_inputs shape: $[32, 9]$ we have 9 features.
- feature_logits shape: $[32, 1280]$ flattened feature maps at the last conv layer.
- meta_logits shape: $[32, 128]$ where we passed in a small sequential ANN for the meta data.
- concat_logits shape: $[32, 1280 + 128]$

```python
if self.use_meta:
    # from cnn images
    feature_logits = self.extract_features(image)

    # from meta features
    meta_logits = self.meta_layer(meta_inputs)

    # concatenate
    concat_logits = torch.cat((feature_logits, meta_logits), dim=1)

    # classifier head
    classifier_logits = self.architecture["head"](concat_logits)
```


### Activation Functions

As we all know, activation functions are used to transform a neurons' linearity to non-linearity and decide whether to "fire" a neuron or not.

When we design or choose an activation function, we need to ensure the follows:

- (Smoothness) Differentiable and Continuous: For example, the sigmoid function is continuous and hence differentiable. If the property is not fulfilled, we might face issues as backpropagation may not be performed properly since we cannot differentiate it.If you notice, the heaviside function is not. We cant perform GD using the HF as we cannot compute gradients but for the logistic function we can. The gradient of sigmoid function g is g(1-g) conveniently

- Monotonic: This helps the model to converge faster. But spoiler alert, Swish is not monotonic.

The properties of Swish are as follows:

- Bounded below: It is claimed in the paper it serves as a strong regularization.
- Smoothness: More smooth than ReLU which allows the model to optimize better, the error landscape, when smoothed, is easier to traverse in order to find a minima. An intuitive idea is the hill again, imagine you traverse down your neighbourhood hill, vs traversing down Mount Himalaya.

```python
# Import matplotlib, numpy and math
import matplotlib.pyplot as plt
import numpy as np
import math

def swish(x):
    sigmoid =  1/(1 + np.exp(-x))
    swish = x * sigmoid
    return swish

epsilon = 1e-20
x = np.linspace(-100,100, 100)
z = swish(x)
print(z)
print(min(z))

plt.plot(x, z)
plt.xlabel("x")
plt.ylabel("Swish(X)")

plt.show()
```

In [1]:
!pip install torchinfo

Collecting torchinfo
  Downloading torchinfo-1.6.3-py3-none-any.whl (20 kB)
Installing collected packages: torchinfo
Successfully installed torchinfo-1.6.3


You should consider upgrading via the 'C:\Users\reighns\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip' command.


In [17]:
import timm
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Union
import torchinfo
import torch
# Utility functions.
import gc
import json
import os
import random
from pathlib import Path, PurePath
from typing import Dict, Union, List
import numpy as np
import torch

In [18]:
def seed_all(seed: int = 1992) -> None:
    """Seed all random number generators."""
    print(f"Using Seed Number {seed}")

    os.environ["PYTHONHASHSEED"] = str(
        seed
    )  # set PYTHONHASHSEED env var at fixed value
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)  # pytorch (both CPU and CUDA)
    np.random.seed(seed)  # for numpy pseudo-random generator
    # set fixed value for python built-in pseudo-random generator
    random.seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.enabled = False

In [19]:
seed_all()

Using Seed Number 1992


In [47]:
@dataclass
class ModelParams:
    """A class to track model parameters.

    model_name (str): name of the model.
    pretrained (bool): If True, use pretrained model.
    input_channels (int): RGB image - 3 channels or Grayscale 1 channel
    output_dimension (int): Final output neuron.
                      It is the number of classes in classification.
                      Caution: If you use sigmoid layer for Binary, then it is 1.
    classification_type (str): classification type.
    """

    model_name: str = "resnet50d"  # resnet50d resnext50_32x4d "tf_efficientnet_b0_ns"  # Debug use tf_efficientnet_b0_ns else tf_efficientnet_b4_ns vgg16

    pretrained: bool = True
    input_channels: int = 3
    output_dimension: int = 2
    classification_type: str = "multiclass"
    use_meta: bool = False

    def to_dict(self) -> Dict[str, Any]:
        """Convert to dictionary."""
        return asdict(self)

In [48]:
MODEL_PARAMS = ModelParams()

In [57]:
class CustomNeuralNet(torch.nn.Module):
    def __init__(
        self,
        model_name: str = MODEL_PARAMS.model_name,
        out_features: int = MODEL_PARAMS.output_dimension,
        in_channels: int = MODEL_PARAMS.input_channels,
        pretrained: bool = MODEL_PARAMS.pretrained,
        use_meta: bool = MODEL_PARAMS.use_meta,
    ):
        """Construct a new model.

        Args:
            model_name ([type], str): The name of the model to use. Defaults to MODEL_PARAMS.model_name.
            out_features ([type], int): The number of output features, this is usually the number of classes, but if you use sigmoid, then the output is 1. Defaults to MODEL_PARAMS.output_dimension.
            in_channels ([type], int): The number of input channels; RGB = 3, Grayscale = 1. Defaults to MODEL_PARAMS.input_channels.
            pretrained ([type], bool): If True, use pretrained model. Defaults to MODEL_PARAMS.pretrained.
        """
        super().__init__()

        self.in_channels = in_channels
        self.pretrained = pretrained
        self.use_meta = use_meta

        self.backbone = timm.create_model(
            model_name, pretrained=self.pretrained, in_chans=self.in_channels
        )

        # removes head from backbone: # TODO: Global pool = "avg" vs "" behaves differently in shape, caution!
        self.backbone.reset_classifier(num_classes=0, global_pool="avg")

        # get the last layer's number of features in backbone (feature map)
        self.in_features = self.backbone.num_features
        self.out_features = out_features

        # Custom Head
        self.single_head_fc = torch.nn.Sequential(
            torch.nn.Linear(self.in_features, self.out_features),
        )

        self.architecture: Dict[str, Callable] = {
            "backbone": self.backbone,
            "bottleneck": None,
            "head": self.single_head_fc,
        }

    def extract_features(self, image: torch.FloatTensor) -> torch.FloatTensor:
        """Extract the features mapping logits from the model.
        This is the output from the backbone of a CNN.

        Args:
            image (torch.FloatTensor): The input image.

        Returns:
            feature_logits (torch.FloatTensor): The features logits.
        """
        # TODO: To rename feature_logits to image embeddings, also find out what is image embedding.
        feature_logits = self.architecture["backbone"](image)
        print(f"feature logits shape = {feature_logits.shape}")
        return feature_logits

    def forward(self, image: torch.FloatTensor) -> torch.FloatTensor:
        """The forward call of the model.

        Args:
            image (torch.FloatTensor): The input image.

        Returns:
            classifier_logits (torch.FloatTensor): The output logits of the classifier head.
        """

        feature_logits = self.extract_features(image)
        classifier_logits = self.architecture["head"](feature_logits)
        print(f"classifier_logits shape = {classifier_logits.shape}")

        return classifier_logits

In [58]:
model = CustomNeuralNet()

In [59]:
batch_size, channel, height, width = 8, 3, 256, 256

In [60]:
X = torch.randn((batch_size, channel, height, width))
y = model(image=X)

feature logits shape = torch.Size([8, 2048])
classifier_logits shape = torch.Size([8, 2])


In [56]:
_ = torchinfo.summary(
    model,
    (batch_size, channel, height, width),
    col_names=[
        "input_size",
        "output_size",
        "num_params",
        "kernel_size",
        "mult_adds",
    ],
    depth=3,
    verbose=1)
        

torch.Size([8, 2048])
torch.Size([8, 2])
Layer (type:depth-idx)                        Input Shape               Output Shape              Param #                   Kernel Shape              Mult-Adds
CustomNeuralNet                               --                        --                        --                        --                        --
├─ResNet: 1-1                                 [8, 3, 256, 256]          [8, 2048]                 --                        --                        --
│    └─Sequential: 2-1                        [8, 3, 256, 256]          [8, 64, 128, 128]         --                        --                        --
│    │    └─Conv2d: 3-1                       [8, 3, 256, 256]          [8, 32, 128, 128]         864                       [3, 32, 3, 3]             113,246,208
│    │    └─BatchNorm2d: 3-2                  [8, 32, 128, 128]         [8, 32, 128, 128]         64                        [32]                      512
│    │    └─ReLU: 3-3   

This model architechure means that if I pass in a batch of $8$ images of size $(3, 256, 256)$, the model statistics will tell us a lot of information. Let us give some examples with a naive **ResNet50d**.

- Input Shape: $[8, 3, 256, 256]$ passing through the first **Sequential Layer's Conv2d (3-1)** with kernel size of
- Kernel Shape: $[3, 32, 3, 3]$ which means $[\textbf{in_channels, out_channels, kernel_size, kernel_size}]$ will yield an output shape of
- Output Shape: $[8, 32, 128, 128]$ indicating that the each input images are now transformed into 32 kernels of size 256 by 256. 
- Params: The **Params** column calculates the number of parameters in this layer at 864 learnable parameters.

---

Once we know how to interpret the table, we can also see that our `CustomNeuralnet()` has `extract_features` which outputs the input at the last convolutional layer, in this example, it is at **SelectAdaptivePool2d: 2-9** where it first went through **AdaptiveAvgPool2d: 3-24** to squash the feature maps to $[8, 2048, 1, 1]$ and subsequently a **Flatten: 3-25** layer to flatten out the last 2 dimensions to become $[8, 2048]$ so we can pass on to the dense layers.

We can verify this by

```python
X = torch.randn((batch_size, channel, height, width))
y = model(image=X)
```

yielding

```python
feature logits shape = torch.Size([8, 2048])
classifier_logits shape = torch.Size([8, 2])
```

where the latter is the final shape of the input after passing through all the dense layers at $[8, 2]$, where one can envision it as 2 output neurons.