# Model Components

The main components of a `WideDeep` (i.e. Multimodal) model are tabular data, text and images, which are feed into the model via so called `wide`, `deeptabular`, `deeptext` and `deepimage` model components

### 1. `wide`

The `wide` component is a Linear layer "plugged" into the output neuron(s). Here, the non-linearities are captured via crossed columns. Crossed columns are, quoting directly the paper: "*For binary features, a cross-product transformation (e.g., “AND(gender=female, language=en)”) is 1 if and only if the constituent features (“gender=female” and “language=en”) are all 1, and 0 otherwise*".

The only particularity of our implementation is that we have implemented the linear layer via an Embedding layer plus a bias. While the implementations are equivalent, the latter is faster and far more memory efficient, since we do not need to one hot encode the categorical features. 

Let's assume we the following dataset:

In [1]:
import torch
import pandas as pd
import numpy as np

from torch import nn

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
df = pd.DataFrame({"color": ["r", "b", "g"], "size": ["s", "n", "l"]})
df.head()

Unnamed: 0,color,size
0,r,s
1,b,n
2,g,l


one hot encoded, the first observation would be

In [3]:
obs_0_oh = (np.array([1.0, 0.0, 0.0, 1.0, 0.0, 0.0])).astype("float32")

if we simply numerically encode (label encode or `le`) the values:

In [4]:
obs_0_le = (np.array([0, 3])).astype("int64")

Note that in the functioning implementation of the package we start from 1, saving 0 for padding, i.e. unseen values. 

Now, let's see if the two implementations are equivalent

In [5]:
# we have 6 different values. Let's assume we are performing a regression, so pred_dim = 1
lin = nn.Linear(6, 1)

In [6]:
emb = nn.Embedding(6, 1)
emb.weight = nn.Parameter(lin.weight.reshape_as(emb.weight))

In [7]:
lin(torch.tensor(obs_0_oh))

tensor([-0.1975], grad_fn=<AddBackward0>)

In [8]:
emb(torch.tensor(obs_0_le)).sum() + lin.bias

tensor([-0.1975], grad_fn=<AddBackward0>)

And this is precisely how the linear model `Wide` is implemented

In [9]:
from pytorch_widedeep.models import Wide

In [10]:
# ?Wide

In [11]:
wide = Wide(input_dim=10, pred_dim=1)
wide

Wide(
  (wide_linear): Embedding(11, 1, padding_idx=0)
)

Note that even though the input dim is 10, the Embedding layer has 11 weights. Again, this is because we save `0` for padding, which is used for unseen values during the encoding process. 

As I mentioned, `deeptabular` has enough complexity on its own and it will be described in a separated notebook. Let's then jump to `deeptext`.

### 2. `deeptabular`

The `deeptabular` model alone is what normally would be referred as Deep Learning for tabular data. As mentioned a number of times throughout the library, each component can be used independently. Therefore, if you wanted to use any of the models below alone, it is perfectly possible. There are just a couple of simple requirement that will be covered in a later notebook.

By the time of writing, there are a number of models available in `pytorch-widedeep` to do DL for tabular data. These are:

1. `TabMlp`
2. `ContextAttentionMLP`
3. `SelfAttentionMLP`
4. `TabResnet`
5. `Tabnet`
6. `TabTransformer`
7. `FT-Tabransformer`
8. `SAINT`
9. `TabFastFormer`
10. `TabPerceiver`

Let's have to one of them. For more information on each of these models, please, have a look to the documentation

In [12]:
from pytorch_widedeep.models import TabMlp

In [13]:
# toy example just to build a model.
colnames = ["a", "b", "c", "d", "e"]
cat_embed_input = [(u, i, j) for u, i, j in zip(colnames[:4], [4] * 4, [8] * 4)]
column_idx = {k: v for v, k in enumerate(colnames)}
tabmlp = TabMlp(
    column_idx=column_idx,
    cat_embed_input=cat_embed_input,
    continuous_cols=["e"],
    mlp_hidden_dims=[8, 4],
)
tabmlp

TabMlp(
  (cat_and_cont_embed): DiffSizeCatAndContEmbeddings(
    (cat_embed): DiffSizeCatEmbeddings(
      (embed_layers): ModuleDict(
        (emb_layer_a): Embedding(5, 8, padding_idx=0)
        (emb_layer_b): Embedding(5, 8, padding_idx=0)
        (emb_layer_c): Embedding(5, 8, padding_idx=0)
        (emb_layer_d): Embedding(5, 8, padding_idx=0)
      )
      (embedding_dropout): Dropout(p=0.1, inplace=False)
    )
    (cont_norm): BatchNorm1d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (tab_mlp): MLP(
    (mlp): Sequential(
      (dense_layer_0): Sequential(
        (0): Dropout(p=0.1, inplace=False)
        (1): Linear(in_features=33, out_features=8, bias=True)
        (2): ReLU(inplace=True)
      )
      (dense_layer_1): Sequential(
        (0): Dropout(p=0.1, inplace=False)
        (1): Linear(in_features=8, out_features=4, bias=True)
        (2): ReLU(inplace=True)
      )
    )
  )
)

###  3. `deeptext`

At the time of writing, `pytorch-widedeep` offers three models that can be passed to `WideDeep` as the `deeptext` component. These are:

1. BasicRNN
2. AttentiveRNN
3. StackedAttentiveRNN

For details on each of these models, please, have a look to the documentation of the package. Let me insist, it is perfectly possible to use custom models for each component, please, have a look to the corresponding notebook. In general, simply, build them and pass them as the corresponding parameters. Note that the custom models MUST return a last layer of activations (i.e. not the final prediction) so that  these activations are collected by `WideDeep` and combined accordingly. In  addition, the models MUST also contain an attribute `output_dim` with the size of these last layers of activations.

Let's have a look to the `BasicRNN` model

In [14]:
from pytorch_widedeep.models import BasicRNN

In [15]:
basic_rnn = BasicRNN(vocab_size=4, hidden_dim=4, n_layers=1, padding_idx=0, embed_dim=4)



In [16]:
basic_rnn

BasicRNN(
  (word_embed): Embedding(4, 4, padding_idx=0)
  (rnn): LSTM(4, 4, batch_first=True, dropout=0.1)
  (rnn_mlp): Identity()
)

You could, if you wanted, add a Fully Connected Head (FC-Head) on top of it

###  4. `deepimage`

At the time of writing `pytorch-widedeep` is integrated with torchvision via the `Vision` class. This means that the it is possible to use a variant of the following architectures:


1. resnet
2. shufflenet
3. resnext
4. wide_resnet
5. regnet
6. densenet
7. mobilenet
8. mnasnet
9. efficientnet
10. squeezenet

The user can choose which layers will be trainable. Alternatively, in none of these architectures is useful, one could use a simple, fully trained CNN (please see the package documentation) or pass a custom model. 

let's have a look

In [17]:
from pytorch_widedeep.models import Vision

In [18]:
resnet = Vision(pretrained_model_name="resnet18", n_trainable=0)

In [19]:
resnet

Vision(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)