## Model Components

The 5 main components of a `WideDeep` model are:

1. `wide`
2. `deeptabular`
3. `deeptext`
4. `deepimage`
5. `deephead`

The first 4 of them will be collected and combined by `WideDeep`, while the 5th one can be optionally added to the `WideDeep` model through its corresponding parameters: `deephead` or alternatively `head_layers`, `head_dropout` and `head_batchnorm`.

Through the development of the package, the `deeptabular` component became one of the core values of the package. Currently `pytorch-widedeep` offers three models that can be passed as the `deeptabular` components. The possibilities are numerous, and therefore, that component will be discussed on its own in a separated notebook. 

### 1. `wide`

The `wide` component is a Linear layer "plugged" into the output neuron(s). This can be implemented in `pytorch-widedeep` via the `Wide` model.

The only particularity of our implementation is that we have implemented the linear layer via an Embedding layer plus a bias. While the implementations are equivalent, the latter is faster and far more memory efficient, since we do not need to one hot encode the categorical features. 

Let's assume we the following dataset:

In [1]:
import torch
import pandas as pd
import numpy as np

from torch import nn

In [2]:
df = pd.DataFrame({'color': ['r', 'b', 'g'], 'size': ['s', 'n', 'l']})
df.head()

Unnamed: 0,color,size
0,r,s
1,b,n
2,g,l


one hot encoded, the first observation would be

In [3]:
obs_0_oh = (np.array([1., 0., 0., 1., 0., 0.])).astype('float32')

if we simply numerically encode (label encode or `le`) the values:

In [4]:
obs_0_le = (np.array([0, 3])).astype('int64')

Note that in the functioning implementation of the package we start from 1, saving 0 for padding, i.e. unseen values. 

Now, let's see if the two implementations are equivalent

In [5]:
# we have 6 different values. Let's assume we are performing a regression, so pred_dim = 1
lin = nn.Linear(6, 1)

In [6]:
emb = nn.Embedding(6, 1) 
emb.weight = nn.Parameter(lin.weight.reshape_as(emb.weight))

In [7]:
lin(torch.tensor(obs_0_oh))

tensor([-0.2856], grad_fn=<AddBackward0>)

In [8]:
emb(torch.tensor(obs_0_le)).sum() + lin.bias

tensor([-0.2856], grad_fn=<AddBackward0>)

And this is precisely how the linear model `Wide` is implemented

In [10]:
from pytorch_widedeep.models import Wide

In [10]:
?Wide

In [11]:
wide = Wide(wide_dim=10, pred_dim=1)
wide

Wide(
  (wide_linear): Embedding(11, 1, padding_idx=0)
)

Note that even though the input dim is 10, the Embedding layer has 11 weights. Again, this is because we save `0` for padding, which is used for unseen values during the encoding process. 

As I mentioned, `deeptabular` has enough complexity on its own and it will be described in a separated notebook. Let's then jump to `deeptext`.

###  3. `deeptext`

`pytorch-widedeep` offers one model that can be passed to `WideDeep` as the `deeptext` component, `DeepText`, which is a standard and simple stack of LSTMs on top of word embeddings. You could also add a FC-Head on top of the LSTMs. The word embeddings can be pre-trained. In the future I aim to include some simple pretrained models so that the combination between text and images is fair.  

*While I recommend using the `wide` and `deeptabular` models within this package when building the corresponding wide and deep model components, it is very likely that the user will want to use custom text and image models. That is perfectly possible. Simply, build them and pass them as the corresponding parameters. Note that the custom models MUST return a last layer of activations (i.e. not the final prediction) so that  these activations are collected by `WideDeep` and combined accordingly. In  addition, the models MUST also contain an attribute `output_dim` with the size of these last layers of activations.*

Let's have a look to the `DeepText` class within `pytorch-widedeep`

In [12]:
import torch
from pytorch_widedeep.models import DeepText

In [13]:
?DeepText

In [14]:
X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)

In [15]:
deeptext = DeepText(vocab_size=4, hidden_dim=4, n_layers=1, padding_idx=0, embed_dim=4)

  "num_layers={}".format(dropout, num_layers))


In [16]:
deeptext

DeepText(
  (word_embed): Embedding(4, 4, padding_idx=0)
  (rnn): LSTM(4, 4, batch_first=True, dropout=0.1)
)

You could, if you wanted, add a Fully Connected Head (FC-Head) on top of it

In [17]:
deeptext = DeepText(vocab_size=4, hidden_dim=8, n_layers=1, padding_idx=0, embed_dim=4, head_hidden_dims=[8,4])

In [18]:
deeptext

DeepText(
  (word_embed): Embedding(4, 4, padding_idx=0)
  (rnn): LSTM(4, 8, batch_first=True, dropout=0.1)
  (texthead): MLP(
    (mlp): Sequential(
      (dense_layer_0): Sequential(
        (0): Linear(in_features=8, out_features=4, bias=True)
        (1): ReLU(inplace=True)
      )
    )
  )
)

Note that since the FC-Head will receive the activations from the last hidden layer of the stack of RNNs, the corresponding dimensions must be consistent.

###  4. DeepImage

Similarly to `deeptext`, `pytorch-widedeep` offers one model that can be passed to `WideDeep` as the `deepimage` component, `DeepImage`, which iseither a pretrained ResNet (18, 34, or 50. Default is 18) or a stack of CNNs, to which one can add a FC-Head. If is a pretrained ResNet, you can chose how many layers you want to defrost deep into the network with the parameter `freeze_n`  

In [19]:
from pytorch_widedeep.models import DeepImage

In [20]:
?DeepImage

In [21]:
X_img = torch.rand((2,3,224,224))

In [22]:
deepimage = DeepImage(head_hidden_dims=[512, 64, 8], head_activation="leaky_relu")

In [23]:
deepimage

DeepImage(
  (backbone): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace=True)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=Tr

In [24]:
deepimage(X_img)

tensor([[-1.0338e-03,  5.0809e-01, -5.1775e-04,  2.8709e-01, -5.5744e-03,
         -5.9626e-03,  1.2294e-01,  1.6768e-01],
        [-1.1770e-03,  3.8934e-02, -2.4541e-03,  6.6003e-03, -4.3299e-03,
         -5.0524e-03,  4.7879e-03, -3.5898e-04]], grad_fn=<LeakyReluBackward1>)

if `pretrained=False` then a stack of 4 CNNs are used

In [25]:
deepimage = DeepImage(pretrained=False, head_hidden_dims=[512, 64, 8])

In [26]:
deepimage

DeepImage(
  (backbone): Sequential(
    (0): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
      (2): LeakyReLU(negative_slope=0.1, inplace=True)
      (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
    (1): Sequential(
      (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
      (2): LeakyReLU(negative_slope=0.1, inplace=True)
    )
    (2): Sequential(
      (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.01, affine=True, track_running_stats=True)
      (2): LeakyReLU(negative_slope=0.1, inplace=True)
    )
    (3): Sequential(
      (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.01, affine=T

###  5. deephead

The `deephead` component is not defined outside `WideDeep` as the rest of the components. 

When defining the `WideDeep` model there is a parameter called `head_layers_dim` (and the corresponding related parameters. See the package documentation) that define the FC-head on top of `DeeDense`, `DeepText` and `DeepImage`. 

Of course, you could also chose to define it yourself externally and pass it using the parameter `deephead`. Have a look

In [27]:
from pytorch_widedeep.models import WideDeep

In [32]:
?WideDeep