# "pytorch-widedeep: deep learning for tabular data"
> a flexible package to combine tabular data with text and images using wide and deep models.

- author: Javier Rodriguez
- toc: true 
- badges: true
- comments: true

In this post I describe the main components of the `Python` library `pytorch-widedeep`, which is intended to be a flexible package to use Deep Learning (hereafter DL) with tabular data and combine it with text and images via wide and deep models. `pytorch-widedeep` is based on Heng-Tze Cheng et al., 2016 [paper](https://arxiv.org/abs/1606.07792). 

## 1. Installation 

To install the package simply use pip:

```bash
pip install pytorch-widedeep
```

or directly from github

```bash
pip install git+https://github.com/jrzaurin/pytorch-widedeep.git
```

**Important note for Mac Users**

Note that the following comments are not directly related to the package, but to the interplay between `pytorch` and OSX (more precisely `pytorch`'s dependency on `OpenMP` I believe) and in general parallel processing in Mac. 

In the first place, at the time of writing the latest `pytorch` version is `1.7`. This version is known to have some [issues](https://stackoverflow.com/questions/64772335/pytorch-w-parallelnative-cpp206) when running on Mac and the data-loaders might not run in parallel. On the other hand, since `Python 3.8` the `multiprocessing` library start method changed from ['fork' to 'spawn'](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). This also affects the data-loaders (for any torch version) and they will not run in parallel. 

Therefore, for Mac users I suggest using `python 3.7` and `torch <= 1.6` (with its corresponding `torchvision` version, i.e. `<= 0.7.0`). I could have enforced this versioning via the `setup.py` file. However, there are a number of unknowns and I preferred to leave it as it is. For example I developed the package using macOS Catalina and maybe some of this issues are not present in the new release Big Sur. Also, I hope that they release soon a patch for `pytorch 1.7` and some, if not all these problems disappear. 

Installing `pytorch-widedeep` via `pip` will install the latest version. Therefore, if these problems are present and the dataloaders do not run in parallel, one can easily downgrade manually: 

```bash
pip install torch==1.6.0 torchvision==0.7.0
```

*None of these issues affect Linux users*

## 2.`pytorch-widedeep`  DL Architectures

As I mentioned earlier `pytorch-widedeep` combines tabular data with text and images via wide and deep models. With that in mind, the two main architectures one can build with a few lines of code using `pytorch-widedeep` are:


<p align="center">
  <img width="700" src="figures/pytorch-widedeep/arch_1.png">
</p>

**Architecture 1**: architecture 1 combines the `Wide`, linear model with the outputs from the `DeepDense` or `DeepDenseResnet`, `DeepText` and `DeepImage` components connected to a final output neuron or neurons, depending on whether we are performing a binary classification or regression, or a multi-class classification. The components within the faded-pink rectangles are concatenated. Later in the post I will describe in detail each of the components, for now, let's just move on.

In math terms, and following the notation in the [paper](https://arxiv.org/abs/1606.07792), Architecture 1 can be formulated as:

<p align="center">
  <img width="500" src="figures/pytorch-widedeep/architecture_1_math.png">
</p>


Where $W$ are the weight matrices applied to the wide model and to the final activations of the deep models, '$a$' are these final activations, and $\phi(x)$ are the cross product transformations of the original features '$x$'. In case you are wondering what are *"cross product transformations"*, here is a quote taken directly from the paper: *"For binary features, a cross-product transformation (e.g., “AND(gender=female, language=en)”) is 1 if and only if the constituent features (“gender=female” and “language=en”) are all 1, and 0 otherwise"*.

<p align="center">
  <img width="700" src="figures/pytorch-widedeep/arch_2.png">
</p>

**Architecture 2**: architecture 2 combines the `Wide`, linear model with the `Deep` components of the model connected to the output neuron(s), after the different Deep components have been themselves combined through a FC-Head (that I refer as `DeepHead`).

In math terms, and following the notation in the [paper](https://arxiv.org/abs/1606.07792), Architecture 2 can be formulated as:

<p align="center">
  <img width="300" src="figures/pytorch-widedeep/architecture_2_math.png">
</p>

Is imporrtant to metion that each individual component, `wide`, `deepdense` (either `DeepDense` or `DeepDenseResnet`), `deeptext` and `deepimage`, can be used independently and in isolation. For example, one could use only `wide`, which is in simply a linear model. Or use `DeepDense` which is in essence a similar implementation to that of the [Tabular](https://docs.fast.ai/tabular.learner) API in the fastai library (which I strongly recommend).

## 3. Quick start

Before diving into the details of the library let's just say that you just want to quickly run one example and get the feel of how `pytorch-widedeep` works. Let go through a quick example using the adult census dataset. In this example we will be fitting a model comprised by a `Wide` and `DeepDense` components

In [1]:
#collapse-hide
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
#collapse-hide
adult = pd.read_csv("data/adult/adult.csv.zip")
adult.columns = [c.replace("-", "_") for c in adult.columns]
adult["income_label"] = (adult["income"].apply(lambda x: ">50K" in x)).astype(int)
adult.drop("income", axis=1, inplace=True)

for c in adult.columns:
    if adult[c].dtype == 'O':
        adult[c] = adult[c].apply(lambda x: "unknown" if x == "?" else x)
        adult[c] = adult[c].str.lower()

In [3]:
adult.head()

Unnamed: 0,age,workclass,fnlwgt,education,educational_num,marital_status,occupation,relationship,race,gender,capital_gain,capital_loss,hours_per_week,native_country,income_label
0,25,private,226802,11th,7,never-married,machine-op-inspct,own-child,black,male,0,0,40,united-states,0
1,38,private,89814,hs-grad,9,married-civ-spouse,farming-fishing,husband,white,male,0,0,50,united-states,0
2,28,local-gov,336951,assoc-acdm,12,married-civ-spouse,protective-serv,husband,white,male,0,0,40,united-states,1
3,44,private,160323,some-college,10,married-civ-spouse,machine-op-inspct,husband,black,male,7688,0,40,united-states,1
4,18,unknown,103497,some-college,10,never-married,unknown,own-child,white,female,0,0,30,united-states,0


In [4]:
from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
from pytorch_widedeep.models import Wide, DeepDense, WideDeep
from pytorch_widedeep.metrics import Accuracy

adult_train, adult_test = train_test_split(adult, test_size=0.2, stratify=adult.income_label)

# prepare wide, crossed, embedding and continuous columns and target
wide_cols = ["education", "relationship", "workclass", "occupation", "native_country", "gender"]
cross_cols = [("education", "occupation"), ("native_country", "occupation")]
embed_cols = [("education", 10), ("workclass", 10), ("occupation", 10), ("native_country", 10)]
cont_cols = ["age", "hours_per_week"]
target = adult_train["income_label"].values

# wide component
preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
X_wide = preprocess_wide.fit_transform(adult_train)
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)

# deepdense component
preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)
X_deep = preprocess_deep.fit_transform(adult_train)
deepdense = DeepDense(hidden_layers=[64, 32], deep_column_idx=preprocess_deep.deep_column_idx, 
                      embed_input=preprocess_deep.embeddings_input, continuous_cols=cont_cols)

# build, compile and fit
model = WideDeep(wide=wide, deepdense=deepdense)
model.compile(method="binary", metrics=[Accuracy])
model.fit(X_wide=X_wide, X_deep=X_deep, target=target, n_epochs=2, batch_size=256) 

# predict
X_wide_te = preprocess_wide.transform(adult_test)
X_deep_te = preprocess_deep.transform(adult_test)
preds = model.predict(X_wide=X_wide_te, X_deep=X_deep_te)

  0%|          | 0/153 [00:00<?, ?it/s]

Training


epoch 1: 100%|██████████| 153/153 [00:02<00:00, 52.44it/s, loss=0.526, metrics={'acc': 0.7471}]
epoch 2: 100%|██████████| 153/153 [00:02<00:00, 57.72it/s, loss=0.409, metrics={'acc': 0.8116}]
predict: 100%|██████████| 39/39 [00:00<00:00, 196.34it/s]
