## Simple Binary Classification with defaults

In this notebook we will use the Adult Census dataset. Download the data from [here](https://www.kaggle.com/wenruliu/adult-income-dataset/downloads/adult.csv/2).

In [1]:
import numpy as np
import pandas as pd
import torch

from pytorch_widedeep.preprocessing import WidePreprocessor, TabPreprocessor
from pytorch_widedeep.training import Trainer
from pytorch_widedeep.models import Wide, TabMlp, TabResnet, TabTransformer, WideDeep
from pytorch_widedeep.metrics import Accuracy, Precision

  return f(*args, **kwds)


In [2]:
df = pd.read_csv('data/adult/adult.csv.zip')
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K


In [3]:
# For convenience, we'll replace '-' with '_'
df.columns = [c.replace("-", "_") for c in df.columns]
#binary target
df['income_label'] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
df.drop('income', axis=1, inplace=True)
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,educational_num,marital_status,occupation,relationship,race,gender,capital_gain,capital_loss,hours_per_week,native_country,income_label
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,0
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,0
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,1
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,1
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,0


### Preparing the data

Have a look to notebooks one and two if you want to get a good understanding of the next few lines of code (although there is no need to use the package)

In [4]:
wide_cols = ['education', 'relationship','workclass','occupation','native_country','gender']
crossed_cols = [('education', 'occupation'), ('native_country', 'occupation')]
cat_embed_cols = [('education',16), ('relationship',8), ('workclass',16), ('occupation',16),('native_country',16)]
continuous_cols = ["age","hours_per_week"]
target_col = 'income_label'

In [5]:
# TARGET
target = df[target_col].values

# wide
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_wide = wide_preprocessor.fit_transform(df)

# deeptabular
tab_preprocessor = TabPreprocessor(embed_cols=cat_embed_cols, continuous_cols=continuous_cols)
X_tab = tab_preprocessor.fit_transform(df)

In [6]:
print(X_wide)
print(X_wide.shape)

[[  1  17  23 ...  89  91 316]
 [  2  18  23 ...  89  92 317]
 [  3  18  24 ...  89  93 318]
 ...
 [  2  20  23 ...  90 103 323]
 [  2  17  23 ...  89 103 323]
 [  2  21  29 ...  90 115 324]]
(48842, 8)


In [7]:
print(X_tab)
print(X_tab.shape)

[[ 1.          1.          1.         ...  1.         -0.99512893
  -0.03408696]
 [ 2.          2.          1.         ...  1.         -0.04694151
   0.77292975]
 [ 3.          2.          2.         ...  1.         -0.77631645
  -0.03408696]
 ...
 [ 2.          4.          1.         ...  1.          1.41180837
  -0.03408696]
 [ 2.          1.          1.         ...  1.         -1.21394141
  -1.64812038]
 [ 2.          5.          7.         ...  1.          0.97418341
  -0.03408696]]
(48842, 7)


### Defining the model

In [8]:
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
deeptabular = TabMlp(mlp_hidden_dims=[64,32], 
                      column_idx=tab_preprocessor.column_idx,
                      embed_input=tab_preprocessor.embeddings_input,
                      continuous_cols=continuous_cols)
model = WideDeep(wide=wide, deeptabular=deeptabular)

In [9]:
model

WideDeep(
  (wide): Wide(
    (wide_linear): Embedding(797, 1, padding_idx=0)
  )
  (deeptabular): Sequential(
    (0): TabMlp(
      (embed_layers): ModuleDict(
        (emb_layer_education): Embedding(17, 16, padding_idx=0)
        (emb_layer_native_country): Embedding(43, 16, padding_idx=0)
        (emb_layer_occupation): Embedding(16, 16, padding_idx=0)
        (emb_layer_relationship): Embedding(7, 8, padding_idx=0)
        (emb_layer_workclass): Embedding(10, 16, padding_idx=0)
      )
      (embedding_dropout): Dropout(p=0.1, inplace=False)
      (tab_mlp): MLP(
        (mlp): Sequential(
          (dense_layer_0): Sequential(
            (0): Dropout(p=0.1, inplace=False)
            (1): Linear(in_features=74, out_features=64, bias=True)
            (2): ReLU(inplace=True)
          )
          (dense_layer_1): Sequential(
            (0): Dropout(p=0.1, inplace=False)
            (1): Linear(in_features=64, out_features=32, bias=True)
            (2): ReLU(inplace=True)
     

As you can see, the model is not particularly complex. In mathematical terms (Eq 3 in the [original paper](https://arxiv.org/pdf/1606.07792.pdf)): 

$$
pred = \sigma(W^{T}_{wide}[x, \phi(x)] + W^{T}_{deep}a_{deep}^{(l_f)} +  b) 
$$ 


The architecture above will output the 1st and the second term in the parenthesis. `WideDeep` will then add them and apply an activation function (`sigmoid` in this case). For more details, please refer to the paper.

### Training
Once the model is built, we just need to compile it and run it

In [10]:
trainer = Trainer(model, objective='binary', metrics=[Accuracy, Precision])

In [11]:
trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, n_epochs=5, batch_size=64, val_split=0.2)

epoch 1: 100%|██████████| 611/611 [00:06<00:00, 91.05it/s, loss=0.405, metrics={'acc': 0.8076, 'prec': 0.6255}] 
valid: 100%|██████████| 153/153 [00:00<00:00, 155.18it/s, loss=0.36, metrics={'acc': 0.8346, 'prec': 0.6752}] 
epoch 2: 100%|██████████| 611/611 [00:06<00:00, 93.72it/s, loss=0.362, metrics={'acc': 0.8288, 'prec': 0.6783}] 
valid: 100%|██████████| 153/153 [00:00<00:00, 159.87it/s, loss=0.355, metrics={'acc': 0.8354, 'prec': 0.6706}]
epoch 3: 100%|██████████| 611/611 [00:06<00:00, 98.04it/s, loss=0.356, metrics={'acc': 0.8333, 'prec': 0.6885}] 
valid: 100%|██████████| 153/153 [00:00<00:00, 153.29it/s, loss=0.352, metrics={'acc': 0.8357, 'prec': 0.6653}]
epoch 4: 100%|██████████| 611/611 [00:06<00:00, 97.35it/s, loss=0.351, metrics={'acc': 0.8349, 'prec': 0.6918}] 
valid: 100%|██████████| 153/153 [00:00<00:00, 157.49it/s, loss=0.351, metrics={'acc': 0.8368, 'prec': 0.6703}]
epoch 5: 100%|██████████| 611/611 [00:06<00:00, 96.26it/s, loss=0.35, metrics={'acc': 0.8359, 'prec': 0.

As you can see, you can run a wide and deep model in just a few lines of code. 

Using `TabResnet` as the `deeptabular` component

In [12]:
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
deeptabular = TabResnet(blocks_dims=[64,32], 
                        column_idx=tab_preprocessor.column_idx,
                        embed_input=tab_preprocessor.embeddings_input,
                        continuous_cols=continuous_cols)
model = WideDeep(wide=wide, deeptabular=deeptabular)

In [13]:
trainer = Trainer(model, objective='binary', metrics=[Accuracy, Precision])

In [14]:
trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, n_epochs=5, batch_size=64, val_split=0.2)

epoch 1: 100%|██████████| 611/611 [00:08<00:00, 71.51it/s, loss=0.398, metrics={'acc': 0.811, 'prec': 0.6357}] 
valid: 100%|██████████| 153/153 [00:01<00:00, 139.38it/s, loss=0.36, metrics={'acc': 0.832, 'prec': 0.6827}]  
epoch 2: 100%|██████████| 611/611 [00:08<00:00, 73.47it/s, loss=0.369, metrics={'acc': 0.8241, 'prec': 0.6637}]
valid: 100%|██████████| 153/153 [00:01<00:00, 103.06it/s, loss=0.354, metrics={'acc': 0.836, 'prec': 0.6872}] 
epoch 3: 100%|██████████| 611/611 [00:10<00:00, 59.24it/s, loss=0.36, metrics={'acc': 0.8287, 'prec': 0.6749}] 
valid: 100%|██████████| 153/153 [00:01<00:00, 151.05it/s, loss=0.351, metrics={'acc': 0.838, 'prec': 0.6892}] 
epoch 4: 100%|██████████| 611/611 [00:08<00:00, 71.69it/s, loss=0.354, metrics={'acc': 0.8324, 'prec': 0.682}] 
valid: 100%|██████████| 153/153 [00:00<00:00, 158.34it/s, loss=0.348, metrics={'acc': 0.8397, 'prec': 0.7025}]
epoch 5: 100%|██████████| 611/611 [00:09<00:00, 62.20it/s, loss=0.351, metrics={'acc': 0.8333, 'prec': 0.684

Using `TabTransformer` as the `deeptabular` component

In [15]:
# for TabTransformer we only need the names of the columns
cat_embed_cols_for_transformer = [el[0] for el in cat_embed_cols]

In [16]:
cat_embed_cols_for_transformer

['education', 'relationship', 'workclass', 'occupation', 'native_country']

In [17]:
# deeptabular
tab_preprocessor = TabPreprocessor(embed_cols=cat_embed_cols_for_transformer, 
                                   continuous_cols=continuous_cols, 
                                   for_tabtransformer=True)
X_tab = tab_preprocessor.fit_transform(df)



In [18]:
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
deeptabular = TabTransformer(column_idx=tab_preprocessor.column_idx,
                             embed_input=tab_preprocessor.embeddings_input,
                             continuous_cols=continuous_cols)
model = WideDeep(wide=wide, deeptabular=deeptabular)

In [19]:
trainer = Trainer(model, objective='binary', metrics=[Accuracy, Precision])

In [21]:
trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, n_epochs=2, batch_size=512, val_split=0.2)

epoch 1: 100%|██████████| 77/77 [00:14<00:00,  5.14it/s, loss=0.356, metrics={'acc': 0.8318, 'prec': 0.6798}]
valid: 100%|██████████| 20/20 [00:01<00:00, 14.60it/s, loss=0.352, metrics={'acc': 0.8361, 'prec': 0.6662}]
epoch 2: 100%|██████████| 77/77 [00:15<00:00,  5.02it/s, loss=0.353, metrics={'acc': 0.8338, 'prec': 0.6832}]
valid: 100%|██████████| 20/20 [00:01<00:00, 14.68it/s, loss=0.352, metrics={'acc': 0.8359, 'prec': 0.66}]  


Also mentioning that one could build a model with the individual components independently. For example, a model comprised only by the `wide` component would be simply a linear model. This could be attained by just:

In [22]:
model = WideDeep(wide=wide)

In [23]:
trainer = Trainer(model, objective='binary', metrics=[Accuracy, Precision])

In [24]:
trainer.fit(X_wide=X_wide, target=target, n_epochs=5, batch_size=64, val_split=0.2)

epoch 1: 100%|██████████| 611/611 [00:03<00:00, 160.60it/s, loss=0.603, metrics={'acc': 0.6775, 'prec': 0.3213}]
valid: 100%|██████████| 153/153 [00:00<00:00, 212.39it/s, loss=0.483, metrics={'acc': 0.7679, 'prec': 0.528}] 
epoch 2: 100%|██████████| 611/611 [00:03<00:00, 174.65it/s, loss=0.448, metrics={'acc': 0.7904, 'prec': 0.6199}]
valid: 100%|██████████| 153/153 [00:00<00:00, 224.30it/s, loss=0.418, metrics={'acc': 0.807, 'prec': 0.6645}] 
epoch 3: 100%|██████████| 611/611 [00:03<00:00, 171.36it/s, loss=0.405, metrics={'acc': 0.8142, 'prec': 0.6749}]
valid: 100%|██████████| 153/153 [00:00<00:00, 221.29it/s, loss=0.393, metrics={'acc': 0.8231, 'prec': 0.6871}]
epoch 4: 100%|██████████| 611/611 [00:03<00:00, 174.61it/s, loss=0.386, metrics={'acc': 0.8237, 'prec': 0.6927}]
valid: 100%|██████████| 153/153 [00:00<00:00, 220.15it/s, loss=0.38, metrics={'acc': 0.8295, 'prec': 0.7002}] 
epoch 5: 100%|██████████| 611/611 [00:03<00:00, 173.50it/s, loss=0.376, metrics={'acc': 0.8269, 'prec': 

The only requisite is that the model component must be passed to `WideDeep` before "fed" to the `Trainer`. This is because the `Trainer` is coded so that it trains a model that has a parent called `model` and then children that correspond to the model components: `wide`,  `deeptabular`, `deeptext` and `deepimage`. Also, `WideDeep` builds the last connection between the output of those components and the final, output neuron(s).