## Binary Classification with different optimizers, schedulers, etc.

In this notebook we will use the Adult Census dataset. Download the data from [here](https://www.kaggle.com/wenruliu/adult-income-dataset/downloads/adult.csv/2).

In [1]:
import numpy as np
import pandas as pd
import torch

from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
from pytorch_widedeep.models import Wide, DeepDense, WideDeep
from pytorch_widedeep.metrics import Accuracy, Recall

In [2]:
df = pd.read_csv('data/adult/adult.csv.zip')
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,educational-num,marital-status,occupation,relationship,race,gender,capital-gain,capital-loss,hours-per-week,native-country,income
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,<=50K
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,<=50K
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,>50K
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,>50K
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,<=50K


In [3]:
# For convenience, we'll replace '-' with '_'
df.columns = [c.replace("-", "_") for c in df.columns]
# binary target
df['income_label'] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
df.drop('income', axis=1, inplace=True)
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,educational_num,marital_status,occupation,relationship,race,gender,capital_gain,capital_loss,hours_per_week,native_country,income_label
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States,0
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States,0
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States,1
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States,1
4,18,?,103497,Some-college,10,Never-married,?,Own-child,White,Female,0,0,30,United-States,0


### Preparing the data

Have a look to notebooks one and two if you want to get a good understanding of the next few lines of code (although there is no need to use the package)

In [4]:
wide_cols = ['education', 'relationship','workclass','occupation','native_country','gender']
crossed_cols = [('education', 'occupation'), ('native_country', 'occupation')]
cat_embed_cols = [('education',16), ('relationship',8), ('workclass',16), ('occupation',16),('native_country',16)]
continuous_cols = ["age","hours_per_week"]
target_col = 'income_label'

In [5]:
# TARGET
target = df[target_col].values

# WIDE
preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_wide = preprocess_wide.fit_transform(df)

# DEEP
preprocess_deep = DensePreprocessor(embed_cols=cat_embed_cols, continuous_cols=continuous_cols)
X_deep = preprocess_deep.fit_transform(df)

In [6]:
print(X_wide)
print(X_wide.shape)

[[  1  17  23 ...  89  91 316]
 [  2  18  23 ...  89  92 317]
 [  3  18  24 ...  89  93 318]
 ...
 [  2  20  23 ...  90 103 323]
 [  2  17  23 ...  89 103 323]
 [  2  21  29 ...  90 115 324]]
(48842, 8)


In [7]:
print(X_deep)
print(X_deep.shape)

[[ 0.          0.          0.         ...  0.         -0.99512893
  -0.03408696]
 [ 1.          1.          0.         ...  0.         -0.04694151
   0.77292975]
 [ 2.          1.          1.         ...  0.         -0.77631645
  -0.03408696]
 ...
 [ 1.          3.          0.         ...  0.          1.41180837
  -0.03408696]
 [ 1.          0.          0.         ...  0.         -1.21394141
  -1.64812038]
 [ 1.          4.          6.         ...  0.          0.97418341
  -0.03408696]]
(48842, 7)


As you can see, you can run a wide and deep model in just a few lines of code

Let's now see how to use `WideDeep` with varying parameters

###  2.1 Dropout and Batchnorm

In [8]:
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
# We can add dropout and batchnorm to the dense layers
deepdense = DeepDense(hidden_layers=[64,32], dropout=[0.5, 0.5], batchnorm=True,
                      deep_column_idx=preprocess_deep.deep_column_idx,
                      embed_input=preprocess_deep.embeddings_input,
                      continuous_cols=continuous_cols)
model = WideDeep(wide=wide, deepdense=deepdense)

In [9]:
model

WideDeep(
  (wide): Wide(
    (wide_linear): Embedding(797, 1, padding_idx=0)
  )
  (deepdense): Sequential(
    (0): DeepDense(
      (embed_layers): ModuleDict(
        (emb_layer_education): Embedding(17, 16)
        (emb_layer_native_country): Embedding(43, 16)
        (emb_layer_occupation): Embedding(16, 16)
        (emb_layer_relationship): Embedding(7, 8)
        (emb_layer_workclass): Embedding(10, 16)
      )
      (embed_dropout): Dropout(p=0.0, inplace=False)
      (dense): Sequential(
        (dense_layer_0): Sequential(
          (0): Linear(in_features=74, out_features=64, bias=True)
          (1): LeakyReLU(negative_slope=0.01, inplace=True)
          (2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (3): Dropout(p=0.5, inplace=False)
        )
        (dense_layer_1): Sequential(
          (0): Linear(in_features=64, out_features=32, bias=True)
          (1): LeakyReLU(negative_slope=0.01, inplace=True)
          (2): BatchN

We can use different initializers, optimizers and learning rate schedulers for each `branch` of the model

###  Optimizers, LR schedulers, Initializers and Callbacks

In [10]:
from pytorch_widedeep.initializers import KaimingNormal, XavierNormal
from pytorch_widedeep.callbacks import ModelCheckpoint, LRHistory, EarlyStopping
from pytorch_widedeep.optim import RAdam

In [15]:
# Optimizers
wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.03)
deep_opt = RAdam(model.deepdense.parameters(), lr=0.01)
# LR Schedulers
wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=3)
deep_sch = torch.optim.lr_scheduler.StepLR(deep_opt, step_size=5)

the component-dependent settings must be passed as dictionaries, while general settings are simply lists

In [16]:
# Component-dependent settings as Dict
optimizers = {'wide': wide_opt, 'deepdense':deep_opt}
schedulers = {'wide': wide_sch, 'deepdense':deep_sch}
initializers = {'wide': KaimingNormal, 'deepdense':XavierNormal}
# General settings as List
callbacks = [LRHistory(n_epochs=10), EarlyStopping, ModelCheckpoint(filepath='model_weights/wd_out')]
metrics = [Accuracy, Recall]

In [17]:
model.compile(method='binary', optimizers=optimizers, lr_schedulers=schedulers, 
              initializers=initializers,
              callbacks=callbacks,
              metrics=metrics)

In [18]:
model.fit(X_wide=X_wide, X_deep=X_deep, target=target, n_epochs=10, batch_size=256, val_split=0.2)

  0%|          | 0/153 [00:00<?, ?it/s]

Training


epoch 1: 100%|██████████| 153/153 [00:02<00:00, 72.33it/s, loss=0.503, metrics={'acc': 0.7885, 'rec': 0.4864}]
valid: 100%|██████████| 39/39 [00:00<00:00, 127.72it/s, loss=0.386, metrics={'acc': 0.7962, 'rec': 0.4998}]
epoch 2: 100%|██████████| 153/153 [00:02<00:00, 71.76it/s, loss=0.374, metrics={'acc': 0.8268, 'rec': 0.5242}]
valid: 100%|██████████| 39/39 [00:00<00:00, 126.72it/s, loss=0.372, metrics={'acc': 0.8277, 'rec': 0.5281}]
epoch 3: 100%|██████████| 153/153 [00:02<00:00, 73.21it/s, loss=0.367, metrics={'acc': 0.8298, 'rec': 0.5242}]
valid: 100%|██████████| 39/39 [00:00<00:00, 126.68it/s, loss=0.37, metrics={'acc': 0.8303, 'rec': 0.5279}]
epoch 4: 100%|██████████| 153/153 [00:02<00:00, 71.37it/s, loss=0.36, metrics={'acc': 0.8319, 'rec': 0.5372}] 
valid: 100%|██████████| 39/39 [00:00<00:00, 128.64it/s, loss=0.369, metrics={'acc': 0.8324, 'rec': 0.5412}]
epoch 5: 100%|██████████| 153/153 [00:02<00:00, 71.53it/s, loss=0.359, metrics={'acc': 0.8322, 'rec': 0.5378}]
valid: 100%|██

In [19]:
dir(model)

['__call__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattr__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_apply',
 '_backward_hooks',
 '_buffers',
 '_forward_hooks',
 '_forward_pre_hooks',
 '_get_name',
 '_load_from_state_dict',
 '_load_state_dict_pre_hooks',
 '_loss_fn',
 '_lr_scheduler_step',
 '_modules',
 '_named_members',
 '_parameters',
 '_predict',
 '_register_load_state_dict_pre_hook',
 '_register_state_dict_hook',
 '_replicate_for_data_parallel',
 '_save_to_state_dict',
 '_slow_forward',
 '_state_dict_hooks',
 '_train_val_split',
 '_training_step',
 '_validation_step',
 '_version',
 '_warm_up',
 'add_module',
 'apply',
 'batch_size',
 'bfloat16',
 'buffers',
 'callback_c

You see that, among many methods and attributes we have the `history` and `lr_history` attributes

In [20]:
model.history.epoch

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [21]:
print(model.history._history)

{'train_loss': [0.5026861273385341, 0.37383826573689777, 0.36658557158669614, 0.3601557047538508, 0.3594148938172783, 0.35907501001763187, 0.358282413942362, 0.35823015644659406, 0.35819698957835927, 0.3581014702133104], 'train_acc': [0.788549637857344, 0.8267857599877153, 0.8297545619737414, 0.8318787909809843, 0.8321859084278146, 0.832237094668953, 0.832902515803752, 0.8329792951654595, 0.8329537020448904, 0.8329281089243211], 'train_rec': [0.48636218905448914, 0.5242272019386292, 0.5242272019386292, 0.5371697545051575, 0.5378115177154541, 0.5361000895500183, 0.5396299362182617, 0.5373836755752563, 0.5368488430976868, 0.5369558334350586], 'val_loss': [0.38589231249613637, 0.371902360365941, 0.36999432627971357, 0.36935041348139447, 0.3691598016482133, 0.36905216712218064, 0.36900061674607104, 0.36898223635477895, 0.36896658937136334, 0.36896434120642835], 'val_acc': [0.79624094017444, 0.8277302321772245, 0.8302895049342779, 0.832357397321977, 0.8325211907784285, 0.8325826133245977, 0

In [22]:
print(model.lr_history)

{'lr_wide_0': [0.03, 0.03, 0.03, 0.003, 0.003, 0.003, 0.00030000000000000003, 0.00030000000000000003, 0.00030000000000000003, 3.0000000000000004e-05], 'lr_deepdense_0': [0.01, 0.01, 0.01, 0.01, 0.01, 0.001, 0.001, 0.001, 0.001, 0.001]}


We can see that the learning rate effectively decreases by a factor of 0.1 (the default) after the corresponding `step_size`. Note that the keys of the dictionary have a suffix `_0`. This is because if you pass different parameter groups to the torch optimizers, these will also be recorded. We'll see this in the `Regression` notebook. 

And I guess one has a good idea of how to use the package. Before we leave this notebook just mentioning that the `WideDeep` class comes with a useful method to "rescue" the learned embeddings. For example, let's say I want to use the embeddings learned for the different levels of the categorical feature `education`

In [23]:
model.get_embeddings(col_name='education', cat_encoding_dict=preprocess_deep.label_encoder.encoding_dict)

{'11th': array([ 0.33238176,  0.02123132,  0.42671534, -0.16836806,  0.04070434,
         0.21476945, -0.05866506,  0.09599391,  0.21264766, -0.08261641,
        -0.4364204 ,  0.5176953 , -0.17785792,  0.1990719 ,  0.05055304,
        -0.05390744], dtype=float32),
 'HS-grad': array([ 0.1851779 , -0.0601109 , -0.04134565, -0.17099169,  0.01647249,
         0.1691518 , -0.03775224, -0.01711482, -0.13714994, -0.02202759,
        -0.2350222 ,  0.20368417,  0.06420711,  0.08465873,  0.11443923,
        -0.28585908], dtype=float32),
 'Assoc-acdm': array([-0.2891686 , -0.25329128, -0.03977084,  0.34204823,  0.4393897 ,
         0.24583909, -0.08771466,  0.3398704 ,  0.06197336, -0.09200054,
         0.13266966, -0.27940965, -0.10639463,  0.16516595,  0.20191231,
        -0.11804624], dtype=float32),
 'Some-college': array([ 0.17284533, -0.34509236, -0.22175975, -0.11192639,  0.14154772,
         0.04188053,  0.14860624,  0.28312132,  0.06071718, -0.10315312,
        -0.05902205, -0.03197744, 