### Chapter 15

### 15.1 Wide and Deep Regression:

Having gone through Chapter 14, this Chapter will be mostly code, running a series of experiments with a brief discussion on the results. 

let's start as always loading the required libraries and defining some paths.

In [1]:
import pandas as pd
import numpy as np
import pickle
import os
import torch
import torch.nn  as nn
import torch.nn.functional as F

from torch.optim.lr_scheduler import StepLR, MultiStepLR
from torch.utils.data import DataLoader
from recutils.wide_deep import WideDeepLoader, WideDeep
from recutils.average_precision import mapk

WD_DIR = "../datasets/Ponpare/data_processed/wide_deep"
wd_dataset_fname = "wd_dataset.p"
wd_interactions_fname = "interactions_dict.p"

The `WideDeepLoader` and `WideDeep` classes are well explained (I think...) in this [repo](https://github.com/jrzaurin/Wide-and-Deep-PyTorch) that I wrote a while ago. 

In [2]:
wd_dataset = pickle.load(open(os.path.join(WD_DIR,wd_dataset_fname), "rb"))
wd_interactions = pickle.load(open(os.path.join(WD_DIR,wd_interactions_fname), "rb"))

Let's define the model inputs

In [3]:
# model inputs
wide_dim = wd_dataset['train_dataset']['wide'].shape[1]
deep_column_idx = wd_dataset['deep_column_idx']
continuous_cols = wd_dataset['continuous_cols']
embeddings_input= wd_dataset['embeddings_input']
encoding_dict   = wd_dataset['encoding_dict']

And prepare the datasets to be loaded to the model

In [4]:
# Interactions during "testing period"
df_all_interactions = wd_interactions['all_valid_interactions']

# datasets
train_dataset = wd_dataset['train_dataset']
widedeep_dataset_tr = WideDeepLoader(train_dataset)

valid_dataset = wd_dataset['valid_dataset']
widedeep_dataset_val = WideDeepLoader(valid_dataset)

test_dataset = wd_dataset['test_dataset']
widedeep_dataset_te = WideDeepLoader(test_dataset, mode='test')

If you ever decided to go to production with a similar solution to the one presented here (DL-based), **a proper optimization is required**. If you dive deep into the code, you will realize that it is not an easy (and quick) excercise. For the time being, let's manually define 5 settings:

In [5]:
# Let's manually define some model set_ups for the experiment
set_ups = {}
set_ups['set_up_1'] = {}
set_ups['set_up_1']['batch_size'] = 4096
set_ups['set_up_1']['lr'] = 0.01
set_ups['set_up_1']['hidden_layers'] = [50, 25]
set_ups['set_up_1']['dropout'] = [0.5, 0.2]
set_ups['set_up_1']['n_epochs'] = 3

set_ups['set_up_2'] = {}
set_ups['set_up_2']['batch_size'] = 4096
set_ups['set_up_2']['lr'] = 0.01
set_ups['set_up_2']['hidden_layers'] = [100, 50]
set_ups['set_up_2']['dropout'] = [0.5, 0.5]
set_ups['set_up_2']['n_epochs'] = 6

set_ups['set_up_3'] = {}
set_ups['set_up_3']['batch_size'] = 8192
set_ups['set_up_3']['lr'] = 0.05
set_ups['set_up_3']['hidden_layers'] = [100, 100, 100]
set_ups['set_up_3']['dropout'] = [0.5, 0.5, 0.5]
set_ups['set_up_3']['n_epochs'] = 10

set_ups['set_up_4'] = {}
set_ups['set_up_4']['batch_size'] = 8192
set_ups['set_up_4']['lr'] = 0.05
set_ups['set_up_4']['hidden_layers'] = [100, 50, 25]
set_ups['set_up_4']['dropout'] = [0.5, 0.2, 0]
set_ups['set_up_4']['n_epochs'] = 10

set_ups['set_up_5'] = {}
set_ups['set_up_5']['batch_size'] = 9216
set_ups['set_up_5']['lr'] = 0.05
set_ups['set_up_5']['hidden_layers'] = [100, 50]
set_ups['set_up_5']['dropout'] = [0.5, 0.2]
set_ups['set_up_5']['n_epochs'] = 5

Due to the high skewness of the interest-distribution to low values batch sizes need to be large, so the algorithm learns something everytime it sees a batch. Nonetheless, feel free to add any set up and see how it goes. 

Without further ado, let's run the experiments:

In [6]:
results = {}
for set_up_name, params in set_ups.items():
    print("INFO: {}".format(set_up_name))

    batch_size = params['batch_size']
    hidden_layers = params['hidden_layers']
    dropout = params['dropout']
    n_epochs = params['n_epochs']
    lr = params['lr']

    train_loader = DataLoader(dataset=widedeep_dataset_tr,
        batch_size=batch_size,
        shuffle=True,
        num_workers=4)

    eval_loader = DataLoader(dataset=widedeep_dataset_val,
        batch_size=batch_size,
        shuffle=True,
        num_workers=4)

    test_loader = DataLoader(dataset=widedeep_dataset_te,
        batch_size=batch_size,
        shuffle=False,
        num_workers=4)

    model = WideDeep(
        wide_dim,
        embeddings_input,
        continuous_cols,
        deep_column_idx,
        hidden_layers,
        dropout,
        encoding_dict
        )
    model.cuda()

    criterion = F.mse_loss
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    # Adding schedulers. These need to be (obviously) define after the optimizer. 
    # Therefore I can include them as part of the set ups
    if set_up_name is 'set_up_1':
        lr_scheduler = None
    elif set_up_name is 'set_up_2':
        lr_scheduler = StepLR(optimizer, step_size=2, gamma=0.5)
    elif set_up_name is 'set_up_3':
        lr_scheduler = MultiStepLR(optimizer, milestones=[3,8], gamma=0.1)
    elif set_up_name is 'set_up_4':
        lr_scheduler = MultiStepLR(optimizer, milestones=[3,8], gamma=0.1)
    elif set_up_name is 'set_up_5':
        lr_scheduler = MultiStepLR(optimizer, milestones=[2,4], gamma=0.1)

    model.fit(
        train_loader,
        criterion,
        optimizer,
        n_epochs=n_epochs,
        eval_loader=eval_loader,
        lr_scheduler=lr_scheduler
        )
    preds = model.predict(test_loader)

    df_all_interactions['interest'] = preds
    df_ranked = df_all_interactions.sort_values(['user_id_hash', 'interest'], ascending=[False, False])
    df_ranked = (df_ranked
        .groupby('user_id_hash')['coupon_id_hash']
        .apply(list)
        .reset_index())
    recomendations_dict = pd.Series(df_ranked.coupon_id_hash.values,
        index=df_ranked.user_id_hash).to_dict()
    true_valid_interactions = wd_interactions['true_valid_interactions']

    actual = []
    pred = []
    for k,_ in recomendations_dict.items():
        actual.append(list(true_valid_interactions[k]))
        pred.append(list(recomendations_dict[k]))
    print("Mean Average Precission: {}".format(mapk(actual,pred)))
    results[set_up_name] = mapk(actual,pred)
    del(model, optimizer, criterion)

INFO: set_up_1


epoch 1: 100%|██████████| 286/286 [00:13<00:00, 21.94it/s, loss=0.0668]
valid: 100%|██████████| 96/96 [00:03<00:00, 30.57it/s, loss=0.0751]
  0%|          | 0/286 [00:00<?, ?it/s]

Validation loss: 0.0663


epoch 2: 100%|██████████| 286/286 [00:13<00:00, 21.87it/s, loss=0.0682]
valid: 100%|██████████| 96/96 [00:03<00:00, 30.30it/s, loss=0.061] 
  0%|          | 0/286 [00:00<?, ?it/s]

Validation loss: 0.0647


epoch 3: 100%|██████████| 286/286 [00:13<00:00, 21.86it/s, loss=0.0651]
valid: 100%|██████████| 96/96 [00:03<00:00, 30.65it/s, loss=0.0752]
  0%|          | 0/531 [00:00<?, ?it/s]

Validation loss: 0.0652


predict: 100%|██████████| 531/531 [00:12<00:00, 42.17it/s]
  0%|          | 0/286 [00:00<?, ?it/s]

Mean Average Precission: 0.012845159103252813
INFO: set_up_2


epoch 1: 100%|██████████| 286/286 [00:14<00:00, 20.38it/s, loss=0.0648]
valid: 100%|██████████| 96/96 [00:03<00:00, 26.21it/s, loss=0.0717]
  0%|          | 0/286 [00:00<?, ?it/s]

Validation loss: 0.0653


epoch 2: 100%|██████████| 286/286 [00:14<00:00, 20.27it/s, loss=0.0658]
valid: 100%|██████████| 96/96 [00:03<00:00, 26.24it/s, loss=0.068] 
  0%|          | 0/286 [00:00<?, ?it/s]

Validation loss: 0.0652


epoch 3: 100%|██████████| 286/286 [00:13<00:00, 20.45it/s, loss=0.0661]
valid: 100%|██████████| 96/96 [00:03<00:00, 26.45it/s, loss=0.0647]
  0%|          | 0/286 [00:00<?, ?it/s]

Validation loss: 0.0647


epoch 4: 100%|██████████| 286/286 [00:14<00:00, 20.24it/s, loss=0.0608]
valid: 100%|██████████| 96/96 [00:03<00:00, 26.58it/s, loss=0.0631]
  0%|          | 0/286 [00:00<?, ?it/s]

Validation loss: 0.0643


epoch 5: 100%|██████████| 286/286 [00:14<00:00, 20.32it/s, loss=0.0633]
valid: 100%|██████████| 96/96 [00:04<00:00, 22.88it/s, loss=0.0633]
  0%|          | 0/286 [00:00<?, ?it/s]

Validation loss: 0.0636


epoch 6: 100%|██████████| 286/286 [00:14<00:00, 20.11it/s, loss=0.0581]
valid: 100%|██████████| 96/96 [00:03<00:00, 26.46it/s, loss=0.0733]
  0%|          | 0/531 [00:00<?, ?it/s]

Validation loss: 0.0642


predict: 100%|██████████| 531/531 [00:15<00:00, 35.34it/s]
  0%|          | 0/143 [00:00<?, ?it/s]

Mean Average Precission: 0.013351588304874222
INFO: set_up_3


epoch 1: 100%|██████████| 143/143 [00:12<00:00, 11.19it/s, loss=0.0754]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.23it/s, loss=0.075] 
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0747


epoch 2: 100%|██████████| 143/143 [00:12<00:00, 11.29it/s, loss=0.0743]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.45it/s, loss=0.0688]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0728


epoch 3: 100%|██████████| 143/143 [00:12<00:00, 11.17it/s, loss=0.0705]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.29it/s, loss=0.0722]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 4: 100%|██████████| 143/143 [00:12<00:00, 11.25it/s, loss=0.0722]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.23it/s, loss=0.0743]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0728


epoch 5: 100%|██████████| 143/143 [00:12<00:00, 11.16it/s, loss=0.0698]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.24it/s, loss=0.0706]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 6: 100%|██████████| 143/143 [00:12<00:00, 11.10it/s, loss=0.0712]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.30it/s, loss=0.0711]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 7: 100%|██████████| 143/143 [00:12<00:00, 11.22it/s, loss=0.0691]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.40it/s, loss=0.0714]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 8: 100%|██████████| 143/143 [00:12<00:00, 11.19it/s, loss=0.0734]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.25it/s, loss=0.0706]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 9: 100%|██████████| 143/143 [00:13<00:00, 10.83it/s, loss=0.0716]
valid: 100%|██████████| 48/48 [00:03<00:00, 12.78it/s, loss=0.0753]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0728


epoch 10: 100%|██████████| 143/143 [00:12<00:00, 11.17it/s, loss=0.0718]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.30it/s, loss=0.0746]
  0%|          | 0/266 [00:00<?, ?it/s]

Validation loss: 0.0727


predict: 100%|██████████| 266/266 [00:15<00:00, 17.16it/s]
  0%|          | 0/143 [00:00<?, ?it/s]

Mean Average Precission: 0.01670946541345236
INFO: set_up_4


epoch 1: 100%|██████████| 143/143 [00:12<00:00, 11.33it/s, loss=0.0711]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.39it/s, loss=0.0739]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 2: 100%|██████████| 143/143 [00:12<00:00, 11.32it/s, loss=0.0663]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.54it/s, loss=0.0691]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0681


epoch 3: 100%|██████████| 143/143 [00:12<00:00, 11.33it/s, loss=0.0674]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.38it/s, loss=0.0673]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0685


epoch 4: 100%|██████████| 143/143 [00:12<00:00, 11.34it/s, loss=0.065] 
valid: 100%|██████████| 48/48 [00:03<00:00, 13.63it/s, loss=0.0653]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0664


epoch 5: 100%|██████████| 143/143 [00:12<00:00, 11.36it/s, loss=0.0658]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.44it/s, loss=0.0667]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0657


epoch 6: 100%|██████████| 143/143 [00:12<00:00, 11.45it/s, loss=0.0635]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.41it/s, loss=0.0655]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0659


epoch 7: 100%|██████████| 143/143 [00:12<00:00, 11.28it/s, loss=0.0636]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.54it/s, loss=0.0632]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0659


epoch 8: 100%|██████████| 143/143 [00:12<00:00, 11.21it/s, loss=0.066] 
valid: 100%|██████████| 48/48 [00:03<00:00, 13.15it/s, loss=0.0682]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0663


epoch 9: 100%|██████████| 143/143 [00:13<00:00, 10.59it/s, loss=0.0644]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.06it/s, loss=0.0665]
  0%|          | 0/143 [00:00<?, ?it/s]

Validation loss: 0.0661


epoch 10: 100%|██████████| 143/143 [00:12<00:00, 11.12it/s, loss=0.0644]
valid: 100%|██████████| 48/48 [00:03<00:00, 13.27it/s, loss=0.0681]
  0%|          | 0/266 [00:00<?, ?it/s]

Validation loss: 0.0659


predict: 100%|██████████| 266/266 [00:15<00:00, 17.63it/s]
  0%|          | 0/127 [00:00<?, ?it/s]

Mean Average Precission: 0.011362966813219183
INFO: set_up_5


epoch 1: 100%|██████████| 127/127 [00:12<00:00,  9.97it/s, loss=0.0751]
valid: 100%|██████████| 43/43 [00:03<00:00, 11.76it/s, loss=0.0717]
  0%|          | 0/127 [00:00<?, ?it/s]

Validation loss: 0.0729


epoch 2: 100%|██████████| 127/127 [00:12<00:00,  9.83it/s, loss=0.0741]
valid: 100%|██████████| 43/43 [00:03<00:00, 11.76it/s, loss=0.0709]
  0%|          | 0/127 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 3: 100%|██████████| 127/127 [00:12<00:00, 10.05it/s, loss=0.073] 
valid: 100%|██████████| 43/43 [00:03<00:00, 11.92it/s, loss=0.0701]
  0%|          | 0/127 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 4: 100%|██████████| 127/127 [00:12<00:00,  9.98it/s, loss=0.0715]
valid: 100%|██████████| 43/43 [00:03<00:00, 11.74it/s, loss=0.0684]
  0%|          | 0/127 [00:00<?, ?it/s]

Validation loss: 0.0727


epoch 5: 100%|██████████| 127/127 [00:12<00:00, 10.03it/s, loss=0.0703]
valid: 100%|██████████| 43/43 [00:03<00:00, 11.75it/s, loss=0.0747]
  0%|          | 0/236 [00:00<?, ?it/s]

Validation loss: 0.0728


predict: 100%|██████████| 236/236 [00:15<00:00, 15.67it/s]


Mean Average Precission: 0.016635405045086905


All this work, all this deep learning (well, not that much) and this model does not perform better than "most popular" recommendations. 

Obviously, there are a couple of things to consider. First, surely there are more experiments to run and set ups to include to find a more optimal solution. The most straightforward update would be to include user and item embeddings (see experiments 2 and 3 in the `py_scripts` directory). I can anticipate that the results are not much better. 

In addition, you will see that the loss changes values constantly as we go through the epoch. In other words, the learning is not very stable. This might be the result of the set up and/or how we decided to prepare the data, i.e. what we pass through the deep and wide models. However, to me this further illustrates that the Ponpare dataset is not particularly well suited for these type of models. Most likely this is a combination of all things: inadequate set up, suboptimal data preprocessing and the nature of the dataset. Nonetheless, I hope you found some of the code here useful for the problems you might want to solve. 

This is the final technique I wanted to show for now. In the future I will include both other techniques and datsets. It is now time to choose one of the techniques from previous Chapters and perform a final test on the original (so far "untouched") test dataset.