## Regression with Images and Text

In this notebook we will go through a series of examples on how to combine all Wide & Deep components.

To that aim I will use the Airbnb listings dataset for London, which you can download from [here](http://insideairbnb.com/get-the-data.html). I use this dataset simply because it contains tabular data, images and text.

I have taken a sample of 1000 listings to keep the data tractable in this notebook. Also, I have preprocessed the data and prepared it for this exercise. All preprocessing steps can be found in the notebook `airbnb_data_preprocessing.ipynb` in this `examples` folder. 

In [1]:
import numpy as np
import pandas as pd
import os
import torch

from pytorch_widedeep import Trainer
from pytorch_widedeep.preprocessing import (
    WidePreprocessor,
    TabPreprocessor,
    TextPreprocessor,
    ImagePreprocessor,
)
from pytorch_widedeep.models import (
    Wide,
    TabMlp,
    Vision,
    BasicRNN,
    WideDeep,
)
from pytorch_widedeep.losses import RMSELoss
from pytorch_widedeep.initializers import *
from pytorch_widedeep.callbacks import *

  return f(*args, **kwds)


In [2]:
df = pd.read_csv("../tmp_data/airbnb/airbnb_sample.csv")
df.head()

Unnamed: 0,id,host_id,description,host_listings_count,host_identity_verified,neighbourhood_cleansed,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,guests_included,minimum_nights,instant_bookable,cancellation_policy,has_house_rules,host_gender,accommodates_catg,guests_included_catg,minimum_nights_catg,host_listings_count_catg,bathrooms_catg,bedrooms_catg,beds_catg,amenity_24-hour_check-in,amenity__toilet,amenity_accessible-height_bed,amenity_accessible-height_toilet,amenity_air_conditioning,amenity_air_purifier,amenity_alfresco_bathtub,amenity_amazon_echo,amenity_baby_bath,amenity_baby_monitor,amenity_babysitter_recommendations,amenity_balcony,amenity_bath_towel,amenity_bathroom_essentials,amenity_bathtub,amenity_bathtub_with_bath_chair,amenity_bbq_grill,amenity_beach_essentials,amenity_beach_view,amenity_beachfront,amenity_bed_linens,amenity_bedroom_comforts,...,amenity_roll-in_shower,amenity_room-darkening_shades,amenity_safety_card,amenity_sauna,amenity_self_check-in,amenity_shampoo,amenity_shared_gym,amenity_shared_hot_tub,amenity_shared_pool,amenity_shower_chair,amenity_single_level_home,amenity_ski-in_ski-out,amenity_smart_lock,amenity_smart_tv,amenity_smoke_detector,amenity_smoking_allowed,amenity_soaking_tub,amenity_sound_system,amenity_stair_gates,amenity_stand_alone_steam_shower,amenity_standing_valet,amenity_steam_oven,amenity_stove,amenity_suitable_for_events,amenity_sun_loungers,amenity_table_corner_guards,amenity_tennis_court,amenity_terrace,amenity_toilet_paper,amenity_touchless_faucets,amenity_tv,amenity_walk-in_shower,amenity_warming_drawer,amenity_washer,amenity_washer_dryer,amenity_waterfront,amenity_well-lit_path_to_entrance,amenity_wheelchair_accessible,amenity_wide_clearance_to_shower,amenity_wide_doorway_to_guest_bathroom,amenity_wide_entrance,amenity_wide_entrance_for_guests,amenity_wide_entryway,amenity_wide_hallways,amenity_wifi,amenity_window_guards,amenity_wine_cooler,security_deposit,extra_people,yield
0,13913.jpg,54730,My bright double bedroom with a large window has a relaxed feeling! It comfortably fits one or t...,4.0,f,Islington,51.56802,-0.11121,t,apartment,private_room,2,1.0,1.0,0.0,1,1,f,moderate,1,female,2,1,1,3,1,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,...,1,1,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,1,0,0,0,1,0,0,100.0,15.0,12.0
1,15400.jpg,60302,"Lots of windows and light. St Luke's Gardens are at the end of the block, and the river not too...",1.0,t,Kensington and Chelsea,51.48796,-0.16898,t,apartment,entire_home/apt,2,1.0,1.0,1.0,2,3,f,strict_14_with_grace_period,1,female,2,2,3,1,1,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,150.0,0.0,109.5
2,17402.jpg,67564,"Open from June 2018 after a 3-year break, we are delighted to be welcoming guests again to this ...",19.0,t,Westminster,51.52098,-0.14002,t,apartment,entire_home/apt,6,2.0,3.0,3.0,4,3,t,strict_14_with_grace_period,1,female,3,3,3,3,2,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,350.0,10.0,149.65
3,24328.jpg,41759,"Artist house, bright high ceiling rooms, private parking and a communal garden in a conservation...",2.0,t,Wandsworth,51.47298,-0.16376,t,other,entire_home/apt,2,1.5,1.0,1.0,2,30,f,moderate,1,male,2,2,3,2,2,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,250.0,0.0,215.6
4,25023.jpg,102813,"Large, all comforts, 2-bed flat; first floor; lift; pretty communal gardens + off-street parking...",1.0,f,Wandsworth,51.44687,-0.21874,t,apartment,entire_home/apt,4,1.0,2.0,2.0,2,4,f,moderate,1,female,3,2,3,1,1,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,250.0,11.0,79.35


### Regression with the defaults

The set up

In [3]:
# There are a number of columns that are already binary. Therefore, no need to one hot encode them
crossed_cols = [("property_type", "room_type")]
already_dummies = [c for c in df.columns if "amenity" in c] + ["has_house_rules"]
wide_cols = [
    "is_location_exact",
    "property_type",
    "room_type",
    "host_gender",
    "instant_bookable",
] + already_dummies

cat_embed_cols = [(c, 16) for c in df.columns if "catg" in c] + [
    ("neighbourhood_cleansed", 64),
    ("cancellation_policy", 16),
]
continuous_cols = ["latitude", "longitude", "security_deposit", "extra_people"]
# it does not make sense to standarised Latitude and Longitude
already_standard = ["latitude", "longitude"]

# text and image colnames
text_col = "description"
img_col = "id"

# path to pretrained word embeddings and the images
word_vectors_path = "../tmp_data/glove.6B/glove.6B.100d.txt"
img_path = "../tmp_data/airbnb/property_picture"

# target
target_col = "yield"

### Prepare the data

I will focus here on how to prepare the data and run the model. Check notebooks 1 and 2 to see what's going on behind the scences

Preparing the data is rather simple

In [4]:
target = df[target_col].values

In [5]:
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_wide = wide_preprocessor.fit_transform(df)

In [6]:
tab_preprocessor = TabPreprocessor(
    cat_embed_cols=cat_embed_cols,
    continuous_cols=continuous_cols,
    already_standard=already_standard,
)
X_tab = tab_preprocessor.fit_transform(df)

In [7]:
text_preprocessor = TextPreprocessor(
    word_vectors_path=word_vectors_path, text_col=text_col
)
X_text = text_preprocessor.fit_transform(df)

The vocabulary contains 2192 tokens
Indexing word vectors...
Loaded 400000 word vectors
Preparing embeddings matrix...
2175 words in the vocabulary had ../tmp_data/glove.6B/glove.6B.100d.txt vectors and appear more than 5 times


In [8]:
image_processor = ImagePreprocessor(img_col=img_col, img_path=img_path)
X_images = image_processor.fit_transform(df)

Reading Images from ../tmp_data/airbnb/property_picture


  4%|▍         | 40/1001 [00:00<00:02, 398.46it/s]

Resizing


100%|██████████| 1001/1001 [00:02<00:00, 397.14it/s]


Computing normalisation metrics


### Build the model components

In [9]:
# Linear model
wide = Wide(input_dim=np.unique(X_wide).shape[0], pred_dim=1)

# DeepDense: 2 Dense layers
tab_mlp = TabMlp(
    column_idx=tab_preprocessor.column_idx,
    cat_embed_input=tab_preprocessor.cat_embed_input,
    cat_embed_dropout=0.1,
    continuous_cols=continuous_cols,
    mlp_hidden_dims=[128, 64],
    mlp_dropout=0.1,
)

# DeepText: a stack of 2 LSTMs
basic_rnn = BasicRNN(
    vocab_size=len(text_preprocessor.vocab.itos),
    embed_matrix=text_preprocessor.embedding_matrix,
    n_layers=2,
    hidden_dim=64,
    rnn_dropout=0.5,
)

# Pretrained Resnet 18
resnet = Vision(pretrained_model_name="resnet18", n_trainable=4)

Combine them all with the "collector" class `WideDeep`

In [10]:
model = WideDeep(
    wide=wide,
    deeptabular=tab_mlp,
    deeptext=basic_rnn,
    deepimage=resnet,
    head_hidden_dims=[256, 128],
)

### Build the trainer and fit

In [11]:
trainer = Trainer(model, objective="rmse")

In [12]:
trainer.fit(
    X_wide=X_wide,
    X_tab=X_tab,
    X_text=X_text,
    X_img=X_images,
    target=target,
    n_epochs=1,
    batch_size=32,
    val_split=0.2,
)

epoch 1: 100%|██████████| 25/25 [01:30<00:00,  3.63s/it, loss=129]
valid: 100%|██████████| 7/7 [00:15<00:00,  2.26s/it, loss=119] 


Both, the Text and Image components allow FC-heads on their own (have a look to the documentation).

Now let's go "kaggle crazy". Let's use different optimizers, initializers and schedulers for different components. Moreover, let's use a different learning rate for different parameter groups, for the `deeptabular` component

In [13]:
deep_params = []
for childname, child in model.named_children():
    if childname == "deeptabular":
        for n, p in child.named_parameters():
            if "embed_layer" in n:
                deep_params.append({"params": p, "lr": 1e-4})
            else:
                deep_params.append({"params": p, "lr": 1e-3})

In [14]:
wide_opt = torch.optim.Adam(model.wide.parameters(), lr=0.03)
deep_opt = torch.optim.Adam(deep_params)
text_opt = torch.optim.AdamW(model.deeptext.parameters())
img_opt = torch.optim.AdamW(model.deepimage.parameters())
head_opt = torch.optim.Adam(model.deephead.parameters())

In [15]:
wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=5)
deep_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
text_sch = torch.optim.lr_scheduler.StepLR(text_opt, step_size=5)
img_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3, 8])
head_sch = torch.optim.lr_scheduler.StepLR(head_opt, step_size=5)

In [16]:
# remember, one optimizer per model components, for lr_schedures and initializers is not neccesary
optimizers = {
    "wide": wide_opt,
    "deeptabular": deep_opt,
    "deeptext": text_opt,
    "deepimage": img_opt,
    "deephead": head_opt,
}
schedulers = {
    "wide": wide_sch,
    "deeptabular": deep_sch,
    "deeptext": text_sch,
    "deepimage": img_sch,
    "deephead": head_sch,
}

# Now...we have used pretrained word embeddings, so you do not want to
# initialise these  embeddings. However you might still want to initialise the
# other layers in the DeepText component. No probs, you can do that with the
# parameter pattern and your knowledge on regular  expressions. Here we are
# telling to the KaimingNormal initializer to NOT touch the  parameters whose
# name contains the string word_embed.
initializers = {
    "wide": KaimingNormal,
    "deeptabular": KaimingNormal,
    "deeptext": KaimingNormal(pattern=r"^(?!.*word_embed).*$"),
    "deepimage": KaimingNormal,
}

mean = [0.406, 0.456, 0.485]  # BGR
std = [0.225, 0.224, 0.229]  # BGR
transforms = [ToTensor, Normalize(mean=mean, std=std)]
callbacks = [
    LRHistory(n_epochs=10),
    EarlyStopping,
    ModelCheckpoint(filepath="model_weights/wd_out"),
]

In [17]:
trainer = Trainer(
    model,
    objective="rmse",
    initializers=initializers,
    optimizers=optimizers,
    lr_schedulers=schedulers,
    callbacks=callbacks,
    transforms=transforms,
)



In [18]:
trainer.fit(
    X_wide=X_wide,
    X_tab=X_tab,
    X_text=X_text,
    X_img=X_images,
    target=target,
    n_epochs=1,
    batch_size=32,
    val_split=0.2,
)

epoch 1: 100%|██████████| 25/25 [01:31<00:00,  3.66s/it, loss=102]
valid: 100%|██████████| 7/7 [00:15<00:00,  2.28s/it, loss=92]  


Model weights after training corresponds to the those of the final epoch which might not be the best performing weights. Usethe 'ModelCheckpoint' Callback to restore the best epoch weights.


we have only run one epoch, but let's check that the LRHistory callback records the lr values for each group

In [19]:
trainer.lr_history

{'lr_wide_0': [0.03, 0.03],
 'lr_deeptabular_0': [0.0001, 0.0001],
 'lr_deeptabular_1': [0.0001, 0.0001],
 'lr_deeptabular_2': [0.0001, 0.0001],
 'lr_deeptabular_3': [0.0001, 0.0001],
 'lr_deeptabular_4': [0.0001, 0.0001],
 'lr_deeptabular_5': [0.0001, 0.0001],
 'lr_deeptabular_6': [0.0001, 0.0001],
 'lr_deeptabular_7': [0.0001, 0.0001],
 'lr_deeptabular_8': [0.0001, 0.0001],
 'lr_deeptabular_9': [0.001, 0.001],
 'lr_deeptabular_10': [0.001, 0.001],
 'lr_deeptabular_11': [0.001, 0.001],
 'lr_deeptabular_12': [0.001, 0.001],
 'lr_deeptabular_13': [0.001, 0.001],
 'lr_deeptabular_14': [0.001, 0.001],
 'lr_deeptext_0': [0.001, 0.001],
 'lr_deepimage_0': [0.001, 0.001],
 'lr_deephead_0': [0.001, 0.001]}