## 1. Regression with Images and Text

In this notebook we will go through a series of examples on how to combine all Wide & Deep components, the Wide component, the stack of dense layers for the "categorical embeddings" and numerical column (deepdense), the text data (deeptext) and images (deepimage). 

To that aim I will use the Airbnb listings dataset for London, which you can download from [here](http://insideairbnb.com/get-the-data.html). I have taken a sample of 1000 listings to keep the data tractable in this notebook. Also, I have preprocess the data and prepared it for this excercise. All preprocessing steps can be found in the notebook `airbnb_data_preprocessing.ipynb` in this `examples` folder. Note that you do not need to go through that notebook to get an understanding on how to use this library. 

The first 5 rows of the dataset are shown below

In [1]:
import numpy as np
import pandas as pd
import os
import torch

from pytorch_widedeep.preprocessing import WidePreprocessor, DeepPreprocessor, TextPreprocessor, ImagePreprocessor
from pytorch_widedeep.models import Wide, DeepDense, DeepText, DeepImage, WideDeep
from pytorch_widedeep.initializers import *
from pytorch_widedeep.callbacks import *
from pytorch_widedeep.optim import RAdam

In [2]:
df = pd.read_csv('data/airbnb/airbnb_sample.csv')
df.head()

Unnamed: 0,id,host_id,description,host_listings_count,host_identity_verified,neighbourhood_cleansed,latitude,longitude,is_location_exact,property_type,room_type,accommodates,bathrooms,bedrooms,beds,guests_included,minimum_nights,instant_bookable,cancellation_policy,has_house_rules,host_gender,accommodates_catg,guests_included_catg,minimum_nights_catg,host_listings_count_catg,bathrooms_catg,bedrooms_catg,beds_catg,amenity_24-hour_check-in,amenity__toilet,amenity_accessible-height_bed,amenity_accessible-height_toilet,amenity_air_conditioning,amenity_air_purifier,amenity_alfresco_bathtub,amenity_amazon_echo,amenity_baby_bath,amenity_baby_monitor,amenity_babysitter_recommendations,amenity_balcony,amenity_bath_towel,amenity_bathroom_essentials,amenity_bathtub,amenity_bathtub_with_bath_chair,amenity_bbq_grill,amenity_beach_essentials,amenity_beach_view,amenity_beachfront,amenity_bed_linens,amenity_bedroom_comforts,...,amenity_roll-in_shower,amenity_room-darkening_shades,amenity_safety_card,amenity_sauna,amenity_self_check-in,amenity_shampoo,amenity_shared_gym,amenity_shared_hot_tub,amenity_shared_pool,amenity_shower_chair,amenity_single_level_home,amenity_ski-in_ski-out,amenity_smart_lock,amenity_smart_tv,amenity_smoke_detector,amenity_smoking_allowed,amenity_soaking_tub,amenity_sound_system,amenity_stair_gates,amenity_stand_alone_steam_shower,amenity_standing_valet,amenity_steam_oven,amenity_stove,amenity_suitable_for_events,amenity_sun_loungers,amenity_table_corner_guards,amenity_tennis_court,amenity_terrace,amenity_toilet_paper,amenity_touchless_faucets,amenity_tv,amenity_walk-in_shower,amenity_warming_drawer,amenity_washer,amenity_washer_dryer,amenity_waterfront,amenity_well-lit_path_to_entrance,amenity_wheelchair_accessible,amenity_wide_clearance_to_shower,amenity_wide_doorway_to_guest_bathroom,amenity_wide_entrance,amenity_wide_entrance_for_guests,amenity_wide_entryway,amenity_wide_hallways,amenity_wifi,amenity_window_guards,amenity_wine_cooler,security_deposit,extra_people,yield
0,13913.jpg,54730,My bright double bedroom with a large window has a relaxed feeling! It comfortably fits one or t...,4.0,f,Islington,51.56802,-0.11121,t,apartment,private_room,2,1.0,1.0,0.0,1,1,f,moderate,1,female,2,1,1,3,1,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0,...,1,1,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,1,0,0,0,1,0,0,100.0,15.0,12.0
1,15400.jpg,60302,"Lots of windows and light. St Luke's Gardens are at the end of the block, and the river not too...",1.0,t,Kensington and Chelsea,51.48796,-0.16898,t,apartment,entire_home/apt,2,1.0,1.0,1.0,2,3,f,strict_14_with_grace_period,1,female,2,2,3,1,1,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,150.0,0.0,109.5
2,17402.jpg,67564,"Open from June 2018 after a 3-year break, we are delighted to be welcoming guests again to this ...",19.0,t,Westminster,51.52098,-0.14002,t,apartment,entire_home/apt,6,2.0,3.0,3.0,4,3,t,strict_14_with_grace_period,1,female,3,3,3,3,2,3,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,350.0,10.0,149.65
3,24328.jpg,41759,"Artist house, bright high ceiling rooms, private parking and a communal garden in a conservation...",2.0,t,Wandsworth,51.47298,-0.16376,t,other,entire_home/apt,2,1.5,1.0,1.0,2,30,f,moderate,1,male,2,2,3,2,2,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,250.0,0.0,215.6
4,25023.jpg,102813,"Large, all comforts, 2-bed flat; first floor; lift; pretty communal gardens + off-street parking...",1.0,f,Wandsworth,51.44687,-0.21874,t,apartment,entire_home/apt,4,1.0,2.0,2.0,2,4,f,moderate,1,female,3,2,3,1,1,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,250.0,11.0,79.35


###  1.1 Regression with the defaults

In [11]:
# There are a number of columns that are already binary. Therefore, no need to one hot encode them
crossed_cols = (['property_type', 'room_type'],)
already_dummies = [c for c in df.columns if 'amenity' in c] + ['has_house_rules']
wide_cols = ['is_location_exact', 'property_type', 'room_type', 'host_gender',
'instant_bookable'] + already_dummies
cat_embed_cols = [(c, 16) for c in df.columns if 'catg' in c] + \
    [('neighbourhood_cleansed', 64), ('cancellation_policy', 16)]
continuous_cols = ['latitude', 'longitude', 'security_deposit', 'extra_people']
# it does not make sense to standarised Latitude and Longitude, they are already "standard"
already_standard = ['latitude', 'longitude']
# text and image colnames
text_col = 'description'
img_col = 'id'
# path to pretrained word embeddings and the images
word_vectors_path = 'data/glove.6B/glove.6B.100d.txt'
img_path = 'data/airbnb/property_picture'
# target
target_col = 'yield'

### 1.1.1 Prepare the data

I will focus here on how to prepare the data and run the model. In a separate notebook I will show what happens behind the scenes). 

Preparing the data is rather simple

In [18]:
target = df[target_col].values

In [19]:
wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
X_wide = wide_preprocessor.fit_transform(df)

In [20]:
deep_preprocessor = DeepPreprocessor(embed_cols=cat_embed_cols, continuous_cols=continuous_cols)
X_deep = deep_preprocessor.fit_transform(df)

In [21]:
text_preprocessor = TextPreprocessor(word_vectors_path=word_vectors_path)
X_text = text_preprocessor.fit_transform(df, text_col)

The vocabulary contains 6400 words
Indexing word vectors...
Loaded 400000 word vectors
Preparing embeddings matrix...
2175 words in the vocabulary had data/glove.6B/glove.6B.100d.txt vectors and appear more than 5 times


In [16]:
image_processor = ImagePreprocessor()
X_images = image_processor.fit_transform(df, img_col, img_path)

Reading Images from data/airbnb/property_picture


  4%|▍         | 43/1001 [00:00<00:02, 423.26it/s]

Resizing


100%|██████████| 1001/1001 [00:02<00:00, 421.47it/s]


Computing normalisation metrics


### 1.1.2. Build model components

In [22]:
# Linear model
wide = Wide(wide_dim=X_wide.shape[1], output_dim=1)
# DeepDense: 2 Dense layers
deepdense = DeepDense(hidden_layers=[128,64], dropout=[0.5, 0.5], 
                      deep_column_idx=deep_preprocessor.deep_column_idx,
                      embed_input=deep_preprocessor.embeddings_input,
                      continuous_cols=continuous_cols)
# DeepText: 2 LSTMs
deeptext = DeepText(vocab_size=len(text_preprocessor.vocab.itos), hidden_dim=64, 
                    n_layers=2, rnn_dropout=0.5, 
                    embedding_matrix=text_preprocessor.embedding_matrix)
# Pretrained Resnet 18 plus a FC-Head 512->256->128
deepimage = DeepImage(pretrained=True, head_layers=[512, 256, 128])

In [24]:
model = WideDeep(wide=wide, deepdense=deepdense, deeptext=deeptext, deepimage=deepimage)

### 1.1.3 Compile and Fit

In [26]:
model.compile(method='regression')

In [27]:
model.fit(X_wide=X_wide, X_deep=X_deep, X_text=X_text, X_img=X_images,
    target=target, n_epochs=1, batch_size=32, val_split=0.2)

epoch 1:  96%|█████████▌| 25/26 [02:02<00:04,  4.91s/it, loss=118]
valid: 100%|██████████| 7/7 [00:15<00:00,  2.26s/it, loss=109] 


### 1.2 Regression with Varying parameters and a FC-Head receiving the Deep side (deepdense, deeptext and deepimage).

This would be the second architecture shown in the README file

In [31]:
wide = Wide(wide_dim=X_wide.shape[1], output_dim=1)
deepdense = DeepDense(hidden_layers=[128,64], dropout=[0.5, 0.5], 
                      deep_column_idx=deep_preprocessor.deep_column_idx,
                      embed_input=deep_preprocessor.embeddings_input,
                      continuous_cols=continuous_cols)
deeptext = DeepText(vocab_size=len(text_preprocessor.vocab.itos), hidden_dim=128, 
                    n_layers=2, rnn_dropout=0.5, 
                    embedding_matrix=text_preprocessor.embedding_matrix)
deepimage = DeepImage(pretrained=True, head_layers=[512, 256, 128])

In [32]:
model = WideDeep(wide=wide, deepdense=deepdense, deeptext=deeptext, deepimage=deepimage, head_layers=[128, 64])

Let's have a look to the model

In [33]:
model

WideDeep(
  (wide): Wide(
    (wide_linear): Linear(in_features=356, out_features=1, bias=True)
  )
  (deepdense): DeepDense(
    (embed_layers): ModuleDict(
      (emb_layer_accommodates_catg): Embedding(3, 16)
      (emb_layer_bathrooms_catg): Embedding(3, 16)
      (emb_layer_bedrooms_catg): Embedding(4, 16)
      (emb_layer_beds_catg): Embedding(4, 16)
      (emb_layer_cancellation_policy): Embedding(5, 16)
      (emb_layer_guests_included_catg): Embedding(3, 16)
      (emb_layer_host_listings_count_catg): Embedding(4, 16)
      (emb_layer_minimum_nights_catg): Embedding(3, 16)
      (emb_layer_neighbourhood_cleansed): Embedding(32, 64)
    )
    (dense): Sequential(
      (dense_layer_0): Sequential(
        (0): Linear(in_features=196, out_features=128, bias=True)
        (1): LeakyReLU(negative_slope=0.01, inplace=True)
        (2): Dropout(p=0.5, inplace=False)
      )
      (dense_layer_1): Sequential(
        (0): Linear(in_features=128, out_features=64, bias=True)
        (1

As we can see: Wide + FC-Head(DeepDense + LSTMs + ResNet18)

Both, the Text and Image components allow FC-heads on their own (referred very creatively as `texthead` and `imagehead`). Following this nomenclature, the FC-head that receives the concatenation of the whole deep component is called `deephead`. 

Now let's go "kaggle crazy". Let's use different optimizers, initializers and schedulers for different components. 

In [36]:
wide_opt = torch.optim.Adam(model.wide.parameters())
deep_opt = torch.optim.Adam(model.deepdense.parameters())
text_opt = RAdam(model.deeptext.parameters())
img_opt  = RAdam(model.deepimage.parameters())
head_opt = torch.optim.Adam(model.deephead.parameters())

In [37]:
wide_sch = torch.optim.lr_scheduler.StepLR(wide_opt, step_size=5)
deep_sch = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3,8])
text_sch = torch.optim.lr_scheduler.StepLR(text_opt, step_size=5)
img_sch  = torch.optim.lr_scheduler.MultiStepLR(deep_opt, milestones=[3,8])
head_sch = torch.optim.lr_scheduler.StepLR(head_opt, step_size=5)

In [40]:
# remember, one optimizer per model components, for lr_schedures and initializers is not neccesary
optimizers = {'wide': wide_opt, 'deepdense':deep_opt, 'deeptext':text_opt, 'deepimage': img_opt, 'deephead': head_opt}
schedulers = {'wide': wide_sch, 'deepdense':deep_sch, 'deeptext':text_sch, 'deepimage': img_sch, 'deephead': head_sch}

# Now...we have used pretrained word embeddings, so you do not want to
# initialise these  embeddings. However you might still want to initialise the
# other layers in the DeepText component.  No probs, you can do that with the
# parameter pattern and your knowledge on regular  expressions. Here we are
# telling to the KaimingNormal initializer to NOT touch the  parameters whose
# name contains the string word_embed. 
initializers = {'wide': KaimingNormal, 'deepdense':KaimingNormal, 
                'deeptext':KaimingNormal(pattern=r"^(?!.*word_embed).*$"), 
                'deepimage':KaimingNormal}

mean = [0.406, 0.456, 0.485]  #BGR
std =  [0.225, 0.224, 0.229]  #BGR
transforms = [ToTensor, Normalize(mean=mean, std=std)]
callbacks = [LRHistory(n_epochs=10), EarlyStopping, ModelCheckpoint(filepath='model_weights/wd_out')]

In [41]:
model.compile(method='regression', initializers=initializers, optimizers=optimizers,
    lr_schedulers=schedulers, callbacks=callbacks, transforms=transforms)

In [42]:
model.fit(X_wide=X_wide, X_deep=X_deep, X_text=X_text, X_img=X_images,
    target=target, n_epochs=1, batch_size=32, val_split=0.2)

epoch 1:   8%|▊         | 2/26 [00:13<02:41,  6.74s/it, loss=137]


KeyboardInterrupt: 