## Chapter 10 

### 1. Non negative matrix factorization (NMF)

If you have worked or are working with recommendation algorithms, I'd say you are very familiar with matrix factorization. Just in case let's go quickly through some formulation before jumping into the code. 

Given a ratings (or scores) matrix $R$ with dimensions $M \times N$ we aim to find two matrix $C$ and $U$ with dimensions $M \times K$ and $N \times K$ respectively such that

$R \approx C \times U^T = \hat{R}$

$K$ are the latent factors (or latent dimensions) which we will choose at our convenience. Then the rating of item $i$ by user $i$ can be computed as the dot product 

$ \hat{r}_{ij} = c_i u_j^T = \sum_{k=1}^k{c_{ik}u_{kj}}$


In our case, $R$, $C$ and $U$ are our interest, coupons and user matrices respectively. Since we have no measure of negative interest, all matrices will be non-negative and hence non-negative matrix factorization. You can find a nice tutorial in python [here](http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/).

Once we have computed $\hat{R}$ we will be in a position where we can recommend existing coupons to customers based on past interactions. **However** let's emphasize once more that this is **NOT** the problem we are solving here. Here we have a bacth of new, unseen coupons and we need to recommend them to existing customers. In this context, there are two ways of forward. 

**Approach 1**
1. Compute $\hat{R}$ and $C$ and $U$
2. Compute similarity between new and old (new --> old) coupons based on features (price, category, etc)
3. Recommend and rank based on $\hat{R}$
4. Map old to new coupons based on previously computed similarity (old --> new)
5. Compute MAP

This approach is very similar to the one described in Chapter 9. As we know, the forward and backwards mapping is computationally expensive.

and 

**Approach 2**
1. Compute $\hat{R}$ and $C$ and $U$
2. Compute similarity between new and old (new --> old) coupons based on features (price, category, etc), and assign the latent factors of the old coupons to the most similar new coupons. 
3. Build a dataset horizontally stacking user and item latent factors.
4. Use a regressor to predict interest and rank. 
5. Compute MAP

This second approach involves one one mapping step and uses both users and coupons latent factors. Therefore, we will use this approach here.

In [1]:
import numpy as np
import pandas as pd
import os
import pickle
import multiprocessing
import lightgbm as lgb

from sklearn.metrics.pairwise import pairwise_distances
from sklearn.decomposition import NMF
from scipy.sparse import load_npz
from sklearn.neighbors import NearestNeighbors
from sklearn.model_selection import train_test_split
from recutils.average_precision import mapk
from recutils.utils import coupon_similarity_function


inp_dir = "../datasets/Ponpare/data_processed/"
train_dir = "train"
valid_dir = "valid"

In [2]:
# train and validation coupons
df_coupons_train_feat = pd.read_pickle(os.path.join(inp_dir, train_dir, 'df_coupons_train_feat.p'))
df_coupons_valid_feat = pd.read_pickle(os.path.join(inp_dir, valid_dir, 'df_coupons_valid_feat.p'))

# train and validation coupon ids
coupons_train_ids = df_coupons_train_feat.coupon_id_hash.values
coupons_valid_ids = df_coupons_valid_feat.coupon_id_hash.values

In the `recutils` module there is a submodule simply called `utils` that contains the `coupon_similarity_function` method. All the code in this function is shown in previous chapters. For convenience I decided to wrap it up in a method and use it here.  

In [3]:
# validation to train coupon similarity
train_coupons_path = os.path.join(inp_dir, train_dir, 'df_coupons_train_feat.p')
valid_coupons_path = os.path.join(inp_dir, valid_dir, 'df_coupons_valid_feat.p')

valid_to_train_most_similar = coupon_similarity_function(train_coupons_path, valid_coupons_path)



For example, the most similar training coupon to the validation coupon `f1540e7a08cce1a8d5a5ebd8233e1db0` is:

In [4]:
valid_to_train_most_similar['f1540e7a08cce1a8d5a5ebd8233e1db0']

'bac9eefb777645cdc30eec34a9a4fe1f'

In [5]:
df_coupons_train_feat[df_coupons_train_feat.coupon_id_hash == 'bac9eefb777645cdc30eec34a9a4fe1f']

Unnamed: 0,price_rate,catalog_price,discount_price,dispperiod,validperiod,usable_date_mon_cat,usable_date_tue_cat,usable_date_wed_cat,usable_date_thu_cat,usable_date_fri_cat,usable_date_sat_cat,usable_date_sun_cat,usable_date_holiday_cat,usable_date_before_holiday_cat,coupon_id_hash,validperiod_method1_cat,validperiod_method2_cat,validfrom_method1_cat,validfrom_method2_cat,validend_method1_cat,validend_method2_cat,dispfrom_cat,dispend_cat,dispperiod_cat,price_rate_cat,catalog_price_cat,discount_price_cat,capsule_text_cat,genre_name_cat,large_area_name_cat,ken_name_cat,small_area_name_cat
17478,62,3190,1200,3,16,3,3,3,3,3,3,3,3,3,bac9eefb777645cdc30eec34a9a4fe1f,4,0,7,3,7,5,1,4,1,2,0,0,6,6,0,2,5


In [6]:
df_coupons_valid_feat[df_coupons_valid_feat.coupon_id_hash == "f1540e7a08cce1a8d5a5ebd8233e1db0"]

Unnamed: 0,price_rate,catalog_price,discount_price,dispperiod,validperiod,usable_date_mon_cat,usable_date_tue_cat,usable_date_wed_cat,usable_date_thu_cat,usable_date_fri_cat,usable_date_sat_cat,usable_date_sun_cat,usable_date_holiday_cat,usable_date_before_holiday_cat,coupon_id_hash,validperiod_method1_cat,validperiod_method2_cat,validfrom_method1_cat,validfrom_method2_cat,validend_method1_cat,validend_method2_cat,dispfrom_cat,dispend_cat,dispperiod_cat,price_rate_cat,catalog_price_cat,discount_price_cat,capsule_text_cat,genre_name_cat,large_area_name_cat,ken_name_cat,small_area_name_cat
210,55,2200,980,3,16,3,3,3,3,3,3,3,3,3,f1540e7a08cce1a8d5a5ebd8233e1db0,4,0,7,3,7,2,1,4,1,1,0,0,6,6,0,2,5


Overall, very similar. 

Let's now load the interaction matrix

In [7]:
# let's load the activity matrix and dict of indexes
interactions_mtx = load_npz(os.path.join(inp_dir, train_dir, "interactions_mtx.npz"))
items_idx_dict = pickle.load(open(os.path.join(inp_dir, train_dir, "items_idx_dict.p"),'rb'))
users_idx_dict = pickle.load(open(os.path.join(inp_dir, train_dir, "users_idx_dict.p"),'rb'))
interactions_mtx

<22623x18622 sparse matrix of type '<class 'numpy.float64'>'
	with 1560464 stored elements in Compressed Sparse Row format>

None negative matrix factorization with default values and 100 components/factors.

In [8]:
n_comp = 100
nmf_model = NMF(n_components=n_comp, init='random', random_state=1981)
user_factors = nmf_model.fit_transform(interactions_mtx)
item_factors = nmf_model.components_.T

And just like that we have our item and user projections onto our latent space

In [9]:
print(user_factors.shape)
print(item_factors.shape)

(22623, 100)
(18622, 100)


Let's make sure every user/item points to the right latent vector

In [10]:
# make sure every user/item points to the right factors
user_factors_dict = {}
for k,v in users_idx_dict.items():
    user_factors_dict[k] = user_factors[users_idx_dict[k]]

item_factors_dict = {}
for k,v in items_idx_dict.items():
    item_factors_dict[k] = item_factors[items_idx_dict[k]]

And now only thing left to do is to train a regressor, more precisely, our favourite lightGBM. Let's build the training/testing datasets and build the model. By the way, now there are no categorical features, and our life is just a bit esier.

In [11]:
df_interest = pd.read_pickle(os.path.join(inp_dir, train_dir, 'df_interest.p'))
df_user_factors = (pd.DataFrame.from_dict(user_factors_dict, orient="index")
    .reset_index())
df_user_factors.columns = ['user_id_hash'] + ['user_factor_'+str(i) for i in range(n_comp)]
df_item_factors = (pd.DataFrame.from_dict(item_factors_dict, orient="index")
    .reset_index())
df_item_factors.columns = ['coupon_id_hash'] + ['item_factor_'+str(i) for i in range(n_comp)]
df_train = pd.merge(df_interest[['user_id_hash','coupon_id_hash','interest']],
    df_item_factors, on='coupon_id_hash')
df_train = pd.merge(df_train, df_user_factors, on='user_id_hash')
df_train.shape

(1560464, 203)

In [12]:
X = df_train.iloc[:,3:].values
y = df_train.interest.values
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.25, random_state=1980)

In [13]:
model = lgb.LGBMRegressor(n_estimators=1000)
model.fit(X_train,y_train,
    eval_set = [(X_valid,y_valid)],
    early_stopping_rounds=10,
    eval_metric="rmse")

[1]	valid_0's rmse: 0.27389
Training until validation scores don't improve for 10 rounds.
[2]	valid_0's rmse: 0.272165
[3]	valid_0's rmse: 0.270627
[4]	valid_0's rmse: 0.269388
[5]	valid_0's rmse: 0.268348
[6]	valid_0's rmse: 0.267333
[7]	valid_0's rmse: 0.266564
[8]	valid_0's rmse: 0.265698
[9]	valid_0's rmse: 0.265071
[10]	valid_0's rmse: 0.26431
[11]	valid_0's rmse: 0.263691
[12]	valid_0's rmse: 0.263187
[13]	valid_0's rmse: 0.262618
[14]	valid_0's rmse: 0.262151
[15]	valid_0's rmse: 0.261611
[16]	valid_0's rmse: 0.261167
[17]	valid_0's rmse: 0.260688
[18]	valid_0's rmse: 0.260246
[19]	valid_0's rmse: 0.259751
[20]	valid_0's rmse: 0.259429
[21]	valid_0's rmse: 0.259055
[22]	valid_0's rmse: 0.25864
[23]	valid_0's rmse: 0.258305
[24]	valid_0's rmse: 0.257903
[25]	valid_0's rmse: 0.257538
[26]	valid_0's rmse: 0.257163
[27]	valid_0's rmse: 0.25685
[28]	valid_0's rmse: 0.256529
[29]	valid_0's rmse: 0.256221
[30]	valid_0's rmse: 0.255909
[31]	valid_0's rmse: 0.255589
[32]	valid_0's rmse: 

[270]	valid_0's rmse: 0.227396
[271]	valid_0's rmse: 0.227349
[272]	valid_0's rmse: 0.227293
[273]	valid_0's rmse: 0.227267
[274]	valid_0's rmse: 0.227221
[275]	valid_0's rmse: 0.227174
[276]	valid_0's rmse: 0.227116
[277]	valid_0's rmse: 0.227078
[278]	valid_0's rmse: 0.227049
[279]	valid_0's rmse: 0.227025
[280]	valid_0's rmse: 0.226972
[281]	valid_0's rmse: 0.226926
[282]	valid_0's rmse: 0.226889
[283]	valid_0's rmse: 0.226832
[284]	valid_0's rmse: 0.226818
[285]	valid_0's rmse: 0.226795
[286]	valid_0's rmse: 0.226757
[287]	valid_0's rmse: 0.226707
[288]	valid_0's rmse: 0.226643
[289]	valid_0's rmse: 0.226588
[290]	valid_0's rmse: 0.226539
[291]	valid_0's rmse: 0.226504
[292]	valid_0's rmse: 0.226468
[293]	valid_0's rmse: 0.226441
[294]	valid_0's rmse: 0.226378
[295]	valid_0's rmse: 0.226331
[296]	valid_0's rmse: 0.226295
[297]	valid_0's rmse: 0.226223
[298]	valid_0's rmse: 0.226187
[299]	valid_0's rmse: 0.226161
[300]	valid_0's rmse: 0.22612
[301]	valid_0's rmse: 0.226077
[302]	val

[538]	valid_0's rmse: 0.219048
[539]	valid_0's rmse: 0.219035
[540]	valid_0's rmse: 0.219017
[541]	valid_0's rmse: 0.218995
[542]	valid_0's rmse: 0.218988
[543]	valid_0's rmse: 0.218961
[544]	valid_0's rmse: 0.218951
[545]	valid_0's rmse: 0.218911
[546]	valid_0's rmse: 0.218899
[547]	valid_0's rmse: 0.21889
[548]	valid_0's rmse: 0.218864
[549]	valid_0's rmse: 0.218861
[550]	valid_0's rmse: 0.218846
[551]	valid_0's rmse: 0.218836
[552]	valid_0's rmse: 0.218829
[553]	valid_0's rmse: 0.218825
[554]	valid_0's rmse: 0.218809
[555]	valid_0's rmse: 0.218782
[556]	valid_0's rmse: 0.218768
[557]	valid_0's rmse: 0.218738
[558]	valid_0's rmse: 0.218714
[559]	valid_0's rmse: 0.218702
[560]	valid_0's rmse: 0.218662
[561]	valid_0's rmse: 0.218636
[562]	valid_0's rmse: 0.218611
[563]	valid_0's rmse: 0.218599
[564]	valid_0's rmse: 0.218583
[565]	valid_0's rmse: 0.218558
[566]	valid_0's rmse: 0.218517
[567]	valid_0's rmse: 0.2185
[568]	valid_0's rmse: 0.218485
[569]	valid_0's rmse: 0.218458
[570]	valid

[807]	valid_0's rmse: 0.214564
[808]	valid_0's rmse: 0.214556
[809]	valid_0's rmse: 0.214544
[810]	valid_0's rmse: 0.214522
[811]	valid_0's rmse: 0.214501
[812]	valid_0's rmse: 0.214491
[813]	valid_0's rmse: 0.214478
[814]	valid_0's rmse: 0.214477
[815]	valid_0's rmse: 0.214469
[816]	valid_0's rmse: 0.214466
[817]	valid_0's rmse: 0.214454
[818]	valid_0's rmse: 0.214435
[819]	valid_0's rmse: 0.214427
[820]	valid_0's rmse: 0.214422
[821]	valid_0's rmse: 0.214408
[822]	valid_0's rmse: 0.214405
[823]	valid_0's rmse: 0.214382
[824]	valid_0's rmse: 0.214378
[825]	valid_0's rmse: 0.214363
[826]	valid_0's rmse: 0.214338
[827]	valid_0's rmse: 0.214329
[828]	valid_0's rmse: 0.214313
[829]	valid_0's rmse: 0.214304
[830]	valid_0's rmse: 0.214283
[831]	valid_0's rmse: 0.214253
[832]	valid_0's rmse: 0.214229
[833]	valid_0's rmse: 0.214216
[834]	valid_0's rmse: 0.214202
[835]	valid_0's rmse: 0.214169
[836]	valid_0's rmse: 0.214142
[837]	valid_0's rmse: 0.21411
[838]	valid_0's rmse: 0.214085
[839]	val

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
       learning_rate=0.1, max_depth=-1, min_child_samples=20,
       min_child_weight=0.001, min_split_gain=0.0, n_estimators=1000,
       n_jobs=-1, num_leaves=31, objective=None, random_state=None,
       reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0,
       subsample_for_bin=200000, subsample_freq=0)

I chose large number of trees/rounds (1000) and "aggressive" early stopping (10) to avoid overfitting. However, it actually reached the 1000 iterations. This number is not "insanely" high but is higher that what I am normally used to using lightGBM. 

My intention here is to "quickly" go through the process. However, if you think this might be your favourite final solution for your recommendation algorithm, you would need to explore more. For example, we have used 100 components, but maybe (probably) 150 or 200 capture better the local (or "regional") effects. If you choose to use lightGBM, you would want to perform a proper optimization process, tunning some of the relevant parameters, as illustrated in the previous chapter. Maybe more regularization or a higher number of leaves leads to a lower number of boosting rounds.

Also, remember these are all numerical features, in total 200. This normally makes things slightly simpler. In this scenario you might want to try libraries like [tpot](https://epistasislab.github.io/tpot/) for automatic ML with genetic programming (if you have the time and the memory) or [ml-lens](http://ml-ensemble.com/info/start/ensembles.html) to build ensemble algorithms. 

Nonetheless, for the time being, let's move forward and load the dictionary of interactions during validation 

In [14]:
# Read the interactions during validation
interactions_valid_dict = pickle.load(
    open("../datasets/Ponpare/data_processed/valid/interactions_valid_dict.p", "rb"))

In [15]:
left = pd.DataFrame({'user_id_hash':list(interactions_valid_dict.keys())})
left['key'] = 0
right = df_coupons_valid_feat[['coupon_id_hash']]
right['key'] = 0
df_valid = (pd.merge(left, right, on='key', how='outer')
    .drop('key', axis=1))
print(df_valid.shape)
df_valid.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


(2173418, 2)


Unnamed: 0,user_id_hash,coupon_id_hash
0,002ae30377cd30f65652e52618e8b2d6,282b5bda1758e147589ca517e02195c3
1,002ae30377cd30f65652e52618e8b2d6,0f43ef71c25d409c250f5a5042806342
2,002ae30377cd30f65652e52618e8b2d6,28ff0fb4b561a2fd6a360fe28f465e07
3,002ae30377cd30f65652e52618e8b2d6,864f351e66cd3aeece5d06987fc2ed4b
4,002ae30377cd30f65652e52618e8b2d6,279ba64539609d30114b68874cd0fb42


Now let's add the user latent factors and map validation into training coupons to use coupon latent factors

In [16]:
# Coupon factors
df_valid['mapped_coupons'] = (df_valid.coupon_id_hash
    .apply(lambda x: valid_to_train_most_similar[x]))
df_valid = pd.merge(df_valid, df_item_factors,
    left_on='mapped_coupons', right_on='coupon_id_hash')
df_valid.drop('coupon_id_hash_y', axis=1, inplace=True)
df_valid.rename(index=str, columns={'coupon_id_hash_x': 'coupon_id_hash'}, inplace=True)

# User
df_valid = pd.merge(df_valid, df_user_factors, on='user_id_hash')
print(df_valid.shape)

(2173060, 203)


In [17]:
X_valid = df_valid.iloc[:, 3:].values
preds = model.predict(X_valid)

Let's add the interest column and rank

In [20]:
df_preds = df_valid[['user_id_hash', 'coupon_id_hash']]
df_preds['interest'] = preds

In [21]:
df_ranked = df_preds.sort_values(['user_id_hash', 'interest'], ascending=[False, False])
df_ranked = (df_ranked
    .groupby('user_id_hash')['coupon_id_hash']
    .apply(list)
    .reset_index())
recomendations_dict = pd.Series(df_ranked.coupon_id_hash.values,
    index=df_ranked.user_id_hash).to_dict()

actual = []
pred = []
for k,_ in recomendations_dict.items():
    actual.append(list(interactions_valid_dict[k]))
    pred.append(list(recomendations_dict[k]))

print(mapk(actual,pred))

0.021966114031188165


$\sim 0.22$ not as good as some previous techniques. However, remember, the first MAP value we obtained when in Chapter 10 during our optimization process was 0.020. Eventually, we managed to push it up to 0.032. Therefore, as I mentioned before, if you think this is your favourite technique, you should carry out a proper optimization process. That process should also include the number of components/factors as a hyperparameter to be tuned. 

Note that the latent factors can be useful for a number of things other than recommending. They have been learned based on users' behaviour. Therefore, you might want to use them for campaign targetting instead of demographic-based features (such as age, location, etc) for example. In this scenario, you will be targetting your users based on their behaviour instead of some "human-readable" features, which is possibly more adequate. 

Before we leave this notebook make sure you are familiar with the concept of latent factors, since similar principles with a different formulation will be applied when using our next technique: Factorization Machines.