# Menu Recommender Using Alternating Least Squares (ALS)

Here I build a recommender system to support upselling, using the collaborative filtering approach which allows us to predict the interests of a user by collecting preferences or taste information from many other users.

The data we have is purchase data which is implicit feedback (where user-item interaction consists of positive only preferences) compared to ratings which are considered as explicit feedback. I use the Alternating Least Squares(ALS) which is particularly useful for implicit feedback.

We are taking a matrix of user-item interactions and figuring out the latent features that relate them to each other. This matrix factorisation method reduces the dimensions/ features (while keeping relevant information) into a smaller matrix of user features and item features.

The matrix factorisation results is:
1. One smaller matrix with dimensions: num of users * latent feature --> contains latent user feature vectors for each user
2. And another matrix with dimensions: num of items * latent feature --> contains latent item feature vectors for each item

Multiplying these two feature matrices together approximates the original matrix, but now we have two matrices that are dense including a number of latent features for each of our items and users.

Reference taken from: https://nbviewer.jupyter.org/github/jmsteinw/Notebooks/blob/master/RecEngine_NB.ipynb

## Date Import and Pre-processing

In [1]:
#Import libraries
import pandas as pd
import numpy as np
import random
import scipy.sparse as sparse
from tqdm import tqdm_notebook as tqdm
import implicit
from sklearn.preprocessing import MinMaxScaler
from sklearn import metrics

In [2]:
#Import data
df = pd.read_csv('../inputs/profile_features_clean.csv')
menu = pd.read_csv('../raw/meals.csv')
partyorg = pd.read_csv('../output/partyorgusers.csv')
smalltimeinf = pd.read_csv('../output/smallinfluencerusers.csv')

In [3]:
#Check for missing values
df.isnull().sum()

account_balance                 0
card_brand_users             6670
has_coupon                      0
birthday_year               50475
birthday_month              50332
age                         50475
user_id                         0
address                         5
zone_id                         0
salesperson_id                  0
discount                        0
due_dates_only                  0
card_details                35326
card_brand_delivery_info    18550
source                          0
delivery_fee                    0
meal_wave                       0
surcharge_amount                0
promo_code_used                 0
gave_feedback                   0
district                        5
delivery_order_id               0
item_id                         5
item_type                       6
quantity                        0
unit_price                      5
name                          136
macros                        136
ingredients                   278
temperature   

In [4]:
#Inspect rows where item_id is missing
df[df.item_id.isnull()].name

1324     NaN
15223    NaN
54305    NaN
54306    NaN
54307    NaN
Name: name, dtype: object

For the above rows, there is no information of the item purchased hence I drop them.

In [5]:
#Drop rows where item_id is unknown
df = df[df.item_id.notnull()]

#Convert item_id into integer type
df.item_id = df.item_id.astype(int)

In [6]:
#Print number of data points 
print('We have {} rows of transaction data relating to {} unique items.'.format(df.shape[0],df.item_id.nunique()))

We have 57494 rows of transaction data relating to 368 unique items.


### Create a Sparse Matrix

In [7]:
#Define quantity for each SKU bought per user
data = df.groupby(['user_id','item_id']).quantity.sum().reset_index()

#Inspect dataframe
data.sample(1)

Unnamed: 0,user_id,item_id,quantity
10794,9721,202,1.0


For items with quantity sum = 0 (perhaps order was cancelled), we convert to 1 to retain information on an original interaction.

In [8]:
#Print number of rows with quantity sum = 0
print('There are {} rows with quantity sum = 0.'.format(data[data.quantity == 0].shape[0]))

#Replace rows of quanity sum = 0 to 1
data.quantity.loc[data.quantity==0] = 1

There are 19 rows with quantity sum = 0.


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)


In [9]:
#Create a sparse matrix of user_id by item_id, with values as quantity
sparse_mat = sparse.csr_matrix((data['quantity'], (data['user_id'], data['item_id'])))

Find sparsity of the matrix created:

In [10]:
#Number of possible interactions in the matrix
matrix_size = sparse_mat.shape[0]*sparse_mat.shape[1]

#Num of items with interactions
count_interactions = sparse_mat.size

#Compute matrix sparsity
sparsity = 100*(1 - (float(count_interactions)/float(matrix_size)))

print('{}% of the matrix is sparse.'.format(round(sparsity,2)))

99.89% of the matrix is sparse.


The sparsity of matrix is pretty high. This would affect how well the recommender system performs.

### Create a Training and a Validation Set from data

A training set and a validation set are created for model evaluation later on. 

The training set, will have a percentage of interactions masked as if the user never purchased the item (set to zero). The validation test set is a duplicate of the origin data reflecting original interaction information in binary form. The list of unique user_ids with interaction masked is recorded. 

In [11]:
def make_trg_and_val_sets(data, pct_test = 0.2):
    
    #Make a copy of the original data to be the test set and store as binary preference matrix
    test_set = data.copy()
    test_set[test_set != 0] = 1 
    
    #Make a copy of the original data to be the training set. 
    training_set = data.copy() 
    
    #Find indices in the data where an interaction exists
    nonzero_inds = training_set.nonzero() 
    
    #Where an interaction exists, zip user,item index into amlist
    nonzero_pairs = list(zip(nonzero_inds[0], nonzero_inds[1]))
    
    #Initate random seed
    random.seed(0)
    
    #Round number of samples needed to the nearest integer
    num_samples = int(np.ceil(pct_test*len(nonzero_pairs))) 
    
    #Sample a random number of user-item pairs without replacement
    samples = random.sample(nonzero_pairs, num_samples) 
    
    #Get user and item indices respectively
    user_inds = [index[0] for index in samples] 
    item_inds = [index[1] for index in samples] 
    
    #Mask interaction of randomly chosen user-item pairs by assigning them as zero
    training_set[user_inds, item_inds] = 0 
    
    #Get rid of zeros in sparse array storage to save space
    training_set.eliminate_zeros()
    
    #Output training, validation set and list of unique user_ids of rows that were altered
    return training_set, test_set, list(set(user_inds)) 

In [12]:
#Call function 
train, test, users_with_altered_data = make_trg_and_val_sets(sparse_mat)

### Model Training & Evaluation

We train the model on training data and get recommendations to check against the validation test set to see how many of all the items recommended, were actually purchased by user (masked in training set). The higher the number, the better the recommender system.

The benchmark used here is a popularity recommender where we recommend the most popular items to every user (same for all users).

In [13]:
#Set parameters
confidence_coef = 15
factors = 2
regularization = 0.1
iterations = 25

#Initialise model
model = implicit.als.AlternatingLeastSquares(factors=factors, regularization=regularization, iterations=iterations)

#Fit model on training data
model.fit((train.T*confidence_coef).astype('double'))

#Get user and item vectors from trained model
user_vecs = model.user_factors
item_vecs = model.item_factors

100%|██████████| 25.0/25 [00:00<00:00, 31.71it/s]


The metric used here is the area under the Receiver Operating Characteristic (or ROC) curve. A greater area under the curve means user actually purchased items higher up on the list of recommended items. 

In [14]:
#Calculate area under the curve for users with masked data 
def auc_score(predictions, test):
    fpr, tpr, thresholds = metrics.roc_curve(test, predictions)
    return metrics.auc(fpr, tpr)

def calc_mean_auc(training_set, altered_users, predictions, test_set):
    
    #Start empty list to store AUC for each user with masked data using ALS and popularity
    als_auc = []
    popularity_auc = []
    
    #Sum interactions for most popular items
    pop_items = np.array(test_set.sum(axis = 0)).reshape(-1) 
    #Use items in prediction outcome
    item_vecs = predictions[1]
    
    for user in altered_users: 
        #Get interactions from training data
        training_row = training_set[user,:].toarray().reshape(-1)
        
        #Find where no interaction
        zero_inds = np.where(training_row == 0)
        
        #Get predictions for user
        user_vec = predictions[0][user,:]
        #Get only those where interactions originally zero
        pred = user_vec.dot(item_vecs).toarray()[0,zero_inds].reshape(-1)
        
        #Select interactions from als prediction for user where interactions were zero
        actual = test_set[user,:].toarray()[0,zero_inds].reshape(-1) 

        #Get item popularity for chosen items
        pop = pop_items[zero_inds] 
        
        #Calculate AUC for user using ALS and popularity systems respectively
        als_auc.append(auc_score(pred, actual))
        popularity_auc.append(auc_score(pop, actual))
    
    #Return mean AUC rounded to two decimal places 
    return float('%.2f'%np.mean(als_auc)), float('%.2f'%np.mean(popularity_auc))  


In [15]:
#Call functions
print('AUC of ALS recommender: {}'.format(calc_mean_auc(train, users_with_altered_data, 
              [sparse.csr_matrix(user_vecs), sparse.csr_matrix(item_vecs.T)], test)[0]))

print('AUC of Popularity recommender: {}'.format(calc_mean_auc(train, users_with_altered_data, 
              [sparse.csr_matrix(user_vecs), sparse.csr_matrix(item_vecs.T)], test)[1]))

AUC of ALS recommender: 0.88
AUC of Popularity recommender: 0.93


The AUC of the recommender system is lower than benchmark of popularity. The recommender system has a mean AUC of 0.88, while simply recommending popular items has a higher AUC of 0.93. This means that it may be more useful to recommend popular items instead of making personalised recommendations.

It is worth actually conducting A/B testing on the personalised recommendations and measuring take up in the real world contexts to determine its accuracy since the scores are close.

Moving forward, ratings of menu items can be collected to build a recommender system on explicit data instead. 

### Modelling

We train the model on all original data and get recommendations for users in The Party Organizer cluster for use to assist upselling.

In [16]:
#Set parameters
confidence_coef = 15
factors = 2
regularization = 0.1
iterations = 25

#Initialise model
model = implicit.als.AlternatingLeastSquares(factors=factors, regularization=regularization, iterations=iterations)

#Fit model
model.fit((sparse_mat.T*confidence_coef).astype('double'))

#Get user and item vectors from trained model
user_vecs = model.user_factors
item_vecs = model.item_factors

100%|██████████| 25.0/25 [00:00<00:00, 37.48it/s]


Get top 10 recommendations.

In [17]:
def recommend(user_id, sparse_mat, user_vecs, item_vecs, num_items=10):
    
    #Get user interactions data from sparse matrix 
    user_interactions = sparse_mat[user_id,:].toarray()
    
    #Add 1 to everything, so that items not purchased yet can be non-zero
    user_interactions = user_interactions.reshape(-1) + 1
    
    #Make items already interacted zero
    user_interactions[user_interactions > 1] = 0
    
    #Get dot product of user vector and all item vectors
    rec_vector = user_vecs[user_id,:].dot(item_vecs.T)
    
    #Scale dot product result between 0 and 1
    min_max = MinMaxScaler()
    rec_vector_scaled = min_max.fit_transform(rec_vector.reshape(-1,1))[:,0]
    
    #Get recommendation vector
    recommend_vector = user_interactions * rec_vector_scaled 
    
    #Sort into order of best recommendations
    item_idx = np.argsort(recommend_vector)[::-1][:num_items]
    
    #Start empty list to store titles and scores
    items = []
    scores = []
    
    #Append recommended item name and scores tolist
    for idx in item_idx:
        items.append(menu.name.loc[menu.id == idx].iloc[0])
        scores.append(recommend_vector[idx])
    
    #Define recommendations dataframe
    recommendations = pd.DataFrame({'user_id':user_id, 'name': items, 'score': scores})

    return recommendations

We retrieve name information of recommended menu items for interpretability.

In [18]:
#Convert id to integer type
menu.id = menu.id.astype(int)

#Retrieve name information of menu items involved in recommender system
menu = menu[menu['id'].isin(data.item_id)]
item_name = pd.DataFrame(menu[['id','name']])
data_with_name = data.merge(item_name,how='left',left_on='item_id',right_on='id')

Recommendations for users in The Party Organiser cluster are exported to csv.

In [19]:
#Get recommendations for users in The Party Organiser cluster
rec = pd.DataFrame()
for user in partyorg['0'].unique():
    recommendations = recommend(user, sparse_mat, user_vecs, item_vecs)
    rec = rec.append(recommendations[['user_id','name']],sort=False)

In [20]:
#Output to csv
rec.to_csv('../output/partyorg_recommendations.csv')

Uncomment below to view recommendations:

In [23]:
# #Print recommendations for users in The Party Organiser cluster
# for user in partyorg['0'].unique():
#     recommendations = recommend(user, sparse_mat, user_vecs, item_vecs)

#     print( '\n\nTRANSACTION HISTORY FOR USER : ' + str(user) + '\n' + '-'*80)
#     print( data_with_name[data_with_name['user_id']==user][['name','item_id','quantity',]])
#     print( '\nRECOMMEND FOLLOWING ITEMS: \n')
#     print( recommendations['name'])
#     print( '='*80)