<div class="alert alert-success alertinfo" style="margin-top: 0px">
<h1>  Model 2. Hybrid (Collaborative filtering/Content based) LightFM </h1>  
</div>

# 1. Imports

In [1]:
# Turning off warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
# Data Manipulation
import sys
import random
import pandas as pd
import numpy as np

# Preprocessing
import missingno
import scipy.sparse as sparse
from scipy.sparse.linalg import spsolve
from sklearn.preprocessing import MaxAbsScaler

# Visualization 
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns
plt.style.use('seaborn-whitegrid')

# Model
from lightfm import LightFM
from lightfm.evaluation import precision_at_k
from lightfm.evaluation import auc_score
from lightfm.evaluation import reciprocal_rank

In [3]:
# Import data
train = pd.read_csv(r"C:\Users\giuse\Desktop\job seeking\DATAscience_interview\train_modelled.csv")
test = pd.read_csv(r"C:\Users\giuse\Desktop\job seeking\DATAscience_interview\test_modelled.csv")

# 2. Data View

In [4]:
train.head()

Unnamed: 0,USER_ID,CURRENCY,CURRENCY_IS_CRYPTO,GAME_TITLE,GAME_TYPE,GAME_PROVIDER,USER_CODE,GAME_CODE,unique,FEEDBACK_bet_count,FEEDBACK_bet_amount_euro,FEEDBACK_bet_amount_currency,FEEDBACK
0,1,BTC,Y,Slotomon Go,slots,enigmatic,0,2027,1,-1,-1,-1,-1
1,5,EUR,N,Fire Lightning,slots,enigmatic,1,839,26,-1,-1,-1,-1
2,181,DOG,Y,4 Horsemen,slots,spinomenal,2,31,46,1,-1,2,2
3,1939,BTC,Y,Boomerang Bonanza,slots,booming,3,403,57,-1,-1,-1,-1
4,6784,BTC,Y,Fantasy Park,slots,enigmatic,4,818,75,-1,-1,-1,-1


In [5]:
train.shape

(787487, 13)

In [6]:
test.head()

Unnamed: 0,USER_ID,CURRENCY,CURRENCY_IS_CRYPTO,GAME_TITLE,GAME_TYPE,GAME_PROVIDER,USER_CODE,GAME_CODE,unique,FEEDBACK_bet_count,FEEDBACK_bet_amount_euro,FEEDBACK_bet_amount_currency,FEEDBACK
0,5,BTC,Y,Bac Agin,card,asiagaming,1,198,3,-1,-1,-1,-1
1,5,BTC,Y,Local Pub,slots,belatra,1,1385,12,-1,-1,-1,-1
2,5,BTC,Y,Sic Bo,craps,enigmatic,1,1993,16,-1,-1,-1,-1
3,5,BTC,Y,Triple Star,slots,wazdan,1,2340,18,-1,-1,-1,-1
4,5,EUR,N,Speed Auto Roulette,roulette,evolution,1,2056,32,-1,-1,-1,-1


In [7]:
test.shape

(196872, 13)

In [8]:
# Data info

print('TRAIN')
print('          {} rows and {} columns .'.format(train.shape[0],train.shape[1]))
print('          {} unique users.'.format(len(train['USER_ID'].unique())))
print('          {} unique game titles.'.format(len(train['GAME_TITLE'].unique())))
print('TEST')
print('          {} rows and {} columns .'.format(test.shape[0],test.shape[1]))
print('          {} unique users.'.format(len(test['USER_ID'].unique())))
print('          {} unique game titles.'.format(len(test['GAME_TITLE'].unique())))

TRAIN
          787487 rows and 13 columns .
          39326 unique users.
          2569 unique game titles.
TEST
          196872 rows and 13 columns .
          24754 unique users.
          2450 unique game titles.


In [9]:
#train, test = train.align(test, join='left', axis=1)

# 3. Methodology

<font size="4"> Most of recommender systems are either content based or collaborative filtering.  </font>  
 
<h2>Content based:</h2>

<h3>$Advantages$</h3>

    * The model doesn't need any data about other users, since the recommendations are specific to this user. This makes it easier to scale to a large number of users.
    * The model can capture the specific interests of a user, and can recommend niche items that very few other users are interested in.
    
<h3>$Disadvantages$</h3>

    * Since the feature representation of the items are hand-engineered to some extent, this technique requires a lot of domain knowledge. Therefore, the model can only be as good as the hand-engineered features.
    * The model can only make recommendations based on existing interests of the user. In other words, the model has limited ability to expand on the users' existing interests.
    
<h2>Collaborative filtering:</h2>

<h3>$Advantages$</h3>

    * No domain knowledge needed - the embeddings are automatically learned.
    * Serendipity. The model can help users discover new interests. In isolation, the ML system may not know the user is interested in a given item, but the model might still recommend it because similar users are interested in that item.
    * Great starting point -To some extent, the system needs only the feedback matrix to train a matrix factorization model. In particular, the system doesn't need contextual features. In practice, this can be used as one of multiple candidate generators.
    
<h3>$Disadvantages$</h3>

    * Cold-start problem - The prediction of the model for a given (user, item) pair is the dot product of the corresponding embeddings. So, if an item is not seen during training, the system can't create an embedding for it and can't query the model with this item.
    * No side features - The model will not take into account any side features therefore possibly missing valuable information.   

<font size="4">As we can see those two types of recommender systems compliment one another. Ideally we would want both of them to get all advantages and negate the disadvantages. LIGHTFM model - which is a hybrid recommender system - allow us to do exactly that. It is a combination of collaborative filtering and content based recommender system</font>

# Creation of sparse matrix (interactions)

### preview interaction matrices as data frames

In [9]:
# Training set USER_ITEM
train_interactions_df = sparse.csr_matrix((train['FEEDBACK'].astype(float), (train['USER_CODE'], train['GAME_CODE'])))
train_interactions_df = train_interactions_df.toarray()
train_interactions_df = pd.DataFrame(data=train_interactions_df)
train_interactions_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2559,2560,2561,2562,2563,2564,2565,2566,2567,2568
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39321,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39322,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39323,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39324,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
# Testing set USER_ITEM
test_interactions_df = sparse.csr_matrix((test['FEEDBACK'].astype(float), (test['USER_CODE'], test['GAME_CODE'])))
test_interactions_df = test_interactions_df.toarray()
test_interactions_df = pd.DataFrame(test_interactions_df)
test_interactions_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,2559,2560,2561,2562,2563,2564,2565,2566,2567,2568
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39321,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39322,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39323,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39324,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### If there is a missmatch uncomment cells below

In [12]:
#train_interactions_df, test_interactions_df = train_interactions_df.align(test_interactions_df, join='right', axis=0)

In [11]:
train_interactions_df.shape

(39326, 2569)

In [12]:
test_interactions_df.shape

(39326, 2569)

### back to matrices

In [13]:
train_interactions = sparse.csr_matrix(train_interactions_df.values)
test_interactions = sparse.csr_matrix(test_interactions_df.values)

In [14]:
train_interactions

<39326x2569 sparse matrix of type '<class 'numpy.float64'>'
	with 755337 stored elements in Compressed Sparse Row format>

In [15]:
test_interactions

<39326x2569 sparse matrix of type '<class 'numpy.float64'>'
	with 194383 stored elements in Compressed Sparse Row format>

### If Normalizing is needed, uncomment cells below

In [16]:
scaler1 = MaxAbsScaler().fit(train_interactions)
train_interactions = scaler1.transform(train_interactions)
scaler2 = MaxAbsScaler().fit(test_interactions)
test_interactions = scaler2.transform(test_interactions)

# 4. Model 1 lightFM - collaborative filtering without user features

In [17]:
model1 = LightFM(learning_rate=0.01, loss='warp')
model1.fit(train_interactions,          
          epochs=20)

<lightfm.lightfm.LightFM at 0x20b16ce4e88>

### evaluation

In [18]:
train_precision = precision_at_k(model1, train_interactions, k=3).mean()
test_precision = precision_at_k(model1, test_interactions,  k=3).mean()
print('train precision_at_k is {}'.format(train_precision))
print('test precision_at_k is {}'.format(test_precision))
print('-------------------------')
train_auc = auc_score(model1, train_interactions).mean()
test_auc = auc_score(model1, test_interactions).mean()
print('train auc is {}'.format(train_auc))
print('test auc is {}'.format(test_auc))
print('-------------------------')
train_MRR = reciprocal_rank(model1, train_interactions).mean()
test_MRR = reciprocal_rank(model1, test_interactions).mean()
print('train MRR is {}'.format(train_MRR))
print('test MRR is {}'.format(test_MRR ))
print('-------------------------')

train precision_at_k is 0.2684977948665619
test precision_at_k is 0.09728506952524185
-------------------------
train auc is 0.9173497557640076
test auc is 0.907015860080719
-------------------------
train MRR is 0.4475768804550171
test MRR is 0.22650407254695892
-------------------------


### Hyperparameter tuning

<font size='4'>Available:</font>
 - <font size='4'> learning rate </font>
 - <font size='4'> learning schedule </font>
 - <font size='4'> no of components </font>
 - <font size='4'> loss </font> 

In [19]:
model1 = LightFM(learning_rate=0.01, loss='warp',learning_schedule='adagrad')
model1.fit(train_interactions,epochs=20)

<lightfm.lightfm.LightFM at 0x20b16cb7d48>

In [20]:
train_precision = precision_at_k(model1, train_interactions, k=3).mean()
test_precision = precision_at_k(model1, test_interactions,  k=3).mean()
print('train precision_at_k is {}'.format(train_precision))
print('test precision_at_k is {}'.format(test_precision))
print('-------------------------')
train_auc = auc_score(model1, train_interactions).mean()
test_auc = auc_score(model1, test_interactions).mean()
print('train auc is {}'.format(train_auc))
print('test auc is {}'.format(test_auc))
print('-------------------------')
train_MRR = reciprocal_rank(model1, train_interactions).mean()
test_MRR = reciprocal_rank(model1, test_interactions).mean()
print('train MRR is {}'.format(train_MRR))
print('test MRR is {}'.format(test_MRR ))
print('-------------------------')

train precision_at_k is 0.26418277621269226
test precision_at_k is 0.09681372344493866
-------------------------
train auc is 0.9167113304138184
test auc is 0.9061096906661987
-------------------------
train MRR is 0.43743377923965454
test MRR is 0.2225571870803833
-------------------------


In [21]:
model2 = LightFM(learning_rate=0.01, loss='warp',learning_schedule='adadelta')
model2.fit(train_interactions,epochs=20)

<lightfm.lightfm.LightFM at 0x20b16cb0448>

In [22]:
train_precision = precision_at_k(model2, train_interactions, k=3).mean()
test_precision = precision_at_k(model2, test_interactions,  k=3).mean()
print('train precision_at_k is {}'.format(train_precision))
print('test precision_at_k is {}'.format(test_precision))
print('-------------------------')
train_auc = auc_score(model2, train_interactions).mean()
test_auc = auc_score(model2, test_interactions).mean()
print('train auc is {}'.format(train_auc))
print('test auc is {}'.format(test_auc))
print('-------------------------')
train_MRR = reciprocal_rank(model2, train_interactions).mean()
test_MRR = reciprocal_rank(model2, test_interactions).mean()
print('train MRR is {}'.format(train_MRR))
print('test MRR is {}'.format(test_MRR ))
print('-------------------------')

train precision_at_k is 0.2668277323246002
test precision_at_k is 0.09943976998329163
-------------------------
train auc is 0.9329794049263
test auc is 0.9203798770904541
-------------------------
train MRR is 0.44007936120033264
test MRR is 0.23454837501049042
-------------------------


In [23]:
model3 = LightFM(no_components=20, learning_rate=0.01, loss='warp',learning_schedule='adadelta')
model3.fit(train_interactions,epochs=20)

<lightfm.lightfm.LightFM at 0x20b17758088>

In [24]:
train_precision = precision_at_k(model3, train_interactions, k=3).mean()
test_precision = precision_at_k(model3, test_interactions,  k=3).mean()
print('train precision_at_k is {}'.format(train_precision))
print('test precision_at_k is {}'.format(test_precision))
print('-------------------------')
train_auc = auc_score(model3, train_interactions).mean()
test_auc = auc_score(model3, test_interactions).mean()
print('train auc is {}'.format(train_auc))
print('test auc is {}'.format(test_auc))
print('-------------------------')
train_MRR = reciprocal_rank(model3, train_interactions).mean()
test_MRR = reciprocal_rank(model3, test_interactions).mean()
print('train MRR is {}'.format(train_MRR))
print('test MRR is {}'.format(test_MRR ))
print('-------------------------')

train precision_at_k is 0.2757968604564667
test precision_at_k is 0.0980122908949852
-------------------------
train auc is 0.9357632398605347
test auc is 0.9199570417404175
-------------------------
train MRR is 0.45170366764068604
test MRR is 0.2321995347738266
-------------------------


In [25]:
model4 = LightFM(no_components=30, learning_rate=0.01, loss='warp',learning_schedule='adadelta')
model4.fit(train_interactions,epochs=20)

<lightfm.lightfm.LightFM at 0x20b1775c088>

In [26]:
train_precision = precision_at_k(model4, train_interactions, k=3).mean()
test_precision = precision_at_k(model4, test_interactions,  k=3).mean()
print('train precision_at_k is {}'.format(train_precision))
print('test precision_at_k is {}'.format(test_precision))
print('-------------------------')
train_auc = auc_score(model4, train_interactions).mean()
test_auc = auc_score(model4, test_interactions).mean()
print('train auc is {}'.format(train_auc))
print('test auc is {}'.format(test_auc))
print('-------------------------')
train_MRR = reciprocal_rank(model4, train_interactions).mean()
test_MRR = reciprocal_rank(model4, test_interactions).mean()
print('train MRR is {}'.format(train_MRR))
print('test MRR is {}'.format(test_MRR ))
print('-------------------------')

train precision_at_k is 0.28776705265045166
test precision_at_k is 0.0950360968708992
-------------------------
train auc is 0.9366611242294312
test auc is 0.9190268516540527
-------------------------
train MRR is 0.46155914664268494
test MRR is 0.2291492521762848
-------------------------


In [27]:
model5 = LightFM(no_components=40, learning_rate=0.01, loss='warp',learning_schedule='adadelta')
model5.fit(train_interactions,epochs=20)

<lightfm.lightfm.LightFM at 0x20b16ce2b48>

In [28]:
train_precision = precision_at_k(model5, train_interactions, k=3).mean()
test_precision = precision_at_k(model5, test_interactions,  k=3).mean()
print('train precision_at_k is {}'.format(train_precision))
print('test precision_at_k is {}'.format(test_precision))
print('-------------------------')
train_auc = auc_score(model5, train_interactions).mean()
test_auc = auc_score(model5, test_interactions).mean()
print('train auc is {}'.format(train_auc))
print('test auc is {}'.format(test_auc))
print('-------------------------')
train_MRR = reciprocal_rank(model5, train_interactions).mean()
test_MRR = reciprocal_rank(model5, test_interactions).mean()
print('train MRR is {}'.format(train_MRR))
print('test MRR is {}'.format(test_MRR ))
print('-------------------------')

train precision_at_k is 0.29342150688171387
test precision_at_k is 0.09401261806488037
-------------------------
train auc is 0.9361026287078857
test auc is 0.9165264964103699
-------------------------
train MRR is 0.4665970504283905
test MRR is 0.22753047943115234
-------------------------


### Evaluations findings
<font size="4">Precision@2 measures the proportion of positive items among the K highest-ranked items. Here out of 3 games that we recommend to the players roughly 10% will be a good recommendation</font>

<font size="4">AUC measures the quality of the overall ranking. Here we can see that 90% of games are classified in a correct order.</font>

### Predictions

In [29]:
games = test['GAME_CODE'].unique()
def get_recommendations(user_number,model):
    user_code = int(train.USER_CODE.loc[train.USER_ID == user_number].iloc[0])
    my_preds = model.predict(int(user_code), games)
    recommendations = []
    for i,pred in enumerate(my_preds):
        gamecode = games[i]
        game_title = train[train.GAME_CODE== gamecode].GAME_TITLE.values[0]
        recommendations.append([game_title,pred])
    res = pd.DataFrame(recommendations,columns=['game_title','score'])
    top3 = res.sort_values(by='score',ascending=False).iloc[:3,::]
    return top3

In [30]:
test['USER_ID'].unique()

array([      5,     181,    1939, ..., 2583538, 2583547, 2583705],
      dtype=int64)

In [31]:
get_recommendations(5,model1)

Unnamed: 0,game_title,score
783,Wolf Treasure,2.030239
47,Fire Lightning,1.908777
34,Wolf Gold,1.896477


In [32]:
get_recommendations(181,model1)

Unnamed: 0,game_title,score
34,Wolf Gold,2.253771
47,Fire Lightning,2.241542
14,Aztec Magic Deluxe,2.099932


In [33]:
get_recommendations(1939,model1)

Unnamed: 0,game_title,score
47,Fire Lightning,2.497816
14,Aztec Magic Deluxe,2.399504
97,Platinum Lightning,2.248054


In [34]:
get_recommendations(2583538,model1)

Unnamed: 0,game_title,score
34,Wolf Gold,2.030888
47,Fire Lightning,1.948727
783,Wolf Treasure,1.900203


In [35]:
get_recommendations(2583547,model1)

Unnamed: 0,game_title,score
47,Fire Lightning,1.985997
783,Wolf Treasure,1.985149
34,Wolf Gold,1.976935


In [36]:
# Predictions for whole test set
#for i in games:
#    get_recommendations(i,model1)

# 5. Preparation for Hybrid

### Adding User features

In [37]:
def unify(data):
    '''
    Changes all values in a data frame to binary 0-1.
    1 for any present value and 0 otherwise.
    Please notice that this function is designed
    only for positive values which is satisfactory in our example.
    
    Parameters
    ----------
    data: Dataframe
        Pandas Dataframe for Users 
            
    Returns
    -------
    data : Dataframe
        Dataframe with binary values only.    
    
    '''
    return data.apply(lambda item: 1 if item > 0 else 0)

In [38]:
# Train USER_FEATURES
user_features = train[['USER_CODE','GAME_TYPE']]
user_features.rename(columns={'GAME_TYPE':'Feature'},inplace=True)
user_features = pd.crosstab(user_features.USER_CODE,user_features.Feature)
user_features = user_features.apply(unify,axis=0)

In [39]:
user_features

Feature,card,casual,craps,lottery,poker,roulette,slots,video_poker
USER_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,0,0,0,0,0,0,1,0
1,1,1,0,0,0,1,1,0
2,0,0,0,0,0,0,1,0
3,0,0,0,0,0,0,1,0
4,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...
39321,1,0,0,0,0,0,1,0
39322,1,0,0,0,0,1,1,0
39323,0,0,0,0,0,0,1,0
39324,0,0,0,0,0,0,1,0


### Adding item features

In [40]:
item_features = train[['GAME_CODE','GAME_TYPE']]
item_features.rename(columns={'GAME_TYPE':'Feature'},inplace=True)
item_features = pd.crosstab(item_features.GAME_CODE,item_features.Feature)
item_features = item_features.apply(unify,axis=0)
item_features_matrix = sparse.csr_matrix(item_features.values)
item_features

Feature,card,casual,craps,lottery,poker,roulette,slots,video_poker
GAME_CODE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,0,0,0,0,0,0,1,0
1,0,0,0,0,0,0,1,0
2,0,0,0,0,0,0,1,0
3,0,0,0,0,0,0,1,0
4,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...
2564,0,0,0,0,0,0,1,0
2565,0,0,0,0,0,0,1,0
2566,0,0,0,0,0,0,1,0
2567,0,0,0,0,0,0,1,0


<div class="alert alert-danger alertinfo" style="margin-top: 0px">
<h1>  Important! </h1>
<h3>  when supplying feature matrices, an implicit identity feature
    matrix will no longer be used. This may result in a less expressive model:
    because no per-user features are estimated, the model may underfit. To
    combat this, we must include per-user (per-item) features (that is, an identity
    matrix) as part of the feature matrix we supply. </h3>    
</div>   

### Creating item-user interaction matrix

In [41]:
# Training set ITEM_USER
item_user_df = sparse.csr_matrix((train['FEEDBACK'].astype(float), (train['GAME_CODE'], train['USER_CODE'])))
item_user_df = item_user_df.toarray()
item_user_df = pd.DataFrame(data=item_user_df)
item_user_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,39316,39317,39318,39319,39320,39321,39322,39323,39324,39325
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2564,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2565,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2566,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2567,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Merging features with interactions

In [42]:
# Train USER_FEATURES Merge with user-item (train interactions)
user_features = pd.concat([user_features, train_interactions_df], axis=1, sort=False)
user_features

Unnamed: 0,card,casual,craps,lottery,poker,roulette,slots,video_poker,0,1,...,2559,2560,2561,2562,2563,2564,2565,2566,2567,2568
0,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,1,0,0,0,1,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
39321,1,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39322,1,0,0,0,0,1,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39323,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
39324,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [43]:
# Train ITEM_FEATURES Merge with item-user
item_features = pd.concat([item_features, item_user_df], axis=1, sort=False)
item_features

Unnamed: 0,card,casual,craps,lottery,poker,roulette,slots,video_poker,0,1,...,39316,39317,39318,39319,39320,39321,39322,39323,39324,39325
0,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2564,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2565,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2566,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2567,0,0,0,0,0,0,1,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [44]:
# Back to sparse matrix and normalize
user_features = sparse.csr_matrix(user_features.values)
scaler3 = MaxAbsScaler().fit(user_features)
user_features = scaler3.transform(user_features)

item_features = sparse.csr_matrix(item_features.values)
scaler4 = MaxAbsScaler().fit(item_features)
item_features = scaler4.transform(item_features)

In [45]:
user_features

<39326x2577 sparse matrix of type '<class 'numpy.float64'>'
	with 810644 stored elements in Compressed Sparse Row format>

In [46]:
item_features

<2569x39334 sparse matrix of type '<class 'numpy.float64'>'
	with 757908 stored elements in Compressed Sparse Row format>

# 6.Model 2 Hybrid Light FM

In [47]:
model6 = LightFM(learning_rate=0.05, loss='warp')
model6.fit(train_interactions,
           user_features,
           item_features,
           epochs=10)

<lightfm.lightfm.LightFM at 0x20b17754b88>

In [48]:
train_interactions.shape

(39326, 2569)

In [49]:
user_features.shape

(39326, 2577)

In [50]:
item_features.shape

(2569, 39334)

In [None]:
#train_auc = auc_score(model6, train_interactions, user_features, item_features,check_intersections=False).mean()
#test_auc = auc_score(model2, test_interactions, train_user_features, train_item_features,).mean()

In [52]:
get_recommendations(5,model6)

Unnamed: 0,game_title,score
27,Mr. Vegas,132109.8125
223,Princess Of Swamp,53448.289062
774,Secret of the Stones Touch,48048.535156


In [53]:
get_recommendations(181,model6)

Unnamed: 0,game_title,score
27,Mr. Vegas,1977468.0
223,Princess Of Swamp,779169.3
774,Secret of the Stones Touch,710044.6


In [54]:
get_recommendations(1939,model6)

Unnamed: 0,game_title,score
27,Mr. Vegas,40069.511719
223,Princess Of Swamp,18890.115234
774,Secret of the Stones Touch,16763.443359
