# Anime Recommender 04: Modelling and Conclusion

## LightFM model with user features and item features

Identified 6 possible variants of user features and will pick the feature that yields the best model scores
- User feature type 1: Genres that the user likes (defined as genres of animes that the user has rated equal to or greater than their average rating)
- User feature type 2: Genres that the user likes (defined as genres of animes that the user has rated equal to or greater than their 75th percentile rating)
- User feature type 3: Top 3 genres among the genres that the user likes (defined as top 3 genres of animes that the user has rated equal to or greater than their average rating)
- User feature type 4: Top 5 genres among the genres that the user likes (defined as top 5 genres of animes that the user has rated equal to or greater than their average rating)
- User feature type 5: Top 3 genres among the genres that the user likes (defined as top 3 genres of animes that the user has rated equal to or greater than their 75th percentile rating)
- User feature type 6: Top 5 genres among the genres that the user likes (defined as top 5 genres of animes that the user has rated equal to or greater than their 75th percentile rating)

## Imports

In [2]:
import time
import itertools
import pandas as pd
import numpy as np
import pickle

from lightfm import LightFM
from lightfm.data import Dataset
from lightfm import cross_validation
from lightfm.evaluation import precision_at_k as lightfm_prec_at_k
from lightfm.evaluation import recall_at_k as lightfm_recall_at_k, auc_score
from lightfm.evaluation import reciprocal_rank as lightfm_reciprocal_rank
from recommenders.models.lightfm.lightfm_utils import similar_items
from scipy import sparse
from tqdm import tqdm
from tqdm.auto import trange



In [4]:
# default number of recommendations
K = 10
# percentage of data used for testing
TEST_PERCENTAGE = 0.25
# no of threads to fit model
NO_THREADS = 32
# seed for pseudonumber generations
SEEDNO = 42

In [5]:
data = pd.read_pickle("../data/users_db_feateng.pkl")
print(data.shape)
data.head()

(912234, 10)


Unnamed: 0,user_id,media_id,rating,genres,liked_genres_avg,liked_genres_75pct,top_3_genres_avg,top_5_genres_avg,top_3_genres_75pct,top_5_genres_75pct
0,710080,1535,70,"[Mystery, Psychological, Supernatural, Thriller]","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Supernatural]","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Slice of Life]"
1,710080,21459,70,"[Action, Adventure, Comedy]","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Supernatural]","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Slice of Life]"
2,710080,113415,70,"[Action, Drama, Supernatural]","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Supernatural]","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Slice of Life]"
3,710080,11757,70,"[Action, Adventure, Fantasy, Romance]","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Supernatural]","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Slice of Life]"
4,710080,5114,70,"[Action, Adventure, Drama, Fantasy]","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Adventure, Comedy, Drama, Ecchi, Fant...","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Supernatural]","[Action, Comedy, Drama]","[Action, Comedy, Drama, Fantasy, Slice of Life]"


In [21]:
anime_id_map = pd.read_pickle("../data/anime_id_mapping.pkl")
print(anime_id_map.shape)
anime_id_map.head()

(6195, 4)


Unnamed: 0,id,title_romaji,title_english,genres
0,1,Cowboy Bebop,Cowboy Bebop,"[Action, Adventure, Drama, Sci-Fi]"
1,5,Cowboy Bebop: Tengoku no Tobira,Cowboy Bebop: The Movie - Knockin' on Heaven's...,"[Action, Drama, Mystery, Sci-Fi]"
2,6,TRIGUN,Trigun,"[Action, Adventure, Comedy, Drama, Sci-Fi]"
3,7,Witch Hunter ROBIN,Witch Hunter ROBIN,"[Action, Drama, Mystery, Supernatural]"
4,8,Bouken Ou Beet,Beet the Vandel Buster,"[Adventure, Fantasy, Supernatural]"


In [22]:
anime_id_map.rename(columns={'id':'media_id'},inplace=True)

In [8]:
def format_newuser_input(user_feature_map, user_feature_list):
    #user_feature_map = user_feature_map  
    num_features = len(user_feature_list)
    normalised_val = 1.0 
    target_indices = []
    for feature in user_feature_list:
        try:
            target_indices.append(user_feature_map[feature])
        except KeyError:
            print(f"new user feature encountered '{feature}'")
            pass
    #print("target indices: {}".format(target_indices))
    new_user_features = np.zeros(len(user_feature_map.keys()))
    for i in target_indices:
        new_user_features[i] = normalised_val
    new_user_features = sparse.csr_matrix(new_user_features)
    return(new_user_features)

## Baseline model

In [6]:
dataset_baseline = Dataset()

In [7]:
dataset_baseline.fit(data['user_id'],data['media_id'])

In [8]:
(interactions_baseline, weights_baseline) = dataset_baseline.build_interactions(data.iloc[:, 0:3].values)

In [9]:
train_interactions_baseline, test_interactions_baseline = cross_validation.random_train_test_split(interactions_baseline, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [14]:
model_baseline = LightFM(loss='warp', no_components = 10,
               learning_rate=.25,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [15]:
model_baseline.fit(interactions=train_interactions_baseline);

In [16]:
train_roc_auc_baseline = auc_score(model_baseline,train_interactions_baseline).mean()
test_roc_auc_baseline = auc_score(model_baseline,test_interactions_baseline).mean()
train_precision_lfm_baseline = lightfm_prec_at_k(model_baseline, train_interactions_baseline, k=K).mean()
test_precision_lfm_baseline = lightfm_prec_at_k(model_baseline, test_interactions_baseline, k=K).mean()
train_recall_lfm_baseline = lightfm_recall_at_k(model_baseline, train_interactions_baseline, k=K).mean()
test_recall_lfm_baseline = lightfm_recall_at_k(model_baseline, test_interactions_baseline, k=K).mean()
print("Baseline model")
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc_baseline}')
print(f'Test AUC: {test_roc_auc_baseline}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm_baseline}')
print(f'Test Precision: {test_precision_lfm_baseline}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm_baseline}')
print(f'Test Recall: {test_recall_lfm_baseline}')

Baseline model
=====AUC=====
Train AUC: 0.5814570784568787
Test AUC: 0.5537643432617188
=====Precision@K=====
Train Precision: 0.0014272527769207954
Test Precision: 0.00034896068973466754
=====Recall@K=====
Train Recall: 0.0012337842748706763
Test Recall: 0.0009256970245295735


## User feature type 1: Genres that the user likes (defined as genres of animes that the user has rated equal to or greater than their average rating)

In [4]:
data1 = data[['user_id','media_id','rating','genres','liked_genres_avg']]
data1.head()

Unnamed: 0,user_id,media_id,rating,genres,liked_genres_avg
0,710080,1535,70,"[Mystery, Psychological, Supernatural, Thriller]","[Action, Adventure, Comedy, Drama, Ecchi, Fant..."
1,710080,21459,70,"[Action, Adventure, Comedy]","[Action, Adventure, Comedy, Drama, Ecchi, Fant..."
2,710080,113415,70,"[Action, Drama, Supernatural]","[Action, Adventure, Comedy, Drama, Ecchi, Fant..."
3,710080,11757,70,"[Action, Adventure, Fantasy, Romance]","[Action, Adventure, Comedy, Drama, Ecchi, Fant..."
4,710080,5114,70,"[Action, Adventure, Drama, Fantasy]","[Action, Adventure, Comedy, Drama, Ecchi, Fant..."


In [5]:
all_anime_genres1 = sorted(list(set(itertools.chain.from_iterable(data1['genres']))))

In [6]:
all_liked_genres1 = sorted(list(set(itertools.chain.from_iterable(data1['liked_genres_avg']))))

In [7]:
dataset1 = Dataset()

In [8]:
dataset1.fit(data1['user_id'],data1['media_id'],user_features=all_liked_genres1,item_features=all_anime_genres1)

In [9]:
item_features1 = dataset1.build_item_features((x,y) for x,y in zip(data1['media_id'],data1['genres']))

In [10]:
user_features1 = dataset1.build_user_features((x,y) for x,y in zip(data1['user_id'],data1['liked_genres_avg']))

In [11]:
(interactions1, weights1) = dataset1.build_interactions(data1.iloc[:, 0:3].values)

In [12]:
train_interactions1, test_interactions1 = cross_validation.random_train_test_split(interactions1, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [13]:
model1 = LightFM(loss='warp', no_components = 10,
               learning_rate=.25,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [14]:
model1.fit(interactions=train_interactions1,item_features=item_features1,user_features=user_features1,epochs=20);

In [15]:
train_roc_auc1 = auc_score(model1,train_interactions1,item_features=item_features1,user_features=user_features1).mean()
test_roc_auc1 = auc_score(model1,test_interactions1,item_features=item_features1,user_features=user_features1).mean()
train_precision_lfm1 = lightfm_prec_at_k(model1, train_interactions1,item_features=item_features1,user_features=user_features1, k=K).mean()
test_precision_lfm1 = lightfm_prec_at_k(model1, test_interactions1,item_features=item_features1,user_features=user_features1, k=K).mean()
train_recall_lfm1 = lightfm_recall_at_k(model1, train_interactions1,item_features=item_features1,user_features=user_features1, k=K).mean()
test_recall_lfm1 = lightfm_recall_at_k(model1, test_interactions1,item_features=item_features1,user_features=user_features1, k=K).mean()
print("User feature type 1: Genres that the user likes (defined as genres of animes that the user has rated equal to or greater than their average rating)")
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc1}')
print(f'Test AUC: {test_roc_auc1}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm1}')
print(f'Test Precision: {test_precision_lfm1}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm1}')
print(f'Test Recall: {test_recall_lfm1}')

User feature type 1: Genres that the user likes (defined as genres of animes that the user has rated equal to or greater than their average rating)
=====AUC=====
Train AUC: 0.9531480073928833
Test AUC: 0.9488025307655334
=====Precision@K=====
Train Precision: 0.2186678946018219
Test Precision: 0.07796996086835861
=====Recall@K=====
Train Recall: 0.07146577015407743
Test Recall: 0.06673994254823816


In [16]:
uid_map1, ufeature_map1, iid_map1, ifeature_map1 = dataset1.mapping()

In [46]:
iid_map1_reverse = dict((value,key) for key,value in iid_map1.items())

### Predict for new user

In [19]:
# predict for new user
user_feature_list = ['Comedy','Romance','Slice of Life']

In [20]:
new_user_features = format_newuser_input(ufeature_map1, user_feature_list)

In [21]:
n_users1, n_items1 = interactions1.shape

In [22]:
model1_pred = model1.predict(0, np.arange(n_items1), user_features=new_user_features) # Here 0 means pick the first row of the user_features sparse matrix

In [28]:
pred_dict1 = {'media_id':data1['media_id'].unique(),'pred':model1_pred}

In [331]:
pred_df1 = pd.DataFrame(pred_dict1)

In [332]:
pred_df1 = pd.merge(pred_df1,anime_id_map,on='media_id')

In [37]:
pred_df1.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,5475.54248,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,5404.202148,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
101,21087,3992.224854,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
199,21519,3892.060059,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
52,20665,3788.651367,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
11,19815,3738.680176,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
3,11757,3112.205566,Sword Art Online,Sword Art Online,"[20021, 20594, 124140]","[Action, Adventure, Fantasy, Romance]"
18,20594,2958.083252,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"
104,21202,2848.407227,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
228,11061,2482.854248,HUNTER×HUNTER (2011),Hunter x Hunter (2011),"[13271, 136, 137, 138, 139, 19951]","[Action, Adventure, Fantasy]"


### Predict for existing user

In [39]:
user_x = uid_map1[710080]
model1_pred_existing = model1.predict(user_x,np.arange(n_items1))

In [40]:
pred_dict1_existing = {'media_id':data1['media_id'].unique(),'pred':model1_pred_existing}

In [41]:
pred_df1_existing = pd.DataFrame(pred_dict1_existing)

In [42]:
pred_df1_existing = pd.merge(pred_df1_existing,anime_id_map,on='media_id')

In [43]:
pred_df1_existing.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,8.502268,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,6.655103,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
199,21519,5.462017,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
454,21518,5.308857,Shokugeki no Souma: Ni no Sara,Food Wars! The Second Plate,"[20923, 98702, 99255]","[Comedy, Ecchi]"
52,20665,5.293135,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
101,21087,5.286254,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
11,19815,5.257683,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
23,124080,4.928625,Horimiya,Horimiya,[14753],"[Comedy, Romance, Slice of Life]"
104,21202,4.909211,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
18,20594,4.753656,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"


### Predict similar items

In [59]:
# Predicting for Yahari Ore no Seishun Love Come ga Machigatteiru.
# Comedy, Drama, Romance, Slice of Life
item_x = iid_map1[14813]

In [117]:
similar_pred1 = similar_items(item_id=item_x, item_features=item_features1, model=model1, N = 100)
similar_pred1['media_id'] = similar_pred1['itemID'].map(iid_map1_reverse)

In [118]:
similar_pred1 = pd.merge(similar_pred1,anime_id_map,on='media_id')

In [119]:
similar_pred1.sort_values('score',ascending=False).head(10)

Unnamed: 0,itemID,score,media_id,title_romaji,title_english,relations,genres
0,54,1.0,4224,Toradora!,Toradora!,"[6127, 11553]","[Comedy, Drama, Romance, Slice of Life]"
4,118,1.0,6045,Kimi ni Todoke,Kimi ni Todoke: From Me to You,[9656],"[Comedy, Drama, Romance, Slice of Life]"
6,721,1.0,18671,Chuunibyou demo Koi ga Shitai! Ren,"Love, Chunibyo & Other Delusions - Heart Throb -","[16934, 20582, 20777, 20889]","[Comedy, Drama, Romance, Slice of Life]"
5,1134,1.0,109261,5-Toubun no Hanayome ∬,The Quintessential Quintuplets 2,"[103572, 131520]","[Comedy, Drama, Romance, Slice of Life]"
1,63,1.0,14813,Yahari Ore no Seishun Love Come wa Machigatteiru.,My Teen Romantic Comedy SNAFU,"[18753, 20698]","[Comedy, Drama, Romance, Slice of Life]"
3,765,1.0,108489,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU Climax!,"[20698, 128643]","[Comedy, Drama, Romance, Slice of Life]"
2,65,1.0,14741,Chuunibyou demo Koi ga Shitai!,"Love, Chunibyo & Other Delusions","[16934, 15687, 15879, 19021]","[Comedy, Drama, Romance, Slice of Life]"
10,451,1.0,853,Ouran Koukou Host Club,Ouran High School Host Club,,"[Comedy, Drama, Romance, Slice of Life]"
12,727,1.0,97766,Gamers!,GAMERS!,,"[Comedy, Drama, Romance, Slice of Life]"
11,224,1.0,103572,5-Toubun no Hanayome,The Quintessential Quintuplets,[109261],"[Comedy, Drama, Romance, Slice of Life]"


### Preliminary analysis for user feature type 1

Top 10 predictions for new and existing user are almost the same, but in different order for some titles. Model might just be predicting animes that are highly rated, and majority of the top 10 predictions are also the most popular animes based on the EDA of the dataset.

The predictions for similar items look extremely good.

## User feature type 2: Genres that the user likes (defined as genres of animes that the user has rated equal to or greater than their 75th percentile rating)

In [63]:
data2 = data[['user_id','media_id','rating','genres','liked_genres_75pct']]

In [64]:
all_anime_genres2 = sorted(list(set(itertools.chain.from_iterable(data2['genres']))))

In [65]:
all_liked_genres2 = sorted(list(set(itertools.chain.from_iterable(data2['liked_genres_75pct']))))

In [66]:
dataset2 = Dataset()

In [67]:
dataset2.fit(data2['user_id'],data2['media_id'],user_features=all_liked_genres2,item_features=all_anime_genres2)

In [68]:
item_features2 = dataset2.build_item_features((x,y) for x,y in zip(data2['media_id'],data2['genres']))

In [74]:
user_features2 = dataset2.build_user_features((x,y) for x,y in zip(data2['user_id'],data2['liked_genres_75pct']))

In [75]:
(interactions2, weights2) = dataset2.build_interactions(data2.iloc[:, 0:3].values)

In [76]:
train_interactions2, test_interactions2 = cross_validation.random_train_test_split(interactions2, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [77]:
model2 = LightFM(loss='warp', no_components = 10,
               learning_rate=.25,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [78]:
model2.fit(interactions=train_interactions2,item_features=item_features2,user_features=user_features2,epochs=20);

In [123]:
train_roc_auc2 = auc_score(model2,train_interactions2,item_features=item_features2,user_features=user_features2).mean()
test_roc_auc2 = auc_score(model2,test_interactions2,item_features=item_features2,user_features=user_features2).mean()
train_precision_lfm2 = lightfm_prec_at_k(model2, train_interactions2,item_features=item_features2,user_features=user_features2, k=K).mean()
test_precision_lfm2 = lightfm_prec_at_k(model2, test_interactions2,item_features=item_features2,user_features=user_features2, k=K).mean()
train_recall_lfm2 = lightfm_recall_at_k(model2, train_interactions2,item_features=item_features2,user_features=user_features2, k=K).mean()
test_recall_lfm2 = lightfm_recall_at_k(model2, test_interactions2,item_features=item_features2,user_features=user_features2, k=K).mean()
print("User feature type 2: Genres that the user likes")
print("(defined as genres of animes that the user has rated equal to or greater than their 75th percentile rating)")
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc2}')
print(f'Test AUC: {test_roc_auc2}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm2}')
print(f'Test Precision: {test_precision_lfm2}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm2}')
print(f'Test Recall: {test_recall_lfm2}')

User feature type 2: Genres that the user likes
(defined as genres of animes that the user has rated equal to or greater than their 75th percentile rating)
=====AUC=====
Train AUC: 0.9528266787528992
Test AUC: 0.9484575986862183
=====Precision@K=====
Train Precision: 0.22304196655750275
Test Precision: 0.07997269928455353
=====Recall@K=====
Train Recall: 0.07665977761628706
Test Recall: 0.07269271179681663


In [80]:
uid_map2, ufeature_map2, iid_map2, ifeature_map2 = dataset2.mapping()

In [81]:
iid_map2_reverse = dict((value,key) for key,value in iid_map2.items())

### Predict for new user

In [82]:
# predict for new user
user_feature_list = ['Comedy','Romance','Slice of Life']

In [83]:
new_user_features = format_newuser_input(ufeature_map2, user_feature_list)

In [84]:
n_users2, n_items2 = interactions2.shape

In [85]:
model2_pred = model2.predict(0, np.arange(n_items2), user_features=new_user_features) # Here 0 means pick the first row of the user_features sparse matrix

In [86]:
pred_dict2 = {'media_id':data2['media_id'].unique(),'pred':model2_pred}

In [329]:
pred_df2 = pd.DataFrame(pred_dict2)

In [330]:
pred_df2 = pd.merge(pred_df2,anime_id_map,on='media_id')

In [89]:
pred_df2.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,5480.367676,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,5147.679688,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
199,21519,3706.275879,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
101,21087,3647.330566,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
52,20665,3589.352783,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
11,19815,3537.028076,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
3,11757,2887.373047,Sword Art Online,Sword Art Online,"[20021, 20594, 124140]","[Action, Adventure, Fantasy, Romance]"
18,20594,2771.747559,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"
104,21202,2555.984619,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
228,11061,2314.376953,HUNTER×HUNTER (2011),Hunter x Hunter (2011),"[13271, 136, 137, 138, 139, 19951]","[Action, Adventure, Fantasy]"


### Predict for existing user

In [128]:
user_x = uid_map2[710080]
model2_pred_existing = model2.predict(user_x,np.arange(n_items2))

In [129]:
pred_dict2_existing = {'media_id':data2['media_id'].unique(),'pred':model2_pred_existing}

In [130]:
pred_df2_existing = pd.DataFrame(pred_dict2_existing)

In [131]:
pred_df2_existing = pd.merge(pred_df2_existing,anime_id_map,on='media_id')

In [132]:
pred_df2_existing.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,9.21361,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,7.38326,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
199,21519,6.11255,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
52,20665,5.961819,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
11,19815,5.902261,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
101,21087,5.864925,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
454,21518,5.687229,Shokugeki no Souma: Ni no Sara,Food Wars! The Second Plate,"[20923, 98702, 99255]","[Comedy, Ecchi]"
23,124080,5.399632,Horimiya,Horimiya,[14753],"[Comedy, Romance, Slice of Life]"
104,21202,5.32418,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
18,20594,5.302322,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"


### Predict similar items

In [124]:
# Predicting for Yahari Ore no Seishun Love Come ga Machigatteiru.
# Comedy, Drama, Romance, Slice of Life
item_x = iid_map2[14813]

In [125]:
similar_pred2 = similar_items(item_id=item_x, item_features=item_features2, model=model2,N=100)
similar_pred2['media_id'] = similar_pred2['itemID'].map(iid_map2_reverse)

In [126]:
similar_pred2 = pd.merge(similar_pred2,anime_id_map,on='media_id')

In [127]:
similar_pred2.sort_values('score',ascending=False).head(10)

Unnamed: 0,itemID,score,media_id,title_romaji,title_english,relations,genres
0,54,1.0,4224,Toradora!,Toradora!,"[6127, 11553]","[Comedy, Drama, Romance, Slice of Life]"
2,483,1.0,98291,Tsurezure Children,Tsuredure Children,,"[Comedy, Drama, Romance, Slice of Life]"
3,721,1.0,18671,Chuunibyou demo Koi ga Shitai! Ren,"Love, Chunibyo & Other Delusions - Heart Throb -","[16934, 20582, 20777, 20889]","[Comedy, Drama, Romance, Slice of Life]"
4,455,1.0,20596,Ao Haru Ride,Blue Spring Ride,"[20900, 20837]","[Comedy, Drama, Romance, Slice of Life]"
5,515,1.0,21049,ReLIFE,ReLIFE,[98635],"[Comedy, Drama, Romance, Slice of Life]"
6,1134,1.0,109261,5-Toubun no Hanayome ∬,The Quintessential Quintuplets 2,"[103572, 131520]","[Comedy, Drama, Romance, Slice of Life]"
1,727,1.0,97766,Gamers!,GAMERS!,,"[Comedy, Drama, Romance, Slice of Life]"
10,765,1.0,108489,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU Climax!,"[20698, 128643]","[Comedy, Drama, Romance, Slice of Life]"
13,118,1.0,6045,Kimi ni Todoke,Kimi ni Todoke: From Me to You,[9656],"[Comedy, Drama, Romance, Slice of Life]"
12,112,1.0,13759,Sakurasou no Pet na Kanojo,The Pet Girl of Sakurasou,,"[Comedy, Drama, Romance, Slice of Life]"


### Preliminary analysis for user feature type 2

Similar to type 1, top 10 predictions for new and existing user are the same, although in different order. Model might just be predicting animes that are highly rated, and majority of the top 10 predictions are also the most popular animes based on the EDA of the dataset.

The results of the similar item predictions also look extremely good.

In terms of model scores, type 1 & 2 have similar AUC scores, but type 2 has better recall.

Between type 1 and 2, I would select type 2

### User feature type 3: Top 3 genres among the genres that the user likes (defined as top 3 genres of animes that the user has rated equal to or greater than their average rating)

In [133]:
data3 = data[['user_id','media_id','rating','genres','top_3_genres_avg']]

In [134]:
all_anime_genres3 = sorted(list(set(itertools.chain.from_iterable(data3['genres']))))

In [135]:
all_liked_genres3 = sorted(list(set(itertools.chain.from_iterable(data3['top_3_genres_avg']))))

In [136]:
dataset3 = Dataset()

In [137]:
dataset3.fit(data3['user_id'],data3['media_id'],user_features=all_liked_genres3,item_features=all_anime_genres3)

In [138]:
item_features3 = dataset3.build_item_features((x,y) for x,y in zip(data3['media_id'],data3['genres']))

In [139]:
user_features3 = dataset3.build_user_features((x,y) for x,y in zip(data3['user_id'],data3['top_3_genres_avg']))

In [140]:
(interactions3, weights3) = dataset3.build_interactions(data3.iloc[:, 0:3].values)

In [141]:
train_interactions3, test_interactions3 = cross_validation.random_train_test_split(interactions3, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [142]:
model3 = LightFM(loss='warp', no_components = 10,
               learning_rate=.25,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [143]:
model3.fit(interactions=train_interactions3,item_features=item_features3,user_features=user_features3,epochs=20);

In [144]:
train_roc_auc3 = auc_score(model3,train_interactions3,item_features=item_features3,user_features=user_features3).mean()
test_roc_auc3 = auc_score(model3,test_interactions3,item_features=item_features3,user_features=user_features3).mean()
train_precision_lfm3 = lightfm_prec_at_k(model3, train_interactions3,item_features=item_features3,user_features=user_features3, k=K).mean()
test_precision_lfm3 = lightfm_prec_at_k(model3, test_interactions3,item_features=item_features3,user_features=user_features3, k=K).mean()
train_recall_lfm3 = lightfm_recall_at_k(model3, train_interactions3,item_features=item_features3,user_features=user_features3, k=K).mean()
test_recall_lfm3 = lightfm_recall_at_k(model3, test_interactions3,item_features=item_features3,user_features=user_features3, k=K).mean()
print("User feature type 3: Top 3 genres among the genres that the user likes")
print("(defined as top 3 genres of animes that the user has rated equal to or greater than their average rating)")
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc3}')
print(f'Test AUC: {test_roc_auc3}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm3}')
print(f'Test Precision: {test_precision_lfm3}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm3}')
print(f'Test Recall: {test_recall_lfm3}')

User feature type 3: Top 3 genres among the genres that the user likes
(defined as top 3 genres of animes that the user has rated equal to or greater than their average rating)
=====AUC=====
Train AUC: 0.9543743133544922
Test AUC: 0.9491682052612305
=====Precision@K=====
Train Precision: 0.23796775937080383
Test Precision: 0.08411470800638199
=====Recall@K=====
Train Recall: 0.09747065969652548
Test Recall: 0.08289530351808082


In [145]:
uid_map3, ufeature_map3, iid_map3, ifeature_map3 = dataset3.mapping()

In [146]:
iid_map3_reverse = dict((value,key) for key,value in iid_map3.items())

### Predict for new user

In [147]:
# predict for new user
user_feature_list = ['Comedy','Romance','Slice of Life']

In [148]:
new_user_features = format_newuser_input(ufeature_map3, user_feature_list)

In [149]:
n_users3, n_items3 = interactions3.shape

In [150]:
model3_pred = model3.predict(0, np.arange(n_items3), user_features=new_user_features) # Here 0 means pick the first row of the user_features sparse matrix

In [151]:
pred_dict3 = {'media_id':data3['media_id'].unique(),'pred':model3_pred}

In [327]:
pred_df3 = pd.DataFrame(pred_dict3)

In [328]:
pred_df3 = pd.merge(pred_df3,anime_id_map,on='media_id')

In [154]:
pred_df3.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
221,20954,4559.840332,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
58,20923,4466.522461,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
52,20665,3693.887939,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
101,21087,3646.187256,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
11,19815,3543.696777,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
199,21519,3533.456543,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
3,11757,3231.106445,Sword Art Online,Sword Art Online,"[20021, 20594, 124140]","[Action, Adventure, Fantasy, Romance]"
18,20594,2963.987305,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"
104,21202,2851.133301,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
24,15809,2706.138916,Hataraku Maou-sama!,The Devil is a Part-Timer!,[130592],"[Comedy, Fantasy, Romance, Slice of Life]"


### Predict for existing user

In [155]:
# user id 710080 is my own account
user_x = uid_map3[710080]
model3_pred_existing = model3.predict(user_x,np.arange(n_items3))

In [156]:
pred_dict3_existing = {'media_id':data3['media_id'].unique(),'pred':model3_pred_existing}

In [157]:
pred_df3_existing = pd.DataFrame(pred_dict3_existing)

In [158]:
pred_df3_existing = pd.merge(pred_df3_existing,anime_id_map,on='media_id')

In [159]:
pred_df3_existing.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,0.966706,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,-0.648251,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
52,20665,-1.229392,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
66,12189,-1.286155,Hyouka,Hyouka,[13469],"[Mystery, Slice of Life]"
11,19815,-1.357756,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
199,21519,-1.369898,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
23,124080,-1.430883,Horimiya,Horimiya,[14753],"[Comedy, Romance, Slice of Life]"
104,21202,-1.48374,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
101,21087,-1.494548,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
454,21518,-1.50925,Shokugeki no Souma: Ni no Sara,Food Wars! The Second Plate,"[20923, 98702, 99255]","[Comedy, Ecchi]"


### Predict similar items

In [160]:
# Predicting for Yahari Ore no Seishun Love Come ga Machigatteiru.
# Comedy, Drama, Romance, Slice of Life
item_x = iid_map3[14813]

In [161]:
similar_pred3 = similar_items(item_id=item_x, item_features=item_features3, model=model3,N=100)
similar_pred3['media_id'] = similar_pred3['itemID'].map(iid_map3_reverse)

In [162]:
similar_pred3 = pd.merge(similar_pred3,anime_id_map,on='media_id')

In [163]:
similar_pred3.sort_values('score',ascending=False).head(10)

Unnamed: 0,itemID,score,media_id,title_romaji,title_english,relations,genres
0,642,1.0,20698,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU TOO!,"[14813, 21769, 108489]","[Comedy, Drama, Romance, Slice of Life]"
1,224,1.0,103572,5-Toubun no Hanayome,The Quintessential Quintuplets,[109261],"[Comedy, Drama, Romance, Slice of Life]"
2,54,1.0,4224,Toradora!,Toradora!,"[6127, 11553]","[Comedy, Drama, Romance, Slice of Life]"
3,63,1.0,14813,Yahari Ore no Seishun Love Come wa Machigatteiru.,My Teen Romantic Comedy SNAFU,"[18753, 20698]","[Comedy, Drama, Romance, Slice of Life]"
4,65,1.0,14741,Chuunibyou demo Koi ga Shitai!,"Love, Chunibyo & Other Delusions","[16934, 15687, 15879, 19021]","[Comedy, Drama, Romance, Slice of Life]"
5,727,1.0,97766,Gamers!,GAMERS!,,"[Comedy, Drama, Romance, Slice of Life]"
6,765,1.0,108489,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU Climax!,"[20698, 128643]","[Comedy, Drama, Romance, Slice of Life]"
7,451,1.0,853,Ouran Koukou Host Club,Ouran High School Host Club,,"[Comedy, Drama, Romance, Slice of Life]"
8,1134,1.0,109261,5-Toubun no Hanayome ∬,The Quintessential Quintuplets 2,"[103572, 131520]","[Comedy, Drama, Romance, Slice of Life]"
9,112,1.0,13759,Sakurasou no Pet na Kanojo,The Pet Girl of Sakurasou,,"[Comedy, Drama, Romance, Slice of Life]"


### Preliminary analysis for user feature type 3

Top 10 predictions for new and existing user have less similarities, but they are still largely similar. For new user, model might just be predicting animes that are highly rated and that is expected since the model does not have any data about the new user. For existing user, although the pred column shows negative value, LightFM's prediction values are not interpretable and should instead be used only to rank titles. 

The results of the similar item predictions also look extremely good. It seems the similar item predictions are extremely similar between the 3 types, but in different order.

In terms of model scores, type 1,2 & 3 have similar AUC scores, but type 3 has the best recall.

Between type 1, 2 & 3, I would select type 3 due to the higher recall, as well as increase in variety of recommendations between new and existing users

### User feature type 4: Top 5 genres among the genres that the user likes (defined as top 5 genres of animes that the user has rated equal to or greater than their average rating)

In [164]:
data4 = data[['user_id','media_id','rating','genres','top_5_genres_avg']]

In [165]:
all_anime_genres4 = sorted(list(set(itertools.chain.from_iterable(data4['genres']))))

In [166]:
all_liked_genres4 = sorted(list(set(itertools.chain.from_iterable(data4['top_5_genres_avg']))))

In [167]:
dataset4 = Dataset()

In [168]:
dataset4.fit(data4['user_id'],data4['media_id'],user_features=all_liked_genres4,item_features=all_anime_genres4)

In [169]:
item_features4 = dataset4.build_item_features((x,y) for x,y in zip(data4['media_id'],data4['genres']))

In [170]:
user_features4 = dataset4.build_user_features((x,y) for x,y in zip(data4['user_id'],data4['top_5_genres_avg']))

In [171]:
(interactions4, weights4) = dataset4.build_interactions(data4.iloc[:, 0:3].values)

In [172]:
train_interactions4, test_interactions4 = cross_validation.random_train_test_split(interactions4, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [173]:
model4 = LightFM(loss='warp', no_components = 10,
               learning_rate=.25,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [174]:
model4.fit(interactions=train_interactions4,item_features=item_features4,user_features=user_features4,epochs=20);

In [175]:
train_roc_auc4 = auc_score(model4,train_interactions4,item_features=item_features4,user_features=user_features4).mean()
test_roc_auc4 = auc_score(model4,test_interactions4,item_features=item_features4,user_features=user_features4).mean()
train_precision_lfm4 = lightfm_prec_at_k(model4, train_interactions4,item_features=item_features4,user_features=user_features4, k=K).mean()
test_precision_lfm4 = lightfm_prec_at_k(model4, test_interactions4,item_features=item_features4,user_features=user_features4, k=K).mean()
train_recall_lfm4 = lightfm_recall_at_k(model4, train_interactions4,item_features=item_features4,user_features=user_features4, k=K).mean()
test_recall_lfm4 = lightfm_recall_at_k(model4, test_interactions4,item_features=item_features4,user_features=user_features4, k=K).mean()
print("User feature type 4: Top 5 genres among the genres that the user likes")
print("(defined as top 5 genres of animes that the user has rated equal to or greater than their average rating)")
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc4}')
print(f'Test AUC: {test_roc_auc4}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm4}')
print(f'Test Precision: {test_precision_lfm4}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm4}')
print(f'Test Recall: {test_recall_lfm4}')

User feature type 4: Top 5 genres among the genres that the user likes
(defined as top 5 genres of animes that the user has rated equal to or greater than their average rating)
=====AUC=====
Train AUC: 0.9548159241676331
Test AUC: 0.9498434662818909
=====Precision@K=====
Train Precision: 0.281594842672348
Test Precision: 0.10039448738098145
=====Recall@K=====
Train Recall: 0.1172254678840192
Test Recall: 0.103140321035574


In [176]:
uid_map4, ufeature_map4, iid_map4, ifeature_map4 = dataset4.mapping()

In [177]:
iid_map4_reverse = dict((value,key) for key,value in iid_map4.items())

### Predict for new user

In [178]:
# predict for new user
user_feature_list = ['Comedy','Romance','Slice of Life']

In [179]:
new_user_features = format_newuser_input(ufeature_map4, user_feature_list)

In [180]:
n_users4, n_items4 = interactions4.shape

In [181]:
model4_pred = model4.predict(0, np.arange(n_items4), user_features=new_user_features) # Here 0 means pick the first row of the user_features sparse matrix

In [182]:
pred_dict4 = {'media_id':data4['media_id'].unique(),'pred':model4_pred}

In [325]:
pred_df4 = pd.DataFrame(pred_dict4)

In [376]:
pred_df4 = pd.merge(pred_df4,anime_id_map,on='media_id')

In [377]:
pred_df4.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji_x,title_english_x,relations_x,genres_x,title_romaji_y,title_english_y,relations_y,genres_y
58,20923,4196.525879,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]",Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,4115.89502,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]",Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
52,20665,3187.632324,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]",Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
101,21087,3118.401855,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]",One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
199,21519,3014.661621,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]",Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
11,19815,2977.427734,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]",No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
3,11757,2720.343262,Sword Art Online,Sword Art Online,"[20021, 20594, 124140]","[Action, Adventure, Fantasy, Romance]",Sword Art Online,Sword Art Online,"[20021, 20594, 124140]","[Action, Adventure, Fantasy, Romance]"
18,20594,2480.673828,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]",Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"
104,21202,2314.435547,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]",Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
24,15809,2121.20459,Hataraku Maou-sama!,The Devil is a Part-Timer!,[130592],"[Comedy, Fantasy, Romance, Slice of Life]",Hataraku Maou-sama!,The Devil is a Part-Timer!,[130592],"[Comedy, Fantasy, Romance, Slice of Life]"


### Predict for existing user

In [186]:
# user id 710080 is my own account
user_x = uid_map4[710080]
model4_pred_existing = model4.predict(user_x,np.arange(n_items4))

In [187]:
pred_dict4_existing = {'media_id':data4['media_id'].unique(),'pred':model4_pred_existing}

In [188]:
pred_df4_existing = pd.DataFrame(pred_dict4_existing)

In [189]:
pred_df4_existing = pd.merge(pred_df4_existing,anime_id_map,on='media_id')

In [190]:
pred_df4_existing.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,6.543212,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,3.257215,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
66,12189,3.23957,Hyouka,Hyouka,[13469],"[Mystery, Slice of Life]"
454,21518,3.169772,Shokugeki no Souma: Ni no Sara,Food Wars! The Second Plate,"[20923, 98702, 99255]","[Comedy, Ecchi]"
23,124080,2.582151,Horimiya,Horimiya,[14753],"[Comedy, Romance, Slice of Life]"
52,20665,2.347939,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
199,21519,2.196276,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
104,21202,2.044642,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
11,19815,2.00492,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
69,21776,1.791389,Kobayashi-san Chi no Maidragon,Miss Kobayashi's Dragon Maid,"[98502, 98580]","[Comedy, Fantasy, Slice of Life]"


### Predict similar items

In [191]:
# Predicting for Yahari Ore no Seishun Love Come ga Machigatteiru.
# Comedy, Drama, Romance, Slice of Life
item_x = iid_map4[14813]

In [192]:
similar_pred4 = similar_items(item_id=item_x, item_features=item_features4, model=model4,N=100)
similar_pred4['media_id'] = similar_pred4['itemID'].map(iid_map4_reverse)

In [193]:
similar_pred4 = pd.merge(similar_pred4,anime_id_map,on='media_id')

In [194]:
similar_pred4.sort_values('score',ascending=False).head(10)

Unnamed: 0,itemID,score,media_id,title_romaji,title_english,relations,genres
0,224,1.0,103572,5-Toubun no Hanayome,The Quintessential Quintuplets,[109261],"[Comedy, Drama, Romance, Slice of Life]"
1,515,1.0,21049,ReLIFE,ReLIFE,[98635],"[Comedy, Drama, Romance, Slice of Life]"
2,65,1.0,14741,Chuunibyou demo Koi ga Shitai!,"Love, Chunibyo & Other Delusions","[16934, 15687, 15879, 19021]","[Comedy, Drama, Romance, Slice of Life]"
3,63,1.0,14813,Yahari Ore no Seishun Love Come wa Machigatteiru.,My Teen Romantic Comedy SNAFU,"[18753, 20698]","[Comedy, Drama, Romance, Slice of Life]"
4,451,1.0,853,Ouran Koukou Host Club,Ouran High School Host Club,,"[Comedy, Drama, Romance, Slice of Life]"
5,455,1.0,20596,Ao Haru Ride,Blue Spring Ride,"[20900, 20837]","[Comedy, Drama, Romance, Slice of Life]"
6,483,1.0,98291,Tsurezure Children,Tsuredure Children,,"[Comedy, Drama, Romance, Slice of Life]"
7,54,1.0,4224,Toradora!,Toradora!,"[6127, 11553]","[Comedy, Drama, Romance, Slice of Life]"
8,721,1.0,18671,Chuunibyou demo Koi ga Shitai! Ren,"Love, Chunibyo & Other Delusions - Heart Throb -","[16934, 20582, 20777, 20889]","[Comedy, Drama, Romance, Slice of Life]"
9,765,1.0,108489,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU Climax!,"[20698, 128643]","[Comedy, Drama, Romance, Slice of Life]"


### Preliminary analysis for user feature type 4

Top 10 predictions for new and existing user have much less similarities compared to types 1-3, but they are still largely similar.

The results of the similar item predictions also look extremely good. It seems the similar item predictions are extremely similar between the 4 types, but in different order.

In terms of model scores, types 1-4 have similar AUC scores, but type 4 has the best recall.

Between types 1-4, I would select type 4 due to the higher recall, as well as increase in variety of recommendations between new and existing users

### User feature type 5: Top 3 genres among the genres that the user likes (defined as top 3 genres of animes that the user has rated equal to or greater than their 75th percentile rating)

In [195]:
data5 = data[['user_id','media_id','rating','genres','top_3_genres_75pct']]

In [196]:
all_anime_genres5 = sorted(list(set(itertools.chain.from_iterable(data5['genres']))))

In [197]:
all_liked_genres5 = sorted(list(set(itertools.chain.from_iterable(data5['top_3_genres_75pct']))))

In [198]:
dataset5 = Dataset()

In [199]:
dataset5.fit(data5['user_id'],data5['media_id'],user_features=all_liked_genres5,item_features=all_anime_genres5)

In [200]:
item_features5 = dataset5.build_item_features((x,y) for x,y in zip(data5['media_id'],data5['genres']))

In [201]:
user_features5 = dataset5.build_user_features((x,y) for x,y in zip(data5['user_id'],data5['top_3_genres_75pct']))

In [202]:
(interactions5, weights5) = dataset5.build_interactions(data5.iloc[:, 0:3].values)

In [203]:
train_interactions5, test_interactions5 = cross_validation.random_train_test_split(interactions5, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [204]:
model5 = LightFM(loss='warp', no_components = 10,
               learning_rate=.25,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [205]:
model5.fit(interactions=train_interactions5,item_features=item_features5,user_features=user_features5,epochs=20);

In [206]:
train_roc_auc5 = auc_score(model5,train_interactions5,item_features=item_features5,user_features=user_features5).mean()
test_roc_auc5 = auc_score(model5,test_interactions5,item_features=item_features5,user_features=user_features5).mean()
train_precision_lfm5 = lightfm_prec_at_k(model5, train_interactions5,item_features=item_features5,user_features=user_features5, k=K).mean()
test_precision_lfm5 = lightfm_prec_at_k(model5, test_interactions5,item_features=item_features5,user_features=user_features5, k=K).mean()
train_recall_lfm5 = lightfm_recall_at_k(model5, train_interactions5,item_features=item_features5,user_features=user_features5, k=K).mean()
test_recall_lfm5 = lightfm_recall_at_k(model5, test_interactions5,item_features=item_features5,user_features=user_features5, k=K).mean()
print("User feature type 5: Top 3 genres among the genres that the user likes")
print("(defined as top 3 genres of animes that the user has rated equal to or greater than their 75th percentile rating)")
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc5}')
print(f'Test AUC: {test_roc_auc5}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm5}')
print(f'Test Precision: {test_precision_lfm5}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm5}')
print(f'Test Recall: {test_recall_lfm5}')

User feature type 5: Top 3 genres among the genres that the user likes
(defined as top 3 genres of animes that the user has rated equal to or greater than their 75th percentile rating)
=====AUC=====
Train AUC: 0.9534348249435425
Test AUC: 0.9484164118766785
=====Precision@K=====
Train Precision: 0.25572675466537476
Test Precision: 0.0902518630027771
=====Recall@K=====
Train Recall: 0.09997262009890688
Test Recall: 0.08649129588169902


In [207]:
uid_map5, ufeature_map5, iid_map5, ifeature_map5 = dataset5.mapping()

In [208]:
iid_map5_reverse = dict((value,key) for key,value in iid_map5.items())

### Predict for new user

In [209]:
# predict for new user
user_feature_list = ['Comedy','Romance','Slice of Life']

In [210]:
new_user_features = format_newuser_input(ufeature_map5, user_feature_list)

In [211]:
n_users5, n_items5 = interactions5.shape

In [212]:
model5_pred = model5.predict(0, np.arange(n_items5), user_features=new_user_features) # Here 0 means pick the first row of the user_features sparse matrix

In [213]:
pred_dict5 = {'media_id':data5['media_id'].unique(),'pred':model5_pred}

In [321]:
pred_df5 = pd.DataFrame(pred_dict5)

In [322]:
pred_df5 = pd.merge(pred_df5,anime_id_map,on='media_id')

In [216]:
pred_df5.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
221,20954,4851.352051,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
58,20923,4811.720215,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
52,20665,3903.775391,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
101,21087,3849.635498,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
11,19815,3698.025146,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
199,21519,3686.38208,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
3,11757,3354.629883,Sword Art Online,Sword Art Online,"[20021, 20594, 124140]","[Action, Adventure, Fantasy, Romance]"
18,20594,3217.090576,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"
104,21202,3064.390625,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
228,11061,2971.944092,HUNTER×HUNTER (2011),Hunter x Hunter (2011),"[13271, 136, 137, 138, 139, 19951]","[Action, Adventure, Fantasy]"


### Predict for existing user

In [217]:
# user id 710080 is my own account
user_x = uid_map5[710080]
model5_pred_existing = model5.predict(user_x,np.arange(n_items5))

In [218]:
pred_dict5_existing = {'media_id':data5['media_id'].unique(),'pred':model5_pred_existing}

In [219]:
pred_df5_existing = pd.DataFrame(pred_dict5_existing)

In [220]:
pred_df5_existing = pd.merge(pred_df5_existing,anime_id_map,on='media_id')

In [221]:
pred_df5_existing.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,1.423041,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,-0.046106,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
52,20665,-0.937143,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
199,21519,-1.0586,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
11,19815,-1.173374,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
23,124080,-1.214637,Horimiya,Horimiya,[14753],"[Comedy, Romance, Slice of Life]"
101,21087,-1.229956,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
66,12189,-1.272346,Hyouka,Hyouka,[13469],"[Mystery, Slice of Life]"
104,21202,-1.275169,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
454,21518,-1.316252,Shokugeki no Souma: Ni no Sara,Food Wars! The Second Plate,"[20923, 98702, 99255]","[Comedy, Ecchi]"


### Predict similar items

In [222]:
# Predicting for Yahari Ore no Seishun Love Come ga Machigatteiru.
# Comedy, Drama, Romance, Slice of Life
item_x = iid_map5[14813]

In [223]:
similar_pred5 = similar_items(item_id=item_x, item_features=item_features5, model=model5,N=100)
similar_pred5['media_id'] = similar_pred5['itemID'].map(iid_map5_reverse)

In [224]:
similar_pred5 = pd.merge(similar_pred5,anime_id_map,on='media_id')

In [225]:
similar_pred5.sort_values('score',ascending=False).head(10)

Unnamed: 0,itemID,score,media_id,title_romaji,title_english,relations,genres
0,451,1.0,853,Ouran Koukou Host Club,Ouran High School Host Club,,"[Comedy, Drama, Romance, Slice of Life]"
2,54,1.0,4224,Toradora!,Toradora!,"[6127, 11553]","[Comedy, Drama, Romance, Slice of Life]"
3,727,1.0,97766,Gamers!,GAMERS!,,"[Comedy, Drama, Romance, Slice of Life]"
4,1134,1.0,109261,5-Toubun no Hanayome ∬,The Quintessential Quintuplets 2,"[103572, 131520]","[Comedy, Drama, Romance, Slice of Life]"
5,515,1.0,21049,ReLIFE,ReLIFE,[98635],"[Comedy, Drama, Romance, Slice of Life]"
6,112,1.0,13759,Sakurasou no Pet na Kanojo,The Pet Girl of Sakurasou,,"[Comedy, Drama, Romance, Slice of Life]"
7,455,1.0,20596,Ao Haru Ride,Blue Spring Ride,"[20900, 20837]","[Comedy, Drama, Romance, Slice of Life]"
1,63,1.0,14813,Yahari Ore no Seishun Love Come wa Machigatteiru.,My Teen Romantic Comedy SNAFU,"[18753, 20698]","[Comedy, Drama, Romance, Slice of Life]"
10,765,1.0,108489,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU Climax!,"[20698, 128643]","[Comedy, Drama, Romance, Slice of Life]"
11,483,1.0,98291,Tsurezure Children,Tsuredure Children,,"[Comedy, Drama, Romance, Slice of Life]"


### Preliminary analysis for user feature type 5

Comparing to type 4, top 10 predictions for new and existing user are back to having more similarities. 

The results of the similar item predictions also look extremely good. It seems the similar item predictions are extremely similar between the 5 types, but in different order.

In terms of model scores, types 1-5 have similar AUC scores. Type 4 still has the best recall but type 5 has better recall than types 1-3.

Between types 1-5, I would select type 4 due to the higher recall, as well as increase in variety of recommendations between new and existing users

### User feature type 6: Top 5 genres among the genres that the user likes (defined as top 5 genres of animes that the user has rated equal to or greater than their 75th percentile rating)

In [226]:
data6 = data[['user_id','media_id','rating','genres','top_5_genres_75pct']]

In [227]:
all_anime_genres6 = sorted(list(set(itertools.chain.from_iterable(data6['genres']))))

In [228]:
all_liked_genres6 = sorted(list(set(itertools.chain.from_iterable(data6['top_5_genres_75pct']))))

In [229]:
dataset6 = Dataset()

In [230]:
dataset6.fit(data6['user_id'],data6['media_id'],user_features=all_liked_genres6,item_features=all_anime_genres6)

In [231]:
item_features6 = dataset6.build_item_features((x,y) for x,y in zip(data6['media_id'],data6['genres']))

In [232]:
user_features6 = dataset6.build_user_features((x,y) for x,y in zip(data6['user_id'],data6['top_5_genres_75pct']))

In [233]:
(interactions6, weights6) = dataset6.build_interactions(data6.iloc[:, 0:3].values)

In [234]:
train_interactions6, test_interactions6 = cross_validation.random_train_test_split(interactions6, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [235]:
model6 = LightFM(loss='warp', no_components = 10,
               learning_rate=.25,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [236]:
model6.fit(interactions=train_interactions6,item_features=item_features6,user_features=user_features6,epochs=20);

In [237]:
train_roc_auc6 = auc_score(model6,train_interactions6,item_features=item_features6,user_features=user_features6).mean()
test_roc_auc6 = auc_score(model6,test_interactions6,item_features=item_features6,user_features=user_features6).mean()
train_precision_lfm6 = lightfm_prec_at_k(model6, train_interactions6,item_features=item_features6,user_features=user_features6, k=K).mean()
test_precision_lfm6 = lightfm_prec_at_k(model6, test_interactions6,item_features=item_features6,user_features=user_features6, k=K).mean()
train_recall_lfm6 = lightfm_recall_at_k(model6, train_interactions6,item_features=item_features6,user_features=user_features6, k=K).mean()
test_recall_lfm6 = lightfm_recall_at_k(model6, test_interactions6,item_features=item_features6,user_features=user_features6, k=K).mean()
print("User feature type 6: Top 5 genres among the genres that the user likes")
print("(defined as top 5 genres of animes that the user has rated equal to or greater than their average rating)")
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc6}')
print(f'Test AUC: {test_roc_auc6}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm6}')
print(f'Test Precision: {test_precision_lfm6}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm6}')
print(f'Test Recall: {test_recall_lfm6}')

User feature type 6: Top 5 genres among the genres that the user likes
(defined as top 5 genres of animes that the user has rated equal to or greater than their average rating)
=====AUC=====
Train AUC: 0.9534088373184204
Test AUC: 0.948606550693512
=====Precision@K=====
Train Precision: 0.24548037350177765
Test Precision: 0.08721741288900375
=====Recall@K=====
Train Recall: 0.09267026219635398
Test Recall: 0.08100045402382826


In [238]:
uid_map6, ufeature_map6, iid_map6, ifeature_map6 = dataset6.mapping()

In [239]:
iid_map6_reverse = dict((value,key) for key,value in iid_map6.items())

### Predict for new user

In [240]:
# predict for new user
user_feature_list = ['Comedy','Romance','Slice of Life']

In [241]:
new_user_features = format_newuser_input(ufeature_map6, user_feature_list)

In [242]:
n_users6, n_items6 = interactions6.shape

In [243]:
model6_pred = model6.predict(0, np.arange(n_items6), user_features=new_user_features) # Here 0 means pick the first row of the user_features sparse matrix

In [244]:
pred_dict6 = {'media_id':data6['media_id'].unique(),'pred':model6_pred}

In [323]:
pred_df6 = pd.DataFrame(pred_dict6)

In [324]:
pred_df6 = pd.merge(pred_df6,anime_id_map,on='media_id')

In [247]:
pred_df6.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,4495.072754,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,4393.568848,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
52,20665,3413.434082,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
101,21087,3335.56543,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
199,21519,3254.538574,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
11,19815,3218.475098,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
3,11757,2808.014404,Sword Art Online,Sword Art Online,"[20021, 20594, 124140]","[Action, Adventure, Fantasy, Romance]"
18,20594,2677.648682,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"
104,21202,2596.04834,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
23,124080,2398.595703,Horimiya,Horimiya,[14753],"[Comedy, Romance, Slice of Life]"


### Predict for existing user

In [248]:
# user id 710080 is my own account
user_x = uid_map6[710080]
model6_pred_existing = model6.predict(user_x,np.arange(n_items6))

In [249]:
pred_dict6_existing = {'media_id':data6['media_id'].unique(),'pred':model6_pred_existing}

In [250]:
pred_df6_existing = pd.DataFrame(pred_dict6_existing)

In [251]:
pred_df6_existing = pd.merge(pred_df6_existing,anime_id_map,on='media_id')

In [252]:
pred_df6_existing.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,relations,genres
58,20923,7.995938,Shokugeki no Souma,Food Wars! Shokugeki no Soma,"[21518, 21691]","[Comedy, Ecchi]"
221,20954,5.742288,Koe no Katachi,A Silent Voice,,"[Drama, Romance, Slice of Life]"
52,20665,4.353853,Shigatsu wa Kimi no Uso,Your Lie in April,[21039],"[Drama, Music, Romance, Slice of Life]"
199,21519,4.160835,Kimi no Na wa.,Your Name.,,"[Drama, Romance, Supernatural]"
11,19815,4.019316,No Game No Life,"No Game, No Life","[20769, 21875]","[Adventure, Comedy, Ecchi, Fantasy]"
101,21087,3.927083,One Punch Man,One-Punch Man,"[21416, 97668, 21386]","[Action, Comedy, Sci-Fi, Supernatural]"
23,124080,3.866499,Horimiya,Horimiya,[14753],"[Comedy, Romance, Slice of Life]"
454,21518,3.750037,Shokugeki no Souma: Ni no Sara,Food Wars! The Second Plate,"[20923, 98702, 99255]","[Comedy, Ecchi]"
104,21202,3.599964,Kono Subarashii Sekai ni Shukufuku wo!,KONOSUBA -God's blessing on this wonderful world!,"[21574, 21699]","[Adventure, Comedy, Fantasy]"
18,20594,3.42268,Sword Art Online II,Sword Art Online II,"[20021, 21403, 100183, 11757]","[Action, Adventure, Fantasy, Sci-Fi]"


### Predict similar items

In [253]:
# Predicting for Yahari Ore no Seishun Love Come ga Machigatteiru.
# Comedy, Drama, Romance, Slice of Life
item_x = iid_map6[14813]

In [254]:
similar_pred6 = similar_items(item_id=item_x, item_features=item_features6, model=model6,N=100)
similar_pred6['media_id'] = similar_pred6['itemID'].map(iid_map6_reverse)

In [255]:
similar_pred6 = pd.merge(similar_pred6,anime_id_map,on='media_id')

In [256]:
similar_pred6.sort_values('score',ascending=False).head(10)

Unnamed: 0,itemID,score,media_id,title_romaji,title_english,relations,genres
0,515,1.0,21049,ReLIFE,ReLIFE,[98635],"[Comedy, Drama, Romance, Slice of Life]"
1,451,1.0,853,Ouran Koukou Host Club,Ouran High School Host Club,,"[Comedy, Drama, Romance, Slice of Life]"
2,721,1.0,18671,Chuunibyou demo Koi ga Shitai! Ren,"Love, Chunibyo & Other Delusions - Heart Throb -","[16934, 20582, 20777, 20889]","[Comedy, Drama, Romance, Slice of Life]"
3,63,1.0,14813,Yahari Ore no Seishun Love Come wa Machigatteiru.,My Teen Romantic Comedy SNAFU,"[18753, 20698]","[Comedy, Drama, Romance, Slice of Life]"
4,642,1.0,20698,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU TOO!,"[14813, 21769, 108489]","[Comedy, Drama, Romance, Slice of Life]"
5,455,1.0,20596,Ao Haru Ride,Blue Spring Ride,"[20900, 20837]","[Comedy, Drama, Romance, Slice of Life]"
6,54,1.0,4224,Toradora!,Toradora!,"[6127, 11553]","[Comedy, Drama, Romance, Slice of Life]"
7,65,1.0,14741,Chuunibyou demo Koi ga Shitai!,"Love, Chunibyo & Other Delusions","[16934, 15687, 15879, 19021]","[Comedy, Drama, Romance, Slice of Life]"
8,224,1.0,103572,5-Toubun no Hanayome,The Quintessential Quintuplets,[109261],"[Comedy, Drama, Romance, Slice of Life]"
9,765,1.0,108489,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU Climax!,"[20698, 128643]","[Comedy, Drama, Romance, Slice of Life]"


### Preliminary analysis for user feature type 6

Top 10 predictions for new and existing user are largely similar.

The results of the similar item predictions also look extremely good. It seems the similar item predictions are extremely similar between the 6 types, but in different order.

In terms of model scores, types 1-6 have similar AUC scores, but type 4 has the best recall.

Between types 1-6, I would select type 4 due to the higher recall, as well as increase in variety of recommendations between new and existing users

## User type evaluation

### Model scores

|        | Train AUC | Test AUC | Train Precision | Test Precision | Train Recall | Test Recall |
|--------|-----------|----------|-----------------|----------------|--------------|-------------|
| Type 1 | 0.953     | 0.949    | 0.219           | 0.078          | 0.071        | 0.066       |
| Type 2 | 0.953     | 0.948    | 0.223           | 0.080          | 0.077        | 0.073       |
| Type 3 | 0.954     | 0.949    | 0.238           | 0.084          | 0.097        | 0.083       |
| Type 4 | 0.955     | 0.950    | 0.282           | 0.100          | 0.117        | 0.103       |
| Type 5 | 0.953     | 0.948    | 0.256           | 0.090          | 0.100        | 0.086       |
| Type 6 | 0.953     | 0.949    | 0.245           | 0.087          | 0.093        | 0.081       |

Types 1-6 have similar AUC scores, but Type 4 has the best precision and recall

### New user predictions

In [334]:
pred_df1.columns = pred_df1.columns + "_1"
pred_df2.columns = pred_df2.columns + "_2"
pred_df3.columns = pred_df3.columns + "_3"
pred_df4.columns = pred_df4.columns + "_4"
pred_df5.columns = pred_df5.columns + "_5"
pred_df6.columns = pred_df6.columns + "_6"

In [355]:
new_user_preds = {'Type 1 preds':pred_df1.sort_values('pred_1',ascending = False).head(10)['title_romaji_1'].values,
                  'Type 2 preds':pred_df2.sort_values('pred_2',ascending = False).head(10)['title_romaji_2'].values,
                  'Type 3 preds':pred_df3.sort_values('pred_3',ascending = False).head(10)['title_romaji_3'].values,
                  'Type 4 preds':pred_df4.sort_values('pred_4',ascending = False).head(10)['title_romaji_4'].values,
                  'Type 5 preds':pred_df5.sort_values('pred_5',ascending = False).head(10)['title_romaji_5'].values,
                  'Type 6 preds':pred_df6.sort_values('pred_6',ascending = False).head(10)['title_romaji_6'].values}

In [356]:
new_user_preds = pd.DataFrame(new_user_preds)
new_user_preds

Unnamed: 0,Type 1 preds,Type 2 preds,Type 3 preds,Type 4 preds,Type 5 preds,Type 6 preds
0,Shokugeki no Souma,Shokugeki no Souma,Koe no Katachi,Shokugeki no Souma,Koe no Katachi,Shokugeki no Souma
1,Koe no Katachi,Koe no Katachi,Shokugeki no Souma,Koe no Katachi,Shokugeki no Souma,Koe no Katachi
2,One Punch Man,Kimi no Na wa.,Shigatsu wa Kimi no Uso,Shigatsu wa Kimi no Uso,Shigatsu wa Kimi no Uso,Shigatsu wa Kimi no Uso
3,Kimi no Na wa.,One Punch Man,One Punch Man,One Punch Man,One Punch Man,One Punch Man
4,Shigatsu wa Kimi no Uso,Shigatsu wa Kimi no Uso,No Game No Life,Kimi no Na wa.,No Game No Life,Kimi no Na wa.
5,No Game No Life,No Game No Life,Kimi no Na wa.,No Game No Life,Kimi no Na wa.,No Game No Life
6,Sword Art Online,Sword Art Online,Sword Art Online,Sword Art Online,Sword Art Online,Sword Art Online
7,Sword Art Online II,Sword Art Online II,Sword Art Online II,Sword Art Online II,Sword Art Online II,Sword Art Online II
8,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!
9,HUNTER×HUNTER (2011),HUNTER×HUNTER (2011),Hataraku Maou-sama!,Hataraku Maou-sama!,HUNTER×HUNTER (2011),Horimiya


For new user predictions, the top 9 predictions are all the same, but the top 5 varies in order. The 10th prediction varies slightly between the 6 types, recommending something that is not in the predictions of the other types.

### Existing user predictions

In [357]:
pred_df1_existing.columns = pred_df1_existing.columns + "_1"
pred_df2_existing.columns = pred_df2_existing.columns + "_2"
pred_df3_existing.columns = pred_df3_existing.columns + "_3"
pred_df4_existing.columns = pred_df4_existing.columns + "_4"
pred_df5_existing.columns = pred_df5_existing.columns + "_5"
pred_df6_existing.columns = pred_df6_existing.columns + "_6"

In [358]:
existing_user_preds = {'Type 1 preds':pred_df1_existing.sort_values('pred_1',ascending = False).head(10)['title_romaji_1'].values,
                  'Type 2 preds':pred_df2_existing.sort_values('pred_2',ascending = False).head(10)['title_romaji_2'].values,
                  'Type 3 preds':pred_df3_existing.sort_values('pred_3',ascending = False).head(10)['title_romaji_3'].values,
                  'Type 4 preds':pred_df4_existing.sort_values('pred_4',ascending = False).head(10)['title_romaji_4'].values,
                  'Type 5 preds':pred_df5_existing.sort_values('pred_5',ascending = False).head(10)['title_romaji_5'].values,
                  'Type 6 preds':pred_df6_existing.sort_values('pred_6',ascending = False).head(10)['title_romaji_6'].values}

In [359]:
existing_user_preds = pd.DataFrame(existing_user_preds)
existing_user_preds

Unnamed: 0,Type 1 preds,Type 2 preds,Type 3 preds,Type 4 preds,Type 5 preds,Type 6 preds
0,Shokugeki no Souma,Shokugeki no Souma,Shokugeki no Souma,Shokugeki no Souma,Shokugeki no Souma,Shokugeki no Souma
1,Koe no Katachi,Koe no Katachi,Koe no Katachi,Koe no Katachi,Koe no Katachi,Koe no Katachi
2,Kimi no Na wa.,Kimi no Na wa.,Shigatsu wa Kimi no Uso,Hyouka,Shigatsu wa Kimi no Uso,Shigatsu wa Kimi no Uso
3,Shokugeki no Souma: Ni no Sara,Shigatsu wa Kimi no Uso,Hyouka,Shokugeki no Souma: Ni no Sara,Kimi no Na wa.,Kimi no Na wa.
4,Shigatsu wa Kimi no Uso,No Game No Life,No Game No Life,Horimiya,No Game No Life,No Game No Life
5,One Punch Man,One Punch Man,Kimi no Na wa.,Shigatsu wa Kimi no Uso,Horimiya,One Punch Man
6,No Game No Life,Shokugeki no Souma: Ni no Sara,Horimiya,Kimi no Na wa.,One Punch Man,Horimiya
7,Horimiya,Horimiya,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!,Hyouka,Shokugeki no Souma: Ni no Sara
8,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!,One Punch Man,No Game No Life,Kono Subarashii Sekai ni Shukufuku wo!,Kono Subarashii Sekai ni Shukufuku wo!
9,Sword Art Online II,Sword Art Online II,Shokugeki no Souma: Ni no Sara,Kobayashi-san Chi no Maidragon,Shokugeki no Souma: Ni no Sara,Sword Art Online II


For existing user predictions, there is much more variety than the new user predictions, however, they are still largely similar. Type 4 gives the most variety, as it recommends Hyouka which only types 3-5 recommended, and Kobayashi-san Chi no Maidragon which none of the other types recommended. Interestingly, type 4 does not recommend One Punch Man while the other 5 types do.

As the user being used to churn the predictions is me, I would say all 6 types are decent predictions for me, as they contain animes that I have already watched and enjoyed, or animes that I know ae good and plan to watch. However, I would say that Type 4 is the best performing one, as it recommended Hyouka and Kobayashi-san Chi no Maidragon.

### Similar item predictions

In [360]:
similar_pred1.columns = similar_pred1.columns + "_1"
similar_pred2.columns = similar_pred2.columns + "_2"
similar_pred3.columns = similar_pred3.columns + "_3"
similar_pred4.columns = similar_pred4.columns + "_4"
similar_pred5.columns = similar_pred5.columns + "_5"
similar_pred6.columns = similar_pred6.columns + "_6"

In [362]:
similar_item_preds = {'Type 1 preds':similar_pred1.sort_values('score_1',ascending = False).head(10)['title_romaji_1'].values,
                  'Type 2 preds':similar_pred2.sort_values('score_2',ascending = False).head(10)['title_romaji_2'].values,
                  'Type 3 preds':similar_pred3.sort_values('score_3',ascending = False).head(10)['title_romaji_3'].values,
                  'Type 4 preds':similar_pred4.sort_values('score_4',ascending = False).head(10)['title_romaji_4'].values,
                  'Type 5 preds':similar_pred5.sort_values('score_5',ascending = False).head(10)['title_romaji_5'].values,
                  'Type 6 preds':similar_pred6.sort_values('score_6',ascending = False).head(10)['title_romaji_6'].values}

In [363]:
similar_item_preds = pd.DataFrame(similar_item_preds)
similar_item_preds

Unnamed: 0,Type 1 preds,Type 2 preds,Type 3 preds,Type 4 preds,Type 5 preds,Type 6 preds
0,Toradora!,Toradora!,Yahari Ore no Seishun Love Come wa Machigattei...,5-Toubun no Hanayome,Ouran Koukou Host Club,ReLIFE
1,Kimi ni Todoke,Tsurezure Children,5-Toubun no Hanayome,ReLIFE,Toradora!,Ouran Koukou Host Club
2,Chuunibyou demo Koi ga Shitai! Ren,Chuunibyou demo Koi ga Shitai! Ren,Toradora!,Chuunibyou demo Koi ga Shitai!,Gamers!,Chuunibyou demo Koi ga Shitai! Ren
3,5-Toubun no Hanayome ∬,Ao Haru Ride,Yahari Ore no Seishun Love Come wa Machigatteiru.,Yahari Ore no Seishun Love Come wa Machigatteiru.,5-Toubun no Hanayome ∬,Yahari Ore no Seishun Love Come wa Machigatteiru.
4,Yahari Ore no Seishun Love Come wa Machigatteiru.,ReLIFE,Chuunibyou demo Koi ga Shitai!,Ouran Koukou Host Club,ReLIFE,Yahari Ore no Seishun Love Come wa Machigattei...
5,Yahari Ore no Seishun Love Come wa Machigattei...,5-Toubun no Hanayome ∬,Gamers!,Ao Haru Ride,Sakurasou no Pet na Kanojo,Ao Haru Ride
6,Chuunibyou demo Koi ga Shitai!,Gamers!,Yahari Ore no Seishun Love Come wa Machigattei...,Tsurezure Children,Ao Haru Ride,Toradora!
7,Ouran Koukou Host Club,Yahari Ore no Seishun Love Come wa Machigattei...,Ouran Koukou Host Club,Toradora!,Yahari Ore no Seishun Love Come wa Machigatteiru.,Chuunibyou demo Koi ga Shitai!
8,Gamers!,Kimi ni Todoke,5-Toubun no Hanayome ∬,Chuunibyou demo Koi ga Shitai! Ren,Yahari Ore no Seishun Love Come wa Machigattei...,5-Toubun no Hanayome
9,5-Toubun no Hanayome,Sakurasou no Pet na Kanojo,Sakurasou no Pet na Kanojo,Yahari Ore no Seishun Love Come wa Machigattei...,Tsurezure Children,Yahari Ore no Seishun Love Come wa Machigattei...


For similar item predictions, we got the model to recommend animes similar to [Yahari Ore no Seishun Love Come ga Machigatteiru.](https://anilist.co/anime/14813/Yahari-Ore-no-Seishun-Love-Come-wa-Machigatteiru). Overall, all 6 user types makes the model achieve excellent predictions, as all are romantic comedies. However, the model includes the input item in the output so we will need to be mindful of that when making the webapp.

Some predictions are present in all 6 types. However, the order in each type is very different. Type 2 is the only model that did not include the input in the top 10 recommendations, and in terms of similarity of genres, all the predictions are good. Although, none of them (other than the sequel of the input anime) are really similar but that is not unexpected since the model is not given any additional information about the animes other than the genres and user ratings. From these results alone, all 6 types are viable.

## Model tuning with selected user feature type

Given that all 6 types give similar AUC scores but type 4 has the highest recall, and the fact that type 4 gave the most satisfying predictions for existing users, we will proceed with user feature type 4: <br>
Top 5 genres among the genres that the user likes (defined as top 5 genres of animes that the user has rated equal to or greater than their average rating)

We will now tune the hyperparameters of the model to improve the score. We will tune the following hyperparameters: learning_rate, item_alpha, no_components, learning_schedule and no_epochs. As increasing the no_epochs will substantially increase the run time, we will tune the first 4 hyperparameters first, then tune no_epochs using the best 4 hyperparameters.

In [11]:
data_final = data[['user_id','media_id','rating','genres','top_5_genres_avg']]

In [12]:
all_anime_genres = sorted(list(set(itertools.chain.from_iterable(data_final['genres']))))

In [13]:
all_liked_genres = sorted(list(set(itertools.chain.from_iterable(data_final['top_5_genres_avg']))))

In [14]:
dataset = Dataset()

In [15]:
dataset.fit(data_final['user_id'],data_final['media_id'],user_features=all_liked_genres,item_features=all_anime_genres)

In [16]:
item_features = dataset.build_item_features((x,y) for x,y in zip(data_final['media_id'],data_final['genres']))

In [17]:
user_features = dataset.build_user_features((x,y) for x,y in zip(data_final['user_id'],data_final['top_5_genres_avg']))

In [18]:
(interactions, weights) = dataset.build_interactions(data_final.iloc[:, 0:3].values)

In [31]:
train_interactions, test_interactions = cross_validation.random_train_test_split(interactions, 
                                                                                 test_percentage=TEST_PERCENTAGE, 
                                                                                 random_state = np.random.RandomState(SEEDNO))

In [284]:
learning_rates = [.15,.25,.35]
item_alphas = [1e-6,1e-8,1e-10]
no_components = [10,20,50]
learning_schedules = ['adagrad','adadelta']

In [285]:
results = []
results_dict = {}
t0 = time.time()

for learning_rate in learning_rates:
    for item_alpha in item_alphas:
        for no_component in no_components:
            for learning_schedule in learning_schedules:
                t1 = time.time()
                results_dict = {'learning_rate':learning_rate,
                                'item_alpha':item_alpha,
                                'no_component':no_component,
                                'learning_schedule':learning_schedule,
                                'train_auc':.0,
                                'test_auc':.0,
                                'train_precision':.0,
                                'test_precision':.0,
                                'train_recall':.0,
                                'test_recall':.0,
                                'runtime':0}
                model = LightFM(loss='warp',
                               no_components = no_component,
                               learning_rate = learning_rate,
                               item_alpha = item_alpha,
                               learning_schedule=learning_schedule,
                               random_state = np.random.RandomState(SEEDNO))
                model.fit(interactions=  train_interactions,item_features = item_features,user_features=user_features,epochs = 10)
                results_dict['train_auc'] = auc_score(model,train_interactions,item_features=item_features,user_features=user_features).mean()
                results_dict['test_auc'] = auc_score(model,test_interactions,item_features=item_features,user_features=user_features).mean()
                results_dict['train_precision'] = lightfm_prec_at_k(model, train_interactions,item_features=item_features,user_features=user_features, k=K).mean()
                results_dict['test_precision'] = lightfm_prec_at_k(model, test_interactions,item_features=item_features,user_features=user_features, k=K).mean()
                results_dict['train_recall'] = lightfm_recall_at_k(model, train_interactions,item_features=item_features,user_features=user_features, k=K).mean()
                results_dict['test_recall'] = lightfm_recall_at_k(model, test_interactions,item_features=item_features,user_features=user_features, k=K).mean()
                t2 = time.time()
                results_dict['runtime'] = round(t2 - t1)

                results.append(results_dict)
                print(f"Fitted {results_dict}. Took {round(t2 - t1)} seconds.")
                print(f"Total time elapsed: {round(time.time() - t0)} seconds.")
                print("==========")
print("Completed")

Fitted {'learning_rate': 0.15, 'item_alpha': 1e-06, 'no_component': 10, 'learning_schedule': 'adagrad', 'train_auc': 0.95100766, 'test_auc': 0.9463143, 'train_precision': 0.21280977, 'test_precision': 0.07402519, 'train_recall': 0.07342505922590963, 'test_recall': 0.06721637021525063, 'runtime': 88}. Took 88 seconds.
Total time elapsed: 88 seconds.
Fitted {'learning_rate': 0.15, 'item_alpha': 1e-06, 'no_component': 10, 'learning_schedule': 'adadelta', 'train_auc': 0.9501498, 'test_auc': 0.9458428, 'train_precision': 0.2533693, 'test_precision': 0.09098771, 'train_recall': 0.08314451057405499, 'test_recall': 0.07768569438050976, 'runtime': 125}. Took 125 seconds.
Total time elapsed: 212 seconds.
Fitted {'learning_rate': 0.15, 'item_alpha': 1e-06, 'no_component': 20, 'learning_schedule': 'adagrad', 'train_auc': 0.95217055, 'test_auc': 0.94746685, 'train_precision': 0.2289001, 'test_precision': 0.080359586, 'train_recall': 0.0903910345491264, 'test_recall': 0.07806083861391849, 'runtime':

In [286]:
results_df = pd.DataFrame(results)

In [287]:
results_df['test_auc_rank'] = results_df['test_auc'].rank(ascending=False)
results_df['test_precision_rank'] = results_df['test_precision'].rank(ascending=False)
results_df['test_recall_rank'] = results_df['test_recall'].rank(ascending=False)

In [288]:
results_df.sort_values('test_recall_rank')

Unnamed: 0,learning_rate,item_alpha,no_component,learning_schedule,train_auc,test_auc,train_precision,test_precision,train_recall,test_recall,runtime,test_auc_rank,test_precision_rank,test_recall_rank
40,0.35,1e-06,50,adagrad,0.954543,0.949252,0.282177,0.100918,0.120092,0.101888,271,2.0,1.0,1.0
22,0.25,1e-06,50,adagrad,0.954024,0.948906,0.267976,0.094053,0.116455,0.097148,274,9.0,8.0,2.0
46,0.35,1e-08,50,adagrad,0.954507,0.949029,0.260108,0.093537,0.113232,0.09688,261,7.0,9.0,3.0
38,0.35,1e-06,20,adagrad,0.954069,0.948995,0.26429,0.092854,0.113167,0.096477,135,8.0,11.0,4.0
45,0.35,1e-08,20,adadelta,0.953068,0.948719,0.256607,0.091928,0.114378,0.096156,204,14.0,16.0,6.0
9,0.15,1e-08,20,adadelta,0.953068,0.948719,0.256607,0.091928,0.114378,0.096156,204,14.0,16.0,6.0
27,0.25,1e-08,20,adadelta,0.953068,0.948719,0.256607,0.091928,0.114378,0.096156,204,14.0,16.0,6.0
42,0.35,1e-08,10,adagrad,0.953464,0.948332,0.26128,0.092482,0.109623,0.094471,91,25.0,12.0,8.0
32,0.25,1e-10,20,adagrad,0.953255,0.948268,0.255549,0.090669,0.108515,0.093926,139,27.0,26.0,9.0
44,0.35,1e-08,20,adagrad,0.953682,0.948635,0.257395,0.092012,0.108258,0.093853,135,16.0,14.0,10.0


We will take the parameters of the model with the best recall: learning_rate = 0.35, item_alpha = 1e-6, no_components = 50, learning_schedule = adagrad. 

Next we will tune the number of epochs.

In [289]:
results = []
results_dict = {}
no_epochs = [10,20,60,100]
t0 = time.time()

for no_epoch in no_epochs:
    t1 = time.time()
    results_dict = {'no_epochs':no_epoch,
                    'train_auc':.0,
                    'test_auc':.0,
                    'train_precision':.0,
                    'test_precision':.0,
                    'train_recall':.0,
                    'test_recall':.0,
                    'runtime':0}
    model = LightFM(loss='warp',
                   no_components = 50,
                   learning_rate = .35,
                   item_alpha = 1e-6,
                   learning_schedule='adagrad',
                   random_state = np.random.RandomState(SEEDNO))
    model.fit(interactions=  train_interactions,item_features = item_features,user_features=user_features,epochs = no_epoch)
    results_dict['train_auc'] = auc_score(model,train_interactions,item_features=item_features,user_features=user_features).mean()
    results_dict['test_auc'] = auc_score(model,test_interactions,item_features=item_features,user_features=user_features).mean()
    results_dict['train_precision'] = lightfm_prec_at_k(model, train_interactions,item_features=item_features,user_features=user_features, k=K).mean()
    results_dict['test_precision'] = lightfm_prec_at_k(model, test_interactions,item_features=item_features,user_features=user_features, k=K).mean()
    results_dict['train_recall'] = lightfm_recall_at_k(model, train_interactions,item_features=item_features,user_features=user_features, k=K).mean()
    results_dict['test_recall'] = lightfm_recall_at_k(model, test_interactions,item_features=item_features,user_features=user_features, k=K).mean()
    t2 = time.time()
    results_dict['runtime'] = round(t2 - t1)
    results.append(results_dict)
    print(f"Fitted {results_dict}. Took {round(t2-t1)} seconds.")
    print(f"Total time elapsed: {round(time.time()-t0)} seconds.")
    print("==========")
print("Completed")

Fitted {'no_epochs': 10, 'train_auc': 0.9545434, 'test_auc': 0.94925225, 'train_precision': 0.2821771, 'test_precision': 0.10091792, 'train_recall': 0.12009213852298505, 'test_recall': 0.10188772224542005, 'runtime': 257}. Took 257 seconds.
Total time elapsed: 257 seconds.
Fitted {'no_epochs': 20, 'train_auc': 0.9561822, 'test_auc': 0.95052236, 'train_precision': 0.28527302, 'test_precision': 0.10167653, 'train_recall': 0.11922310437088121, 'test_recall': 0.10113011971781795, 'runtime': 416}. Took 416 seconds.
Total time elapsed: 673 seconds.
Fitted {'no_epochs': 60, 'train_auc': 0.95873404, 'test_auc': 0.95196784, 'train_precision': 0.3397998, 'test_precision': 0.121643156, 'train_recall': 0.13790965131970803, 'test_recall': 0.11950063231451379, 'runtime': 980}. Took 980 seconds.
Total time elapsed: 1653 seconds.
Fitted {'no_epochs': 100, 'train_auc': 0.9591967, 'test_auc': 0.95199686, 'train_precision': 0.34031808, 'test_precision': 0.12146109, 'train_recall': 0.13894763956562906, 't

In [290]:
results_epoch_df = pd.DataFrame(results)

In [291]:
results_epoch_df['test_auc_rank'] = results_epoch_df['test_auc'].rank(ascending=False)
results_epoch_df['test_precision_rank'] = results_epoch_df['test_precision'].rank(ascending=False)
results_epoch_df['test_recall_rank'] = results_epoch_df['test_recall'].rank(ascending=False)

In [292]:
results_epoch_df.sort_values('test_recall_rank')

Unnamed: 0,no_epochs,train_auc,test_auc,train_precision,test_precision,train_recall,test_recall,runtime,test_auc_rank,test_precision_rank,test_recall_rank
3,100,0.959197,0.951997,0.340318,0.121461,0.138948,0.120816,1538,1.0,2.0,1.0
2,60,0.958734,0.951968,0.3398,0.121643,0.13791,0.119501,980,2.0,1.0,2.0
0,10,0.954543,0.949252,0.282177,0.100918,0.120092,0.101888,257,4.0,4.0,3.0
1,20,0.956182,0.950522,0.285273,0.101677,0.119223,0.10113,416,3.0,3.0,4.0


While the model scores improved with higher epochs, the runtime is substantially longer. Since the improvement in scores is not substantially better, we will go with 60 epochs.

In [32]:
model = LightFM(loss='warp', no_components = 50,
               learning_rate=.35,
               item_alpha=1e-6,
                random_state=np.random.RandomState(SEEDNO))

In [None]:
model.fit(interactions=train_interactions,item_features=item_features,user_features=user_features,epochs=60);

In [None]:
uid_map, ufeature_map, iid_map, ifeature_map = dataset.mapping()

## Model Evaluation

In [297]:
train_roc_auc = auc_score(model,train_interactions,item_features=item_features,user_features=user_features).mean()
test_roc_auc = auc_score(model,test_interactions,item_features=item_features,user_features=user_features).mean()
train_precision_lfm = lightfm_prec_at_k(model, train_interactions,item_features=item_features,user_features=user_features, k=K).mean()
test_precision_lfm = lightfm_prec_at_k(model, test_interactions,item_features=item_features,user_features=user_features, k=K).mean()
train_recall_lfm = lightfm_recall_at_k(model, train_interactions,item_features=item_features,user_features=user_features, k=K).mean()
test_recall_lfm = lightfm_recall_at_k(model, test_interactions,item_features=item_features,user_features=user_features, k=K).mean()
print("=====AUC=====")
print(f'Train AUC: {train_roc_auc}')
print(f'Test AUC: {test_roc_auc}')
print("=====Precision@K=====")
print(f'Train Precision: {train_precision_lfm}')
print(f'Test Precision: {test_precision_lfm}')
print("=====Recall@K=====")
print(f'Train Recall: {train_recall_lfm}')
print(f'Test Recall: {test_recall_lfm}')

=====AUC=====
Train AUC: 0.9587340354919434
Test AUC: 0.9519678354263306
=====Precision@K=====
Train Precision: 0.33979979157447815
Test Precision: 0.12164315581321716
=====Recall@K=====
Train Recall: 0.13790965131970803
Test Recall: 0.11950063231451379


A test AUC of 0.952 means that there is a 95.2% probability that a randomly chosen relevant item has a higher score than a randomly chosen irrelevant item.

A test precision@k of 0.122 at k = 10 means 12.2% of the recommended items in the top-10 set are relevant.

A test recall@k of 0.120 at k = 10 means 12.0% of relevant items are found in to the top-10 recommendations.

The model is performing fairly well as a recommender.

### Model Predictions

#### Predict for new user

In [6]:
# predict for new user
user_feature_list = ['Comedy','Romance','Slice of Life']

In [9]:
new_user_features = format_newuser_input(ufeature_map, user_feature_list)

In [19]:
n_users, n_items = interactions.shape

In [20]:
model_pred = model.predict(0, np.arange(n_items), user_features=new_user_features) # Here 0 means pick the first row of the user_features sparse matrix

In [21]:
pred_dict = {'media_id':data['media_id'].unique(),'pred':model_pred}

In [39]:
pred_df = pd.DataFrame(pred_dict)

In [40]:
pred_df = pd.merge(pred_df,anime_db[['media_id','title_romaji','title_english','averageScore','genres']],on='media_id')

In [41]:
pred_df.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,averageScore,genres
101,21087,20204.013672,One Punch Man,One-Punch Man,83.0,"Action, Comedy, Sci-Fi, Supernatural"
221,20954,19922.732422,Koe no Katachi,A Silent Voice,88.0,"Drama, Romance, Slice of Life"
11,19815,17458.646484,No Game No Life,"No Game, No Life",78.0,"Adventure, Comedy, Ecchi, Fantasy"
3,11757,16540.271484,Sword Art Online,Sword Art Online,69.0,"Action, Adventure, Fantasy, Romance"
52,20665,15044.25,Shigatsu wa Kimi no Uso,Your Lie in April,84.0,"Drama, Music, Romance, Slice of Life"
199,21519,14915.349609,Kimi no Na wa.,Your Name.,86.0,"Drama, Romance, Supernatural"
50,16498,12584.092773,Shingeki no Kyojin,Attack on Titan,85.0,"Action, Drama, Fantasy, Mystery"
18,20594,12056.78418,Sword Art Online II,Sword Art Online II,65.0,"Action, Adventure, Fantasy, Sci-Fi"
1,21459,11505.833008,Boku no Hero Academia,My Hero Academia,78.0,"Action, Adventure, Comedy"
14,101921,11477.990234,Kaguya-sama wa Kokurasetai: Tensaitachi no Ren...,Kaguya-sama: Love is War,83.0,"Comedy, Psychological, Romance, Slice of Life"


#### Predict for existing user

In [29]:
# user id 710080 is my own account
user_x = uid_map[710080]
model_pred_existing = model.predict(user_x,np.arange(n_items))

In [30]:
pred_dict_existing = {'media_id':data['media_id'].unique(),'pred':model_pred_existing}

In [31]:
pred_df_existing = pd.DataFrame(pred_dict_existing)

In [32]:
pred_df_existing = pd.merge(pred_df_existing,anime_db[['media_id','title_romaji','title_english','averageScore']],on='media_id')

In [33]:
pred_df_existing.sort_values('pred',ascending=False).head(10)

Unnamed: 0,media_id,pred,title_romaji,title_english,averageScore
1796,10155,-5.627423,Dog Days,Dog Days,65.0
1332,21361,-5.757196,GRANBLUE FANTASY The Animation,Granblue Fantasy: The Animation,64.0
1578,21319,-5.830634,Ao no Kanata no Four Rhythm,AOKANA: Four Rhythm Across the Blue,63.0
2487,18195,-5.911045,Little Busters!: Refrain,Little Busters! Refrain,80.0
801,20991,-5.918883,Hidan no Aria AA,Aria the Scarlet Ammo AA,58.0
866,20774,-5.9401,Kuusen Madoushi Kouhosei no Kyoukan,Sky Wizards Academy,58.0
541,20031,-6.064839,D-Frag!,D-Frag!,73.0
2429,18893,-6.213651,Aoki Hagane no Arpeggio: Ars Nova,Arpeggio of Blue Steel,70.0
2740,16444,-6.890656,Mondaiji-tachi ga Isekai kara Kuru Sou Desu yo...,Problem Children Are Coming From Another World...,69.0
66,12189,-7.045213,Hyouka,Hyouka,79.0


#### Predict similar items

In [34]:
# Predicting for Yahari Ore no Seishun Love Come ga Machigatteiru.
# Comedy, Drama, Romance, Slice of Life
item_x = iid_map[14813]

In [35]:
similar_pred = similar_items(item_id=item_x, item_features=item_features, model=model,N=100)
similar_pred['media_id'] = similar_pred['itemID'].map(iid_map_reverse)

In [37]:
similar_pred = pd.merge(similar_pred,anime_db[['media_id','title_romaji','title_english','averageScore']],on='media_id')

In [38]:
similar_pred.sort_values('score',ascending=False).head(10)

Unnamed: 0,itemID,score,media_id,title_romaji,title_english,averageScore
0,63,1.0,14813,Yahari Ore no Seishun Love Come wa Machigatteiru.,My Teen Romantic Comedy SNAFU,78.0
1,642,1.0,20698,Yahari Ore no Seishun Love Come wa Machigattei...,My Teen Romantic Comedy SNAFU TOO!,81.0
2,224,1.0,103572,5-Toubun no Hanayome,The Quintessential Quintuplets,76.0
3,54,1.0,4224,Toradora!,Toradora!,79.0
4,112,1.0,13759,Sakurasou no Pet na Kanojo,The Pet Girl of Sakurasou,79.0
5,451,1.0,853,Ouran Koukou Host Club,Ouran High School Host Club,81.0
6,515,0.999999,21049,ReLIFE,ReLIFE,78.0
7,455,0.999999,20596,Ao Haru Ride,Blue Spring Ride,75.0
8,727,0.999999,97766,Gamers!,GAMERS!,66.0
9,483,0.999999,98291,Tsurezure Children,Tsuredure Children,75.0


## Conclusion & Recommendations

The LightFM model, when given user and item features, performs really well as compared to the baseline model (which did not have any user or item features).

In recommending animes for new users, it seems the model is predicting popular animes which may or may not be relevant to the user input. E.g. "Romance" is part of the user input, but "Romance" only appears in 5 out of the 10 recommendations. Further to that, there are recommendations that do not even include one of the genres in the user input, e.g. Shingeki no Kyojin and Sword Art Online II does not have "Comedy", "Romance", nor "Slice of Life" genres. There is however, 1 recommendation that hits all user inputs, Kaguya-sama ga Kokurasetai. This means that 1 out of 10 recommendations are relevant, which coincides with the model's precision@k score of 12.2%.

In recommending animes for existing users or similar animes, the model is performing as desired, as the top 10 recommendations contain animes that are not available in Netflix at this moment.

Overall, the model is performing decently and the issue with recommending animes for new users might be attributed to a cold start problem, seeing how recommendations for existing users are working as desired. There may be merit in applying an additional filter to remove irrelevant recommendations based on user input, such as excluding animes that does not match any of the genres or does not match with all genres. On that note, it might be useful to allow the user to select if they want the recommendations to contain all or at least one of the genres they have selected.

## Exports

Data items required for predictions:<br>
- model
- item_features
- uid_map
- iid_map
- iid_map_reverse
- ufeature_map
- n_items
- anime data
- data['media_id'].unique()

In [305]:
# pickle model
pickle.dump(model, open('../model.pkl', 'wb'))

In [299]:
iid_map_reverse = dict((value,key) for key,value in iid_map.items())

In [301]:
n_users, n_items = interactions.shape

In [302]:
# Combining remaining required data items into 1 dictionary for pickle
data_items_for_predictions = {'item_features' : item_features,
                             'uid_map':uid_map,
                             'iid_map':iid_map,
                             'iid_map_reverse':iid_map_reverse,
                             'ufeature_map':ufeature_map,
                             'n_items':n_items,
                             'unique_media_id':data['media_id'].unique()}

In [303]:
pickle.dump(data_items_for_predictions, open('../data/data_items_for_predictions.pkl', 'wb'))

In [306]:
anime_db = pd.read_pickle("../data/anime_db_cleaned.pkl")

In [307]:
anime_db_export = anime_db[['id','description','genres','startDate','endDate','duration','averageScore',
                            'siteUrl','title_romaji','title_english','coverImage_large','coverImage_medium']]

In [309]:
anime_db_export['genres'] = anime_db_export['genres'].apply(lambda x: ", ".join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  anime_db_export['genres'] = anime_db_export['genres'].apply(lambda x: ", ".join(x))


In [312]:
anime_db_export.rename(columns={'id':'media_id'},inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [313]:
anime_db_export.to_pickle("../data/anime_db_export.pkl")