


**Introduction**.

The task proposed to compare two approaches to recommendation systems: collaborative and hybrid (collaborative+content). The data used for training and validation was MOvieLens20M, a dataset containing 138493 user ratings for 27278 movies, as well as some additional information about these movies.

**Forming dataset**.

Let's start with the construction of a dataframe containing the data needed for further operations. From all MOvieLens20M we need only two csv files - "rating.csv" and "movie.csv". "rating.csv" contains information about user ratings for specific movies, the data from this file will be used to build a purely collaborative model. "movie.csv" contains information about movie titles and genres, the data from this file together with the data from "rating.csv" will be used to build a hybrid (collaborative+content) model.

The final dataframe has the following data fields: 'userId' - user id; 'movieId' - movie id; 'rating' - the rating given by the user with userId to the movie with movieId; 'timestamp' - the time when the rating was given; 19 additional fields containing information about movie genres with movieId (Adventure, Animation, Children, Comedy .....). One-hot coding was used for genres: 1 - if the movie has this genre in the list of genres; 0 - if not.

I would also like to note that due to the lack of high computing resources on my laptop and in order to reduce training time, the original dataset was reduced. Only the 15000 most frequent users and 3000 most frequent movies were selected.

The function "create_dataframe" creates a dataframe, the dataframe itself can be seen in cell [4].

In [1]:
import pandas as pd
import numpy as np


In [2]:
def create_dataframe(n_users, n_items, rating_file_name, movie_file_name):
    
    """Return dataframe with ratings and one-hot encoded movies' genres. 
    
    Size of the returned dataframe is reduced in comparison with the original datasets: in the final dataframe 
    only n_users most frequent users and n_items most frequent movies are taken into account. """

    df_moive_with_genres=pd.read_csv(movie_file_name)
    df_init=pd.read_csv(rating_file_name)

    df_moive_with_genres['genres']=df_moive_with_genres['genres'].apply(lambda x: x.split('|'))



    for index, row in df_moive_with_genres.iterrows():

        for genre in row['genres']:
            df_moive_with_genres.at[index, genre] = 1

    df_moive_with_genres=df_moive_with_genres.fillna(0)


    
    df_with_feat_full=df_init.merge(df_moive_with_genres, on='movieId')
    
    from collections import Counter
    ucount = Counter(df_init['userId'])
    mcount = Counter(df_init['movieId'])
    
    top_userid = [u for u,c in ucount.most_common(n_users)]
    top_movieid = [i for i, c in mcount.most_common(n_items)]
    
    df_with_feat= df_with_feat_full[df_with_feat_full['userId'].isin(top_userid) & df_with_feat_full['movieId'].isin(top_movieid)].copy()


    df_with_feat.drop(['title', 'genres','(no genres listed)'], axis=1, inplace=True)
    
    return df_with_feat
    

In [3]:
df=create_dataframe(15000,3000,'rating.csv','movie.csv')

In [4]:
df

Unnamed: 0,userId,movieId,rating,timestamp,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Thriller,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir
5,54,2,3.0,2000-11-22 18:36:16,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,91,2,3.5,2005-03-29 01:55:58,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,116,2,2.0,2005-11-23 06:41:08,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
15,131,2,1.0,2009-03-29 11:41:01,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
16,132,2,3.0,2005-04-22 12:29:57,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19572775,137277,8948,3.0,2014-03-07 22:43:33,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19572779,137893,8948,4.0,2008-12-19 03:51:01,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19572782,138067,8948,1.5,2005-06-08 07:20:14,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
19572783,138200,8948,3.0,2009-03-18 23:37:36,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**The splitting into train, validation, and test sets was performed based on the following considerations:** initial dataset is sorted in ascending order by "timestamp" field to avoid data-leaks; "train_val_frac" parameter shows the portion of the sorted dataset (counting from its beginning) used for training and validation subsets; (1 - "train_val_frac") is the portion of the sorted dataset (counting from its end) used for the test subset. In the second step, the unified dataset for training and validation is partitioned into train and validation sets with the partition fraction "train_frac" for training.

The "trainval_test_split" function splits the initial dataset into train-validation and test subsets.

The "train_val_split" function splits the train-validation subset (from the previous function) into train and validation sets.

Train, validation and test sets can be seen in cells [9], [10] and [11] respectively.

In [5]:
def trainval_test_split(df, train_val_frac=0.8):
    
    """Sort initial dataframe by timestamp, remove timestamp column and return train-validation and test subsets. 
    train_val_frac - is a part of the entire dataset which is used as a train-validation set."""
    
    
    df=df.sort_values('timestamp', ascending=True)

    test_frac=1-train_val_frac
    train_val=df.iloc[:int(train_val_frac*len(df))].copy()
    test=df.iloc[int(train_val_frac*len(df))+1:].copy()
    train_val.drop('timestamp', axis=1, inplace=True)
    test.drop('timestamp', axis=1, inplace=True)
    
    return train_val, test

In [6]:
train_val, test=trainval_test_split(df, train_val_frac=0.8)

In [7]:
def train_val_split(train_val, train_frac=0.9):
    
    """Split train-validation set (sorted by timestamp) into train and validation subsets 
    with "train_frac" being the portion of the train subset."""
    
    train=train_val.iloc[:int(train_frac*len(train_val))]
    validation=train_val.iloc[int(train_frac*len(train_val))+1:]
    
    return train, validation

In [8]:
train, validation = train_val_split(train_val, train_frac=0.9)

In [9]:
train

Unnamed: 0,userId,movieId,rating,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,...,Thriller,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir
163287,130558,50,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8087470,130558,25,5.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7371005,130558,21,4.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5181524,130558,17,5.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3127224,130558,24,3.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5423970,47866,508,3.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3259927,47866,442,3.5,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
4575145,47866,2916,3.5,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
2406906,47866,7153,4.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [10]:
validation

Unnamed: 0,userId,movieId,rating,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,...,Thriller,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir
1214179,47866,1246,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1630333,47866,2291,3.5,0.0,0.0,0.0,0.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1155157,47866,1222,3.5,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6300450,47866,1183,2.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
6396271,47866,1391,3.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16176008,76987,33162,4.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
9272725,76987,8641,4.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93825,58222,47,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
350268,58222,296,4.5,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [11]:
test

Unnamed: 0,userId,movieId,rating,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,...,Thriller,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir
13481060,76987,7373,4.5,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
307223,58222,293,5.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
798780,58222,1089,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3693819,58222,1213,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1156816,58222,1222,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16651637,70232,58998,2.5,0.0,0.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4243329,16978,2093,3.5,1.0,0.0,1.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9618691,89081,55232,3.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
18690497,89081,52458,4.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**To build both collaborative and hybrid models, we will use LightFM library.** The functions "get_movie_features", "create_skeleton", "build_interactions_weights", "build_movie_features" are used to convert data into the format appropriate for LightFM. We will not dwell on these functions in detail, as they do not change the essence of the data, they only change the form. Their approximate description is presented in the code itself.

In [12]:
def get_movie_features(data):
    
    """Return list of all additional movie features (genres) and all possible values for them"""
    
    movie_features_names=[]
    for feature in data.columns:
        if feature[0].istitle():
            
            movie_features_names+=[feature]*len(data[feature].unique())
            
            
    unique_feature_values=[]   
    
    for feature in movie_features_names:
         unique_feature_values+=list(data[feature].unique())
            
    movie_features_list=[]
            
    for x,y in zip(movie_features_names, unique_feature_values):
        res = str(x)+ ":" +str(y)
        movie_features_list.append(res)
    
            
    
    
    return movie_features_list
    

In [13]:
movie_features=get_movie_features(df)

In [14]:
def create_skeleton(data, item_features):
    
    """Return a special structure of the dataset required for LightFM"""
    
    from lightfm.data import Dataset
    
    skeleton = Dataset()
    users=list(data['userId'].unique())
    items=list(data['movieId'].unique())
    
    skeleton.fit(users, items, item_features=item_features)
    
    return skeleton

In [15]:
skeleton=create_skeleton(df, movie_features)

  "LightFM was compiled without OpenMP support. "


In [16]:
def build_interactions_weights(data, skeleton):
    
    """Return two sparse matrices: "interactions" and "weights".
    
    "interactions" matrix contains information about the presence of interaction between a user and an item.
    
    "weights" matrix contains quantifies for that interactions."""
    
    
    interactions, weights=skeleton.build_interactions(((data.iloc[i]['userId'],data.iloc[i]['movieId'],
                                                         data.iloc[i]['rating']) for i in range(len(data))))
    
    return interactions, weights

In [17]:
train_interactions, train_weights=build_interactions_weights(train, skeleton)

In [18]:
def build_movie_features(data,skeleton):
    
    """Return list of movies and corresponing additional features (genres) in the appropriate for LightFM format"""
    
    feature_list=[]
    
    movies=list(data['movieId'].unique())
    for movie in movies:

        
        temp=data[data['movieId']==movie].iloc[0]
        
        add_list=[]
        
        for feature in data.columns:           
            if feature[0].istitle():
                res=feature+':'+str(temp[feature])
                add_list.append(res)
                
            
        feature_list.append((movie,add_list))
        
        
                
    movie_features=skeleton.build_item_features(feature_list, normalize= True)
    
    return movie_features
                

In [19]:
item_features=build_movie_features(df,skeleton)

**We will use nDCG10 as the metric for the following reasons:**
1. This metric evaluates the quality of the ranking, not the accuracy of the rating prediction for a particular movie (like RMSE, for example).
2. The metric is normalized (varies in the range [0,1]). Furthermore, nDCG10 also takes into account the position of relevant elements.

The function "ndcg_score" calculates nDCG10 metric, "dcg_score" is an auxiliary function to calculate nDCG10. 

In [20]:
def dcg_score(y_true, y_score, k):
    
    """Return dcg_score at k for y_true and y_score arrays"""
    
    order = np.argsort(y_score)[::-1]
    y_true = np.take(y_true, order[:k])

    gains = 2 ** y_true - 1

    discounts = np.log2(np.arange(len(y_true)) + 2)
    
    return np.sum(gains / discounts)

In [21]:
def ndcg_score(df_pred, k_val=10):
    
    """Return average (for all users) ndcg_score at k for the test or validation set"""
    
    
    users_test=df_pred['uid'].unique()
    ndcg=0
    count=0
    
    for user in users_test:
        y_true=np.array(df_pred[df_pred['uid']==user]['r_ui'])
        y_score=np.array(df_pred[df_pred['uid']==user]['scores'])
        
        dcg=dcg_score(y_true, y_score, k_val)
        idcg=dcg_score(y_true, y_true, k_val)
        
        if idcg!=0:
            ndcg+=dcg/idcg
            count+=1
            
    if count!=0:
        return ndcg/count
    else:
        return None

**At first, let's estimate our metric on a baseline model**. The baseline will be the recommendation of Top10 most viewed (popular) movies from the train set. We will recommend the 10 most popular movies from the train set to each user from the test set, ranking them in descending order of popularity. That is, the most frequently viewed movie will be the first in the list of recommendations, and the least frequently watched of these Top10 movies will be the last.

The functions "match_test_rows_with_top_n_movies" and "create_df_pred_for_baseline" are auxiliary functions. They add the 'score' field to the test set, which in the case of baseline imitates model predictions, and transform the test set with the 'score' field into the form necessary to calculate the "ndcg_score" function.

For baseline nDCG10 = 0.576 (see in cell 25).

In [22]:
#find the most popular (frequently watched) movies in the train set

from collections import Counter

top_n=10

mcount = Counter(train['movieId'])

top_movieid_train = [i for i, c in mcount.most_common(top_n)]



In [23]:
def match_test_rows_with_top_n_movies(x, top_movieid_train):
    
    """This function is additional for "create_df_pred_for_baseline". Calculates scores for the 10 most popular 
    (frequently watched) movies based on their popularity."""
    
    if x['movieId'] in top_movieid_train:
        
        score=(10-top_movieid_train.index(x['movieId']))/2
        
        return score
    else:
        return 0

In [24]:
def create_df_pred_for_baseline(test):
    
    """Returns dataframe with "userId", "movieId", "r_ui" (true rating) and "scores" predicted by the baseline model."""
    
    df_pred=test.copy()
    
    df_pred['scores']=df_pred.apply(lambda x: match_test_rows_with_top_n_movies(x, top_movieid_train), axis=1)
    
    df_pred.rename(columns={"userId": "uid","movieId": "iid", 'rating':'r_ui'}, inplace=True)
    
    return df_pred

In [25]:
# nDCG10 value for the baseline model

df_pred=create_df_pred_for_baseline(test)
print("For baseline NDCG10 =",ndcg_score(df_pred, k_val=10))

For baseline NDCG10 = 0.5763855341638605


**Models building.**

The function "get_trained_LightFM" creates, fits (on the preprocessed data from the train set) and returns a trained LightFM model. The 'item_features' parameter of the function is responsible for the presence of additional features of the movies (genres in our case).

*If 'item_features' = None*, then no additional information besides ratings is transeferred to the model and the model implements a purely collaborative approach.

*If 'item_features' is not None*, then the model gets additional information about the movies and implements a hybrid approach (collaborative+content).

The function "create_df_pred_for_LightFM" is an auxiliary function. It adds the 'score' field to the validation and test sets, which contains information about the model predictions, and transforms the validation and test sets with the 'score' field into the form needed to calculate the metric.

In [26]:
def get_trained_LightFM(train_interactions, train_weights, item_features, params):
    
    """Create and fit LightFM model, and then return the fitted model"""
    
    from lightfm import LightFM
    
    no_components=params['no_components']
    epochs=params['epochs']
    item_alpha =params['item_alpha']
    user_alpha=params['user_alpha']
    
    model = LightFM(no_components=no_components,loss='warp', 
                    item_alpha=item_alpha, user_alpha=user_alpha)
    
    model.fit(train_interactions,
      item_features= item_features,
      sample_weight= train_weights,
      epochs=epochs,num_threads=3)
    
    return model
    

In [27]:
def create_df_pred_for_LightFM(model, test, skeleton):
    
    """Return dataframe with "userId", "movieId", "r_ui" (true rating) and "scores" predicted by the LightFM model"""
    
    user_id_map, user_feature_map, item_id_map, item_feature_map=skeleton.mapping()
    
    df_pred=test.copy()
    scores=[]
    
    for i in range(len(test)):
        
        temp=test.iloc[i]
        score=model.predict(np.array([user_id_map[temp['userId']]]), np.array([item_id_map[temp['movieId']]]))
        scores.append(score[0])
    
    
    df_pred['scores']=scores
    
    df_pred.rename(columns={"userId": "uid","movieId": "iid", 'rating':'r_ui'}, inplace=True)
    
    return df_pred

**Let's consider model with the collaborative approach. In this model, genre-related extra features of movies are not taken into account (item_features=None).**

Let's vary different parameters of the model, the parameter sets are presented in the 'list_params' (see the cell below). For each set of parameters we will display the following results: the model's training time and the value of our nDCG10 metric for the validation set .

The following conclusions can be drawn from these results:
1. Increasing the 'no_components' parameter, which is responsible for the size of user and movie representation vectors, greatly increases the model training time and, other things being equal, practically does not improve the metric value (and in some cases even worsens it).

2. Increasing 'item_alpha' and 'user_alpha' parameters, which are responsible for the regularization of the loss function, also increases training time (though less significantly than 'no_components') and worsens metric value.

The set of parameters {'no_components':10, 'epochs':20, 'item_alpha':0, 'user_alpha':0} is optimal in terms of training time and metric value. For these parametrs learning time was 6 min and nDCG10=0.634 for the validation set.

In [28]:
#fitting of the collaborative model

list_params=[{'no_components':10, 'epochs':20, 'item_alpha':0, 'user_alpha':0},
             {'no_components':10, 'epochs':20, 'item_alpha':0.01, 'user_alpha':0.01},
             {'no_components':50, 'epochs':20, 'item_alpha':0, 'user_alpha':0},
             {'no_components':50, 'epochs':20, 'item_alpha':0.01, 'user_alpha':0.01}]

print('For pure collaborative model:\n\n')

import time

for params in list_params:
    
    start_time=time.monotonic()
    
    model=get_trained_LightFM(train_interactions, train_weights, None, params)
    
    finish_time=time.monotonic()
    
    df_pred=create_df_pred_for_LightFM(model, validation, skeleton)
        
    print('==========================')
    print('For params:', params,'\n\n' 'Training time (min):',round(2*(finish_time-start_time)/60)/2 ,'       NDCG10=',ndcg_score(df_pred, k_val=10),'\n==========================\n')

For pure collaborative model:


For params: {'no_components': 10, 'epochs': 20, 'item_alpha': 0, 'user_alpha': 0} 

Training time (min): 6.0        NDCG10= 0.6343169332443236 

For params: {'no_components': 10, 'epochs': 20, 'item_alpha': 0.01, 'user_alpha': 0.01} 

Training time (min): 7.0        NDCG10= 0.6188044006276578 

For params: {'no_components': 50, 'epochs': 20, 'item_alpha': 0, 'user_alpha': 0} 

Training time (min): 19.5        NDCG10= 0.6375469138306958 

For params: {'no_components': 50, 'epochs': 20, 'item_alpha': 0.01, 'user_alpha': 0.01} 

Training time (min): 27.0        NDCG10= 0.6098998690706406 



Let's perform the final estimation of the model with the collaborative approach on the test set (see the cell below). The value of nDCG10 = 0.601, which exceeds the value of the metric for baseline (0.576) by about 4.3%. It may seem that the increase in the metric is not great, but it is very important to understand that the recommendation of Top10 movies from the train set is quite a strong baseline, which can be difficult to overcome.

In [29]:
# nDCG10 value for the collaborative model

params= {'no_components': 10, 'epochs': 20, 'item_alpha': 0, 'user_alpha': 0} 
model=get_trained_LightFM(train_interactions, train_weights, None, params)
df_pred=create_df_pred_for_LightFM(model, test, skeleton)

print('For pure collaborative model:')
print('Test NDCG10 =', ndcg_score(df_pred, k_val=10))

For pure collaborative model:
Test NDCG10 = 0.6014518719202534


**Now, let's consider the hybrid model. In it item_features are taken into account.** 

Let's fit the hybrid model on the parameter set optimal for the collaborative model and print the training time and metric value on the validation sample. (The parameter set for the collaborative model has turned out to be optimal for this model as well). The results are presented in the cell below.

The training time was 40 min, which is significantly higher than the training time of the collaborative model. The value of the metric on the validation sample for the hybrid model (0.631) is about 0.5% less than the value for the collaborative model (0.634). This is probably due to the fact that the hybrid model is much more complex than the collaborative, and, accordingly, the hybrid model may be worse trained on the same train set.

In [32]:
#fitting of the hybrid model


list_params=[{'no_components':10, 'epochs':20, 'item_alpha':0, 'user_alpha':0}]

print('For hybrid model:\n\n')

import time

for params in list_params:
    
    start_time=time.monotonic()
    
    model=get_trained_LightFM(train_interactions, train_weights, item_features, params)
    
    finish_time=time.monotonic()

    df_pred=create_df_pred_for_LightFM(model, validation, skeleton)
    
    print('==========================')
    print('For params:', params,'\n\n' 'Training time (min):',round(2*(finish_time-start_time)/60)/2 ,'       NDCG10=',ndcg_score(df_pred, k_val=10),'\n==========================\n')

For hybrid model:


For params: {'no_components': 10, 'epochs': 20, 'item_alpha': 0, 'user_alpha': 0} 

Training time (min): 40.0        NDCG10= 0.6307978126973641 



Let's perform the final estimation of the hybrid model on the test sample (see the cell below). nDCG10 = 0.590, which is about 2.4% higher than the baseline metric value (0.576). Like on the validation set, on the test set nDCG10 value for the hybrid model is worse than the value for the collaborative model, which is again related to the peculiarities mentioned above.

In [33]:
# nDCG10 value for the hybrid model

df_pred=create_df_pred_for_LightFM(model, test, skeleton)

print('For hybrid model:')
print('Test NDCG10 =', ndcg_score(df_pred, k_val=10))

For hybrid model:
Test NDCG10 = 0.5904613576105188


**Conclusions:** 

1. The following values of the nDCG10 metric on the test set were obtained: for the baseline - 0.576; for the collaborative model - 0.601; for the hybrid model - 0.590.  

2. Both collaborative and hybrid models outperform the baseline model (recommendation of the 10 most popular movies from the train set) in terms of the metric value. However, the superiority is not large, so it is worth performing statistical evaluation of the results, for example, using the bootstrap method. 

3. The hybrid model slightly loses to the collaborative model in the value of nDCG10. However, the collaborative model suffers from so-called "cold-start" problem, due to the lack of information when adding a new object. The hybrid model helps to solve this problem because it contains some additional information about movies, so a slightly lower metric value in the hybrid model is compensated by its ability to interact with new objects.

**What the allotted time was not enough:** 

1. Perform a statistical evaluation of the results (e.g., with bootstrap).

2. Perhaps try a more exotic split into the train, validation, and test sets. However, at this stage of my understanding of the task, it seems that all other options, other than trivial sorting by timestamp and then splitting in timestamp order, will more or less lead to data leaks.

3. Try models from other libraries (e.g. SVD, SVD++, NMF from scikit-surprise) and evaluate them.