## MOVIE RECOMMENDER SYSTEM - MODEL EVALUATION

#### SREENATH S

**NOTE: It is assumed that all the required input files are present in the same folder where this notebook is copied to.**

This notebook is part of the project Movie Recommendation System. Basic functionality of this notebook is to perform the model evaluation in terms of hitrate and recall. This will be used for comparing different models such as Content Based Filtering Model, Collaborative Filtering Model, Hybrid Filtering Model. 

Also note that this notebooks is reused from the code walkthrough session, and made the required changes as needed.

Functionalities provided as part of ModelEvaluator class:

1. The class is instantiated with following data:<br>
    a. Indexed movie metadata.<br>
    b. indexed user rating dataset.(complete dataset)<br>
    c. indexed user ratings train dataset.<br>
    d. Indexed user ratings test dataset.<br>
    
2. get_not_interacted_movie_sample():<br>
    Input for this method are user id and sample size. This method is used to get a list of non interacted movies for a given user. Then it will get a random set of movies from the non interacted list. This random sample size is configurable, and it can be changed in config file.
    
3. verify_hit_top_n():<br>
    This method accepts 'item_id' and list of recommendations and the max position to be considered for hit rate calculation (topn).<br>
    For example if the input topn = 5, then it checks whether the give movieId is present in the first 5 movies as part of recommended items. If so it will give the index at which the given movie is present as part of the recommended list and indicate there is a hit at the topn.
    
4. evaluate_model_for_user() - Accepts model, userId and maximum number of recommedations needed.<br>
    a. Get the list of movies user has interacted in test dataset.<br>
    b. Get the list of movies user has interacted in training dataset.<br>
    c. Invoke item recommendation API on given model, with user id and item to ignore as the list of movies identified in step b<br>
    d. Now for each interacted movie in test set, perform a hit rate at 5 and hit rate at 10 computation.<br>
    e. Compute the recall@n = number of hits@n / number of movies interacted in trainset.<br>

5. evaluate_model()  - Accepts model, Name of the model (collaborative, content based  or hybrid) and maximum number of recommedations needed.<br>
    a. This method will perform hit rate computation for every user in the test set by invoking evaluate_model_for_user().<br>
    b. Compute global recall@n for the given model<br>

**NOTE: We will be reusing the notebook given in the code walthrough session, only the required changes are made on top of that**

In [1]:
import import_ipynb
import pandas as pd
import random
import configs as config
EVAL_RANDOM_SAMPLE_NON_INTERACTED_ITEMS = config.EVAL_RANDOM_SAMPLE_NON_INTERACTED_ITEMS

In [2]:
%%capture
import MovieRecommender_TrainTestDataGenerator as DataGenerator

In [3]:
class ModelEvaluator:
    
    def __init__(self):
        
        self.movie_metadata_indexed_df = DataGenerator.get_indexed_movie_metadata()       
        self.user_behaviour_full_indexed_df,\
        self.user_behaviour_train_indexed_df,\
        self.user_behaviour_test_indexed_df = DataGenerator.get_user_behaviour_indexed()
        print('Shape of movie metadata dataset: ', self.movie_metadata_indexed_df.shape)
        print('Shape of ratings dataset full: ', self.user_behaviour_full_indexed_df.shape)
        print('Shape of ratings dataset train: ', self.user_behaviour_train_indexed_df.shape)
        print('Shape of ratings dataset test: ', self.user_behaviour_test_indexed_df.shape)


    def get_not_interacted_movie_sample(self, userId, sample_size, seed=42):
        #Get list of interacted movies for given user from the complete dataset.
        interacted_movies = DataGenerator.get_users_interaction_data(userId, self.user_behaviour_full_indexed_df)
        # Find out the list of all movies present in the dataset.
        all_movies = set(self.movie_metadata_indexed_df.index)
        # Filter the list of non interacted movies for the user
        non_interacted_movies = all_movies - interacted_movies

        random.seed(seed)
        # Sample the non interacted movies randomly.
        non_interacted_movie_sample = random.sample(non_interacted_movies, sample_size)
        return set(non_interacted_movie_sample)

    def verify_hit_top_n(self, item_id, recommended_items, topn):    
            hit = False
            hit_index = -1
            # iterate through each item in the list of recommendation and check whether given item is in recommeded list.
            # if it exists store the index at which this item is part of the recommendation list.
            for index, movieId in enumerate(recommended_items):
                if movieId == item_id:
                    hit_index = index
                    break
            # if the item is present and its index<topn then indicate hit as true.
            if hit_index < topn  and hit_index != -1:
                hit = True
            return hit, hit_index

    def evaluate_model_for_user(self, model, userId, top_n = 8000):
        # Filter the user interacted movies in test set
        interacted_values_in_testset = self.user_behaviour_test_indexed_df[self.user_behaviour_test_indexed_df.index == userId]
        
        # Get the unique list of movie ids (imdbId) user has interacted within test set
        user_interacted_movies_in_test = set(interacted_values_in_testset.imdbId)
        
        # Number of movies user has interacted with in test dataset
        interacted_movies_count_testset = len(user_interacted_movies_in_test) 

        # Filter the list of movies user has interacted in train set
        user_interacted_movies_in_train = DataGenerator.get_users_interaction_data(userId, self.user_behaviour_train_indexed_df)

        # Getting a ranked recommendation list from a model for a given user
        person_recs_df = model.get_item_recommendations(userId, items_to_ignore = user_interacted_movies_in_train, topn=top_n)

        hits_at_5_count = 0
        hits_at_10_count = 0
        #For each item the user has interacted in test set
        for item_id in user_interacted_movies_in_test:
            hit_at_5 = False
            hit_at_10 = False
            #Getting a random sample (100) items the user has not interacted 
            #(to represent items that are assumed to be no relevant to the user)
            non_interacted_items_sample = \
            self.get_not_interacted_movie_sample(userId, 
                                                 sample_size=EVAL_RANDOM_SAMPLE_NON_INTERACTED_ITEMS, 
                                                 seed=item_id%(100))

            #Combining the current interacted item with the 100 random items
            items_to_filter_recs = non_interacted_items_sample.union(set([item_id]))

            #Filtering only recommendations that are either the interacted item or from a random sample of 100 non-interacted items
            valid_recs_df = person_recs_df[person_recs_df['imdbId'].isin(items_to_filter_recs)]                    
            valid_recs = valid_recs_df['imdbId'].values
            #Verifying if the current interacted item is among the Top-N recommended items
            hit_at_5, index_at_5 = self.verify_hit_top_n(item_id, valid_recs, 5)
            if hit_at_5:
                hits_at_5_count =  hits_at_5_count +1
            hit_at_10, index_at_10 = self.verify_hit_top_n(item_id, valid_recs, 10)
            if hit_at_10:
                hits_at_10_count = hits_at_10_count+1

        #Recall is the rate of the interacted items that are ranked among the Top-N recommended items, 
        #when mixed with a set of non-relevant items
        recall_at_5 = hits_at_5_count / float(interacted_movies_count_testset)
        recall_at_10 = hits_at_10_count / float(interacted_movies_count_testset)

        person_metrics = {'hitrate@5_count':hits_at_5_count, 
                          'hitrate@10_count':hits_at_10_count, 
                          'interacted_count': interacted_movies_count_testset,
                          'recallscore@5': recall_at_5,
                          'recallscore@10': recall_at_10}
        return person_metrics

    def evaluate_model(self, model, recommendation_model_type, top_n = 8000):
        people_metrics = []
        # Evaluate the model performance for each user in the test dataset
        # Store the model performance details for each user to a list
        for idx, userId in enumerate(list(self.user_behaviour_test_indexed_df.index.unique().values)):
            person_metrics = self.evaluate_model_for_user(model, userId, top_n)  
            person_metrics['userId'] = userId
            people_metrics.append(person_metrics)
        print('Number of users processed : ', idx)
        # Create a dataset from the list of evaluation results.
        detailed_results_df = pd.DataFrame(people_metrics).sort_values('interacted_count', ascending=False)
        #Compute Global Recall rate @ 5 and @10
        global_recall_at_5 = detailed_results_df['hitrate@5_count'].sum() / float(detailed_results_df['interacted_count'].sum())
        global_recall_at_10 = detailed_results_df['hitrate@10_count'].sum() / float(detailed_results_df['interacted_count'].sum())
        
        global_metrics = {'model_type': recommendation_model_type,
                          'recallscore@5': global_recall_at_5,
                          'recallscore@10': global_recall_at_10}    
        return global_metrics, detailed_results_df

    
model_evaluator = ModelEvaluator()

Shape of movie metadata dataset:  (8989, 8)
Shape of ratings dataset full:  (99752, 2)
Shape of ratings dataset train:  (84789, 2)
Shape of ratings dataset test:  (14963, 2)
