# Hybrid Recommenders

### Why hybrid recommendations: Combine the power of different types of recommenders to mainly :
        - solve the cold-start problem
        - Accurately predict future user behaviors 
### Hybrid recommender can be classified into:
        - Weighted : Taking the output from each of the models and combine the result in static weightings, which the weight does not change across the train and tests set.
<img src='./image/WeightedRecSys.png'>
        
        -  Switching : Select a single recommendations system based on the situation and what kind of user profile you have.

<img src='./image/switch.png'> 

        -  Mixed: First takes the user profile and features to generate different set of candidate datasets. The recommender then inputs different set of candidate to the recommendation model accordingly, and combine the prediction to produce the result recommendation.

<img src='./image/mixed.png'>

        - Feature combination: We add a virtual contributing recommendation model to the system, which works as feature engineering towards the original user profile dataset. For example: we can inject features of a collaborative recommendation model into an content-based recommendation model.
<img src='./image/Features_combination.png'>

        - Feature Augmentations: A contributing recommendation model is employed to generate a rating or classification of the user/item profile, which is further used in the main recommendation system to produce the final predicted result. The feature augmentation hybrid is able to improve the performance of the core system without changing the main recommendation model. For example, by using the association rule, we are able to enhance the user profile dataset
<img src='./image/hybrid_aug.png'>

        - Cascade: Cascade hybrid defines a strict hierarchical structure recommendation system, such that the main recommendation system produce the primary result, and we use the secondary model to resolve some minor issues of the primary result, like breaking tie in the scoring.
        In practice, most of the dataset are sparse, the secondary recommendation model can be effective against equal scoring issue or missing data issue.
<img src='./image/cascade.png'>     

We have discussed a few problems that different recommender approaches have, such as the cold-start problem in collaborative filtering. Some of these problems can be resolved by using a different recommender approach in the start-up phase (e.g., using a content-based approach). In this Python notebook, I will present a simple hybdrif recommender that combines the content and the collaborative filters that we've built so far.

##### Popular Example 
Netflix is a very good example of a hybrid recommender. It employs content-based techniques when it shows you similar movies to a movie you're watching (the "more like this" section). These are typically content-based. However, most of the times, you would use a collaborative filter ("Top picks for you").

#### Case Study
Imagine that you've built a Netflix-like website. Each time a user watches a movie, you want to display a list of recommendations in the side pane (a bit like Youtube). A content-based recommender would then seem appropriate. However, let's say if a user would be watching the Dark Knight, this would lead to more Batman movie recommendations (not necessarily other superhero movies), which might be of poor quality. This requires a collaborative filter, which predicts the ratings of the movies recommender by our content-based model and return the top few movies with the highest predictions.

#### Workflow
1. Take in a movie title and a user as input.
2. Use a content-based model to compute the 25 most similar movies.
3. Compute the predicted ratings that the user might give these 25 movies using a collaborative filter.
4. Return the top 10 movies with the highest predicted rating.

In [1]:
#Import the relevant packages
import numpy as np
import pandas as pd

In [2]:
#Import or compute the cosine_sim matrix
cosine_sim = pd.read_csv('./cosine_sim.csv')

Normally I would ask you to compute the cosine similarity matrix, but the file above already has the scores. You can try to do it yourself in your own time! (first: TF_IDF, second: cosine_similary)

In [8]:
#Import or compute the cosine sim mapping matrix (mapping movie into an index)
cosine_sim_map = pd.read_csv('./cosine_sim_map.csv', header=None)

#Convert cosine_sim_map into a Pandas Series
cosine_sim_map = cosine_sim_map.set_index(0)
cosine_sim_map = cosine_sim_map[1]


Now we import another csv-file to build a CF model. We will use the SVD model from the last chapter for this purpose, albeit with slightly different syntax

In [13]:
#Build the SVD based Collaborative filter
from surprise import SVD, Reader, Dataset

reader = Reader()
ratings = pd.read_csv('./ratings_small.csv')
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
svd = SVD()
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f9d608c85b0>

In [15]:
#Build the SVD based Collaborative filter
from surprise import SVD, Reader, Dataset

reader = Reader()
ratings = pd.read_csv('./ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [16]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
svd = SVD()
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7f9d608c82e0>

Yet another file to important to map metadata to the CF data.

In [17]:
#Build title to ID and ID to title mappings
id_map = pd.read_csv('./movie_ids.csv')
id_to_title = id_map.set_index('id')
title_to_id = id_map.set_index('title')
id_map.head()

Unnamed: 0,title,movieId,id
0,Toy Story,1,862.0
1,Jumanji,2,8844.0
2,Grumpier Old Men,3,15602.0
3,Waiting to Exhale,4,31357.0
4,Father of the Bride Part II,5,11862.0


Import metadata so that you can inspect the year of release and the IMDB rating

In [18]:
#Import or compute relevant metadata of the movies
smd = pd.read_csv('./metadata_small.csv')
smd.head()

Unnamed: 0.1,Unnamed: 0,index,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,...,vote_average,vote_count,year,cast,crew,keywords,cast_size,crew_size,director,soup
0,0,0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"['Animation', 'Comedy', 'Family']",http://toystory.disney.com/toy-story,862,tt0114709,en,...,7.7,5415.0,1995,"['tomhanks', 'timallen', 'donrickles']","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...","['jealousi', 'toy', 'boy', 'friendship', 'frie...",13,106,"['johnlasseter', 'johnlasseter']",jealousi toy boy friendship friend rivalri boy...
1,1,1,False,,65000000,"['Adventure', 'Fantasy', 'Family']",,8844,tt0113497,en,...,6.9,2413.0,1995,"['robinwilliams', 'jonathanhyde', 'kirstendunst']","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...","['boardgam', 'disappear', ""basedonchildren'sbo...",26,16,"['joejohnston', 'joejohnston']",boardgam disappear basedonchildren'sbook newho...
2,2,2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"['Romance', 'Comedy']",,15602,tt0113228,en,...,6.5,92.0,1995,"['waltermatthau', 'jacklemmon', 'ann-margret']","[{'credit_id': '52fe466a9251416c75077a89', 'de...","['fish', 'bestfriend', 'duringcreditssting']",7,4,"['howarddeutch', 'howarddeutch']",fish bestfriend duringcreditssting waltermatth...
3,3,3,False,,16000000,"['Comedy', 'Drama', 'Romance']",,31357,tt0114885,en,...,6.1,34.0,1995,"['whitneyhouston', 'angelabassett', 'lorettade...","[{'credit_id': '52fe44779251416c91011acb', 'de...","['basedonnovel', 'interracialrelationship', 's...",10,10,"['forestwhitaker', 'forestwhitaker']",basedonnovel interracialrelationship singlemot...
4,4,4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,['Comedy'],,11862,tt0113041,en,...,5.7,173.0,1995,"['stevemartin', 'dianekeaton', 'martinshort']","[{'credit_id': '52fe44959251416c75039ed7', 'de...","['babi', 'midlifecrisi', 'confid', 'age', 'dau...",12,7,"['charlesshyer', 'charlesshyer']",babi midlifecrisi confid age daughter motherda...


Below is the hybrid recommender according to the workflow described earlier

In [19]:
def hybrid(userId, title):
    #Extract the cosine_sim index of the movie
    idx = cosine_sim_map[title]
    
    #Extract the TMDB ID of the movie
    tmdbId = title_to_id.loc[title]['id']
    
    #Extract the movie ID internally assigned by the dataset
    movie_id = title_to_id.loc[title]['movieId']
    
    #Extract the similarity scores and their corresponding index for every movie from the cosine_sim matrix
    sim_scores = list(enumerate(cosine_sim[str(int(idx))]))
    
    #Sort the (index, score) tuples in decreasing order of similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    #Select the top 25 tuples, excluding the first 
    #(as it is the similarity score of the movie with itself)
    sim_scores = sim_scores[1:26]
    
    #Store the cosine_sim indices of the top 25 movies in a list
    movie_indices = [i[0] for i in sim_scores]

    #Extract the metadata of the aforementioned movies
    movies = smd.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year', 'id']]
    
    #Compute the predicted ratings using the SVD filter
    movies['est'] = movies['id'].apply(lambda x: svd.predict(userId, id_to_title.loc[x]['movieId']).est)
    
    #Sort the movies in decreasing order of predicted rating
    movies = movies.sort_values('est', ascending=False)
    
    #Return the top 10 movies as recommendations
    return movies.head(10)

Below, you can test the hybrid recommender model. Let's imagine that users with the IDS 1 and 2 are both watching the movie Avatar. You can see that both the content and the order recommended to them differ. This is due to the collaborative filter. However, alle the movies are similar to Avatar, due to the content-based approach.

In [20]:
hybrid(1, 'Avatar')

Unnamed: 0,title,vote_count,vote_average,year,id,est
1011,The Terminator,4208.0,7.4,1984,218,3.193356
1668,Return from Witch Mountain,38.0,5.6,1978,14822,3.112402
522,Terminator 2: Judgment Day,4274.0,7.7,1991,280,3.045058
1621,Darby O'Gill and the Little People,35.0,6.7,1959,18887,3.030548
974,Aliens,3282.0,7.7,1986,679,3.016155
2834,Predator,2129.0,7.3,1987,106,3.007958
2014,Fantastic Planet,140.0,7.6,1973,16306,2.943413
8865,Star Wars: The Force Awakens,7993.0,7.5,2015,140607,2.908581
8658,X-Men: Days of Future Past,6155.0,7.5,2014,127585,2.894691
8401,Star Trek Into Darkness,4479.0,7.4,2013,54138,2.881366


In [21]:
hybrid(2, 'Avatar')

Unnamed: 0,title,vote_count,vote_average,year,id,est
522,Terminator 2: Judgment Day,4274.0,7.7,1991,280,4.053542
1011,The Terminator,4208.0,7.4,1984,218,4.028717
8401,Star Trek Into Darkness,4479.0,7.4,2013,54138,3.776965
7705,Alice in Wonderland,8.0,5.4,1933,25694,3.644369
8658,X-Men: Days of Future Past,6155.0,7.5,2014,127585,3.605292
974,Aliens,3282.0,7.7,1986,679,3.586829
1668,Return from Witch Mountain,38.0,5.6,1978,14822,3.574871
2834,Predator,2129.0,7.3,1987,106,3.526883
8865,Star Wars: The Force Awakens,7993.0,7.5,2015,140607,3.516028
2014,Fantastic Planet,140.0,7.6,1973,16306,3.479192
