# Hybrid Filtering
Hybrid  recommender  system  is  the  one  that  combines  multiple recommendation  techniques  together  to  produce  the  output.  If  one  compares  hybrid recommender  systems  with  collaborative  or  content-based  systems,  the  recommendation accuracy is usually higher in hybrid systems. The reason is the lack of information about the domain  dependencies  in  collaborative  filtering,  and  about  the  people’s  preferences  in content-based system. The combination of both leads to common knowledge increase, which contributes  to  better  recommendations.  The  knowledge  increase  makes  it  especially promising  to explore  new ways  to  extend underlying collaborative  filtering algorithms with content data and content-based algorithms with the user behavior data. (https://www.researchgate.net/publication/324763207_A_Hybrid_Approach_using_Collaborative_filtering_and_Content_based_Filtering_for_Recommender_System)

### 1. Import library

In [2]:
from recommendation_data import dataset
from math import sqrt
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

### 2. Load dataset

In [8]:
film=pd.read_csv('../Dataset/data_contents.csv',delimiter=';') #dataset of film
usr_rating=pd.DataFrame(dataset) #dataset of user's rating

In [9]:
film

Unnamed: 0,Film,Description
0,Ada Apa dengan Cinta 2,Budi Tom Romance ID
1,Aladdin,Budi Cruise Romance US
2,Avengers: End Game,Andi Michael Action US
3,Bumi Manusia,Charles Tom Drama ID
4,Captain Marvel,Andi Cruise Action US
5,Dilan 1991,Budi Tom Romance ID
6,Dua Garis Biru,Charles Tom Drama ID
7,Gundala,Andi Cruiser Action ID
8,Spiderman: Far From Home,Charles Michael Action US
9,The Lion King,Budi Michael Drama US


In [10]:
usr_rating

Unnamed: 0,ANI,AhokTemanFirli,Damar Teman Firli,Dpv,Febi ganteng gak ada obat,Genjeh,Hania,Indra 1991 SM,Indra Junior,Jawaharal,...,Putrisqiana,Rima,Romantika,Star,Topik Zulkarnain,bunga,faizah,franadek,jul,luck
Ada Apa dengan Cinta 2,4,0,5,5,4,5,3,0,4,2,...,4,5,5,4,0,0,3,4,0,3
Aladdin,4,0,0,0,5,5,0,0,5,5,...,0,5,0,5,0,5,0,5,3,0
Avengers: End Game,0,3,5,5,5,5,0,0,5,5,...,5,5,0,5,5,5,5,5,3,4
Bumi Manusia,5,0,0,0,0,0,4,0,0,0,...,4,4,0,0,0,0,5,5,0,0
Captain Marvel,4,4,0,5,4,4,0,0,5,4,...,3,5,0,5,2,5,0,4,3,2
Dilan 1991,4,0,0,4,4,3,4,0,0,3,...,2,5,5,0,0,4,5,4,0,0
Dua Garis Biru,0,0,0,0,0,0,4,0,4,5,...,3,3,0,0,0,0,4,3,3,0
Gundala,0,0,0,4,3,4,5,5,0,4,...,3,5,0,4,0,4,0,4,3,0
Spiderman: Far From Home,3,0,5,5,5,4,0,0,5,5,...,4,5,0,0,4,5,0,4,3,0
The Lion King,0,0,0,0,0,0,0,0,5,4,...,0,4,0,5,5,5,0,4,3,0


### 3. Steps Hybrid Filtering

##### 3.1 Calculate content based filtering

In [11]:
# function description: get recommended film from the similarities between the films -> this is why it called 'content based'
def content_based(person,min_content_score):

    #1 - START - check in usr_rating, get movies that the person did not watch yet
    k=0
    not_watch=[]
    for i in df1[person]:
        if i==0:
            not_watch.append(df1.index[k])
            k=k+1
        else:
            k=k+1
    # 1 - END - not_watch will filled by movies that the person did not watch yet
    # example: not_watch = ['Aladdin', 'Bumi Manusia', 'Dua Garis Biru', 'The Lion King']
    
    # 2 - START - Convert a collection of raw documents (movie description) to a matrix of TF-IDF features.
    tf = TfidfVectorizer(analyzer='word',
                             ngram_range=(1, 3),
                             min_df=0,
                             stop_words='english')
    tfidf_matrix=tf.fit_transform(film['Description'])
    # 2 - END - a matrix of TF-IDF features
    # tfidf = [0.2847195009092169, 0.2847195009092169, 0.32024359109404055 ..........]
    
    # 3 - START - Compute similarities between movies
    cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)
    # 3 - END - Matrix with similarities between movies. see the output of cosine_similarities at : https://imgur.com/GOW1KmH
    # so from this matrix, we conclude that bumi manusia and dua garis biru is similar (dilihat dari deskripsinya)
    
    # 4 - START - Create dataframe from cosine similarities
    new=pd.DataFrame(cosine_similarities,index=film['Film'],columns=film['Film'])
    # 4 - END - see the output of new at : https://imgur.com/c3405is
    
    # 5 - START - create dataframe of not watched films, compute the mean score(similarity), and sort it as ascending
    final=pd.DataFrame(new[not_watch].mean().sort_values(ascending=False),columns=['Score'])
    # 5 - END - see the output of final at : https://imgur.com/x27svfw
    
    # 6 - START - from final, filter it, and get the film which have score(similarty) greater that min_content_score
    final2=final[final.Score>=min_content_score]
    indeks=final2.index
    
    return indeks
    # 6 - END - return the filterred films

##### 3.2 Calculate the collaborative filtering

In [12]:
def person_correlation(person1, person2):

   # To get both rated items
    both_rated = {}
    for item in dataset[person1]:
        if item in dataset[person2]:
            both_rated[item] = 1

    number_of_ratings = len(both_rated)

    # Checking for ratings in common
    if number_of_ratings == 0:
        return 0

    # Add up all the preferences of each user
    person1_preferences_sum = sum([dataset[person1][item] for item in both_rated])
    person2_preferences_sum = sum([dataset[person2][item] for item in both_rated])

    # Sum up the squares of preferences of each user
    person1_square_preferences_sum = sum([pow(dataset[person1][item],2) for item in both_rated])
    person2_square_preferences_sum = sum([pow(dataset[person2][item],2) for item in both_rated])

    # Sum up the product value of both preferences for each item
    product_sum_of_both_users = sum([dataset[person1][item] * dataset[person2][item] for item in both_rated])

    # Calculate the pearson score
    numerator_value = product_sum_of_both_users - (person1_preferences_sum*person2_preferences_sum/number_of_ratings)
    denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum,2)/number_of_ratings) * (person2_square_preferences_sum -pow(person2_preferences_sum,2)/number_of_ratings))

    if denominator_value == 0:
        return 0
    else:
        r = numerator_value / denominator_value
        return r

##### 3.3 Calculate the hybrid filtering

In [13]:
def user_recommendations(person,min_content_score):

    # Gets recommendations for a person by using a weighted average of every other user's rankings
    totals = {}
    simSums = {}
    rankings_list =[]
    for other in dataset:
        # don't compare me to myself
        if other == person:
            continue
        sim = person_correlation(person,other)
        #print ">>>>>>>",sim. see the output at : 

        # ignore scores of zero or lower
        if sim <=0: 
            continue
        for item in dataset[other]:

            # only score movies i haven't seen yet
            if item not in dataset[person] or dataset[person][item] == 0:

            # Similrity * score
                totals.setdefault(item,0)
                totals[item] += dataset[other][item]* sim
                # sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+= sim
    # final output of simSums : https://imgur.com/ENrTS3h
    
    # Create the normalized list
    rankings = [(total/simSums[item],item) for item,total in totals.items()]
    #sorting and reversing, means sort it descending
    rankings.sort(reverse=True)

    # START - loop through user based recommendation list, and check if it is also recommended
    # by content based recommendation ->>> the hybrid is here! we menggabungkan user based recommendation and content based recommendation
    new_rankings=[]
    content_based_recommendation_list = content_based(person,min_content_score)
    for i in rankings:
        if i[1] in content_based_recommendation_list:
            new_rankings.append(i)
    # END
        
    return new_rankings

##### 3.4 Hybrid filtering function

In [14]:
def hybrid_filtering(person,min_content_score):
    return user_recommendations(person,min_content_score)

### 4. User input
User needs to input his name (on list) and the minimum score film.

In [26]:
print('List of user : ', usr_rating.columns)

List of user :  Index(['ANI', 'AhokTemanFirli', 'Damar Teman Firli', 'Dpv',
       'Febi ganteng gak ada obat', 'Genjeh', 'Hania', 'Indra 1991 SM',
       'Indra Junior', 'Jawaharal', 'Maria O.', 'Mulya', 'Nonton_Saat_Diskon',
       'OM INDRA', 'Putrisqiana', 'Rima', 'Romantika', 'Star',
       'Topik Zulkarnain', 'bunga', 'faizah', 'franadek', 'jul', 'luck'],
      dtype='object')


In [22]:
name = input('Your name :')
min_content_score = input('Minimum Score Film :')
hybrid_filtering(name,float(min_content_score))

Your name :Hania
Minimum Score Film :0.18


[(2.188279355216407, 'Avengers: End Game'),
 (0.5623441289567186, 'Spiderman: Far From Home')]

------