# HYBRID FILTERING
Hybrid Filtering is the combination of Collaborative Filtering and Content-Based Filtering. <br>
- Content based system used to maintain user profile based on content analysis 
- Collaborative system used to directly compare profiles to determine similar users for recommendation one of the recommendation system

There are two Hybrid Recommender System :
1. Content-Based → Collaborative Filtering <br>
   We can use content-based prediction at users with many training examples and collaboration at others. <br>
   The prediction of content-based models can be used in recursive collaborative filtering <br>
<br>
2. Collaborative Filtering → Content-Based<br>
   Features can be extracted from other users’ ratings
   
A hybrid model of several individual models usually performs better than the best individual model (even weaker models can contribute).<br>
- Voting <br>
Weights of votes can be calibrated on a validation set
- Stacking <br>
Predictions of the individual models can form features in a second-phase classifier

## Dataset 
There are two dataset in this modeling, first the Collaborative Filtering dataset 'film.csv', second the Content-Based Filtering dataset 'hybridCB.csv'.<br>

### 'film.csv'
The datasets is movie rating from 24 users on 10 movies, which is :
- Ada Apa dengan Cinta 2
- Aladdin
- Avengers: End Game
- Bumi Manusia
- Captain Marvel
- Dilan 1991
- Dua Garis Biru
- Gundala
- Spiderman: Far From Home
- The Lion King

### 'hybridCB.csv'
The datasets is 10 movies that same with 'film.csv' with some features , such as :
- Movie
- Jenis Film
- Genre
- Rating
- Duration

# I. Import Libraries
To get started, let's import the libraries.

In [5]:
import pandas as pd
import numpy as np
from math import sqrt
from datafilm import dataset
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

# II. Load Data
Read data to perform recommendation system by Hybrid Filtering model by input both of dataset.

In [29]:
datamovie=pd.read_csv('hybridCB.csv')
datamovie.head()

Unnamed: 0,Movie,Jenis Film,Genre,Rating,Duration
0,Ada Apa dengan Cinta 2,Indonesia,Romance,10.0,180
1,Aladdin,Hollywood,Drama,8.9,120
2,Avengers: End Game,Hollywood,Hero,7.8,130
3,Bumi Manusia,Indonesia,Drama,8.5,120
4,Captain Marvel,Hollywood,Hero,8.5,130


In [30]:
datarating=pd.DataFrame(dataset)
datarating

Unnamed: 0,ANI,AhokTemanFirli,Damar Teman Firli,Dpv,Febi ganteng gak ada obat,Genjeh,Hania,Indra 1991 SM,Indra Junior,Jawaharal,...,Putrisqiana,Rima,Romantika,Star,Topik Zulkarnain,bunga,faizah,franadek,jul,luck
Ada Apa dengan Cinta 2,4.0,0.0,5.0,5.0,4.0,5.0,3.0,0.0,4.0,2.0,...,4.0,5.0,5.0,4.0,0.0,0.0,3.0,4.0,0.0,3.0
Aladdin,4.0,0.0,0.0,0.0,5.0,5.0,0.0,0.0,5.0,5.0,...,0.0,5.0,0.0,5.0,0.0,5.0,0.0,5.0,3.0,0.0
Avengers: End Game,0.0,3.0,5.0,5.0,5.0,5.0,0.0,0.0,5.0,5.0,...,5.0,5.0,0.0,5.0,5.0,5.0,5.0,5.0,3.0,4.0
Bumi Manusia,5.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,4.0,4.0,0.0,0.0,0.0,0.0,5.0,5.0,0.0,0.0
Captain Marvel,4.0,4.0,0.0,5.0,4.0,4.0,0.0,0.0,5.0,4.0,...,3.0,5.0,0.0,5.0,2.0,5.0,0.0,4.0,3.0,2.0
Dilan 1991,4.0,0.0,0.0,4.0,4.0,3.0,4.0,0.0,0.0,3.0,...,2.0,5.0,5.0,0.0,0.0,4.0,5.0,4.0,0.0,0.0
Dua Garis Biru,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0,5.0,...,3.0,3.0,0.0,0.0,0.0,0.0,4.0,3.0,3.0,0.0
Gundala,0.0,0.0,0.0,4.0,3.0,4.0,5.0,5.0,0.0,4.0,...,3.0,5.0,0.0,4.0,0.0,4.0,0.0,4.0,3.0,0.0
Spiderman: Far From Home,3.0,0.0,5.0,5.0,5.0,4.0,0.0,0.0,5.0,5.0,...,4.0,5.0,0.0,0.0,4.0,5.0,0.0,4.0,3.0,0.0
The Lion King,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,4.0,...,0.0,4.0,0.0,5.0,5.0,5.0,0.0,4.0,3.0,0.0


# III. Data Preparation
Prepare the data for modeling.<BR>
The feature we're gonna use in this modeling is only based on **Genre**

In [31]:
datamovie = datamovie.drop(['Jenis Film','Rating','Duration'],1)
datamovie

Unnamed: 0,Movie,Genre
0,Ada Apa dengan Cinta 2,Romance
1,Aladdin,Drama
2,Avengers: End Game,Hero
3,Bumi Manusia,Drama
4,Captain Marvel,Hero
5,Dilan 1991,Romance
6,Dua Garis Biru,Drama
7,Gundala,Hero
8,Spiderman: Far From Home,Hero
9,The Lion King,Drama


In [32]:
#define movie as index
datamovie1=datamovie.set_index('Movie',1)
datamovie1.head()

Unnamed: 0_level_0,Genre
Movie,Unnamed: 1_level_1
Ada Apa dengan Cinta 2,Romance
Aladdin,Drama
Avengers: End Game,Hero
Bumi Manusia,Drama
Captain Marvel,Hero


# IV. Hybrid Filtering Learning Method
In this modeling we use first Hybrid Recommender System, which is: <br>

Content-Based → Collaborative Filtering: <br>
We use content-based prediction at users based on **Genre** and collaboration at others. <br>
The prediction of content-based models used in recursive collaborative filtering .

So, in this modeling we only can give recommendation for one user. 

## Define and Compute Content Based
Content Based is used to measure of how much alike movies are based on 'Genre'.

In [38]:
def content_based(person,min_content_score):
    
    k=0
    not_watch=[]
    for i in datarating[person]:
        if i==0:
            not_watch.append(datarating.index[k])
            k=k+1
        else:
            k=k+1
    
    tf = TfidfVectorizer(analyzer='word',
                             ngram_range=(1, 3),
                             min_df=0,
                             stop_words='english')
    tfidf_matrix=tf.fit_transform(datamovie['Genre'])
    cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)
    
    result=pd.DataFrame(cosine_similarities,index=datamovie['Movie'],columns=datamovie['Movie'])
    
    final=pd.DataFrame(result[not_watch].mean().sort_values(ascending=False),columns=['Score'])
    final1=final[final.Score>=min_content_score]
    indeks=final1.index
    
    return indeks

## Define and Compute Pearson Correlation
Correlation is used for understanding relationship between two or more variables. Correlation captures the association between two variables, numerically. Pearson correlation quantifies the linear relationship between two variables (in this case we find correlation between users). Pearson correlation coefficient can lie between -1 and +1, like other correlation measures. A positive Pearson corelation mean that one variable’s value increases with the others. And a negative Pearson coefficient  means one variable decreases as other variable decreases. Correlations coefficients of -1 or +1 mean the relationship is exactly linear.

In [39]:
def person_correlation(person1, person2):

   # To get both rated items
    both_rated = {}
    for item in dataset[person1]:
        if item in dataset[person2]:
            both_rated[item] = 1

    number_of_ratings = len(both_rated)

    # Checking for ratings in common
    if number_of_ratings == 0:
        return 0

    # Add up all the preferences of each user
    person1_preferences_sum = sum([dataset[person1][item] for item in both_rated])
    person2_preferences_sum = sum([dataset[person2][item] for item in both_rated])

    # Sum up the squares of preferences of each user
    person1_square_preferences_sum = sum([pow(dataset[person1][item],2) for item in both_rated])
    person2_square_preferences_sum = sum([pow(dataset[person2][item],2) for item in both_rated])

    # Sum up the product value of both preferences for each item
    product_sum_of_both_users = sum([dataset[person1][item] * dataset[person2][item] for item in both_rated])

    # Calculate the pearson score
    numerator_value = product_sum_of_both_users - (person1_preferences_sum*person2_preferences_sum/number_of_ratings)
    denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum,2)/number_of_ratings) * (person2_square_preferences_sum -pow(person2_preferences_sum,2)/number_of_ratings))

    if denominator_value == 0:
        return 0
    else:
        r = numerator_value / denominator_value
        return r

## Define and Compute User Recommendation
Finally we can calculate recommendation for users by filtering about what kinds of movies they like based on **Genre** then predict content-based models use recursive collaborative filtering.

In [40]:
def user_recommendations(person,min_content_score):

    # Gets recommendations for a person by using a weighted average of every other user's rankings
    totals = {}
    simSums = {}
    rankings_list =[]
    for other in dataset:
        # don't compare me to myself
        if other == person:
            continue
        sim = person_correlation(person,other)
        #print ">>>>>>>",sim

        # ignore scores of zero or lower
        if sim <=0: 
            continue
        for item in dataset[other]:

            # only score movies i haven't seen yet
            if item not in dataset[person] or dataset[person][item] == 0:

            # Similrity * score
                totals.setdefault(item,0)
                totals[item] += dataset[other][item]* sim
                # sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+= sim

        # Create the normalized list

    rankings = [(total/simSums[item],item) for item,total in totals.items()]
    rankings.sort()
    rankings.reverse()
    # returns the recommended items
    recommendataions_list = [recommend_item for score,recommend_item in rankings]
    

    new_rankings=[]
    for i in rankings:
        if i[1] in content_based(person,min_content_score):
            new_rankings.append(i)
        
    return new_rankings

# V. Hybrid Filtering Result
Calculate user recommendation by hybrid filtering.

In [41]:
def hybrid_filtering(person,min_content_score):
    content_based(person,min_content_score)
    return user_recommendations(person,min_content_score)

In [42]:
hybrid_filtering('AhokTemanFirli',0.2)

[(3.6730461435999056, 'Spiderman: Far From Home'),
 (2.97952988065743, 'Aladdin'),
 (2.97748214528544, 'Ada Apa dengan Cinta 2'),
 (2.353179716353117, 'The Lion King'),
 (2.217889585545292, 'Gundala'),
 (1.855138094497258, 'Dilan 1991'),
 (1.3926543165428198, 'Dua Garis Biru'),
 (0.5409954083704469, 'Bumi Manusia ')]

# VI. Conclusion
From the result, we can conclude that for **AhokTemanFirli**, the highest order on movie recommendation for him is:
1. Spiderman: Far From Home
2. Aladdin
3. Ada Apa dengan Cinta 2
4. The Lion King
5. Gundala
6. Dilan 1991
7. Dua Garis Biru
8. Bumi Manusia

It can be seen that **AhokTemanFirli** has not watched a number of films, so by using the Hybrid Filtering method, **AhokTemanFirli** will get the next recommendation for watching movies based on a minimum score of **2**.