# Hybrid Filtering

Pada code ini akan dilakukan Hybrid Analysis dari data rating film yang diberikan oleh 24 customer pada sebuah bioskop.

<br>
Analisis ini ditujukan untuk mengetahui bagaimana karakter dari masing-masing customer dan bank film yang dimiliki. Sehingga dengan Hybrid analysis ini dapat memberikan rekomendasi film yang sesuai dengan karakter dan selera film masing-masing customer.
<br>

Terdapat dua dataset yang digunakan, yaitu data film beserta dengan jenis, gendre, rating, kode rating, dan durasinya. Dataset kedua adalah data 24 customer dengan rating yang diberikan oleh mereka untuk setiap film.
<br>

## Import Libraries

In [40]:
# import libraries
from recommendation_data import dataset
from math import sqrt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

## Load Data

In [41]:
# load dataset film dan informasi detail mengenai masing-masing film.
# Dalam hal ini informasi pendukung yang ada meliputi: Jenis, Gendre, Rating Kode_Rating, dan Duration
df = pd.read_csv('../data/data_movie.csv', delimiter=';')
df.head()

Unnamed: 0,Movie,Jenis,Gendre,Rating,Kode_Rating,Duration
0,Avengers: End Game,Non Indo,Action,8.6,1.0,181
1,Ada Apa dengan Cinta 2,Indo,Horror,7.0,0.36,169
2,Aladdin,Non Indo,Animation,7.5,0.56,102
3,Gundala,Indo,Action,7.6,0.6,183
4,Captain Marvel,Non Indo,Adventure,6.1,1.0,102


In [42]:
# Selanjutnya, untuk melakukan Hybrid analysis hanya akan membutuhkan Gendre dari masing-masing film
# Sehingga feature lainnya akan dihapuskan dengan drop
movie = df.drop(columns = ['Jenis','Rating', 'Kode_Rating', 'Duration'], axis=1)
movie.head()

Unnamed: 0,Movie,Gendre
0,Avengers: End Game,Action
1,Ada Apa dengan Cinta 2,Horror
2,Aladdin,Animation
3,Gundala,Action
4,Captain Marvel,Adventure


In [43]:
# dataset pemberian rating untuk masing-masing film oleh seluruh customer
rating = pd.DataFrame(dataset)
rating.head()

Unnamed: 0,Aika,Bika,Cika,Dika,Eika,Fika,Gika,Hika,Iika,Jika,...,Oika,Pika,Qika,Rika,Sika,Tika,Uika,Vika,Wika,Xika
Ada Apa dengan Cinta 2,3,0,0,4,3,0,5,3,5,4,...,0,2,4,4,4,0,5,0,3,0
Aladdin,0,0,0,5,1,5,5,0,0,0,...,5,5,5,5,5,0,0,3,0,4
Avengers: End Game,0,5,3,5,5,4,5,5,5,5,...,5,5,5,5,5,0,5,3,4,0
Bumi Manusia,4,0,0,5,0,0,0,5,0,4,...,0,0,0,0,0,0,0,0,0,0
Captain Marvel,0,2,4,4,5,3,4,0,5,3,...,5,4,4,5,5,0,0,3,2,0


In [31]:
def content_based(person,min_content_score):
    
    k = 0
    not_watch = []
    for i in rating[person]:
        if i == 0:
            not_watch.append(rating.index[k])
            k = k + 1
        else:
            k = k + 1
    
    tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
    tfidf_matrix = tf.fit_transform(movie['Gendre'])
    cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix)
    
    hasilbaru = pd.DataFrame(cosine_similarities, index = movie['Movie'], columns = movie['Movie'])
    
    hasilfinal = pd.DataFrame(hasilbaru[not_watch].mean().sort_values(ascending=False), columns = ['Score'])
    hasilfinal1 = hasilfinal[hasilfinal.Score >= min_content_score]
    indeks = hasilfinal1.index
    
    return indeks

In [32]:
def person_correlation(person1, person2):

   # To get both rated items
    both_rated = {}
    for item in dataset[person1]:
        if item in dataset[person2]:
            both_rated[item] = 1

    number_of_ratings = len(both_rated)

    # Checking for ratings in common
    if number_of_ratings == 0:
        return 0

    # Add up all the preferences of each user
    person1_preferences_sum = sum([dataset[person1][item] for item in both_rated])
    person2_preferences_sum = sum([dataset[person2][item] for item in both_rated])

    # Sum up the squares of preferences of each user
    person1_square_preferences_sum = sum([pow(dataset[person1][item],2) for item in both_rated])
    person2_square_preferences_sum = sum([pow(dataset[person2][item],2) for item in both_rated])

    # Sum up the product value of both preferences for each item
    product_sum_of_both_users = sum([dataset[person1][item] * dataset[person2][item] for item in both_rated])

    # Calculate the pearson score
    numerator_value = product_sum_of_both_users - (person1_preferences_sum*person2_preferences_sum/number_of_ratings)
    denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum,2)/number_of_ratings) * (person2_square_preferences_sum -pow(person2_preferences_sum,2)/number_of_ratings))

    if denominator_value == 0:
        return 0
    else:
        r = numerator_value / denominator_value
        return r

In [33]:
def user_recommendations(person, min_content_score):

    # Gets recommendations for a person by using a weighted average of every other user's rankings
    totals = {}
    simSums = {}
    rankings_list = []
    for other in dataset:
        # don't compare me to myself
        if other == person:
            continue
        sim = person_correlation(person,other)
        #print ">>>>>>>",sim

        # ignore scores of zero or lower
        if sim <= 0: 
            continue
        for item in dataset[other]:

            # only score movies i haven't seen yet
            if item not in dataset[person] or dataset[person][item] == 0:

            # Similrity * score
                totals.setdefault(item,0)
                totals[item] += dataset[other][item] * sim
                # sum of similarities
                simSums.setdefault(item,0)
                simSums[item] += sim

        # Create the normalized list

    rankings = [(total/simSums[item],item) for item,total in totals.items()]
    rankings.sort()
    rankings.reverse()
    # returns the recommended items
    recommendataions_list = [recommend_item for score,recommend_item in rankings]
    

    new_rankings=[]
    for i in rankings:
        if i[1] in content_based(person,min_content_score):
            new_rankings.append(i)
        
    return new_rankings

In [34]:
def hybrid_filtering(person,min_content_score):
    content_based(person,min_content_score)
    return user_recommendations(person,min_content_score)

### Case Example

Ingin mengetahui rekomendasi film untuk ditonton oleh Sika.

In [39]:
hybrid_filtering('Sika',0.18)

[(1.703577783780036, 'Gundala'),
 (1.232618231009842, 'Dilan 1991'),
 (0.008347029626296167, 'Bumi Manusia')]

**Conclusion** <br>
Dari data rating, diketahui Sika belum menonton beberapa film, maka dengan menggunakan metode Hybrid filtering algprithm, Sika akan mendapatkan masukan selanjutnya agar menonton film berdasarkan nilai minimum konten skor sebesar 1.8 sehingga film yang direkomendasikan adalah Gundala, Dilan 1991, dan Bumi Manusia (berurutan dari yang paling sesuai dengan karakter dan selera film Sika).