# Firli Ilhami

## Collaborative Filtering Algorithm
## Objective
Collaborative filtering is one of algorithm which has been used for recommendation systems. In this case, we will use for recommendation movie. Collaborative filtering will recommend film based on other user's ratings about that film.

### Import Library & Dataset

In [17]:
from recommendation_data import dataset
from math import sqrt
import pandas as pd

### Dataset
Our data is about rating viewer about films that they have watched. There are 10 films :
1. Ada Apa Dengan Cinta 2  
2. Aladdin
3. Avenger: End Game
4. Bumi Manusia
5. Captain Marvel
6. Dilam 1991
7. Dua Garis Biru
8. Gundala
9. Spiderman: Far From Home
10. The Lion King

The rating score is from 1 to 5. If viewers give 0 score, it means they haven't watched that film and there are 24 respondens.

Our dataset is not dataframe, so i will make a dataframe to make it easier to see the data.

In [18]:
dataset

{'ANI': {'Ada Apa dengan Cinta 2': 4,
  'Aladdin': 4,
  'Avengers: End Game': 0,
  'Bumi Manusia': 5,
  'Captain Marvel': 4,
  'Dilan 1991': 4,
  'Dua Garis Biru': 0,
  'Gundala': 0,
  'Spiderman: Far From Home': 3,
  'The Lion King': 0},
 'AhokTemanFirli': {'Ada Apa dengan Cinta 2': 0,
  'Aladdin': 0,
  'Avengers: End Game': 3,
  'Bumi Manusia': 0,
  'Captain Marvel': 4,
  'Dilan 1991': 0,
  'Dua Garis Biru': 0,
  'Gundala': 0,
  'Spiderman: Far From Home': 0,
  'The Lion King': 0},
 'Damar Teman Firli': {'Ada Apa dengan Cinta 2': 5,
  'Aladdin': 0,
  'Avengers: End Game': 5,
  'Bumi Manusia': 0,
  'Captain Marvel': 0,
  'Dilan 1991': 0,
  'Dua Garis Biru': 0,
  'Gundala': 0,
  'Spiderman: Far From Home': 5,
  'The Lion King': 0},
 'Dpv': {'Ada Apa dengan Cinta 2': 5,
  'Aladdin': 0,
  'Avengers: End Game': 5,
  'Bumi Manusia': 0,
  'Captain Marvel': 5,
  'Dilan 1991': 4,
  'Dua Garis Biru': 0,
  'Gundala': 4,
  'Spiderman: Far From Home': 5,
  'The Lion King': 0},
 'Febi ganteng gak 

In [19]:
df=pd.DataFrame(dataset)
df.index.name='Person'

In [20]:
df.transpose().head()

Person,Ada Apa dengan Cinta 2,Aladdin,Avengers: End Game,Bumi Manusia,Captain Marvel,Dilan 1991,Dua Garis Biru,Gundala,Spiderman: Far From Home,The Lion King
ANI,4,4,0,5,4,4,0,0,3,0
AhokTemanFirli,0,0,3,0,4,0,0,0,0,0
Damar Teman Firli,5,0,5,0,0,0,0,0,5,0
Dpv,5,0,5,0,5,4,0,4,5,0
Febi ganteng gak ada obat,4,5,5,0,4,4,0,3,5,0


## Function

### Similarity Score Function
We use this function to measure opinion/interest about film from 'person1' and 'person2'.<br>

Output from this function is score between 0 and 1, if the score is 1 , it means that they really have similarity opinion about films that they have watched and vice versa.
This function uses $1/(1+distance)$ to measure similarity and using euclidean distance to measure the distance. We can change with other method if we want. Example : Manhattan distance, Minkowski distance, etc. 

We use rating score to calculate the distance, if their rating's score is same, this function will calculate the distance = 0, So 1/(1+distance) = 1/(1+0) = 1 and it means they have same opinion about film that they have watched

In this function we have to input 2 parameters:
1. person1 : name of person 1
2. person2 : name of person 2

In [21]:
def similarity_score(person1,person2):

    # this Returns the ration euclidean distancen score of person 1 and 2

    # To get both rated items by person 1 and 2
    both_viewed = {}

    for item in dataset[person1]:
        if item in dataset[person2]:
            both_viewed[item] = 1
        
        # The Conditions to check if they both have common rating items
        if len(both_viewed) == 0:
            return 0

        # Finding Euclidean distance
        sum_of_eclidean_distance = []

        for item in dataset[person1]:
            if item in dataset[person2]:
                sum_of_eclidean_distance.append(pow(dataset[person1][item] - dataset[person2][item], 2))
        sum_of_eclidean_distance = sum(sum_of_eclidean_distance)
        
        return 1/(1+sqrt(sum_of_eclidean_distance))

### Person Correlation Function
Same with Similarity score , but in this function we will use correlation method to see about 'person1' and 'person2' opinion/interest about film that they have watched.<br>

If the correlation score is close to 1 , it means that they have same opinion about films that they have watched and vice versa.
In this function we have to input 2 parameters:
1. person1 : name of person 1
2. person2 : name of person 2


In [22]:
def person_correlation(person1, person2):

   # To get both rated items
    both_rated = {}
    for item in dataset[person1]:
        if item in dataset[person2]:
            both_rated[item] = 1

    number_of_ratings = len(both_rated)

    # Checking for ratings in common
    if number_of_ratings == 0:
        return 0

    # Add up all the preferences of each user
    person1_preferences_sum = sum([dataset[person1][item] for item in both_rated])
    person2_preferences_sum = sum([dataset[person2][item] for item in both_rated])

    # Sum up the squares of preferences of each user
    person1_square_preferences_sum = sum([pow(dataset[person1][item],2) for item in both_rated])
    person2_square_preferences_sum = sum([pow(dataset[person2][item],2) for item in both_rated])

    # Sum up the product value of both preferences for each item
    product_sum_of_both_users = sum([dataset[person1][item] * dataset[person2][item] for item in both_rated])

    # Calculate the pearson score
    numerator_value = product_sum_of_both_users - (person1_preferences_sum*person2_preferences_sum/number_of_ratings)
    denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum,2)/number_of_ratings) * (person2_square_preferences_sum -pow(person2_preferences_sum,2)/number_of_ratings))

    if denominator_value == 0:
        return 0
    else:
        r = numerator_value / denominator_value
        return r

### Most Similar Users Function
This function will give us information about viewer who have same interest with us, and we can set how many number of viewer that closest to us. In this function we use <b>person_correlation function</b> and then sort the correlation score.

In this function we have to input 2 parameters:
1. person : name of person 
2. number_of_users : number of persons who are closest to us about film interest/opinion

In [23]:
def most_similar_users(person, number_of_users):

    # returns the number_of_users (similar persons) for a given specific person
    scores = [(person_correlation(person, other_person), other_person) for other_person in dataset if other_person != person]

    # Sort the similar persons so the highest scores person will appear at the first
    scores.sort()
    scores.reverse()
    return scores[0:number_of_users]

### User_Recommendations Function
This function will give us recommendations film that we haven't watched based on other viewer's ratings about that film.<br>
If film has higher value score, it means that film is more recommended. <br>

In this function we will use <b>person_correlation function</b> which we have disscused before.
In this function we have to input 1 parameters:
1. person : name of person 


In [41]:
def user_recommendations(person):

    # Gets recommendations for a person by using a weighted average of every other user's rankings
    totals = {}
    simSums = {}
    rankings_list =[]
    for other in dataset:
        # don't compare me to myself
        if other == person:
            continue
        sim = person_correlation(person,other)
        #print ">>>>>>>",sim

        # ignore scores of zero or lower
        if sim <=0: 
            continue
        for item in dataset[other]:

            # only score movies i haven't seen yet
            if item not in dataset[person] or dataset[person][item] == 0:

            # Similrity * score
                totals.setdefault(item,0)
                totals[item] += dataset[other][item]* sim
                # sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+= sim

        # Create the normalized list

    rankings = [(total/simSums[item],item) for item,total in totals.items()]
    rankings.sort()
    rankings.reverse()
    # returns the recommended items & score
    recommendataions_list = [recommend_item for score,recommend_item in rankings]
    score_list=[score for score,recommend_item in rankings]
    final=pd.DataFrame(recommendataions_list,columns=['Film'],index=list(range(1,len(recommendataions_list)+1)))
    return print('Top Recommendation Film Based on User Recommendations: '),print(final)

In [42]:
print (user_recommendations('Indra Junior'))

Top Recommendation Film Based on User Recommendations: 
           Film
1       Gundala
2    Dilan 1991
3  Bumi Manusia
(None, None)


In [26]:
print(most_similar_users('Indra Junior', 3))

[(0.7195400397487958, 'Mulya'), (0.6138541929301831, 'Jawaharal'), (0.5885791599322122, 'Topik Zulkarnain')]


## Conclusion

* There are 3 films that Indra Junior has not watched. There are Bumi Manusia, Dilan 1991 and Gundala. Based on User_recommendation function (Collaborative filtering) the top recommended films are Gundala, Dilan 1991 and Bumi Manusia  

* There 3 person who most similar to Indra Junior . They are Mulya , Jawahara1 and Topik Zulkarnain. This result is calculated by correlation function . So it means if the score is close to 1 , they have more similar opinion about film with Indra Junior. In my opinion Mulya and Jawahara1 have high score (above 0.6) and Topik Zulkarain is moderate.