# COLLABORATIVE FILTERING
Collaborative Filtering (CF) is one of the recommendation system that seeks to predict the "rating" or "preference" a user would give to an item.<BR>
The process of information is filtered by collecting human judgments (ratings).<BR>
The user is any individual who provides ratings to a system, while the items is anything for which a human can provide a rating.<br>
The problem of collaborative filtering is to predict how well a user will like an item that he has not rated given a set of historical preference judgments for a community of users.

## Dataset :
The Datasets we used in this modeling is movie rating from 24 users on 10 movies, which is :
- Ada Apa dengan Cinta 2
- Aladdin
- Avengers: End Game
- Bumi Manusia
- Captain Marvel
- Dilan 1991
- Dua Garis Biru
- Gundala
- Spiderman: Far From Home
- The Lion King

# I. Import Libraries
To get started, let's import the libraries.

In [1]:
from datafilm import dataset
from math import sqrt

## Notes :
In this modeling, the data defined as 'dataset' variable already saved as JSON format in datafilm.py. So, we just need to import 'dataset' from datafilm.py.

# II. Define and Compute Similarity Score
This similarity score is used to measure of how much alike two data objects are (in this case between users). Similarity measure in a data mining context is a distance with dimensions representing features of the objects. If this distance is small, it will be the high degree of similarity where large distance will be the low degree of similarity.

In [2]:
def similarity_score(person1,person2):

    # this Returns the ration euclidean distancen score of person 1 and 2

    # To get both rated items by person 1 and 2
    both_viewed = {}

    for item in dataset[person1]:
        if item in dataset[person2]:
            both_viewed[item] = 1
        
        # The Conditions to check if they both have common rating items
        if len(both_viewed) == 0:
            return 0

        # Finding Euclidean distance
        sum_of_eclidean_distance = []

        for item in dataset[person1]:
            if item in dataset[person2]:
                sum_of_eclidean_distance.append(pow(dataset[person1][item] - dataset[person2][item], 2))
        sum_of_eclidean_distance = sum(sum_of_eclidean_distance)
        
        return 1/(1+sqrt(sum_of_eclidean_distance))

# III. Define and Compute Pearson Correlation
Correlation is used for understanding relationship between two or more variables. Correlation captures the association between two variables, numerically. Pearson correlation quantifies the linear relationship between two variables (in this case we find correlation between users). Pearson correlation coefficient can lie between -1 and +1, like other correlation measures. A positive Pearson corelation mean that one variable’s value increases with the others. And a negative Pearson coefficient  means one variable decreases as other variable decreases. Correlations coefficients of -1 or +1 mean the relationship is exactly linear.

In [3]:
def person_correlation(person1, person2):

   # To get both rated items
    both_rated = {}
    for item in dataset[person1]:
        if item in dataset[person2]:
            both_rated[item] = 1

    number_of_ratings = len(both_rated)

    # Checking for ratings in common
    if number_of_ratings == 0:
        return 0

    # Add up all the preferences of each user
    person1_preferences_sum = sum([dataset[person1][item] for item in both_rated])
    person2_preferences_sum = sum([dataset[person2][item] for item in both_rated])

    # Sum up the squares of preferences of each user
    person1_square_preferences_sum = sum([pow(dataset[person1][item],2) for item in both_rated])
    person2_square_preferences_sum = sum([pow(dataset[person2][item],2) for item in both_rated])

    # Sum up the product value of both preferences for each item
    product_sum_of_both_users = sum([dataset[person1][item] * dataset[person2][item] for item in both_rated])

    # Calculate the pearson score
    numerator_value = product_sum_of_both_users - (person1_preferences_sum*person2_preferences_sum/number_of_ratings)
    denominator_value = sqrt((person1_square_preferences_sum - pow(person1_preferences_sum,2)/number_of_ratings) * (person2_square_preferences_sum -pow(person2_preferences_sum,2)/number_of_ratings))

    if denominator_value == 0:
        return 0
    else:
        r = numerator_value / denominator_value
        return r

# IV. Define and Compute Most Similar Users
This 'most similar users' computation is used to find users for something like "training" data.

In [4]:
def most_similar_users(person, number_of_users):

    # returns the number_of_users (similar persons) for a given specific person
    scores = [(person_correlation(person, other_person), other_person) for other_person in dataset if other_person != person]

    # Sort the similar persons so the highest scores person will appear at the first
    scores.sort()
    scores.reverse()
    return scores[0:number_of_users]

# V. Calculate User Recommendation
Finally we can calculate recommendation for users by filtering about the interests of a user by collecting preferences or taste information from many users (collaborating).

In [5]:
def user_recommendations(person):

    # Gets recommendations for a person by using a weighted average of every other user's rankings
    totals = {}
    simSums = {}
    rankings_list =[]
    for other in dataset:
        # don't compare me to myself
        if other == person:
            continue
        sim = person_correlation(person,other)
        #print ">>>>>>>",sim

        # ignore scores of zero or lower
        if sim <=0: 
            continue
        for item in dataset[other]:

            # only score movies i haven't seen yet
            if item not in dataset[person] or dataset[person][item] == 0:

            # Similrity * score
                totals.setdefault(item,0)
                totals[item] += dataset[other][item]* sim
                # sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+= sim

        # Create the normalized list

    rankings = [(total/simSums[item],item) for item,total in totals.items()]
    rankings.sort()
    #rankings.sort(reverse=True)
    rankings.reverse()
    # returns the recommended items
    recommendataions_list = [recommend_item for score,recommend_item in rankings]
    return recommendataions_list, rankings

In [10]:
print(user_recommendations('bunga'))
print(person_correlation('bunga', 'Rima'))
print(similarity_score('bunga', 'Rima'))

(['Ada Apa dengan Cinta 2', 'Dua Garis Biru', 'Bumi Manusia '], [(2.4392626020510484, 'Ada Apa dengan Cinta 2'), (1.2497054043395917, 'Dua Garis Biru'), (0.4997733439738271, 'Bumi Manusia ')])
0.5636570923081838
0.12077134402462537


# VI. Conclusion
**User Recommendation**
From the results above, we can see that the model give user some movie recommendation, which user hasn't been watched or hasn't rated yet. These recommendations are sorted according to the highest rating order of the existing ratings of other users, who have similar ratings with the active user or user that has been watched and rated the movies. <BR>
From the result, model gives movie recommendation for 'bunga' that is:
    1. Ada Apa dengan Cinta 2
    2. Dua Garis Biru
    3. Bumi Manusia
    
**Pearson Correlation**
The relationship between user 'bunga' and 'Rima' in movie ratings is 0.564, which means that values of the two of them will increase with each other with a correlation of 0.564.

**Similarity Score**
The measure of similarity level on movie ratings between 'bunga' and 'Rima' is 0.12 which means that the similarity of their ratings is just as much 0.12.

We can conclude that this model can be useful to make recommendation for every items that can be rated and give much can be used such as,
- To give recommendation on movie, skin care, make up, and any items that can be rated for users or viewers.
- To make ratings for social media users to give recommendation for others about that social media.
- To give recommendation on any apps in AppStore or GooglePlay to gain new users.
- etc