# Bonus Project 2: Recommder System for Movies

In this project, you will implement a recommender system for your classmates, professor and TAs based on the movie survey we have conducted. The movie preference file is at **./data/movie_preference.csv**

## Recommender System

The objective of a Recommender System is to recommend relevant items for users, based on their preference. Recommender system is prevalent in the digital space. For example, when you go shopping on Amazon, you will notice that Amazon is recommending products on the front page before you even type anything in the search box. Similarly, when you go on YouTube, the top bar of Youtube is typically "videos recommended to you." All these features are based on recommmender system. 

What item to recommend to which user is arguably the most important business decision in many digital platforms. For instance, YouTube cannot control which videos that users upload to it. It cannot control which videos users like to watch. Moreoveor, since watching videos is free, YouTube cannot change the price of its items. It does not have inventory either since each video can be viewed as many times as possible. In this case, what could YouTube control? Or in other words, what differentiates a good video streaming service from a bad one? The answer is recommender system. 

## Types of Recommender Systems

There are **three** types of recommender system. **In this bonus project, we will implement the first one.**

### Popularity-based Recommendation 

The most obvious system is popularity-based recommendation. In this case, this model recommends to a user the most popular items that the user has not previously consumed. In the movie setting, we will recommend the movie that most users have liked and consumed. In other words, this system utilizes the "widom of the crowds." It usually provides good recommendations for most of the people. Since it is easy to implement, people normally use popularity-based recommendation as a baseline. *Note: this system is not personalized. If both consumers did not watch Movie A and Movie A is the most popular one, both of them will be recommended Movie A.*

### Content-based Recommendation 

This recommender system leverages the data of one customer's historical actions. This recommender systems first utilizes a set of features to describe an item (for example, for movies, we can use the movie's director, main actor, main actress, genre, etc. to describe the movie). When a user comes in, the system will recommend the movies that are closest to the movie that the users have consumed and liked before in terms of the features. For instance, if a user likes action move from Nolan the most, this system will recommend another action movie from Nolan that this user has not consumed. *Note: we will not implement this system in this bonus project since it requires knowledge about supervised learning. We will come back to this topic at the end of this semester.*

### Collaborative Filtering Recommendation

The last type of recommender system is called collaborative filtering. This approach uses the memory of previous users interactions to compute users similarities based on items they've interacted (user-based approach) or compute items similarities based on the users that have interacted with them (item-based approach).

A typical example of this approach is User Neighbourhood-based CF, in which the top-N similar users (usually computed using Pearson correlation) for a user are selected and used to recommend items those similar users liked, but the current user have not interacted yet. 


## In this bonus project, we will implement the user-based collaborative filtering recommendation algorithm 

## 1. Read-in the preference file

The first exercise is to read in the movie preference csv file. 

It returns two things:

1. A dictionary where the key is username and the value is a vector of (-1, 0, 1) that indicates the users preference across movies (in the order of the csv file). 

2. A list of strings that indicate the order of column names.

3. A data frame that contains the csv file.


In [1]:
import pandas as pd
def read_in_movie_preference():
    file_location = "./data/movie_preference.csv"
    df = None
    column_names = []
    preference = {}
    
    df = pd.read_csv('./data/movie_preference.csv')
    column_names = list(df.columns[1:])
    big_list = df.values.tolist()
    for i in range(len(big_list)):
        preference[big_list[i][0]] = big_list[i][1:]
    
    return [df, column_names, preference]

In [2]:
[df, column_names, preference] = read_in_movie_preference()
assert df.shape == (186, 21)

In [3]:
assert column_names == ['The Shawshank Redemption', 'The Godfather',
                       'The Dark Knight ', 'Star Wars: The Force Awakens',
                       'The Lord of the Rings: The Return of the King',
                       'Inception', 'The Matrix ', 'Avengers: Infinity War ',
                       'Interstellar ', 'Spirited Away', 'Coco', 'The Dark Knight Rises',
                       'Braveheart', 'The Wolf of Wall Street', 'Gone Girl ', 'La La Land',
                       'Shutter Island', 'Ex Machina', 'The Martian', 'Kingsman: The Secret Service']

In [4]:
assert preference["DJZ"] == [0, 1, 1, 0, 1, 1, 1, -1, 1, 1, 0, -1, -1, -1, 1, -1, 1, -1, 1, -1]

## 2. Compute the jaccard similarity of any two persons

Your next task is to write a function to compute the jaccard similarity of two persons. In particular, the function should take in two binary vectors representing two persons movie prefecens and compute the jaccard similarity among two persons. In particular, the jaccard similarity of any two persons are equal to 

$$ \frac{\text{Number of Movies both people like}}{\text{Number of Movies at least one person likes}} $$

If there is no movie liked by either of the two persons, jaccard similarity is equal to 0. 

For example:
    
    Input: v1 = [1, 0, 1, -1], v2 = [1, 1, 0, 0]
    Output: js = 1 / 3 = 0.333
   


In [5]:
def jaccard_similarity(preference_1, preference_2):
    js = 0

    numerator = 0
    denominator = 0
    for i in range(len(preference_1)):
        if preference_1[i] != -1 and preference_2[i] != -1:
            numerator += preference_1[i] * preference_2[i]
        if preference_1[i] + preference_2[i] == 0:
            if preference_1[i] != 0:
                denominator += 1
        else:
            denominator += int(preference_1[i] + preference_2[i] > 0)

    if denominator > 0:
        js = numerator / denominator
    if denominator == 0:
        js = 0
    return js


In [6]:
assert round(jaccard_similarity([1, 0, 1, -1], [1, 1, 0, 0]), 2) == 0.33
assert jaccard_similarity(preference["123"], preference["DJZ"]) == 0.25

## 3. Finding Soulmates

Given a person's name, implement a function that finds the person's movie soulmate. Soulmate is defined as the other person who has the highest jaccard similarity that is less than 1 with the focal person. If there are multiple people having the same jaccard similarity with the focal person, pick the person with the smallest name (sorting names in the ascending order). This function should return the soul mate name the movie preference of the soul mate, and the jaccard similarity score of the soul mate



In [7]:
def Find_Soul_Mate(preference, name):
    soulmate = ""
    soulmate_preference = []
    max_js = 0
    jaccard_dict = {}
    big_list = []
    big_list = df.values.tolist()
    for i in range(186):
        jaccard_dict[big_list[i][0]] = jaccard_similarity(big_list[i][1:], preference[name])
    
    jaccard = pd.DataFrame.from_dict(jaccard_dict, orient='index')
    jaccard.reset_index(drop=False, inplace=True)

    jaccard.columns = ['name','similarity']
    jaccard = jaccard.sort_values(by = ['similarity', 'name'], ascending = [False,True]).reset_index(drop=True)
    soulmate = jaccard.loc[1, 'name']
    soulmate_preference = preference[soulmate]
    max_js = jaccard.loc[1, 'similarity']
    
    return [soulmate, soulmate_preference, max_js]


In [8]:
[soulmate, soulmate_preference, js] = Find_Soul_Mate(preference, "DJZ")
[soulmate, soulmate_preference, js]
assert soulmate == 'Jade'
assert soulmate_preference == [1, 1, 1, 0, 1, 1, 1, -1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0]
assert js == 0.7272727272727273

## 4. Recommendation
This function takes in a name and recommends a movie. The recommended movie is the first movie (in the order of the column) that this person's soulmate has watched but this person has not. If such movie does not exist, return an empty string. If it exists, returns the name of the movie.

**Note:** from the test case we can see that this recommendation method generates the same outcome as the popularity-based recommendation. 

In [9]:
def Recommendation(preference, name, movie_names):
    recommendation = ""
    [soulmate, soulmate_preference, js] = Find_Soul_Mate(preference, "DJZ")
    for i in range(len(preference[name])):
        if preference[name][i] == 0:
            if soulmate_preference[i] == 1:
                break
    recommendation = movie_names[i]
    
    return recommendation


In [10]:
assert Recommendation(preference, "DJZ", column_names) == 'The Shawshank Redemption'