# Recommder System for Movies 

### Recommender System

The objective of a Recommender System is to recommend relevant items for users, based on their preference. Recommender system is prevalent in the digital space. For example, when you go shopping on Amazon, you will notice that Amazon is recommending products on the front page before you even type anything in the search box. Similarly, when you go on YouTube, the top bar of Youtube is typically "videos recommended to you." All these features are based on recommmender system. 

What item to recommend to which user is arguably the most important business decision in many digital platforms. For instance, YouTube cannot control which videos that users upload to it. It cannot control which videos users like to watch. Moreoveor, since watching videos is free, YouTube cannot change the price of its items. It does not have inventory either since each video can be viewed as many times as possible. In this case, what could YouTube control? Or in other words, what differentiates a good video streaming service from a bad one? The answer is recommender system. 

## Approach 1: Popularity-based Recommendation 

The most obvious system is popularity-based recommendation. In this case, this model recommends to a user the most popular items that the user has not previously consumed. In the movie setting, we will recommend the movie that most users have liked and consumed. In other words, this system utilizes the "widom of the crowds." It usually provides good recommendations for most of the people. Since it is easy to implement, people normally use popularity-based recommendation as a baseline. *Note: this system is not personalized. If both consumers did not watch Movie A and Movie A is the most popular one, both of them will be recommended Movie A.*

### 1. Read-in the preference file

In [None]:
import pandas as pd

def read_in_movie_preference():
    file_location = "./data/movie_preference.csv"
    df = pd.read_csv(file_location)
    column_names = list(df.columns[1:])
    preference = {}
    data = df.values.tolist()
    
    for person in data:
        preference[person[0]] = person[1:]
    return [df, column_names, preference]

### 2. Compute the ranking of most popular movies

In [None]:
def movies_popularity_ranking(df, movie_names):
    movie_popularity_rank = []
    
    movie_score = [df.loc[:,name].sum() for name in movie_names]
    
    sorted_score = sorted(movie_score, reverse = True)
    
    for i in movie_score:
        for j in range(len(sorted_score)):
            if i == sorted_score[j]:
                movie_popularity_rank.append(j+1)
                break
        
    return movie_popularity_rank

### 3. Recommendation

In [None]:
def Recommendation(movie_popularity_ranking, preference, movie_names, name):
    recommended_movie = ""
    
    unwatched = []
    
    if name in preference:
        movie_scores = preference[name]
        
        if 0 in movie_scores:
            for n in range(len(movie_scores)):
                if movie_scores[n] == 0:
                    unwatched.append(movie_popularity_ranking[n])
        
        recommend = min(unwatched)
        index = unwatched.index(recommend)
        recommended_movie = movie_names[index]
        print(recommended_movie)
    
    return recommended_movie

## Approach 2: Collaborative Filtering Recommendation

This approach uses the memory of previous users interactions to compute users similarities based on items they've interacted (user-based approach) or compute items similarities based on the users that have interacted with them (item-based approach).

A typical example of this approach is User Neighbourhood-based CF, in which the top-N similar users (usually computed using Pearson correlation) for a user are selected and used to recommend items those similar users liked, but the current user have not interacted yet. 

### 1. Read-in the preference file

In [None]:
import pandas as pd

def read_in_movie_preference():
    file_location = "./data/movie_preference.csv"
    df = pd.read_csv(file_location)
    column_names = list(df.columns[1:])
    preference = {}
    data = df.values.tolist()
    
    for person in data:
        preference[person[0]] = person[1:]
    return [df, column_names, preference]

### 2. Compute the jaccard similarity of any two persons

In [None]:
def jaccard_similarity(preference_1, preference_2):
    js = 0
    
    nu = 0
    de = 0
    
    for i in range(len(preference_1)):
        nu += (preference_1[i]==1 and preference_2[i]==1)
        de += (preference_1[i]==1 or preference_2[i]==1)
    
    if de == 0:
        js = 0
    else:
        js = nu/de
    
    return js

### 3. Finding soulmates

In [None]:
def Find_Soul_Mate(preference, name):
    soulmate = ""
    soulmate_preference = []
    max_js = 0
    
    for other in preference:
        if other != name:
            js = jaccard_similarity(preference[other],preference[name])
        
            if js > max_js or (js == max_js and other < soulmate):
                max_js = js 
                soulmate = other
                soulmate_preference = preference[other]                

    return [soulmate, soulmate_preference, max_js]

### 4. Recommendation
This function takes in a name and recommends a movie. The recommended movie is the first movie (in the order of the column) that this person's soulmate has watched but this person has not. If such movie does not exist, return an empty string. If it exists, returns the name of the movie.

In [None]:
def Recommendation(preference, name, movie_names):
    recommendation = ""
    index = 0
    
    mymovie = preference[name]
    soulmovie = Find_Soul_Mate(preference, name)[1]
    
    for i in range(len(soulmovie)):
        if mymovie[i] == 0 and soulmovie == 1:
            index = i
            break
    recommendation = movie_names[index]
    return recommendation
