# Week 10 Exercise: Building a Recommender System

In [42]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors

## Data Preparation

### Importing and Viewing Data

In [59]:
# import data
movies= pd.read_csv("movies.csv")
ratings = pd.read_csv("ratings.csv")

In [3]:
# view
movies

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [4]:
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


In [5]:
ratings.reset_index(inplace=True)

### Transform Data for Model

In order to use the data, we will pivot the ratings data so that user Id's are columns and movie Id's are the indices. 

In [60]:
final_dat = ratings.pivot(index='movieId', columns='userId', values='rating')

In [61]:
# impute Nan's w/ 0 for interpretability
final_dat.fillna(0, inplace=True)
final_dat.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,0.0,0.0,4.0,0.0,4.5,0.0,0.0,0.0,...,4.0,0.0,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0,0.0,0.0,...,0.0,4.0,0.0,5.0,3.5,0.0,0.0,2.0,0.0,0.0
3,4.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0


I need to balance the fact that less reviews might make a movie seem more appealing when it's really just less reviewed. So we will create a qualified dataset which requires 10 ratings for a movie and 50 reviews from a user.

In [64]:
no_user_voted = ratings.groupby('movieId')['rating'].count()
no_movies_voted = ratings.groupby('userId')['rating'].count()

In [68]:
# filter movie rows with movies that have been reviewed 10 times
final_dat = final_dat.loc[no_user_voted[no_user_voted >10].index,:]

In [67]:
# filter user columns with users that have reviewed 50 movies
final_dat = final_dat.loc[:,no_movies_voted[no_movies_voted > 50].index]

Make a sparse matrix with values to make data more efficient

In [69]:
# create sparse matrix
sparsed = csr_matrix(final_dat.values)
final_dat.reset_index(inplace=True)

## Building Recommender System

For my recommender system, I am using an item-based collaborative filtering recommendation system. This means that the system will take a movie that someone has ranked highly, look for other users who have ranked that movie highly and see what other movies those users have ranked highly and recommend those movies. The system will be implemented using a K-Nearest Neighbors model.

In [74]:
# create nearest neighbor model
knn = NearestNeighbors(metric="cosine", algorithm="brute", n_neighbors=20, n_jobs=-1)

In [75]:
# fit it to sparsed data
knn.fit(sparsed)

In [78]:
def get_recommendation(movie_name):
    """
    Takes movie name, and creates 10 recommendations based on that movie.
    """
    n_movies = 10
    # holds full movie name in case only partial name given
    movie_list = movies[movies["title"].str.contains(movie_name)]
    
    if len(movie_list):
        # find movie id
        movie_idx= movie_list.iloc[0]['movieId']
        movie_idx = final_dat[final_dat['movieId'] == movie_idx].index[0]
        # get distances and indices from knn model
        distances , indices = knn.kneighbors(sparsed[movie_idx],n_neighbors=n_movies+1)  
        # create a list of 10 recommended movies based on relation to given movie
        rec_movie_indices = sorted(list(zip(indices.squeeze().tolist(),distances.squeeze().tolist())),key=lambda x: x[1])[:0:-1]
        recommend_frame = []
        for val in rec_movie_indices:
            movie_idx = final_dat.iloc[val[0]]['movieId']
            idx = movies[movies['movieId'] == movie_idx].index
            recommend_frame.append({'Title':movies.iloc[idx]['title'].values[0],'Distance':val[1]})
        df = pd.DataFrame(recommend_frame,index=range(1,n_movies+1))
        return df
    else:
        return "No movies found. Please check your input"

### Testing System

Now I will take some movies and see what the model recommends to me.

In [80]:
get_recommendation("Ant-Man")

Unnamed: 0,Title,Distance
1,Thor: The Dark World (2013),0.383046
2,Kingsman: The Secret Service (2015),0.377068
3,Captain America: The Winter Soldier (2014),0.376943
4,Untitled Spider-Man Reboot (2017),0.368109
5,X-Men: Apocalypse (2016),0.338759
6,Man of Steel (2013),0.335081
7,Guardians of the Galaxy (2014),0.333083
8,Star Wars: Episode VII - The Force Awakens (2015),0.328174
9,Iron Man 3 (2013),0.322229
10,Captain America: Civil War (2016),0.287791


In [87]:
get_recommendation("Black Beauty")

Unnamed: 0,Title,Distance
1,Midnight Run (1988),0.7073
2,"Paper, The (1994)",0.704013
3,Just Cause (1995),0.703694
4,Don Juan DeMarco (1995),0.701168
5,It Could Happen to You (1994),0.70007
6,Ladyhawke (1985),0.696508
7,"Secret of Roan Inish, The (1994)",0.695788
8,Little Women (1994),0.691002
9,Wallace & Gromit: The Best of Aardman Animatio...,0.688023
10,"Indian in the Cupboard, The (1995)",0.684798


In [88]:
get_recommendation("Little Women")

Unnamed: 0,Title,Distance
1,Dances with Wolves (1990),0.562984
2,Dave (1993),0.557968
3,Ghost (1990),0.556694
4,Mr. Holland's Opus (1995),0.555646
5,Quiz Show (1994),0.547318
6,Sabrina (1995),0.541068
7,Legends of the Fall (1994),0.530763
8,Sleepless in Seattle (1993),0.530663
9,Crimson Tide (1995),0.527508
10,While You Were Sleeping (1995),0.513346


In [97]:
get_recommendation("Last of the Mohicans")

Unnamed: 0,Title,Distance
1,Platoon (1986),0.592466
2,Con Air (1997),0.591426
3,Heat (1995),0.577661
4,Jaws (1975),0.570005
5,Glory (1989),0.567197
6,Batman Returns (1992),0.565448
7,"Untouchables, The (1987)",0.565345
8,Die Hard 2 (1990),0.562082
9,Beverly Hills Cop II (1987),0.558577
10,Sneakers (1992),0.558473


In [98]:
get_recommendation('Ace Ventura')

Unnamed: 0,Title,Distance
1,"Lion King, The (1994)",0.503122
2,Billy Madison (1995),0.50164
3,Interview with the Vampire: The Vampire Chroni...,0.497001
4,Jumanji (1995),0.487712
5,Mrs. Doubtfire (1993),0.485233
6,Happy Gilmore (1996),0.483602
7,Die Hard: With a Vengeance (1995),0.472779
8,Dumb & Dumber (Dumb and Dumber) (1994),0.465048
9,"Mask, The (1994)",0.453656
10,Ace Ventura: Pet Detective (1994),0.342129


## Conclusions

It appears that my movie recommender system is working as intended. Based on the testing above, the system adequately recommends movies that are similar to the movie that was suggested. There are some odd recommendations (Die Hard: With a Vengeance for Ace Ventura) but overall I am satisfied with the results.

If I were to go further, I would want to test other models against it. I saw that Support Vector Machines can also function as good recommender systems. I would also want to try using other types of filtering systems to see the different results one would get for the same movie given.

In summary, I built a movie recommender system using an item-based collaborative filtering system that recommends 10 movies based on what similar users also rated highly. It was implemented using a K-Nearest Neighbor model, which returned the 10 nearest neighbors for any given movie. 

## References

https://www.datasciencecentral.com/profiles/blogs/5-types-of-recommenders

https://towardsdatascience.com/brief-on-recommender-systems-b86a1068a4dd

https://www.analyticsvidhya.com/blog/2020/11/create-your-own-movie-movie-recommendation-system/