# Movie Referral System

In this project, we'll be looking to create a movie recommendation system based on user ratings system. We've collected datasets with over 100000 users who've rated movies of different genres. The system will detect movies similar to the movies which have been rated by the user.

First, we'll import the pandas and numpy libraries. We have two separate datasets involving the movie id, title and the user ifd and ratings columns, which we'll merge together into one dataframe.

In [23]:
import pandas as pd
import numpy as np

movcols = ['movieId', 'title']
movies = pd.read_csv('movies.csv', usecols = movcols)

ratcols = ['userId', 'movieId', 'rating']
ratings = pd.read_csv('ratings.csv', usecols = ratcols)

ratings = pd.merge(movies, ratings)

ratings.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5


Now, we'll create another dataframe which will have the user ids as the index and the movie titles as the column names. The user ratings will be the values of that dataframe.

In [24]:
userRatings = ratings.pivot_table(index=['userId'], columns=['title'], values='rating')

userRatings.head()

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,4.0,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,


Now, we'll use the pandas correlation function to find similarities between the movies in the dataframe. We'll filter out the movies which have only been rated by a handful number of users to give us the most effective of results.

In [25]:
ratingCorr = userRatings.corr(method='pearson', min_periods=150)

ratingCorr.head()

title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71 (2014),,,,,,,,,,,...,,,,,,,,,,
'Hellboy': The Seeds of Creation (2004),,,,,,,,,,,...,,,,,,,,,,
'Round Midnight (1986),,,,,,,,,,,...,,,,,,,,,,
'Salem's Lot (2004),,,,,,,,,,,...,,,,,,,,,,
'Til There Was You (1997),,,,,,,,,,,...,,,,,,,,,,


We'll now take a user from the dataset as an example, let's say user id number 1. We'll take a look at his rated movies list.

In [26]:
sampRatings = userRatings.loc[1].dropna()

sampRatings

title
13th Warrior, The (1999)                4.0
20 Dates (1998)                         4.0
Abyss, The (1989)                       4.0
Adventures of Robin Hood, The (1938)    5.0
Alice in Wonderland (1951)              5.0
                                       ... 
Wolf Man, The (1941)                    5.0
X-Men (2000)                            5.0
Young Frankenstein (1974)               5.0
Young Sherlock Holmes (1985)            3.0
¡Three Amigos! (1986)                   4.0
Name: 1, Length: 232, dtype: float64

Based on the list, we'll try to use the correlation function to find similar movies to the list. We'll append all the similar movies to a series called suggestions.

In [27]:
suggestions = pd.Series()

for i in range(0, len(sampRatings.index)):
    sim = ratingCorr[sampRatings.index[i]].dropna()
    
    sim = sim.apply(lambda x: x * sampRatings[i])
    
    suggestions = suggestions.append(sim)
    
print()
print('Sorting..')
print()
    
suggestions.sort_values(inplace=True, ascending=False)
suggestions.head(10)

  """Entry point for launching an IPython kernel.



Sorting..



Usual Suspects, The (1995)                                                        5.0
Star Wars: Episode IV - A New Hope (1977)                                         5.0
Back to the Future (1985)                                                         5.0
Fargo (1996)                                                                      5.0
Fight Club (1999)                                                                 5.0
Fugitive, The (1993)                                                              5.0
Gladiator (2000)                                                                  5.0
Matrix, The (1999)                                                                5.0
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    5.0
Schindler's List (1993)                                                           5.0
dtype: float64

The series suggestions contains all the suggested movie names alongside their a combination of their correlation factor and the user's own ratings. Now, we use groupby function to sum up the correlation numbers of the movies and sort them to figure out the most sugguested movies for the user.

In [28]:
suggestions = suggestions.groupby(suggestions.index).sum()

suggestions.sort_values(inplace=True, ascending=False)
suggestions.head(10)

Matrix, The (1999)                                                                19.901289
Star Wars: Episode IV - A New Hope (1977)                                         17.755124
Forrest Gump (1994)                                                               15.096937
Pulp Fiction (1994)                                                               14.040461
Star Wars: Episode V - The Empire Strikes Back (1980)                             13.803897
Star Wars: Episode VI - Return of the Jedi (1983)                                 13.245080
Shawshank Redemption, The (1994)                                                  12.320133
Silence of the Lambs, The (1991)                                                  12.274909
Fight Club (1999)                                                                  9.468755
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)     8.869588
dtype: float64

Now, we see that the system has suggested movies based on correlated movies related to the movies rated by the user.