# Data Exploration and Analysis

We are using grouplens' ml-latest small data set.
It contains 4 csv files, namely links, movies, ratings and tags.
We will be using only "movies" and "ratings" files

# Movies

Attributes:

1. MovieID

---> Nominal Attribute

---> Type: Positive Integer

---> Is Unique

---> Movie Identifier

2. Title

---> Nominal Attribute

---> Type: Alphanumeric String

---> Is Unique

---> Movie Name + Release Year

3. Genres

---> Nominal Attribute

---> Type: String

---> Not Unique

---> Combinations of different genres seperated using "|"

# Ratings

Attributes:

1. UserId

---> Nominal Attribute

---> Type: Positive Integers

---> Unique with respect to users but not unique with respect to data.

---> User Identifier

2. MovieId

---> Nominal Attribute

---> Type: Positive Integers

---> Unique with respect to movies but not unique with respect to data.

---> Movie Identifier

3. Rating

---> Ordinal Attribute

---> Type: Floating point numbers range(0-5)

---> Not Unique

---> Rating given by user (UserId) to a movie (MovieId)

4. Timestamp

---> Ordinal Attribute

---> Type: Long Integers

---> Not Unique

---> Time at which user (UserId) rated a movie (MovieId)

# Dimensionality Reduction

Removing irrelevant attributes from our data set.
Note - here irrelevent is with respect to or relative to our approach and not on general level.

Removing Genres from Movies:

Although Genres are important while designing a recommendation system, Our approach to item based collaborative filtering only considers ratings given by users on movies.
We will be testing if we get accurate results using only ratings. However, if we get obscure results then we will consider genres in our data set.

Removing Timestamps from Ratings:

We don't consider time to be relevant to our approach.

# ML Project Final Submission

In [3]:
import pandas as pd
import numpy as np
movies = pd.read_csv("movies.csv")
movies.drop(["genres"],axis = 1, inplace = True)
ratings = pd.read_csv("ratings.csv")


ratings.drop(["timestamp"],axis = 1, inplace = True)

merged = pd.merge(movies,ratings)


In [4]:
merged

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),1,4.0
1,1,Toy Story (1995),5,4.0
2,1,Toy Story (1995),7,4.5
3,1,Toy Story (1995),15,2.5
4,1,Toy Story (1995),17,4.5
...,...,...,...,...
100831,193581,Black Butler: Book of the Atlantic (2017),184,4.0
100832,193583,No Game No Life: Zero (2017),184,3.5
100833,193585,Flint (2017),184,3.5
100834,193587,Bungo Stray Dogs: Dead Apple (2018),184,3.5


Considering all the movies gives vague recommendation as dataset contains unpopular movies too

If we consider only the popular movies then it decreases the ammount of computations as well as gives proper recommendations

In [5]:
new = merged.groupby("title").agg({"rating":[np.size,np.mean]})

Taking only those movies which have 20+ ratings. (As this is a small dataset, movies having greater than 20 ratings are popular ones)

This number (20) was not chosen arbitrarily. We chose different numbers from 1-100, choosing number greater than 20 eliminated several popular movies from the data frame and choosing less than 20 included unpopular movies. So, we chose 20.

In [6]:
new = new[new["rating"]["size"]>20]

In [7]:
new

Unnamed: 0_level_0,rating,rating
Unnamed: 0_level_1,size,mean
title,Unnamed: 1_level_2,Unnamed: 2_level_2
(500) Days of Summer (2009),42.0,3.666667
10 Things I Hate About You (1999),54.0,3.527778
101 Dalmatians (1996),47.0,3.074468
101 Dalmatians (One Hundred and One Dalmatians) (1961),44.0,3.431818
12 Angry Men (1957),57.0,4.149123
...,...,...
Zoolander (2001),54.0,3.509259
Zootopia (2016),32.0,3.890625
eXistenZ (1999),22.0,3.863636
xXx (2002),24.0,2.770833


Generating data frame containing only popular movies.

In [8]:
newratings = pd.merge(merged,new,on = "title")



In [9]:
newratings

Unnamed: 0,movieId,title,userId,rating,"(rating, size)","(rating, mean)"
0,1,Toy Story (1995),1,4.0,215.0,3.92093
1,1,Toy Story (1995),5,4.0,215.0,3.92093
2,1,Toy Story (1995),7,4.5,215.0,3.92093
3,1,Toy Story (1995),15,2.5,215.0,3.92093
4,1,Toy Story (1995),17,4.5,215.0,3.92093
...,...,...,...,...,...,...
66656,168252,Logan (2017),567,4.0,25.0,4.28000
66657,168252,Logan (2017),586,5.0,25.0,4.28000
66658,168252,Logan (2017),596,5.0,25.0,4.28000
66659,168252,Logan (2017),599,3.5,25.0,4.28000


In [10]:
ratings

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0
...,...,...,...
100831,610,166534,4.0
100832,610,168248,5.0
100833,610,168250,5.0
100834,610,168252,5.0


Generating a pivot table containing only popular movies (20+ ratings)

Also we are going to use cosine similarity measure on our dataset, as it only works when data is represented in binary. We convert are ratings values to binary. NaN -> 0

In [11]:
movieratingpivot = newratings.pivot_table(index = ["userId"],columns = ["title"],values = "rating", aggfunc=lambda x: len(x.unique()),fill_value=0)

In [12]:
movieratingpivot

title,(500) Days of Summer (2009),10 Things I Hate About You (1999),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),13 Going on 30 (2004),"13th Warrior, The (1999)",1408 (2007),2001: A Space Odyssey (1968),2012 (2009),...,Young Frankenstein (1974),Young Guns (1988),Zack and Miri Make a Porno (2008),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zootopia (2016),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0,0,0,0,0,0,1,0,0,0,...,1,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,0,0,0,0,0,0,0,0,1,0,...,1,0,0,0,0,0,0,0,0,0
607,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
608,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,1,0,1,1,0
609,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# Correlation Approach

## Pearson's Coefficient

calculating correlation matrix among movies based on user ratings using Person's correlation 

In [13]:
pearson = movieratingpivot.corr(method = "pearson",min_periods = 20)

In [14]:
pearson

title,(500) Days of Summer (2009),10 Things I Hate About You (1999),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),13 Going on 30 (2004),"13th Warrior, The (1999)",1408 (2007),2001: A Space Odyssey (1968),2012 (2009),...,Young Frankenstein (1974),Young Guns (1988),Zack and Miri Make a Porno (2008),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zootopia (2016),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
(500) Days of Summer (2009),1.000000,0.279940,0.139942,0.149419,0.112902,0.268254,0.102878,0.172389,0.143571,0.303765,...,0.107300,0.107074,0.410298,0.412097,0.352856,0.234354,0.226414,0.086298,0.211389,0.134928
10 Things I Hate About You (1999),0.279940,1.000000,0.256224,0.203113,0.018919,0.321002,0.248508,0.110238,0.140868,0.099425,...,0.162018,0.226679,0.226041,0.134580,0.149741,0.268585,0.056099,0.156386,0.144728,0.191369
101 Dalmatians (1996),0.139942,0.256224,1.000000,0.299641,0.118454,0.215186,0.030332,0.219346,0.138031,0.114033,...,0.207358,0.095313,0.114033,0.078141,0.107302,0.191299,0.069882,0.043024,0.194494,0.121629
101 Dalmatians (One Hundred and One Dalmatians) (1961),0.149419,0.203113,0.299641,1.000000,0.106432,0.225400,0.160754,0.230029,0.134606,0.086377,...,0.200525,0.166103,0.121133,0.085446,0.138967,0.270037,0.104929,0.082011,0.269517,0.160754
12 Angry Men (1957),0.112902,0.018919,0.118454,0.106432,1.000000,-0.029728,0.127438,0.160916,0.232520,0.186524,...,0.134306,0.160916,0.093844,0.220209,0.120933,0.118062,0.050775,0.028525,0.050914,0.099555
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zoolander (2001),0.234354,0.268585,0.191299,0.270037,0.118062,0.226041,0.134230,0.284900,0.201127,0.226041,...,0.216681,0.110238,0.289348,0.253973,0.293167,1.000000,0.107870,0.094481,0.382210,0.105660
Zootopia (2016),0.226414,0.056099,0.069882,0.104929,0.050775,0.076560,0.059552,0.173892,0.120574,0.237878,...,0.032044,0.025537,0.278208,0.182737,0.292891,0.107870,1.000000,0.033359,0.179310,0.023153
eXistenZ (1999),0.086298,0.156386,0.043024,0.082011,0.028525,0.011700,0.263860,0.181757,0.299941,0.011700,...,0.152994,0.137409,0.059924,0.168413,0.065193,0.094481,0.033359,1.000000,0.141753,0.176811
xXx (2002),0.211389,0.144728,0.194494,0.269517,0.050914,0.146786,0.124271,0.298429,0.169739,0.239285,...,0.087462,0.128297,0.285535,0.226949,0.207008,0.382210,0.179310,0.141753,1.000000,0.082528


Testing our system on different movies and getting top 10 recommended movies and their correlation

In [15]:
pearson["Star Wars: Episode IV - A New Hope (1977)"].sort_values(ascending = False)[1:11]

title
Star Wars: Episode V - The Empire Strikes Back (1980)                             0.722617
Star Wars: Episode VI - Return of the Jedi (1983)                                 0.673075
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    0.544330
Indiana Jones and the Last Crusade (1989)                                         0.502200
Star Wars: Episode I - The Phantom Menace (1999)                                  0.478435
Matrix, The (1999)                                                                0.458924
Indiana Jones and the Temple of Doom (1984)                                       0.432533
Terminator, The (1984)                                                            0.430735
Back to the Future (1985)                                                         0.427486
Aliens (1986)                                                                     0.420938
Name: Star Wars: Episode IV - A New Hope (1977), dtype: float64

In [16]:
pearson["Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)"].sort_values(ascending = False)[1:11]

title
Indiana Jones and the Last Crusade (1989)                0.640258
Star Wars: Episode V - The Empire Strikes Back (1980)    0.578669
Star Wars: Episode IV - A New Hope (1977)                0.544330
Indiana Jones and the Temple of Doom (1984)              0.499429
Star Wars: Episode VI - Return of the Jedi (1983)        0.469163
Terminator, The (1984)                                   0.468134
Back to the Future (1985)                                0.465982
Princess Bride, The (1987)                               0.458141
Monty Python and the Holy Grail (1975)                   0.448111
Aliens (1986)                                            0.445871
Name: Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981), dtype: float64

In [17]:
pearson["Star Trek II: The Wrath of Khan (1982)"].sort_values(ascending = False)[1:11]

title
Star Trek IV: The Voyage Home (1986)             0.712752
Star Trek III: The Search for Spock (1984)       0.701304
Star Trek VI: The Undiscovered Country (1991)    0.615599
Star Trek: The Motion Picture (1979)             0.589973
Star Trek: First Contact (1996)                  0.574865
Star Trek V: The Final Frontier (1989)           0.500530
Superman (1978)                                  0.484649
Star Trek: Insurrection (1998)                   0.471163
Escape from New York (1981)                      0.466507
Superman II (1980)                               0.459477
Name: Star Trek II: The Wrath of Khan (1982), dtype: float64

In [18]:
pearson["Dark Knight Rises, The (2012)"].sort_values(ascending = False)[1:11]

title
Dark Knight, The (2008)             0.594260
Django Unchained (2012)             0.590568
Avengers, The (2012)                0.586191
Inception (2010)                    0.576297
Interstellar (2014)                 0.564401
Grand Budapest Hotel, The (2014)    0.560322
Wolf of Wall Street, The (2013)     0.529007
Iron Man (2008)                     0.526420
Mad Max: Fury Road (2015)           0.523889
Sherlock Holmes (2009)              0.520774
Name: Dark Knight Rises, The (2012), dtype: float64

In [19]:
pearson["X-Men: Days of Future Past (2014)"].sort_values(ascending = False)[1:11]

title
X-Men: First Class (2011)                     0.648158
Captain America: The First Avenger (2011)     0.626546
Avengers: Age of Ultron (2015)                0.614512
Guardians of the Galaxy (2014)                0.592427
Thor: The Dark World (2013)                   0.580761
Star Trek Into Darkness (2013)                0.577653
Thor (2011)                                   0.572598
Captain America: The Winter Soldier (2014)    0.568684
Avengers, The (2012)                          0.565021
X-Men Origins: Wolverine (2009)               0.552470
Name: X-Men: Days of Future Past (2014), dtype: float64

In [20]:
pearson["Avengers, The (2012)"].sort_values(ascending = False)[1:11]

title
Iron Man 2 (2010)                             0.653766
Guardians of the Galaxy (2014)                0.636100
Iron Man (2008)                               0.607362
X-Men: First Class (2011)                     0.589131
Dark Knight Rises, The (2012)                 0.586191
Avengers: Age of Ultron (2015)                0.577425
Captain America: The Winter Soldier (2014)    0.577213
Thor (2011)                                   0.567498
Captain America: The First Avenger (2011)     0.565988
Iron Man 3 (2013)                             0.565988
Name: Avengers, The (2012), dtype: float64

In [21]:
pearson["Logan (2017)"].sort_values(ascending = False)[1:11]

title
Guardians of the Galaxy 2 (2017)                     0.638975
Rogue One: A Star Wars Story (2016)                  0.598771
Captain America: Civil War (2016)                    0.580897
Doctor Strange (2016)                                0.580897
Arrival (2016)                                       0.529463
X-Men Origins: Wolverine (2009)                      0.488529
X-Men: Days of Future Past (2014)                    0.488332
Avengers: Age of Ultron (2015)                       0.478160
Star Wars: Episode VII - The Force Awakens (2015)    0.472897
Captain America: The First Avenger (2011)            0.470602
Name: Logan (2017), dtype: float64

In [22]:
pearson["Inception (2010)"].sort_values(ascending = False)[1:11]

title
Dark Knight, The (2008)          0.658115
Dark Knight Rises, The (2012)    0.576297
Inglourious Basterds (2009)      0.565771
Shutter Island (2010)            0.548158
Up (2009)                        0.536997
Avengers, The (2012)             0.535424
Django Unchained (2012)          0.535217
Interstellar (2014)              0.535148
Social Network, The (2010)       0.487115
Iron Man (2008)                  0.481924
Name: Inception (2010), dtype: float64

In [23]:
pearson["Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)"].sort_values(ascending = False)[1:11]

title
Harry Potter and the Chamber of Secrets (2002)                            0.728967
Harry Potter and the Goblet of Fire (2005)                                0.692827
Harry Potter and the Prisoner of Azkaban (2004)                           0.691742
Shrek (2001)                                                              0.559344
Pirates of the Caribbean: Dead Man's Chest (2006)                         0.552697
Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005)    0.543842
Harry Potter and the Order of the Phoenix (2007)                          0.541162
Harry Potter and the Half-Blood Prince (2009)                             0.541162
Ice Age (2002)                                                            0.536343
Harry Potter and the Deathly Hallows: Part 1 (2010)                       0.529466
Name: Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001), dtype: float64

In [24]:
pearson["Bourne Ultimatum, The (2007)"].sort_values(ascending = False)[1:11]

title
Bourne Supremacy, The (2004)    0.589056
Bourne Identity, The (2002)     0.550616
Taken (2008)                    0.504129
Sherlock Holmes (2009)          0.498994
Dark Knight, The (2008)         0.485901
Iron Man (2008)                 0.475247
Batman Begins (2005)            0.475135
Casino Royale (2006)            0.473267
Blood Diamond (2006)            0.462435
V for Vendetta (2006)           0.453075
Name: Bourne Ultimatum, The (2007), dtype: float64

## Spearman's Coefficient

calculating correlation matrix among movies based on user ratings using Spearman's correlation

In [25]:
spearman = movieratingpivot.corr(method = "spearman",min_periods = 20)

In [26]:
spearman

title,(500) Days of Summer (2009),10 Things I Hate About You (1999),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),13 Going on 30 (2004),"13th Warrior, The (1999)",1408 (2007),2001: A Space Odyssey (1968),2012 (2009),...,Young Frankenstein (1974),Young Guns (1988),Zack and Miri Make a Porno (2008),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zootopia (2016),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
(500) Days of Summer (2009),1.000000,0.279940,0.139942,0.149419,0.112902,0.268254,0.102878,0.172389,0.143571,0.303765,...,0.107300,0.107074,0.410298,0.412097,0.352856,0.234354,0.226414,0.086298,0.211389,0.134928
10 Things I Hate About You (1999),0.279940,1.000000,0.256224,0.203113,0.018919,0.321002,0.248508,0.110238,0.140868,0.099425,...,0.162018,0.226679,0.226041,0.134580,0.149741,0.268585,0.056099,0.156386,0.144728,0.191369
101 Dalmatians (1996),0.139942,0.256224,1.000000,0.299641,0.118454,0.215186,0.030332,0.219346,0.138031,0.114033,...,0.207358,0.095313,0.114033,0.078141,0.107302,0.191299,0.069882,0.043024,0.194494,0.121629
101 Dalmatians (One Hundred and One Dalmatians) (1961),0.149419,0.203113,0.299641,1.000000,0.106432,0.225400,0.160754,0.230029,0.134606,0.086377,...,0.200525,0.166103,0.121133,0.085446,0.138967,0.270037,0.104929,0.082011,0.269517,0.160754
12 Angry Men (1957),0.112902,0.018919,0.118454,0.106432,1.000000,-0.029728,0.127438,0.160916,0.232520,0.186524,...,0.134306,0.160916,0.093844,0.220209,0.120933,0.118062,0.050775,0.028525,0.050914,0.099555
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zoolander (2001),0.234354,0.268585,0.191299,0.270037,0.118062,0.226041,0.134230,0.284900,0.201127,0.226041,...,0.216681,0.110238,0.289348,0.253973,0.293167,1.000000,0.107870,0.094481,0.382210,0.105660
Zootopia (2016),0.226414,0.056099,0.069882,0.104929,0.050775,0.076560,0.059552,0.173892,0.120574,0.237878,...,0.032044,0.025537,0.278208,0.182737,0.292891,0.107870,1.000000,0.033359,0.179310,0.023153
eXistenZ (1999),0.086298,0.156386,0.043024,0.082011,0.028525,0.011700,0.263860,0.181757,0.299941,0.011700,...,0.152994,0.137409,0.059924,0.168413,0.065193,0.094481,0.033359,1.000000,0.141753,0.176811
xXx (2002),0.211389,0.144728,0.194494,0.269517,0.050914,0.146786,0.124271,0.298429,0.169739,0.239285,...,0.087462,0.128297,0.285535,0.226949,0.207008,0.382210,0.179310,0.141753,1.000000,0.082528


In [27]:
spearman["Star Wars: Episode IV - A New Hope (1977)"].sort_values(ascending = False)[1:11]

title
Star Wars: Episode V - The Empire Strikes Back (1980)                             0.722617
Star Wars: Episode VI - Return of the Jedi (1983)                                 0.673075
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    0.544330
Indiana Jones and the Last Crusade (1989)                                         0.502200
Star Wars: Episode I - The Phantom Menace (1999)                                  0.478435
Matrix, The (1999)                                                                0.458924
Indiana Jones and the Temple of Doom (1984)                                       0.432533
Terminator, The (1984)                                                            0.430735
Back to the Future (1985)                                                         0.427486
Aliens (1986)                                                                     0.420938
Name: Star Wars: Episode IV - A New Hope (1977), dtype: float64

In [28]:
spearman["Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)"].sort_values(ascending = False)[1:11]

title
Indiana Jones and the Last Crusade (1989)                0.640258
Star Wars: Episode V - The Empire Strikes Back (1980)    0.578669
Star Wars: Episode IV - A New Hope (1977)                0.544330
Indiana Jones and the Temple of Doom (1984)              0.499429
Star Wars: Episode VI - Return of the Jedi (1983)        0.469163
Terminator, The (1984)                                   0.468134
Back to the Future (1985)                                0.465982
Princess Bride, The (1987)                               0.458141
Monty Python and the Holy Grail (1975)                   0.448111
Aliens (1986)                                            0.445871
Name: Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981), dtype: float64

In [29]:
spearman["Star Trek II: The Wrath of Khan (1982)"].sort_values(ascending = False)[1:11]

title
Star Trek IV: The Voyage Home (1986)             0.712752
Star Trek III: The Search for Spock (1984)       0.701304
Star Trek VI: The Undiscovered Country (1991)    0.615599
Star Trek: The Motion Picture (1979)             0.589973
Star Trek: First Contact (1996)                  0.574865
Star Trek V: The Final Frontier (1989)           0.500530
Superman (1978)                                  0.484649
Star Trek: Insurrection (1998)                   0.471163
Escape from New York (1981)                      0.466507
Superman II (1980)                               0.459477
Name: Star Trek II: The Wrath of Khan (1982), dtype: float64

In [30]:
spearman["Dark Knight Rises, The (2012)"].sort_values(ascending = False)[1:11]

title
Dark Knight, The (2008)             0.594260
Django Unchained (2012)             0.590568
Avengers, The (2012)                0.586191
Inception (2010)                    0.576297
Interstellar (2014)                 0.564401
Grand Budapest Hotel, The (2014)    0.560322
Wolf of Wall Street, The (2013)     0.529007
Iron Man (2008)                     0.526420
Mad Max: Fury Road (2015)           0.523889
Sherlock Holmes (2009)              0.520774
Name: Dark Knight Rises, The (2012), dtype: float64

In [31]:
spearman["X-Men: Days of Future Past (2014)"].sort_values(ascending = False)[1:11]

title
X-Men: First Class (2011)                     0.648158
Captain America: The First Avenger (2011)     0.626546
Avengers: Age of Ultron (2015)                0.614512
Guardians of the Galaxy (2014)                0.592427
Thor: The Dark World (2013)                   0.580761
Star Trek Into Darkness (2013)                0.577653
Thor (2011)                                   0.572598
Captain America: The Winter Soldier (2014)    0.568684
Avengers, The (2012)                          0.565021
X-Men Origins: Wolverine (2009)               0.552470
Name: X-Men: Days of Future Past (2014), dtype: float64

In [32]:
spearman["Avengers, The (2012)"].sort_values(ascending = False)[1:11]

title
Iron Man 2 (2010)                             0.653766
Guardians of the Galaxy (2014)                0.636100
Iron Man (2008)                               0.607362
X-Men: First Class (2011)                     0.589131
Dark Knight Rises, The (2012)                 0.586191
Avengers: Age of Ultron (2015)                0.577425
Captain America: The Winter Soldier (2014)    0.577213
Thor (2011)                                   0.567498
Captain America: The First Avenger (2011)     0.565988
Iron Man 3 (2013)                             0.565988
Name: Avengers, The (2012), dtype: float64

In [33]:
spearman["Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)"].sort_values(ascending = False)[1:11]

title
Harry Potter and the Chamber of Secrets (2002)                            0.728967
Harry Potter and the Goblet of Fire (2005)                                0.692827
Harry Potter and the Prisoner of Azkaban (2004)                           0.691742
Shrek (2001)                                                              0.559344
Pirates of the Caribbean: Dead Man's Chest (2006)                         0.552697
Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005)    0.543842
Harry Potter and the Order of the Phoenix (2007)                          0.541162
Harry Potter and the Half-Blood Prince (2009)                             0.541162
Ice Age (2002)                                                            0.536343
Harry Potter and the Deathly Hallows: Part 1 (2010)                       0.529466
Name: Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001), dtype: float64

In [34]:
spearman["Bourne Ultimatum, The (2007)"].sort_values(ascending = False)[1:11]

title
Bourne Supremacy, The (2004)    0.589056
Bourne Identity, The (2002)     0.550616
Taken (2008)                    0.504129
Sherlock Holmes (2009)          0.498994
Dark Knight, The (2008)         0.485901
Iron Man (2008)                 0.475247
Batman Begins (2005)            0.475135
Casino Royale (2006)            0.473267
Blood Diamond (2006)            0.462435
V for Vendetta (2006)           0.453075
Name: Bourne Ultimatum, The (2007), dtype: float64

In [35]:
spearman["Inception (2010)"].sort_values(ascending = False)[1:11]

title
Dark Knight, The (2008)          0.658115
Dark Knight Rises, The (2012)    0.576297
Inglourious Basterds (2009)      0.565771
Shutter Island (2010)            0.548158
Up (2009)                        0.536997
Avengers, The (2012)             0.535424
Django Unchained (2012)          0.535217
Interstellar (2014)              0.535148
Social Network, The (2010)       0.487115
Iron Man (2008)                  0.481924
Name: Inception (2010), dtype: float64

In [36]:
spearman["Logan (2017)"].sort_values(ascending = False)[1:11]

title
Guardians of the Galaxy 2 (2017)                     0.638975
Rogue One: A Star Wars Story (2016)                  0.598771
Captain America: Civil War (2016)                    0.580897
Doctor Strange (2016)                                0.580897
Arrival (2016)                                       0.529463
X-Men Origins: Wolverine (2009)                      0.488529
X-Men: Days of Future Past (2014)                    0.488332
Avengers: Age of Ultron (2015)                       0.478160
Star Wars: Episode VII - The Force Awakens (2015)    0.472897
Captain America: The First Avenger (2011)            0.470602
Name: Logan (2017), dtype: float64

## Kendall's Coefficient

calculating correlation matrix among movies based on user ratings using Kendall's correlation

In [37]:
kendall = movieratingpivot.corr(method = "kendall",min_periods = 20)

In [38]:
kendall

title,(500) Days of Summer (2009),10 Things I Hate About You (1999),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),13 Going on 30 (2004),"13th Warrior, The (1999)",1408 (2007),2001: A Space Odyssey (1968),2012 (2009),...,Young Frankenstein (1974),Young Guns (1988),Zack and Miri Make a Porno (2008),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zootopia (2016),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
(500) Days of Summer (2009),1.000000,0.279940,0.139942,0.149419,0.112902,0.268254,0.102878,0.172389,0.143571,0.303765,...,0.107300,0.107074,0.410298,0.412097,0.352856,0.234354,0.226414,0.086298,0.211389,0.134928
10 Things I Hate About You (1999),0.279940,1.000000,0.256224,0.203113,0.018919,0.321002,0.248508,0.110238,0.140868,0.099425,...,0.162018,0.226679,0.226041,0.134580,0.149741,0.268585,0.056099,0.156386,0.144728,0.191369
101 Dalmatians (1996),0.139942,0.256224,1.000000,0.299641,0.118454,0.215186,0.030332,0.219346,0.138031,0.114033,...,0.207358,0.095313,0.114033,0.078141,0.107302,0.191299,0.069882,0.043024,0.194494,0.121629
101 Dalmatians (One Hundred and One Dalmatians) (1961),0.149419,0.203113,0.299641,1.000000,0.106432,0.225400,0.160754,0.230029,0.134606,0.086377,...,0.200525,0.166103,0.121133,0.085446,0.138967,0.270037,0.104929,0.082011,0.269517,0.160754
12 Angry Men (1957),0.112902,0.018919,0.118454,0.106432,1.000000,-0.029728,0.127438,0.160916,0.232520,0.186524,...,0.134306,0.160916,0.093844,0.220209,0.120933,0.118062,0.050775,0.028525,0.050914,0.099555
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zoolander (2001),0.234354,0.268585,0.191299,0.270037,0.118062,0.226041,0.134230,0.284900,0.201127,0.226041,...,0.216681,0.110238,0.289348,0.253973,0.293167,1.000000,0.107870,0.094481,0.382210,0.105660
Zootopia (2016),0.226414,0.056099,0.069882,0.104929,0.050775,0.076560,0.059552,0.173892,0.120574,0.237878,...,0.032044,0.025537,0.278208,0.182737,0.292891,0.107870,1.000000,0.033359,0.179310,0.023153
eXistenZ (1999),0.086298,0.156386,0.043024,0.082011,0.028525,0.011700,0.263860,0.181757,0.299941,0.011700,...,0.152994,0.137409,0.059924,0.168413,0.065193,0.094481,0.033359,1.000000,0.141753,0.176811
xXx (2002),0.211389,0.144728,0.194494,0.269517,0.050914,0.146786,0.124271,0.298429,0.169739,0.239285,...,0.087462,0.128297,0.285535,0.226949,0.207008,0.382210,0.179310,0.141753,1.000000,0.082528


In [39]:
kendall["Star Wars: Episode IV - A New Hope (1977)"].sort_values(ascending = False)[1:11]

title
Star Wars: Episode V - The Empire Strikes Back (1980)                             0.722617
Star Wars: Episode VI - Return of the Jedi (1983)                                 0.673075
Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)    0.544330
Indiana Jones and the Last Crusade (1989)                                         0.502200
Star Wars: Episode I - The Phantom Menace (1999)                                  0.478435
Matrix, The (1999)                                                                0.458924
Indiana Jones and the Temple of Doom (1984)                                       0.432533
Terminator, The (1984)                                                            0.430735
Back to the Future (1985)                                                         0.427486
Aliens (1986)                                                                     0.420938
Name: Star Wars: Episode IV - A New Hope (1977), dtype: float64

In [40]:
kendall["Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)"].sort_values(ascending = False)[1:11]

title
Indiana Jones and the Last Crusade (1989)                0.640258
Star Wars: Episode V - The Empire Strikes Back (1980)    0.578669
Star Wars: Episode IV - A New Hope (1977)                0.544330
Indiana Jones and the Temple of Doom (1984)              0.499429
Star Wars: Episode VI - Return of the Jedi (1983)        0.469163
Terminator, The (1984)                                   0.468134
Back to the Future (1985)                                0.465982
Princess Bride, The (1987)                               0.458141
Monty Python and the Holy Grail (1975)                   0.448111
Aliens (1986)                                            0.445871
Name: Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981), dtype: float64

In [41]:
kendall["Star Trek II: The Wrath of Khan (1982)"].sort_values(ascending = False)[1:11]

title
Star Trek IV: The Voyage Home (1986)             0.712752
Star Trek III: The Search for Spock (1984)       0.701304
Star Trek VI: The Undiscovered Country (1991)    0.615599
Star Trek: The Motion Picture (1979)             0.589973
Star Trek: First Contact (1996)                  0.574865
Star Trek V: The Final Frontier (1989)           0.500530
Superman (1978)                                  0.484649
Star Trek: Insurrection (1998)                   0.471163
Escape from New York (1981)                      0.466507
Superman II (1980)                               0.459477
Name: Star Trek II: The Wrath of Khan (1982), dtype: float64

In [42]:
kendall["Dark Knight Rises, The (2012)"].sort_values(ascending = False)[1:11]

title
Dark Knight, The (2008)             0.594260
Django Unchained (2012)             0.590568
Avengers, The (2012)                0.586191
Inception (2010)                    0.576297
Interstellar (2014)                 0.564401
Grand Budapest Hotel, The (2014)    0.560322
Wolf of Wall Street, The (2013)     0.529007
Iron Man (2008)                     0.526420
Mad Max: Fury Road (2015)           0.523889
Sherlock Holmes (2009)              0.520774
Name: Dark Knight Rises, The (2012), dtype: float64

In [43]:
kendall["X-Men: Days of Future Past (2014)"].sort_values(ascending = False)[1:11]

title
X-Men: First Class (2011)                     0.648158
Captain America: The First Avenger (2011)     0.626546
Avengers: Age of Ultron (2015)                0.614512
Guardians of the Galaxy (2014)                0.592427
Thor: The Dark World (2013)                   0.580761
Star Trek Into Darkness (2013)                0.577653
Thor (2011)                                   0.572598
Captain America: The Winter Soldier (2014)    0.568684
Avengers, The (2012)                          0.565021
X-Men Origins: Wolverine (2009)               0.552470
Name: X-Men: Days of Future Past (2014), dtype: float64

In [44]:
kendall["Avengers, The (2012)"].sort_values(ascending = False)[1:11]

title
Iron Man 2 (2010)                             0.653766
Guardians of the Galaxy (2014)                0.636100
Iron Man (2008)                               0.607362
X-Men: First Class (2011)                     0.589131
Dark Knight Rises, The (2012)                 0.586191
Avengers: Age of Ultron (2015)                0.577425
Captain America: The Winter Soldier (2014)    0.577213
Thor (2011)                                   0.567498
Captain America: The First Avenger (2011)     0.565988
Iron Man 3 (2013)                             0.565988
Name: Avengers, The (2012), dtype: float64

In [45]:
kendall["Logan (2017)"].sort_values(ascending = False)[1:11]

title
Guardians of the Galaxy 2 (2017)                     0.638975
Rogue One: A Star Wars Story (2016)                  0.598771
Doctor Strange (2016)                                0.580897
Captain America: Civil War (2016)                    0.580897
Arrival (2016)                                       0.529463
X-Men Origins: Wolverine (2009)                      0.488529
X-Men: Days of Future Past (2014)                    0.488332
Avengers: Age of Ultron (2015)                       0.478160
Star Wars: Episode VII - The Force Awakens (2015)    0.472897
Captain America: The First Avenger (2011)            0.470602
Name: Logan (2017), dtype: float64

In [46]:
kendall["Inception (2010)"].sort_values(ascending = False)[1:11]

title
Dark Knight, The (2008)          0.658115
Dark Knight Rises, The (2012)    0.576297
Inglourious Basterds (2009)      0.565771
Shutter Island (2010)            0.548158
Up (2009)                        0.536997
Avengers, The (2012)             0.535424
Django Unchained (2012)          0.535217
Interstellar (2014)              0.535148
Social Network, The (2010)       0.487115
Iron Man (2008)                  0.481924
Name: Inception (2010), dtype: float64

In [47]:
kendall["Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)"].sort_values(ascending = False)[1:11]

title
Harry Potter and the Chamber of Secrets (2002)                            0.728967
Harry Potter and the Goblet of Fire (2005)                                0.692827
Harry Potter and the Prisoner of Azkaban (2004)                           0.691742
Shrek (2001)                                                              0.559344
Pirates of the Caribbean: Dead Man's Chest (2006)                         0.552697
Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005)    0.543842
Harry Potter and the Order of the Phoenix (2007)                          0.541162
Harry Potter and the Half-Blood Prince (2009)                             0.541162
Ice Age (2002)                                                            0.536343
Harry Potter and the Deathly Hallows: Part 1 (2010)                       0.529466
Name: Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001), dtype: float64

In [48]:
kendall["Star Trek: The Motion Picture (1979)"].sort_values(ascending = False)[1:11]

title
Star Trek VI: The Undiscovered Country (1991)                         0.691083
Star Trek III: The Search for Spock (1984)                            0.666315
Star Trek II: The Wrath of Khan (1982)                                0.589973
Star Trek V: The Final Frontier (1989)                                0.573855
Star Trek IV: The Voyage Home (1986)                                  0.559292
Star Trek: First Contact (1996)                                       0.462737
Star Trek: Insurrection (1998)                                        0.459262
Superman III (1983)                                                   0.449945
Adventures of Buckaroo Banzai Across the 8th Dimension, The (1984)    0.428282
Superman II (1980)                                                    0.424218
Name: Star Trek: The Motion Picture (1979), dtype: float64

# Nearest Neighbor Approach

In [49]:
import numpy as np
import operator

Implementing cosine similarity from scratch: 

In [50]:
def cosine(l1,l2):
    cosine_dist = np.dot(l1,l2) / (np.linalg.norm(l1) * np.linalg.norm(l2) )
    return (1-cosine_dist)

Implementing euclidean distance from scratch:

In [51]:
def euclidean(l1,l2):
    euclidean_dist = 0
    for i in range(len(l1)):
        euclidean_dist = euclidean_dist + np.square(l1[i] - l2[i])
    euclidean_dist = np.sqrt(euclidean_dist)
    return euclidean_dist

Implementing manhattan distance from scratch:

In [52]:
def manhattan(l1,l2):
    manhattan_dist = 0
    for i in range(len(l1)):
        manhattan_dist = manhattan_dist + abs(l1[i] - l2[i])
    return manhattan_dist

Implementing normalized euclidean distance from scratch:

In [53]:
def normeuclid(l1,l2):
    l1 = np.asarray(l1)
    l2 = np.asarray(l2)
    v = np.linalg.norm(l1) + np.linalg.norm(l2);
    if(v != 0):
        distance = np.linalg.norm(l1-l2)/v;
    else:
        distance = 0
    return distance

Generating pivot table exp_table containing all the ratings:

In [54]:
exp_table = newratings.pivot_table(index = ["userId"],columns = ["title"],values = "rating")

Substituting all the NaN values with zeroes:

In [55]:
exp_table = exp_table.fillna(0)
exp_table

title,(500) Days of Summer (2009),10 Things I Hate About You (1999),101 Dalmatians (1996),101 Dalmatians (One Hundred and One Dalmatians) (1961),12 Angry Men (1957),13 Going on 30 (2004),"13th Warrior, The (1999)",1408 (2007),2001: A Space Odyssey (1968),2012 (2009),...,Young Frankenstein (1974),Young Guns (1988),Zack and Miri Make a Porno (2008),Zodiac (2007),Zombieland (2009),Zoolander (2001),Zootopia (2016),eXistenZ (1999),xXx (2002),¡Three Amigos! (1986)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,...,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,...,3.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
607,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,...,0.0,0.0,0.0,0.0,0.0,3.0,0.0,4.5,3.5,0.0
609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Implementing nearest neighbor algorithm from scratch:

nearestNeighbor function takes movie name and distance metric as input. It generates a list l and appends the ratings given by different users to that movie. After we get this list we generate a dictionary d. Then we iterate through each column(movie) in the pivot table and calculate their distance with our target movie and append it in our dictionary d. Finally we sort our dictionary and return it. The dictionary contains movie name as key and the distance from our target movie as corresponding value.

Lesser the distance between the movies, more similarity between those movies.

In [56]:
def nearestNeighbor(movie,metric):
    l = []
    for i in exp_table:
        if(i==movie):
            for j in exp_table[i]:
                l.append(j)
    d = dict()
    for i in exp_table:
        temp = []
        for j in exp_table[i]:
            temp.append(j)
        if(sum(temp)>=20):
            if(metric == "cosine"):
                d[i] = cosine(l,temp)
            elif(metric== "euclidean"):
                d[i] = euclidean(l,temp)
            elif(metric=="normeuclid"):
                d[i] = normeuclid(l,temp)
            elif(metric== "manhattan"):
                d[i] = manhattan(l,temp)
    sorted_d = sorted(d.items(), key=operator.itemgetter(1))
    return sorted_d

Applying our algorithm on Star Wars Episode 4 on all of our distance metrics:

In [57]:
nearestNeighbor("Star Wars: Episode IV - A New Hope (1977)","manhattan")[1:11]

[('Star Wars: Episode V - The Empire Strikes Back (1980)', 383.5),
 ('Star Wars: Episode VI - Return of the Jedi (1983)', 460.0),
 ('Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)',
  631.5),
 ('Indiana Jones and the Last Crusade (1989)', 693.5),
 ('Terminator, The (1984)', 755.5),
 ('Back to the Future (1985)', 758.5),
 ('Star Wars: Episode I - The Phantom Menace (1999)', 767.0),
 ('Indiana Jones and the Temple of Doom (1984)', 786.0),
 ('Aliens (1986)', 797.5),
 ('Groundhog Day (1993)', 802.0)]

In [58]:
nearestNeighbor("Star Wars: Episode IV - A New Hope (1977)","euclidean")[1:11]

[('Star Wars: Episode V - The Empire Strikes Back (1980)', 38.31775045589185),
 ('Star Wars: Episode VI - Return of the Jedi (1983)', 42.20781918081056),
 ('Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)',
  49.751884386422994),
 ('Indiana Jones and the Last Crusade (1989)', 52.70910737244561),
 ('Star Wars: Episode I - The Phantom Menace (1999)', 54.57105459856901),
 ('Back to the Future (1985)', 54.742579405797095),
 ('Terminator, The (1984)', 55.111251119893836),
 ('Indiana Jones and the Temple of Doom (1984)', 56.42694391866354),
 ('Independence Day (a.k.a. ID4) (1996)', 56.60388679233962),
 ('Men in Black (a.k.a. MIB) (1997)', 56.817690906970164)]

In [59]:
nearestNeighbor("Star Wars: Episode IV - A New Hope (1977)","normeuclid")[1:11]

[('Star Wars: Episode V - The Empire Strikes Back (1980)', 0.2927788834806333),
 ('Star Wars: Episode VI - Return of the Jedi (1983)', 0.3307465113677841),
 ('Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)',
  0.3857163755952979),
 ('Matrix, The (1999)', 0.4107853849442627),
 ('Back to the Future (1985)', 0.4470353699334789),
 ('Indiana Jones and the Last Crusade (1989)', 0.4497626893105888),
 ('Godfather, The (1972)', 0.45296799208382943),
 ('Terminator 2: Judgment Day (1991)', 0.4555967052348985),
 ('Saving Private Ryan (1998)', 0.457312122887015),
 ('Silence of the Lambs, The (1991)', 0.4683382554202458)]

In [60]:
nearestNeighbor("Star Wars: Episode IV - A New Hope (1977)","cosine")[1:11]

[('Star Wars: Episode V - The Empire Strikes Back (1980)', 0.1675926447766266),
 ('Star Wars: Episode VI - Return of the Jedi (1983)', 0.20936143443836774),
 ('Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)',
  0.291165883452313),
 ('Matrix, The (1999)', 0.33655325375593426),
 ('Indiana Jones and the Last Crusade (1989)', 0.3582813138541153),
 ('Back to the Future (1985)', 0.3771935100409338),
 ('Star Wars: Episode I - The Phantom Menace (1999)', 0.3956233322589974),
 ('Terminator, The (1984)', 0.4030132965528985),
 ('Godfather, The (1972)', 0.4046831966011222),
 ('Saving Private Ryan (1998)', 0.40724673685871127)]

Applying our algorithm on Raiders of the lost ark on all of our distance metrics:

In [61]:
nearestNeighbor("Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)","manhattan")[1:11]

[('Indiana Jones and the Last Crusade (1989)', 422.0),
 ('Star Wars: Episode V - The Empire Strikes Back (1980)', 553.0),
 ('Indiana Jones and the Temple of Doom (1984)', 564.5),
 ('Terminator, The (1984)', 603.0),
 ('Back to the Future (1985)', 615.0),
 ('Star Wars: Episode IV - A New Hope (1977)', 631.5),
 ('Princess Bride, The (1987)', 640.5),
 ('Aliens (1986)', 644.0),
 ('Ghostbusters (a.k.a. Ghost Busters) (1984)', 645.5),
 ('Monty Python and the Holy Grail (1975)', 648.5)]

In [62]:
nearestNeighbor("Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)","euclidean")[1:11]

[('Indiana Jones and the Last Crusade (1989)', 39.585350825778974),
 ('Star Wars: Episode V - The Empire Strikes Back (1980)', 45.77116996538323),
 ('Indiana Jones and the Temple of Doom (1984)', 46.89083066016212),
 ('Back to the Future (1985)', 48.68778080791935),
 ('Terminator, The (1984)', 48.75961443653959),
 ('Star Wars: Episode IV - A New Hope (1977)', 49.751884386422994),
 ('Aliens (1986)', 50.48762224545735),
 ('Ghostbusters (a.k.a. Ghost Busters) (1984)', 50.54948070949889),
 ('Die Hard (1988)', 50.584088407324295),
 ('Star Wars: Episode VI - Return of the Jedi (1983)', 50.62854925829892)]

In [63]:
nearestNeighbor("Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)","normeuclid")[1:11]

[('Indiana Jones and the Last Crusade (1989)', 0.36214962848218696),
 ('Star Wars: Episode V - The Empire Strikes Back (1980)', 0.3721549884803157),
 ('Star Wars: Episode IV - A New Hope (1977)', 0.3857163755952979),
 ('Star Wars: Episode VI - Return of the Jedi (1983)', 0.422865662796908),
 ('Back to the Future (1985)', 0.4249595235559777),
 ('Matrix, The (1999)', 0.4362373936111975),
 ('Princess Bride, The (1987)', 0.45586830367779396),
 ('Terminator, The (1984)', 0.45916380311413585),
 ('Lord of the Rings: The Fellowship of the Ring, The (2001)',
  0.4658256537573094),
 ('Monty Python and the Holy Grail (1975)', 0.4661285677583408)]

In [64]:
nearestNeighbor("Raiders of the Lost Ark (Indiana Jones and the Raiders of the Lost Ark) (1981)","cosine")[1:11]

[('Indiana Jones and the Last Crusade (1989)', 0.2418412926777287),
 ('Star Wars: Episode V - The Empire Strikes Back (1980)', 0.2765915143138685),
 ('Star Wars: Episode IV - A New Hope (1977)', 0.291165883452313),
 ('Back to the Future (1985)', 0.35584222883954875),
 ('Star Wars: Episode VI - Return of the Jedi (1983)', 0.3574150791639239),
 ('Indiana Jones and the Temple of Doom (1984)', 0.36727167604273825),
 ('Matrix, The (1999)', 0.3688930918516803),
 ('Terminator, The (1984)', 0.3899353831820861),
 ('Princess Bride, The (1987)', 0.40485472971444414),
 ('Die Hard (1988)', 0.4140598317415035)]

Applying our algorithm on Star Trek 2 on all of our distance metrics:

In [65]:
nearestNeighbor("Star Trek II: The Wrath of Khan (1982)","manhattan")[1:11]

[('Star Trek IV: The Voyage Home (1986)', 126.0),
 ('Star Trek III: The Search for Spock (1984)', 126.5),
 ('Star Trek VI: The Undiscovered Country (1991)', 159.0),
 ('Star Trek: The Motion Picture (1979)', 175.5),
 ('Star Trek: Insurrection (1998)', 182.5),
 ('Star Trek V: The Final Frontier (1989)', 194.5),
 ('Midnight Run (1988)', 204.5),
 ('Adventures of Buckaroo Banzai Across the 8th Dimension, The (1984)', 206.0),
 ('Escape from New York (1981)', 211.0),
 ('Superman III (1983)', 216.0)]

In [66]:
nearestNeighbor("Star Trek II: The Wrath of Khan (1982)","euclidean")[1:11]

[('Star Trek IV: The Voyage Home (1986)', 20.161845153655953),
 ('Star Trek III: The Search for Spock (1984)', 20.63371028196335),
 ('Star Trek VI: The Undiscovered Country (1991)', 23.695991222145572),
 ('Star Trek: The Motion Picture (1979)', 24.814310387355118),
 ('Star Trek: Insurrection (1998)', 25.46075411294803),
 ('Star Trek V: The Final Frontier (1989)', 26.622359023948274),
 ('Superman II (1980)', 27.83433131943356),
 ('Escape from New York (1981)', 28.0178514522438),
 ('Midnight Run (1988)', 28.0579756931964),
 ('Adventures of Buckaroo Banzai Across the 8th Dimension, The (1984)',
  28.36370920736567)]

In [67]:
nearestNeighbor("Star Trek II: The Wrath of Khan (1982)","normeuclid")[1:11]

[('Star Trek IV: The Voyage Home (1986)', 0.3690260284105403),
 ('Star Trek III: The Search for Spock (1984)', 0.39333045737526373),
 ('Star Trek VI: The Undiscovered Country (1991)', 0.44165810721189747),
 ('Star Trek: First Contact (1996)', 0.4593347477387632),
 ('Star Trek: The Motion Picture (1979)', 0.4759702011083696),
 ('Superman (1978)', 0.4799043868957559),
 ('Star Trek: Insurrection (1998)', 0.5103469501556018),
 ('Abyss, The (1989)', 0.5132279258574435),
 ('Lethal Weapon (1987)', 0.5181805058492247),
 ('RoboCop (1987)', 0.5215113068494373)]

In [68]:
nearestNeighbor("Star Trek II: The Wrath of Khan (1982)","cosine")[1:11]

[('Star Trek IV: The Voyage Home (1986)', 0.24473213326418264),
 ('Star Trek III: The Search for Spock (1984)', 0.25778544405099957),
 ('Star Trek VI: The Undiscovered Country (1991)', 0.3550213048193799),
 ('Star Trek: The Motion Picture (1979)', 0.4016216877556634),
 ('Star Trek: First Contact (1996)', 0.40341065485638683),
 ('Star Trek: Insurrection (1998)', 0.4363615667374041),
 ('Superman (1978)', 0.45910744317171337),
 ('Star Trek V: The Final Frontier (1989)', 0.49966071650735555),
 ('Aliens (1986)', 0.5030132808349197),
 ('Superman II (1980)', 0.5159407895998822)]

Applying our algorithm on The dark knight rises on all of our distance metrics:

In [69]:
nearestNeighbor("Dark Knight Rises, The (2012)","manhattan")[1:11]

[('Avengers, The (2012)', 236.5),
 ('Star Trek Into Darkness (2013)', 241.0),
 ('Grand Budapest Hotel, The (2014)', 242.0),
 ('Amazing Spider-Man, The (2012)', 247.0),
 ('Skyfall (2012)', 248.0),
 ('Star Wars: Episode VII - The Force Awakens (2015)', 248.5),
 ('Mad Max: Fury Road (2015)', 249.0),
 ('Prometheus (2012)', 250.5),
 ('John Wick (2014)', 252.0),
 ('X-Men: First Class (2011)', 252.5)]

In [70]:
nearestNeighbor("Dark Knight Rises, The (2012)","euclidean")[1:11]

[('Avengers, The (2012)', 30.426140077242792),
 ('Skyfall (2012)', 30.83828789021855),
 ('Grand Budapest Hotel, The (2014)', 30.894983411550815),
 ('Star Trek Into Darkness (2013)', 31.080540535840107),
 ('Amazing Spider-Man, The (2012)', 31.464265445104548),
 ('Mad Max: Fury Road (2015)', 31.488092987667578),
 ('Prometheus (2012)', 31.5),
 ('X-Men: First Class (2011)', 31.539657575820318),
 ('The Hunger Games (2012)', 31.583223394707513),
 ('Star Wars: Episode VII - The Force Awakens (2015)', 31.65043443619692)]

In [71]:
nearestNeighbor("Dark Knight Rises, The (2012)","normeuclid")[1:11]

[('Avengers, The (2012)', 0.4415189649055235),
 ('Dark Knight, The (2008)', 0.4438191879377271),
 ('Interstellar (2014)', 0.45675383357591387),
 ('Django Unchained (2012)', 0.4584448511845458),
 ('Inception (2010)', 0.4620743783189068),
 ('Iron Man (2008)', 0.47641921203007914),
 ('Shutter Island (2010)', 0.47739975650541294),
 ('Batman Begins (2005)', 0.47774007036751404),
 ('Guardians of the Galaxy (2014)', 0.48384439084578423),
 ('Wolf of Wall Street, The (2013)', 0.4849402118984488)]

In [72]:
nearestNeighbor("Dark Knight Rises, The (2012)","cosine")[1:11]

[('Dark Knight, The (2008)', 0.33394556662760544),
 ('Inception (2010)', 0.3824955344787052),
 ('Avengers, The (2012)', 0.38764447366883426),
 ('Interstellar (2014)', 0.41712086419487815),
 ('Django Unchained (2012)', 0.4194602836094039),
 ('Batman Begins (2005)', 0.4447542953971322),
 ('Grand Budapest Hotel, The (2014)', 0.4474060740689192),
 ('Skyfall (2012)', 0.4492845508571931),
 ('Iron Man (2008)', 0.45255617633555323),
 ('Shutter Island (2010)', 0.45461638133434823)]

Applying our algorithm on X-men days of future past on all of our distance metrics:

In [73]:
nearestNeighbor("X-Men: Days of Future Past (2014)","manhattan")[1:11]

[('Avengers: Age of Ultron (2015)', 81.0),
 ('Thor: The Dark World (2013)', 85.5),
 ('Captain America: The First Avenger (2011)', 87.5),
 ('Star Trek Into Darkness (2013)', 92.5),
 ('Captain America: The Winter Soldier (2014)', 95.0),
 ('Captain America: Civil War (2016)', 96.5),
 ('X-Men Origins: Wolverine (2009)', 99.5),
 ('Mission: Impossible - Ghost Protocol (2011)', 100.5),
 ('Thor (2011)', 100.5),
 ('X-Men: First Class (2011)', 101.0)]

In [74]:
nearestNeighbor("X-Men: Days of Future Past (2014)","euclidean")[1:11]

[('Avengers: Age of Ultron (2015)', 17.277152543170995),
 ('Captain America: The First Avenger (2011)', 17.514279888136993),
 ('Thor: The Dark World (2013)', 17.741194999210173),
 ('Star Trek Into Darkness (2013)', 18.214005600086985),
 ('Thor (2011)', 18.580904176062045),
 ('X-Men Origins: Wolverine (2009)', 18.621224449536072),
 ('Captain America: The Winter Soldier (2014)', 18.66815470259447),
 ('Iron Man 3 (2013)', 19.0),
 ('X-Men: First Class (2011)', 19.091883092036785),
 ('Captain America: Civil War (2016)', 19.12459149890528)]

In [75]:
nearestNeighbor("X-Men: Days of Future Past (2014)","normeuclid")[1:11]

[('X-Men: First Class (2011)', 0.40760674467680336),
 ('Captain America: The First Avenger (2011)', 0.4188534199617134),
 ('Avengers: Age of Ultron (2015)', 0.42668699442802044),
 ('Captain America: The Winter Soldier (2014)', 0.4342636951702117),
 ('Thor (2011)', 0.4375465844627364),
 ('Star Trek Into Darkness (2013)', 0.44312022374690757),
 ('Iron Man 3 (2013)', 0.44924451836943896),
 ('Guardians of the Galaxy (2014)', 0.45483203990627574),
 ('Avengers, The (2012)', 0.4754275703458758),
 ('Iron Man 2 (2010)', 0.47619173143198984)]

In [76]:
nearestNeighbor("X-Men: Days of Future Past (2014)","cosine")[1:11]

[('X-Men: First Class (2011)', 0.31994954960473043),
 ('Captain America: The First Avenger (2011)', 0.3499129021223577),
 ('Guardians of the Galaxy (2014)', 0.35126180926750894),
 ('Avengers: Age of Ultron (2015)', 0.3586709247729726),
 ('Avengers, The (2012)', 0.3767046650721667),
 ('Captain America: The Winter Soldier (2014)', 0.3771466284756678),
 ('Thor (2011)', 0.3827783789057231),
 ('Star Trek Into Darkness (2013)', 0.38988765836854467),
 ('Iron Man 3 (2013)', 0.40338864685613873),
 ('Thor: The Dark World (2013)', 0.42089709150835863)]

Applying our algorithm on The avengers on all of our distance metrics:

In [77]:
nearestNeighbor("Avengers, The (2012)","manhattan")[1:11]

[('Guardians of the Galaxy (2014)', 164.0),
 ('Iron Man 2 (2010)', 168.0),
 ('Captain America: The Winter Soldier (2014)', 172.0),
 ('X-Men: Days of Future Past (2014)', 177.0),
 ('X-Men: First Class (2011)', 178.0),
 ('Avengers: Age of Ultron (2015)', 181.0),
 ('Captain America: The First Avenger (2011)', 182.5),
 ('Iron Man 3 (2013)', 183.0),
 ('Thor (2011)', 187.5),
 ('Ant-Man (2015)', 195.5)]

In [78]:
nearestNeighbor("Avengers, The (2012)","euclidean")[1:11]

[('Iron Man 2 (2010)', 23.895606290697042),
 ('Guardians of the Galaxy (2014)', 24.176434807473164),
 ('Captain America: The Winter Soldier (2014)', 25.563646062328434),
 ('X-Men: First Class (2011)', 25.777897509300484),
 ('Iron Man 3 (2013)', 25.80697580112788),
 ('X-Men: Days of Future Past (2014)', 25.95187854472196),
 ('Avengers: Age of Ultron (2015)', 26.153393661244042),
 ('Captain America: The First Avenger (2011)', 26.158172719056658),
 ('Thor (2011)', 26.186828750346997),
 ('Ant-Man (2015)', 27.225906780123964)]

In [79]:
nearestNeighbor("Avengers, The (2012)","normeuclid")[1:11]

[('Guardians of the Galaxy (2014)', 0.3722971424861407),
 ('Iron Man 2 (2010)', 0.4131216208662139),
 ('Iron Man (2008)', 0.4176160225369672),
 ('X-Men: First Class (2011)', 0.43989474267173556),
 ('Dark Knight Rises, The (2012)', 0.4415189649055235),
 ('Sherlock Holmes (2009)', 0.46519018823733443),
 ('Captain America: The Winter Soldier (2014)', 0.4669224438186888),
 ('Avatar (2009)', 0.4717389635085356),
 ('Interstellar (2014)', 0.4720410781645153),
 ('Edge of Tomorrow (2014)', 0.47208775327482094)]

In [80]:
nearestNeighbor("Avengers, The (2012)","cosine")[1:11]

[('Guardians of the Galaxy (2014)', 0.27639886157188664),
 ('Iron Man 2 (2010)', 0.30467642941271655),
 ('Iron Man (2008)', 0.3413244923996469),
 ('X-Men: First Class (2011)', 0.3583198256869926),
 ('Captain America: The Winter Soldier (2014)', 0.36254723189739635),
 ('Iron Man 3 (2013)', 0.371651602660236),
 ('X-Men: Days of Future Past (2014)', 0.3767046650721667),
 ('Avengers: Age of Ultron (2015)', 0.38340897847907196),
 ('Captain America: The First Avenger (2011)', 0.3849901098965691),
 ('Thor (2011)', 0.3857602802793374)]

Applying our algorithm on Harry Potter and the Sorcerer's Stone on all of our distance metrics:

In [81]:
nearestNeighbor("Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)","manhattan")[1:11]

[('Harry Potter and the Chamber of Secrets (2002)', 184.5),
 ('Harry Potter and the Goblet of Fire (2005)', 213.5),
 ('Harry Potter and the Prisoner of Azkaban (2004)', 225.5),
 ('Harry Potter and the Half-Blood Prince (2009)', 280.0),
 ('Harry Potter and the Order of the Phoenix (2007)', 283.5),
 ('Harry Potter and the Deathly Hallows: Part 1 (2010)', 285.0),
 ('Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005)',
  299.0),
 ('Harry Potter and the Deathly Hallows: Part 2 (2011)', 311.0),
 ("Pirates of the Caribbean: Dead Man's Chest (2006)", 311.0),
 ('Charlie and the Chocolate Factory (2005)', 317.5)]

In [82]:
nearestNeighbor("Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)","euclidean")[1:11]

[('Harry Potter and the Chamber of Secrets (2002)', 25.763346055976502),
 ('Harry Potter and the Goblet of Fire (2005)', 27.608875384557045),
 ('Harry Potter and the Prisoner of Azkaban (2004)', 28.99568933479596),
 ('Harry Potter and the Half-Blood Prince (2009)', 31.85906464414798),
 ('Harry Potter and the Deathly Hallows: Part 1 (2010)', 32.54228019054596),
 ('Harry Potter and the Order of the Phoenix (2007)', 32.737593069741706),
 ('Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005)',
  33.279122584587476),
 ('Charlie and the Chocolate Factory (2005)', 33.737960815674676),
 ('Harry Potter and the Deathly Hallows: Part 2 (2011)', 34.08812109811862),
 ("Pirates of the Caribbean: Dead Man's Chest (2006)", 34.132096331752024)]

In [83]:
nearestNeighbor("Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)","normeuclid")[1:11]

[('Harry Potter and the Chamber of Secrets (2002)', 0.3331817716872032),
 ('Harry Potter and the Prisoner of Azkaban (2004)', 0.37075138451674855),
 ('Harry Potter and the Goblet of Fire (2005)', 0.38075554624164615),
 ('Harry Potter and the Half-Blood Prince (2009)', 0.45479317296940724),
 ('Shrek (2001)', 0.45592569618602313),
 ('Spider-Man (2002)', 0.4586580429333544),
 ('Monsters, Inc. (2001)', 0.4603652918276173),
 ('Pirates of the Caribbean: The Curse of the Black Pearl (2003)',
  0.464674876965993),
 ('Harry Potter and the Order of the Phoenix (2007)', 0.4678041741025246),
 ('Ice Age (2002)', 0.46904643164204546)]

In [84]:
nearestNeighbor("Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)","cosine")[1:11]

[('Harry Potter and the Chamber of Secrets (2002)', 0.22043861780495633),
 ('Harry Potter and the Goblet of Fire (2005)', 0.2733029405065126),
 ('Harry Potter and the Prisoner of Azkaban (2004)', 0.2743422887231457),
 ('Harry Potter and the Half-Blood Prince (2009)', 0.38344923516638296),
 ('Shrek (2001)', 0.3880149107885582),
 ('Harry Potter and the Order of the Phoenix (2007)', 0.40740511473501595),
 ('Harry Potter and the Deathly Hallows: Part 1 (2010)', 0.41244404564481796),
 ('Monsters, Inc. (2001)', 0.4172153312267154),
 ('Pirates of the Caribbean: The Curse of the Black Pearl (2003)',
  0.42040375223887283),
 ('Spider-Man (2002)', 0.4206266548377473)]