# Introduction
Recommender Systems:
1. Rating-user Based Recommender Systems
2. Tag-user Based Recommender Systems

**The purpose of recommender systems is recommending new things that are not seen before from people.**

We will use Collaborative Filtering while recommending

**Collaborative filtering means to recommend according to the combination of your experience and experiences of other people.**

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.

['tag.csv', 'rating.csv', 'movie.csv', 'genome_scores.csv', 'link.csv', 'genome_tags.csv']


Importing the necessary data-set

In [2]:
#importing the genome_scores dataset to get the tag scores
scores=pd.read_csv('../input/genome_scores.csv')
scores.columns

Index(['movieId', 'tagId', 'relevance'], dtype='object')

In [3]:
#importing the link dataset to get the tag scores
link=pd.read_csv('../input/link.csv')
link.columns

Index(['movieId', 'imdbId', 'tmdbId'], dtype='object')

In [4]:
# we need all the columns
scores.head()

Unnamed: 0,movieId,tagId,relevance
0,1,1,0.025
1,1,2,0.025
2,1,3,0.05775
3,1,4,0.09675
4,1,5,0.14675


In [5]:
# import movie data set and look at columns
movie = pd.read_csv("../input/movie.csv")
movie.columns

Index(['movieId', 'title', 'genres'], dtype='object')

In [6]:
# what we need is that movie id and title
movie = movie.loc[:,["movieId","title"]]
movie.head(10)

Unnamed: 0,movieId,title
0,1,Toy Story (1995)
1,2,Jumanji (1995)
2,3,Grumpier Old Men (1995)
3,4,Waiting to Exhale (1995)
4,5,Father of the Bride Part II (1995)
5,6,Heat (1995)
6,7,Sabrina (1995)
7,8,Tom and Huck (1995)
8,9,Sudden Death (1995)
9,10,GoldenEye (1995)


In [7]:
#importing rating data set
rating = pd.read_csv("../input/rating.csv")
rating.columns

Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')

In [8]:
# what we need is that user id, movie id and rating
rating = rating.loc[:,["userId","movieId","rating"]]
rating.head(10)

Unnamed: 0,userId,movieId,rating
0,1,2,3.5
1,1,29,3.5
2,1,32,3.5
3,1,47,3.5
4,1,50,3.5
5,1,112,3.5
6,1,151,4.0
7,1,223,4.0
8,1,253,4.0
9,1,260,4.0


*1.Starting with* **rating-user recommendation system.**

In [9]:
# then merge movie and rating data
data = pd.merge(movie,rating,on='movieId')

In [10]:
# now lets look at our data 
data.head()

Unnamed: 0,movieId,title,userId,rating
0,1,Toy Story (1995),3,4.0
1,1,Toy Story (1995),6,5.0
2,1,Toy Story (1995),8,4.0
3,1,Toy Story (1995),10,4.0
4,1,Toy Story (1995),11,4.5


In [11]:
data.shape

(20000263, 4)

We see a huge amount data. All of these data will take a lot of time to work with. So we will consider only those data which will help us to recommend and will not take un-neccessary space.

Here we have a lot of user-id. We will consider only those users that has rated for more than 300 movies.

**Reason -** A person with good movie knowledge has definitely seen and rated more movies, and we will consider only those users.


*This process will automatically reduce my data without making problem to my analysis and prediction.*

In [12]:
#getting the no. of times each user rated

a=data['userId'].value_counts().reset_index()
a.rename(columns={'userId':'count','index':'userId'},inplace = True)
a

Unnamed: 0,userId,count
0,118205,9254
1,8405,7515
2,82418,5646
3,121535,5520
4,125794,5491
5,74142,5447
6,34576,5356
7,131904,5330
8,83090,5169
9,59477,4988


In [13]:
a.shape

(138493, 2)

In [14]:
#we will consider only those users who have rated for more than 300 times

a = a[a['count']>300]
a.shape

(16184, 2)

In [15]:
#we will consider only those selected users only in our analysis

data = data[data['userId'].isin(a['userId'])]
data.shape

(9891292, 4)

Our data is finally reduced.

Now we will go on with our predictions.

In [16]:
# lets make a pivot table in order to recommend easily
ptable = data.pivot_table(index = ["movieId"],columns = ["userId"],values = "rating")
ptable.head()

userId,11,24,54,58,91,96,104,116,131,132,133,134,137,147,156,208,215,220,245,247,251,258,271,278,283,294,295,298,309,311,318,337,347,348,359,367,370,375,387,388,...,138166,138170,138177,138186,138191,138196,138200,138201,138202,138205,138208,138211,138254,138259,138267,138270,138285,138289,138295,138301,138307,138317,138325,138335,138343,138382,138386,138387,138397,138404,138406,138411,138414,138436,138437,138454,138456,138472,138474,138493
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.5,4.0,4.0,5.0,4.0,3.5,,3.0,2.0,,4.0,4.0,4.0,4.5,5.0,4.0,2.0,4.0,,,4.0,4.0,1.5,,,4.5,4.0,4.0,4.0,,5.0,4.5,4.0,,5.0,3.0,4.5,4.0,,2.5,...,4.0,5.0,4.0,,2.5,4.0,3.0,5.0,,4.5,3.0,4.0,4.0,4.0,4.0,,,1.5,3.5,2.5,3.5,3.0,5.0,,,3.0,3.5,2.5,,5.0,4.0,5.0,,3.5,4.0,5.0,1.0,3.0,5.0,3.5
2,,,3.0,,3.5,,,2.0,1.0,3.0,,,3.0,,5.0,,,3.0,,,4.0,,2.5,3.5,,4.5,3.0,3.0,4.0,,,,2.0,,,2.0,4.0,,,1.5,...,,4.5,4.0,,3.5,,,,3.5,,2.0,,3.5,,,,,,2.0,2.5,2.5,3.0,3.0,0.5,2.5,4.0,,3.5,,2.5,3.0,,,,,,3.0,,4.0,4.0
3,,,,,3.0,4.0,,2.0,,,,,,,2.0,,,,,,5.0,,3.5,,,,,3.0,3.0,,,,,,,3.0,,,,,...,,,,,,,,4.0,,,2.0,,2.0,,,,,,,,,,,,,3.0,,3.5,5.0,,,,,,,,,,,
4,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,3.0,,3.0,,,,,,,,,,,,...,,,,,,3.0,,,,,2.0,4.5,,,,1.0,,,,,,,,,,4.0,,,,,,,,,,,,,,
5,,2.0,3.0,,,,,,,,,,,,3.0,,,,,,4.0,,,,,2.5,3.0,3.0,,,,,,,,3.0,,,,,...,,4.5,,,,3.0,,,2.5,,2.0,3.5,,,,,,,,,,,,,,,,2.5,,,,,,,,,,,4.0,


**Will feel all the NaN values with 0**

In [17]:
ptable=ptable.fillna(0)

In [18]:
ptable.head()

userId,11,24,54,58,91,96,104,116,131,132,133,134,137,147,156,208,215,220,245,247,251,258,271,278,283,294,295,298,309,311,318,337,347,348,359,367,370,375,387,388,...,138166,138170,138177,138186,138191,138196,138200,138201,138202,138205,138208,138211,138254,138259,138267,138270,138285,138289,138295,138301,138307,138317,138325,138335,138343,138382,138386,138387,138397,138404,138406,138411,138414,138436,138437,138454,138456,138472,138474,138493
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,4.5,4.0,4.0,5.0,4.0,3.5,0.0,3.0,2.0,0.0,4.0,4.0,4.0,4.5,5.0,4.0,2.0,4.0,0.0,0.0,4.0,4.0,1.5,0.0,0.0,4.5,4.0,4.0,4.0,0.0,5.0,4.5,4.0,0.0,5.0,3.0,4.5,4.0,0.0,2.5,...,4.0,5.0,4.0,0.0,2.5,4.0,3.0,5.0,0.0,4.5,3.0,4.0,4.0,4.0,4.0,0.0,0.0,1.5,3.5,2.5,3.5,3.0,5.0,0.0,0.0,3.0,3.5,2.5,0.0,5.0,4.0,5.0,0.0,3.5,4.0,5.0,1.0,3.0,5.0,3.5
2,0.0,0.0,3.0,0.0,3.5,0.0,0.0,2.0,1.0,3.0,0.0,0.0,3.0,0.0,5.0,0.0,0.0,3.0,0.0,0.0,4.0,0.0,2.5,3.5,0.0,4.5,3.0,3.0,4.0,0.0,0.0,0.0,2.0,0.0,0.0,2.0,4.0,0.0,0.0,1.5,...,0.0,4.5,4.0,0.0,3.5,0.0,0.0,0.0,3.5,0.0,2.0,0.0,3.5,0.0,0.0,0.0,0.0,0.0,2.0,2.5,2.5,3.0,3.0,0.5,2.5,4.0,0.0,3.5,0.0,2.5,3.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,4.0,4.0
3,0.0,0.0,0.0,0.0,3.0,4.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,3.5,0.0,0.0,0.0,0.0,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,2.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,3.5,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,2.0,4.5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,2.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,2.5,3.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,...,0.0,4.5,0.0,0.0,0.0,3.0,0.0,0.0,2.5,0.0,2.0,3.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0


* As it can be seen from table above, rows are movie Id, columns are user Id and values are ratings

**Getting inside my algorithm**

In [19]:
#importing the necessary package

from sklearn.neighbors import NearestNeighbors

In [20]:
model=NearestNeighbors(algorithm='brute')

In [21]:
model.fit(ptable)

NearestNeighbors(algorithm='brute', leaf_size=30, metric='minkowski',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

Function to recommend movies

In [22]:
def recommends(movie_id):
    distances,suggestions=model.kneighbors(ptable.loc[movie_id,:].values.reshape(1,-1),n_neighbors=16)
    return ptable.iloc[suggestions[0]].index

In [23]:
l=movie[movie['movieId'].isin(ptable.index)]

Checking for the movie 'Avengers'

In [24]:
l[l['title'].str.contains('avengers',case=False)]

Unnamed: 0,movieId,title
2069,2153,"Avengers, The (1998)"
10872,44020,Ultimate Avengers (2006)
17874,89745,"Avengers, The (2012)"
23096,110132,Avengers Confidential: Black Widow & Punisher ...
24424,115727,Crippled Avengers (Can que) (Return of the 5 D...


getting the movie code and will recommend using the code itself

In [25]:
recommendation=recommends(89745)

In [26]:
#getting the recommend movie's Id

recommendation

Int64Index([ 89745,  91529,  87232,  86332,  88140,  91500, 102125,  95510,
             77561,  96610,  98809,  96079,  94864,  91542,  91630, 102445],
           dtype='int64', name='movieId')

In [27]:
#getting the movie names from it's Id

for movie_id  in recommendation[1:]:
    print(movie[movie['movieId']==movie_id]['title'].values[0])

Dark Knight Rises, The (2012)
X-Men: First Class (2011)
Thor (2011)
Captain America: The First Avenger (2011)
Hunger Games, The (2012)
Iron Man 3 (2013)
Amazing Spider-Man, The (2012)
Iron Man 2 (2010)
Looper (2012)
Hobbit: An Unexpected Journey, The (2012)
Skyfall (2012)
Prometheus (2012)
Sherlock Holmes: A Game of Shadows (2011)
Mission: Impossible - Ghost Protocol (2011)
Star Trek Into Darkness (2013)


We will check for Harry Potter now

In [28]:
l[l['title'].str.contains('harry potter',case=False)]

Unnamed: 0,movieId,title
4800,4896,Harry Potter and the Sorcerer's Stone (a.k.a. ...
5717,5816,Harry Potter and the Chamber of Secrets (2002)
7769,8368,Harry Potter and the Prisoner of Azkaban (2004)
10600,40815,Harry Potter and the Goblet of Fire (2005)
11974,54001,Harry Potter and the Order of the Phoenix (2007)
13935,69844,Harry Potter and the Half-Blood Prince (2009)
16191,81834,Harry Potter and the Deathly Hallows: Part 1 (...
17499,88125,Harry Potter and the Deathly Hallows: Part 2 (...


In [29]:
recommendation=recommends(4896)

#getting the movie names from it's Id

for movie_id  in recommendation[1:]:
    print(movie[movie['movieId']==movie_id]['title'].values[0])

Harry Potter and the Chamber of Secrets (2002)
Harry Potter and the Prisoner of Azkaban (2004)
Harry Potter and the Goblet of Fire (2005)
Spider-Man (2002)
Star Wars: Episode II - Attack of the Clones (2002)
Monsters, Inc. (2001)
Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Matrix Reloaded, The (2003)
Shrek 2 (2004)
X2: X-Men United (2003)
Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (2002)
Spider-Man 2 (2004)
Shrek (2001)
Ice Age (2002)
Finding Nemo (2003)


2.Starting with tag-user recommendation system

Doing the same procedures in this case again.

In [30]:
scores.head()

Unnamed: 0,movieId,tagId,relevance
0,1,1,0.025
1,1,2,0.025
2,1,3,0.05775
3,1,4,0.09675
4,1,5,0.14675


In [31]:
scores.shape

(11709768, 3)

In [32]:
movie_tag_pivot=pd.pivot_table(columns='tagId',index='movieId',values='relevance',data=scores)

In [33]:
movie_tag_pivot

tagId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,...,1089,1090,1091,1092,1093,1094,1095,1096,1097,1098,1099,1100,1101,1102,1103,1104,1105,1106,1107,1108,1109,1110,1111,1112,1113,1114,1115,1116,1117,1118,1119,1120,1121,1122,1123,1124,1125,1126,1127,1128
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1,0.02500,0.02500,0.05775,0.09675,0.14675,0.21700,0.06700,0.26275,0.26200,0.03200,0.57700,0.11625,0.18800,0.00800,0.03675,0.28175,0.00700,0.11050,0.67050,0.18450,0.33025,0.28250,0.05700,0.01550,0.08500,0.08100,0.19500,0.07150,0.89200,0.67625,0.03875,0.22800,0.40200,0.03875,0.02675,0.33025,0.10100,0.01250,0.01850,0.01425,...,0.10850,0.60425,0.41050,0.44500,0.33725,0.02175,0.04075,0.06250,0.04375,0.10075,0.00475,0.19225,0.25850,0.00900,0.02775,0.29925,0.04675,0.01025,0.02725,0.73700,0.11200,0.05125,0.04375,0.05350,0.12575,0.77675,0.14500,0.11275,0.04200,0.10250,0.03950,0.01800,0.04575,0.03275,0.12500,0.04150,0.01925,0.03625,0.07775,0.02300
2,0.03975,0.04375,0.03775,0.04800,0.11025,0.07250,0.04775,0.10975,0.09925,0.02050,0.06775,0.08900,0.22575,0.00625,0.00300,0.03475,0.00950,0.18975,0.64600,0.40025,0.45100,0.60200,0.15100,0.28100,0.07600,0.14350,0.11675,0.04350,0.98100,0.10550,0.00825,0.06650,0.08575,0.05300,0.04525,0.14650,0.07750,0.02900,0.02275,0.02475,...,0.08925,0.32475,0.19125,0.32550,0.17675,0.02650,0.03500,0.04125,0.02775,0.04750,0.00575,0.04700,0.15700,0.01425,0.02400,0.31600,0.08850,0.02375,0.01300,0.24450,0.10075,0.03425,0.02475,0.26450,0.39025,0.18000,0.18725,0.14750,0.01500,0.05700,0.04175,0.01925,0.01725,0.02425,0.12550,0.02250,0.01550,0.01475,0.09025,0.01875
3,0.04350,0.05475,0.02800,0.07700,0.05400,0.06850,0.05600,0.18500,0.04925,0.02675,0.02225,0.07100,0.09050,0.00475,0.00250,0.02875,0.01175,0.10400,0.16475,0.17375,0.28000,0.20775,0.08675,0.01700,0.04250,0.08375,0.12225,0.10800,0.41200,0.09525,0.00900,0.04750,0.06475,0.05925,0.02600,0.20950,0.04375,0.02350,0.05675,0.01300,...,0.10525,0.11400,0.21250,0.16575,0.03550,0.02450,0.04925,0.03825,0.03400,0.07225,0.01125,0.03875,0.13875,0.41600,0.04175,0.16800,0.06175,0.02050,0.04275,0.15525,0.17250,0.04275,0.04450,0.02325,0.03375,0.19950,0.02825,0.37075,0.02625,0.07325,0.04150,0.02675,0.02775,0.03425,0.15550,0.03675,0.01700,0.01950,0.09700,0.01850
4,0.03725,0.03950,0.03675,0.03100,0.06825,0.04050,0.02325,0.08700,0.05125,0.03025,0.02125,0.03325,0.11750,0.01075,0.00275,0.02875,0.01400,0.16875,0.13950,0.12900,0.49875,0.44650,0.09050,0.01525,0.06175,0.06175,0.08850,0.55575,0.13800,0.10575,0.01050,0.03925,0.07950,0.05300,0.01725,0.05650,0.09025,0.01275,0.01125,0.00550,...,0.09350,0.12250,0.18575,0.18000,0.04800,0.01625,0.01675,0.03625,0.02050,0.05800,0.01700,0.03025,0.31475,0.09275,0.02825,0.17850,0.06750,0.02875,0.02975,0.16875,0.06225,0.16425,0.03925,0.01875,0.03400,0.10725,0.02825,0.97675,0.01800,0.28825,0.05750,0.03375,0.02275,0.03975,0.18525,0.05925,0.01500,0.01525,0.06450,0.01300
5,0.04200,0.05275,0.05925,0.03675,0.07525,0.12525,0.02850,0.08500,0.02950,0.02875,0.03125,0.06150,0.07175,0.00650,0.00225,0.02775,0.01425,0.16100,0.16350,0.22650,0.27575,0.21075,0.10275,0.02250,0.04150,0.05100,0.19200,0.17150,0.15375,0.12225,0.01000,0.04175,0.07950,0.04050,0.03175,0.07250,0.03400,0.01925,0.04200,0.00600,...,0.10825,0.14700,0.18750,0.22100,0.08950,0.01350,0.04075,0.03550,0.03075,0.06625,0.00775,0.03850,0.13150,0.80675,0.02650,0.26975,0.05575,0.02100,0.02175,0.20825,0.11850,0.14150,0.03875,0.03475,0.04675,0.12000,0.02925,0.48900,0.02150,0.07450,0.04250,0.02825,0.02150,0.02600,0.14275,0.02075,0.01650,0.01675,0.10750,0.01825
6,0.02825,0.02550,0.01850,0.04550,0.09575,0.05500,0.04400,0.24200,0.12850,0.02550,0.01550,0.04400,0.08000,0.00675,0.00350,0.03225,0.01250,0.18250,0.92475,0.60200,0.33625,0.24075,0.04125,0.01150,0.21450,0.06775,0.07200,0.26600,0.08050,0.25275,0.01850,0.03300,0.20750,0.00975,0.02025,0.07000,0.06700,0.02250,0.11575,0.00650,...,0.84025,0.35350,0.32900,0.40500,0.07050,0.02225,0.04600,0.04125,0.06275,0.06550,0.00775,0.05675,0.67900,0.02050,0.03050,0.22875,0.06900,0.01725,0.02875,0.08600,0.07000,0.10675,0.04675,0.00900,0.02550,0.40750,0.03375,0.10925,0.06375,0.18800,0.04900,0.01825,0.02075,0.06000,0.29975,0.15525,0.03525,0.01950,0.06650,0.01900
7,0.04575,0.05275,0.16675,0.08275,0.11450,0.15625,0.05025,0.11175,0.03950,0.08000,0.02250,0.08800,0.05725,0.00600,0.00275,0.02700,0.01600,0.13250,0.13350,0.14100,0.37800,0.16625,0.06050,0.02125,0.06625,0.05975,0.06150,0.15500,0.22850,0.09350,0.00825,0.02900,0.07150,0.04200,0.01800,0.11100,0.06575,0.03325,0.03875,0.00800,...,0.09175,0.19025,0.20575,0.21175,0.06825,0.01475,0.03400,0.04500,0.04200,0.07025,0.03675,0.02175,0.12625,0.35525,0.01825,0.20725,0.06375,0.02225,0.02075,0.44500,0.16875,0.41125,0.15875,0.03350,0.06575,0.28225,0.02975,0.44475,0.04550,0.06950,0.03750,0.02825,0.01200,0.03575,0.13000,0.04875,0.01975,0.01050,0.10925,0.01850
8,0.03075,0.03550,0.04675,0.02175,0.05600,0.03650,0.01675,0.07325,0.02950,0.04100,0.03025,0.02125,0.08000,0.00525,0.00425,0.04475,0.01050,0.12175,0.49425,0.54100,0.51375,0.78525,0.13025,0.01975,0.05275,0.13700,0.37100,0.05175,0.96325,0.09625,0.00800,0.02400,0.03825,0.22475,0.01450,0.04775,0.03325,0.00550,0.00475,0.02375,...,0.05250,0.15275,0.17500,0.21625,0.18400,0.01050,0.04275,0.01700,0.01950,0.05350,0.01575,0.02250,0.14325,0.01450,0.05725,0.16475,0.08150,0.02200,0.01925,0.14450,0.11050,0.05350,0.02425,0.05825,0.10275,0.09550,0.02825,0.37825,0.01025,0.03875,0.03700,0.01925,0.01625,0.02325,0.20975,0.02825,0.01675,0.01125,0.07000,0.01500
9,0.03500,0.04050,0.01825,0.01800,0.03650,0.01750,0.01300,0.04225,0.01675,0.01525,0.02225,0.02875,0.09950,0.01050,0.00200,0.01975,0.01050,0.11750,0.97450,0.93750,0.23525,0.18875,0.10400,0.20475,0.04325,0.05600,0.04475,0.03900,0.15150,0.06725,0.00650,0.02400,0.04200,0.02600,0.01450,0.01675,0.03000,0.02500,0.00775,0.00725,...,0.12000,0.11475,0.11250,0.10675,0.03700,0.01125,0.03725,0.02575,0.02125,0.05000,0.01000,0.01475,0.46925,0.01225,0.02250,0.13500,0.07125,0.03575,0.01350,0.11625,0.06525,0.01750,0.02275,0.01625,0.03775,0.07625,0.02275,0.14850,0.01300,0.04950,0.02225,0.01075,0.01175,0.01525,0.14100,0.02225,0.01100,0.00700,0.07275,0.01550
10,0.99975,0.99975,0.01950,0.03675,0.06675,0.05450,0.04550,0.12950,0.08550,0.01925,0.02150,0.07225,0.23225,0.01100,0.00350,0.02975,0.01350,0.28375,0.96600,0.60500,0.26025,0.19550,0.10800,0.01900,0.05625,0.07250,0.06150,0.05375,0.82725,0.08575,0.00675,0.03650,0.09025,0.03025,0.02200,0.08575,0.09450,0.07275,0.03625,0.00900,...,0.11000,0.20500,0.22400,0.12825,0.04750,0.11750,0.02975,0.04775,0.04225,0.12875,0.01650,0.12850,0.69050,0.03000,0.03075,0.22575,0.07350,0.02100,0.02450,0.10625,0.11750,0.08325,0.02150,0.01550,0.03850,0.26050,0.04150,0.19075,0.02950,0.07225,0.46750,0.02325,0.02150,0.03125,0.18400,0.03750,0.01775,0.01775,0.07300,0.01825


In [34]:
movie_tag_pivot.fillna(0,inplace=True)

In [35]:
model1=NearestNeighbors(algorithm='brute')

In [36]:
model1.fit(movie_tag_pivot)

NearestNeighbors(algorithm='brute', leaf_size=30, metric='minkowski',
         metric_params=None, n_jobs=1, n_neighbors=5, p=2, radius=1.0)

In [37]:
def recommend(movie_id):
    distances,suggestions=model1.kneighbors(movie_tag_pivot.loc[movie_id,:].values.reshape(1,-1),n_neighbors=16)
    return movie_tag_pivot.iloc[suggestions[0]].index

In [38]:
#we will merge the link and scores dataset now

movie = pd.merge(movie,link,on='movieId')

In [39]:
scores_movie=movie[movie['movieId'].isin(movie_tag_pivot.index)]

Predictions for the movie avengers

In [40]:
scores_movie[scores_movie['title'].str.contains('avengers',case=False)]

Unnamed: 0,movieId,title,imdbId,tmdbId
2069,2153,"Avengers, The (1998)",118661,9320.0
10872,44020,Ultimate Avengers (2006),491703,14609.0
17874,89745,"Avengers, The (2012)",848228,24428.0


In [41]:
recommendations=recommend(89745)

In [42]:
recommendations

Int64Index([ 89745,  59315, 112852,  86332, 110102, 103042,   3793,  87232,
              6333, 102125, 106072,   5349,   8636,  77561, 111362,  88140],
           dtype='int64', name='movieId')

In [43]:
for movie_id  in recommendations[1:]:
    print(movie[movie['movieId']==movie_id]['title'].values[0])

Iron Man (2008)
Guardians of the Galaxy (2014)
Thor (2011)
Captain America: The Winter Soldier (2014)
Man of Steel (2013)
X-Men (2000)
X-Men: First Class (2011)
X2: X-Men United (2003)
Iron Man 3 (2013)
Thor: The Dark World (2013)
Spider-Man (2002)
Spider-Man 2 (2004)
Iron Man 2 (2010)
X-Men: Days of Future Past (2014)
Captain America: The First Avenger (2011)


In [44]:
scores_movie[scores_movie['title'].str.contains('harry potter',case=False)]

Unnamed: 0,movieId,title,imdbId,tmdbId
4800,4896,Harry Potter and the Sorcerer's Stone (a.k.a. ...,241527,671.0
5717,5816,Harry Potter and the Chamber of Secrets (2002),295297,672.0
7769,8368,Harry Potter and the Prisoner of Azkaban (2004),304141,673.0
10600,40815,Harry Potter and the Goblet of Fire (2005),330373,674.0
11974,54001,Harry Potter and the Order of the Phoenix (2007),373889,675.0
13935,69844,Harry Potter and the Half-Blood Prince (2009),417741,767.0
16191,81834,Harry Potter and the Deathly Hallows: Part 1 (...,926084,12444.0


In [45]:
recommendation=recommend(4896)

for movie_id  in recommendation[1:]:
    print(movie[movie['movieId']==movie_id]['title'].values[0])

Harry Potter and the Chamber of Secrets (2002)
Harry Potter and the Prisoner of Azkaban (2004)
Harry Potter and the Goblet of Fire (2005)
Harry Potter and the Order of the Phoenix (2007)
Harry Potter and the Half-Blood Prince (2009)
Harry Potter and the Deathly Hallows: Part 1 (2010)
Spiderwick Chronicles, The (2008)
Chronicles of Narnia: The Lion, the Witch and the Wardrobe, The (2005)
Golden Compass, The (2007)
Chronicles of Narnia: Prince Caspian, The (2008)
Chronicles of Narnia: The Voyage of the Dawn Treader, The (2010)
Eragon (2006)
NeverEnding Story, The (1984)
Inkheart (2008)
Seeker: The Dark Is Rising, The (2007)


# We see both the models are predicting more or less the same type of things.

(Anyway from an user perspective, our recommendation is working good.)

Model Dumping

In [46]:
import pickle as pkl

In [47]:
#for tag-user recommendation

pkl.dump(model1,open('engine_tu.pkl','wb'))
pkl.dump(movie_tag_pivot,open('movie_tag_pivot_table_tu.pkl','wb'))
pkl.dump(scores_movie,open('movie_names_tu.pkl','wb'))

In [48]:
#one problem will persist while dumping the rating vs user pivot table. that is we have seen that the data is huge in that table and we might face space problem in this IDE.

#thus we will consider the dump file of tag-user only.