# Recommender Systems

Basic recommendation systems using Python and pandas. 

In this notebook, focus is on providing a basic recommendation system by suggesting items that are most similar to a particular item, in this case, movies. This is not a true robust recommendation system, to describe it more accurately,it just tells you what movies/items are most similar to your movie choice.


In [1]:
# Import the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [27]:
# Creating list of column names
columns_names= ['user_id','item_id','rating','timestamp']

In [28]:
# Reading u data File 
df=pd.read_csv('./Data/u.data',sep='\t',names=columns_names)

In [29]:
# To view first 5 rows of our dataset
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742


In [30]:
# Reading Movie_Id_Titles data File 
movie_titles =pd.read_csv('./Data/Movie_Id_Titles.csv')

In [31]:
# To view first 5 rows of our dataset
movie_titles.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [32]:
# Merging both the files
df = pd.merge(df,movie_titles,on='item_id')

df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,290,50,5,880473582,Star Wars (1977)
2,79,50,4,891271545,Star Wars (1977)
3,2,50,5,888552084,Star Wars (1977)
4,8,50,5,879362124,Star Wars (1977)


In [33]:
# Using groupby and checking descriptive statistics
df.groupby('title')['rating'].mean().sort_values(ascending=False).head()

title
They Made Me a Criminal (1939)                5.0
Marlene Dietrich: Shadow and Light (1996)     5.0
Saint of Fort Washington, The (1993)          5.0
Someone Else's America (1995)                 5.0
Star Kid (1997)                               5.0
Name: rating, dtype: float64

In [34]:
# Using groupby and checking descriptive statistics
df.groupby('title')['rating'].count().sort_values(ascending=False).head()

title
Star Wars (1977)             584
Contact (1997)               509
Fargo (1996)                 508
Return of the Jedi (1983)    507
Liar Liar (1997)             485
Name: rating, dtype: int64

In [35]:
# Using groupby and checking descriptive statistics
ratings = pd.DataFrame(df.groupby('title')['rating'].mean(),df.groupby('title')['rating'].count())

ratings.head()

Unnamed: 0_level_0,rating
rating,Unnamed: 1_level_1
9,
5,
109,
125,
41,


In [36]:
ratings['num of ratings']=pd.DataFrame(df.groupby('title')['rating'].count())

ratings.head()

Unnamed: 0_level_0,rating,num of ratings
rating,Unnamed: 1_level_1,Unnamed: 2_level_1
9,,
5,,
109,,
125,,
41,,


In [37]:
# Using pivot table to get rating of movies by single user in one row
movie_mat = df.pivot_table(index='user_id',columns='title',values='rating')

movie_mat.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [38]:
starwars_user_ratings =movie_mat['Star Wars (1977)']
liarliar_user_ratings =movie_mat['Liar Liar (1997)']

starwars_user_ratings.head()

user_id
0    5.0
1    5.0
2    5.0
3    NaN
4    5.0
Name: Star Wars (1977), dtype: float64

In [39]:
# Usiing correlation function to get correlation of a moview rating with other
similar_to_starwars = movie_mat.corrwith(starwars_user_ratings)

  c = cov(x, y, rowvar)
  c *= np.true_divide(1, fact)


In [40]:
# Usiing correlation function to get correlation of a moview rating with other
similar_to_liarliar =movie_mat.corrwith(liarliar_user_ratings)

In [41]:
# Usiing correlation function to get correlation of a moview rating with other
corr_starwars =pd.DataFrame(similar_to_starwars,columns=['correlation'])

In [42]:
corr_starwars.dropna(inplace=True)

corr_starwars.head()

Unnamed: 0_level_0,correlation
title,Unnamed: 1_level_1
'Til There Was You (1997),0.872872
1-900 (1994),-0.645497
101 Dalmatians (1996),0.211132
12 Angry Men (1957),0.184289
187 (1997),0.027398


In [26]:
corr_starwars = corr_starwars.join(ratings['num of ratings'])

ValueError: columns overlap but no suffix specified: Index(['num of ratings'], dtype='object')

In [23]:
corr_starwars

Unnamed: 0,correlation,num of ratings
'Til There Was You (1997),0.872872,
1-900 (1994),-0.645497,
101 Dalmatians (1996),0.211132,
12 Angry Men (1957),0.184289,
187 (1997),0.027398,
...,...,...
Young Guns (1988),0.186377,
Young Guns II (1990),0.228615,
"Young Poisoner's Handbook, The (1995)",-0.007374,
Zeus and Roxanne (1997),0.818182,


In [22]:
# Minimum rating by atleast 10 users
corr_starwars[corr_starwars['num of ratings']>10].sort_values('correlation',ascending=False).head()

Unnamed: 0,correlation,num of ratings
