# Recommender Systems

Basic recommendation systems using Python and pandas. 

In this notebook, focus is on providing a basic recommendation system by suggesting items that are most similar to a particular item, in this case, movies. This is not a true robust recommendation system, to describe it more accurately,it just tells you what movies/items are most similar to your movie choice.


In [1]:
# Import the libraries
# Pandas is high-performance, easy-to-use data structures and data analysis tools for the Python programming language
import pandas as pd

# Numpy is the fundamental package for scientific computing with Python
import numpy as np

In [2]:
# Creating list of column names
columns_names= ['user_id','item_id','rating','timestamp']

In [3]:
# Reading u data File 
df=pd.read_csv('u.data',sep='\t',names=columns_names)

In [4]:
# To view first 5 rows of our dataset
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,0,50,5,881250949
1,0,172,5,881250949
2,0,133,1,881250949
3,196,242,3,881250949
4,186,302,3,891717742


In [6]:
# Reading Movie_Id_Titles data File 
movie_titles =pd.read_csv('Movie_Id_Titles')

In [8]:
# To view first 5 rows of our dataset
movie_titles.head()

Unnamed: 0,item_id,title
0,1,Toy Story (1995)
1,2,GoldenEye (1995)
2,3,Four Rooms (1995)
3,4,Get Shorty (1995)
4,5,Copycat (1995)


In [9]:
# Merging both the files
df =pd.merge(df,movie_titles,on='item_id')

In [10]:
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp,title
0,0,50,5,881250949,Star Wars (1977)
1,290,50,5,880473582,Star Wars (1977)
2,79,50,4,891271545,Star Wars (1977)
3,2,50,5,888552084,Star Wars (1977)
4,8,50,5,879362124,Star Wars (1977)


In [11]:
# Matplotlib is used for graphs
import matplotlib.pyplot as plt

# seaborn Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics
import seaborn as sns


In [12]:
# %matplotlib inline is magic command. This performs the necessary behind-the-scenes setup for IPython to work correctly hand in hand with matplotlib
%matplotlib inline

In [15]:
# Using groupby and checking descriptive statistics
df.groupby('title')['rating'].mean().sort_values(ascending=False).head()

title
Marlene Dietrich: Shadow and Light (1996)     5.0
Prefontaine (1997)                            5.0
Santa with Muscles (1996)                     5.0
Star Kid (1997)                               5.0
Someone Else's America (1995)                 5.0
Name: rating, dtype: float64

In [16]:
# Using groupby and checking descriptive statistics
df.groupby('title')['rating'].count().sort_values(ascending=False).head()

title
Star Wars (1977)             584
Contact (1997)               509
Fargo (1996)                 508
Return of the Jedi (1983)    507
Liar Liar (1997)             485
Name: rating, dtype: int64

In [21]:
# Using groupby and checking descriptive statistics
ratings2 = pd.DataFrame(df.groupby('title')['rating'].mean(),df.groupby('title')['rating'].count())

In [22]:
ratings.head()

Unnamed: 0_level_0,rating,num of ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),2.333333,9
1-900 (1994),2.6,5
101 Dalmatians (1996),2.908257,109
12 Angry Men (1957),4.344,125
187 (1997),3.02439,41


In [19]:
ratings['num of ratings']=pd.DataFrame(df.groupby('title')['rating'].count())

In [20]:
ratings.head()

Unnamed: 0_level_0,rating,num of ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
'Til There Was You (1997),2.333333,9
1-900 (1994),2.6,5
101 Dalmatians (1996),2.908257,109
12 Angry Men (1957),4.344,125
187 (1997),3.02439,41


In [23]:
# Using pivot table to get rating of movies by single user in one row
movie_mat = df.pivot_table(index='user_id',columns='title',values='rating')

In [24]:
movie_mat.head()

title,'Til There Was You (1997),1-900 (1994),101 Dalmatians (1996),12 Angry Men (1957),187 (1997),2 Days in the Valley (1996),"20,000 Leagues Under the Sea (1954)",2001: A Space Odyssey (1968),3 Ninjas: High Noon At Mega Mountain (1998),"39 Steps, The (1935)",...,Yankee Zulu (1994),Year of the Horse (1997),You So Crazy (1994),Young Frankenstein (1974),Young Guns (1988),Young Guns II (1990),"Young Poisoner's Handbook, The (1995)",Zeus and Roxanne (1997),unknown,Á köldum klaka (Cold Fever) (1994)
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,,,,,,,,,,,...,,,,,,,,,,
1,,,2.0,5.0,,,3.0,4.0,,,...,,,,5.0,3.0,,,,4.0,
2,,,,,,,,,1.0,,...,,,,,,,,,,
3,,,,,2.0,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,


In [27]:
starwars_user_ratings =movie_mat['Star Wars (1977)']
liarliar_user_ratings =movie_mat['Liar Liar (1997)']

In [28]:
starwars_user_ratings.head()

user_id
0    5.0
1    5.0
2    5.0
3    NaN
4    5.0
Name: Star Wars (1977), dtype: float64

In [30]:
# Usiing correlation function to get correlation of a moview rating with other
similar_to_starwars = movie_mat.corrwith(starwars_user_ratings)

  c = cov(x, y, rowvar)
  c *= 1. / np.float64(fact)


In [31]:
# Usiing correlation function to get correlation of a moview rating with other
similar_to_liarliar =movie_mat.corrwith(liarliar_user_ratings)

  c = cov(x, y, rowvar)
  c *= 1. / np.float64(fact)


In [32]:
# Usiing correlation function to get correlation of a moview rating with other
corr_starwars =pd.DataFrame(similar_to_starwars,columns=['correlation'])

In [33]:
corr_starwars.dropna(inplace=True)

In [34]:
corr_starwars.head()

Unnamed: 0_level_0,correlation
title,Unnamed: 1_level_1
'Til There Was You (1997),0.872872
1-900 (1994),-0.645497
101 Dalmatians (1996),0.211132
12 Angry Men (1957),0.184289
187 (1997),0.027398


In [35]:
corr_starwars = corr_starwars.join(ratings['num of ratings'])

In [37]:
# Minimum rating by atleast 100 users
corr_starwars[corr_starwars['num of ratings']>100].sort_values('correlation',ascending=False).head()

Unnamed: 0_level_0,correlation,num of ratings
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars (1977),1.0,584
"Empire Strikes Back, The (1980)",0.748353,368
Return of the Jedi (1983),0.672556,507
Raiders of the Lost Ark (1981),0.536117,420
Austin Powers: International Man of Mystery (1997),0.377433,130
