# COLLABORATIVE  FILTERING - Recommending Books and Movies

We'll start by loading up the Goodreads dataset. Using Pandas, we can very quickly load the rows of the rating and item files that we care about, and merge them together so we can work with book names instead of ID's. (In a real production job, you'd stick with ID's and worry about the names at the display layer to make things more efficient. But this lets us understand what's going on better for now.)

In [1]:
import warnings
warnings.simplefilter('ignore')

In [2]:
import pandas as pd
import numpy as np


#### Illustration of the Principle

In [3]:
import numpy as np
np.random.seed(5)
ratings = np.random.randint(5, size=(5, 5), )
ratings

array([[3, 0, 1, 0, 4],
       [3, 0, 0, 4, 1],
       [0, 3, 4, 3, 1],
       [4, 2, 1, 1, 2],
       [1, 1, 1, 2, 0]])

In [4]:
ratingsDF = pd.DataFrame(ratings, columns=['BK1', 'BK2', 'BK3', 'BK4', 'BK5'])

In [5]:
ratingsDF['user'] = ['user1', 'user2', 'user3', 'user4', 'user5']
ratingsDF.set_index('user', inplace=True)

In [6]:
ratingsDF

Unnamed: 0_level_0,BK1,BK2,BK3,BK4,BK5
user,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
user1,3,0,1,0,4
user2,3,0,0,4,1
user3,0,3,4,3,1
user4,4,2,1,1,2
user5,1,1,1,2,0


Pearson's correlation formula for a sample population: <br>

$r_{xy} = \frac{\sum_{i=1}^{n} (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2 (y_i - \bar{y})^2}}$

In [7]:
corrTable = ratingsDF.corr()

In [8]:
ratingsDF[['BK2', 'BK3']]

Unnamed: 0_level_0,BK2,BK3
user,Unnamed: 1_level_1,Unnamed: 2_level_1
user1,0,1
user2,0,0
user3,3,4
user4,2,1
user5,1,1


In [9]:
corrTable['BK2']['BK3']

0.8344408667498864

In [10]:
ratingsDF[['BK4', 'BK5']]

Unnamed: 0_level_0,BK4,BK5
user,Unnamed: 1_level_1,Unnamed: 2_level_1
user1,0,4
user2,4,1
user3,3,1
user4,1,2
user5,2,0


In [11]:
corrTable['BK4']['BK5']

-0.7298004491997616

In [12]:
ratingsDF[['BK2', 'BK4']]

Unnamed: 0_level_0,BK2,BK4
user,Unnamed: 1_level_1,Unnamed: 2_level_1
user1,0,0
user2,0,4
user3,3,3
user4,2,1
user5,1,2


In [13]:
corrTable['BK2']['BK4']

0.12126781251816648

<div class="alert alert-info"> 

<h3> Hypothesis of a collaborative filtering based recommendation system.</h3>

<h4> 1. If the ratings of two items I1 and I2 are similar, i.e. correlation is a large +ve number, and a user likes I1, then user will like I2. </h4>

<h4> 2. If the ratings of two items I1 and I2 are opposing, i.e. correlation is a large -ve number, and a user likes I1, then user will dislike I2. </h4>

<h4> 3. If the ratings of two items I1 and I2 do not have any pattern, i.e. correlation is a small number, then it is difficult to conclude anything based on user liking I1.</h4>
</div>

In [14]:
corrTable = ratingsDF.corr()
corrTable

Unnamed: 0,BK1,BK2,BK3,BK4,BK5
BK1,1.0,-0.490098,-0.742379,-0.3849,0.541736
BK2,-0.490098,1.0,0.834441,0.121268,-0.328719
BK3,-0.742379,0.834441,1.0,0.104257,-0.130435
BK4,-0.3849,0.121268,0.104257,1.0,-0.7298
BK5,0.541736,-0.328719,-0.130435,-0.7298,1.0


###  Load the data set of Movie Ratings from MovieLens source

In [15]:
# ratings=pd.read_csv('movies/ml-1m/ratings.dat', sep='::', header=None)
# ratings.columns = ['userId', 'movieId', 'rating', 'timeStamp']

corrTableName = "moviesCorrTable.pkl"
pathToRatings = '/Data/movieRatings.csv'
pathToDetails = '/Data/movieInfo.csv'

In [16]:
ratings = pd.read_csv(pathToRatings)
#ratings.drop('Unnamed: 0', axis=1, inplace=True)
ratings.shape

(1000209, 3)

In [17]:
ratings.sample(n=5, random_state=10)

Unnamed: 0,userId,itemId,rating
511275,3154,1625,3
467574,2881,1079,5
982145,5926,3256,4
867130,5232,1968,3
880583,5319,2294,4


In [18]:
# movies=pd.read_csv('movies/ml-1m/movies.dat', sep='::', header=None)
# movies.columns = ['MovieID', 'Title', 'Genres']
items=pd.read_csv(pathToDetails)
#items.drop('Unnamed: 0', axis=1, inplace=True)
items.shape

(3883, 3)

In [19]:
items.head(10) #(n=5, random_state=4)

Unnamed: 0,itemId,title,details
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy
5,6,Heat (1995),Action|Crime|Thriller
6,7,Sabrina (1995),Comedy|Romance
7,8,Tom and Huck (1995),Adventure|Children's
8,9,Sudden Death (1995),Action
9,10,GoldenEye (1995),Action|Adventure|Thriller


### Find out how many users in total

In [20]:
ratings['userId'].unique().size

6040

### Find out how many movies in total

In [21]:
ratings['itemId'].unique().size

3706

# Build the Pivot Table

<div class="alert alert-warning">
<b> Example Pivot Table. </b>
<br>
<br>
<img src="pivTable.JPG"/>
</div>

In [22]:
pivotTable = ratings.pivot_table(index=['userId'],columns=['itemId'],values='rating')
pivotTable.shape

(6040, 3706)

### Inspect the head

In [23]:
pivotTable.sample(n=5)

itemId,1,2,3,4,5,6,7,8,9,10,...,3943,3944,3945,3946,3947,3948,3949,3950,3951,3952
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2044,,,,,,,,,,,...,,,,,,,,,,
4898,4.0,,,,,3.0,,,,,...,,,,,,,,,,
3555,,4.0,,,,,,,,3.0,...,,,,,,,,,,
4172,,,,,,,,,,,...,,,,,,,,,,
454,,,,,,,,,,,...,,,,,,4.0,,,,


##  Find out the correlation matrix of all books with each other

In [24]:
computeCorr = False
if computeCorr == False:
    corrTable = pd.read_pickle(corrTableName)
else:
    corrTable = pivotTable.corr(min_periods=350)
    corrTable.to_pickle(corrTableName)

In [25]:
#View the corrtable

corrTable.head()

itemId,1,2,3,4,5,6,7,8,9,10,...,3943,3944,3945,3946,3947,3948,3949,3950,3951,3952
itemId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.187467,,,,0.051097,,,,0.143598,...,,,,,,0.139323,,,,
2,0.187467,1.0,,,,,,,,,...,,,,,,,,,,
3,,,1.0,,,,,,,,...,,,,,,,,,,
4,,,,,,,,,,,...,,,,,,,,,,
5,,,,,,,,,,,...,,,,,,,,,,


In [26]:
corrTable.shape

(3706, 3706)

In [27]:
def itemsFromIDs(items, IDlist):
    df = pd.DataFrame(columns=items.columns)
    for id in IDlist:
        item = items[items.itemId == id] 
        df = pd.concat([df, item], axis=0)
    
    df.reset_index(inplace=True, drop=True)
    return df

In [28]:
def relatedRecos(itemName):
    ItemID = items[items.title == itemName]["itemId"].iloc[0]
    my_corr=corrTable.loc[ItemID]

    top10 = my_corr.dropna().sort_values(ascending=False)[1:11]
    top10itemIDs = list(top10.index)

    top10Items = itemsFromIDs(items, top10itemIDs)
    
    return top10Items

In [76]:
itemName ='Goldfinger (1964)'
relatedRecos(itemName)

Unnamed: 0,itemId,title,details
0,2949,Dr. No (1962),Action
1,2948,From Russia with Love (1963),Action
2,10,GoldenEye (1995),Action|Adventure|Thriller
3,1287,Ben-Hur (1959),Action|Adventure|Drama
4,1291,Indiana Jones and the Last Crusade (1989),Action|Adventure
5,1198,Raiders of the Lost Ark (1981),Action|Adventure
6,1201,"Good, The Bad and The Ugly, The (1966)",Action|Western
7,3793,X-Men (2000),Action|Sci-Fi
8,474,In the Line of Fire (1993),Action|Thriller
9,2640,Superman (1978),Action|Adventure|Sci-Fi


In [30]:
itemName = "Lost World: Jurassic Park, The (1997)"
#itemName='Lion King, The (1994)'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,480,Jurassic Park (1993),Action|Adventure|Sci-Fi
1,1882,Godzilla (1998),Action|Sci-Fi
2,1562,Batman & Robin (1997),Action|Adventure|Crime
3,160,Congo (1995),Action|Adventure|Mystery|Sci-Fi
4,736,Twister (1996),Action|Adventure|Romance|Thriller
5,1831,Lost in Space (1998),Action|Sci-Fi|Thriller
6,434,Cliffhanger (1993),Action|Adventure|Crime
7,780,Independence Day (ID4) (1996),Action|Sci-Fi|War
8,1917,Armageddon (1998),Action|Adventure|Sci-Fi|Thriller
9,196,Species (1995),Horror|Sci-Fi


In [31]:
itemName='Indiana Jones and the Temple of Doom (1984)'
relatedRecos(itemName)

# Harrison Ford (Indiana Jones), Michael Douglas (Jewel of the Nile)

Unnamed: 0,itemId,title,details
0,2405,"Jewel of the Nile, The (1985)",Action|Adventure|Comedy|Romance
1,1291,Indiana Jones and the Last Crusade (1989),Action|Adventure
2,2002,Lethal Weapon 3 (1992),Action|Comedy|Crime|Drama
3,3033,Spaceballs (1987),Comedy|Sci-Fi
4,1375,Star Trek III: The Search for Spock (1984),Action|Adventure|Sci-Fi
5,3638,Moonraker (1979),Action|Romance|Sci-Fi
6,1210,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War
7,2001,Lethal Weapon 2 (1989),Action|Comedy|Crime|Drama
8,1373,Star Trek V: The Final Frontier (1989),Action|Adventure|Sci-Fi
9,2989,For Your Eyes Only (1981),Action


In [32]:
itemName='Lion King, The (1994)'
relatedRecos(itemName)


Unnamed: 0,itemId,title,details
0,588,Aladdin (1992),Animation|Children's|Comedy|Musical
1,595,Beauty and the Beast (1991),Animation|Children's|Musical
2,2081,"Little Mermaid, The (1989)",Animation|Children's|Comedy|Musical|Romance
3,1022,Cinderella (1950),Animation|Children's|Musical
4,2080,Lady and the Tramp (1955),Animation|Children's|Comedy|Musical|Romance
5,1907,Mulan (1998),Animation|Children's
6,2085,101 Dalmatians (1961),Animation|Children's
7,587,Ghost (1990),Comedy|Romance|Thriller
8,1,Toy Story (1995),Animation|Children's|Comedy
9,2355,"Bug's Life, A (1998)",Animation|Children's|Comedy


In [33]:
itemName = "Superman (1978)"

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,2641,Superman II (1980),Action|Adventure|Sci-Fi
1,2642,Superman III (1983),Action|Adventure|Sci-Fi
2,1270,Back to the Future (1985),Comedy|Sci-Fi
3,2011,Back to the Future Part II (1989),Comedy|Sci-Fi
4,2054,"Honey, I Shrunk the Kids (1989)",Adventure|Children's|Comedy|Fantasy|Sci-Fi
5,1375,Star Trek III: The Search for Spock (1984),Action|Adventure|Sci-Fi
6,2407,Cocoon (1985),Comedy|Sci-Fi
7,1097,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi
8,2406,Romancing the Stone (1984),Action|Adventure|Comedy|Romance
9,1721,Titanic (1997),Drama|Romance


In [34]:
itemName = "Back to the Future (1985)"
#itemName='Lion King, The (1994)'
relatedRecos(itemName)


Unnamed: 0,itemId,title,details
0,2011,Back to the Future Part II (1989),Comedy|Sci-Fi
1,2012,Back to the Future Part III (1990),Comedy|Sci-Fi|Western
2,2640,Superman (1978),Action|Adventure|Sci-Fi
3,2054,"Honey, I Shrunk the Kids (1989)",Adventure|Children's|Comedy|Fantasy|Sci-Fi
4,2420,"Karate Kid, The (1984)",Drama
5,2797,Big (1988),Comedy|Fantasy
6,2407,Cocoon (1985),Comedy|Sci-Fi
7,2716,Ghostbusters (1984),Comedy|Horror
8,587,Ghost (1990),Comedy|Romance|Thriller
9,2470,Crocodile Dundee (1986),Adventure|Comedy


In [35]:
itemName = 'Godfather, The (1972)'
relatedRecos(itemName)


Unnamed: 0,itemId,title,details
0,1221,"Godfather: Part II, The (1974)",Action|Crime|Drama
1,1213,GoodFellas (1990),Crime|Drama
2,1263,"Deer Hunter, The (1978)",Drama|War
3,1997,"Exorcist, The (1973)",Horror
4,1084,Bonnie and Clyde (1967),Crime|Drama
5,1276,Cool Hand Luke (1967),Comedy|Drama
6,1953,"French Connection, The (1971)",Action|Crime|Drama|Thriller
7,2019,Seven Samurai (The Magnificent Seven) (Shichin...,Action|Drama
8,1954,Rocky (1976),Action|Drama
9,1208,Apocalypse Now (1979),Drama|War


In [36]:
itemName = 'Lawrence of Arabia (1962)'
relatedRecos(itemName)


Unnamed: 0,itemId,title,details
0,1953,"French Connection, The (1971)",Action|Crime|Drama|Thriller
1,1250,"Bridge on the River Kwai, The (1957)",Drama|War
2,1207,To Kill a Mockingbird (1962),Drama
3,1208,Apocalypse Now (1979),Drama|War
4,913,"Maltese Falcon, The (1941)",Film-Noir|Mystery
5,1233,"Boat, The (Das Boot) (1981)",Action|Drama|War
6,1262,"Great Escape, The (1963)",Adventure|War
7,1193,One Flew Over the Cuckoo's Nest (1975),Drama
8,924,2001: A Space Odyssey (1968),Drama|Mystery|Sci-Fi|Thriller
9,1276,Cool Hand Luke (1967),Comedy|Drama


In [37]:
itemName = 'Good, The Bad and The Ugly, The (1966)'
relatedRecos(itemName)


Unnamed: 0,itemId,title,details
0,2951,"Fistful of Dollars, A (1964)",Action|Western
1,2944,"Dirty Dozen, The (1967)",Action|War
2,1266,Unforgiven (1992),Western
3,1220,"Blues Brothers, The (1980)",Action|Comedy|Musical
4,1276,Cool Hand Luke (1967),Comedy|Drama
5,1208,Apocalypse Now (1979),Drama|War
6,2947,Goldfinger (1964),Action
7,3421,Animal House (1978),Comedy
8,1953,"French Connection, The (1971)",Action|Crime|Drama|Thriller
9,1968,"Breakfast Club, The (1985)",Comedy|Drama


#### Netflix Recommendation Engine (Much much more sophisticated)

1. Explicit behavior (like ratings given)
2. Looks at implicit user behavior (in how many sittings did you finish watching the movie, what time did you watch the movie)
3. Tagging of movie content (crime thriller, adventure, female leads etc etc). All of this is generated by human taggers.


#### Amazon recommendation engine.

1. Purchased shopping carts = real money from real people spent on real items = powerful data and a lot of it.
2. Items added to carts but abandoned.
3. Pricing experiments online (A/B testing, etc.) where they offer the same products at different prices and see the results
4. Wishlists - what's on them specifically for you - and in aggregate it can be treated similarly to another stream of basket analysis data
5. Referral sites (identification of where you came in from can hint other items of interest)
6. Dwell times (how long before you click back and pick a different item)
7. Ratings by you or those in your social network/buying circles - if you rate things you like you get more of what you like and if you confirm with the "i already own it" button they create a very complete profile of you
8. Demographic information (your shipping address, etc.) - they know what is popular in your general area for your kids, yourself, your spouse, etc.
9. user segmentation = did you buy 3 books in separate months for a toddler? likely have a kid or more.. etc.
10. Direct marketing click through data - did you get an email from them and click through? They know which email it was and what you clicked through on and whether you bought it as a result.
11. Click paths in session - what did you view regardless of whether it went in your cart
12. Number of times viewed an item before final purchase

#### Market Basket Analysis
See what is their in your basket and provide recommendations based on what people with such a basket might additionally buy.

1. For example, if basket has milk, then possibly egg and bread and cereal.
2. If basket has beer, then maybe peanuts etc/

### Try to find recommendations based on your favorite movies

In [38]:
def searchForItems(items, searchStr):
    df = items[items['title'].str.contains(searchStr, case=False)]
    return list(df['title'])

In [39]:
searchForItems(items, "rear")

['Rear Window (1954)']

In [40]:
# Specify a movie which exists in the database as Item in the code below.

In [41]:
itemName = 'Rear Window (1954)'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,903,Vertigo (1958),Mystery|Thriller
1,908,North by Northwest (1959),Drama|Thriller
2,1250,"Bridge on the River Kwai, The (1957)",Drama|War
3,1219,Psycho (1960),Horror|Thriller
4,923,Citizen Kane (1941),Drama
5,1953,"French Connection, The (1971)",Action|Crime|Drama|Thriller
6,910,Some Like It Hot (1959),Comedy|Crime
7,912,Casablanca (1942),Drama|Romance|War
8,1204,Lawrence of Arabia (1962),Adventure|War
9,913,"Maltese Falcon, The (1941)",Film-Noir|Mystery


In [42]:
itemName = 'My Fair Lady (1964)'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,1947,West Side Story (1961),Musical|Romance
1,1035,"Sound of Music, The (1965)",Musical
2,899,Singin' in the Rain (1952),Musical|Romance
3,1028,Mary Poppins (1964),Children's|Comedy|Musical
4,920,Gone with the Wind (1939),Drama|Romance|War
5,1097,E.T. the Extra-Terrestrial (1982),Children's|Drama|Fantasy|Sci-Fi
6,1307,When Harry Met Sally... (1989),Comedy|Romance
7,919,"Wizard of Oz, The (1939)",Adventure|Children's|Drama|Musical
8,356,Forrest Gump (1994),Comedy|Romance|War
9,2396,Shakespeare in Love (1998),Comedy|Romance


###  Load the data set of Book Ratings from Good Reads source

In [43]:
# ratings=pd.read_csv('movies/ml-1m/ratings.dat', sep='::', header=None)
# ratings.columns = ['userId', 'movieId', 'rating', 'timeStamp']

corrTableName = "booksCorrTable.pkl"
pathToRatings = 'd:/ml/Data/booksV2/bookRatings.csv'
pathToDetails = 'd:/ml/Data/booksV2/bookInfo.csv'

In [44]:
ratings = pd.read_csv(pathToRatings)

In [45]:
ratings.head(n=5)

Unnamed: 0,userId,itemId,rating
0,22,264,2
1,1138,264,5
2,1160,264,3
3,1217,264,3
4,1572,264,3


In [46]:
# movies=pd.read_csv('movies/ml-1m/movies.dat', sep='::', header=None)
# movies.columns = ['MovieID', 'Title', 'Genres']
items=pd.read_csv(pathToDetails)

In [47]:
items.sample(n=5)

Unnamed: 0,itemId,title,details
1214,625,Midwives,Chris Bohjalian
1598,821,The Best of Me,Nicholas Sparks
1719,1523,"Wither (The Chemical Garden, #1)",Lauren DeStefano
709,1371,Predictably Irrational: The Hidden Forces That...,Dan Ariely
182,195,The Guernsey Literary and Potato Peel Pie Society,"Mary Ann Shaffer, Annie Barrows"


### Find out how many users in total

In [48]:
ratings['userId'].unique().size

3000

### Find out how many movies in total

In [49]:
items.shape[0]

1891

# Build the Pivot Table

In [50]:
pivotTable = ratings.pivot_table(index=['userId'],columns=['itemId'],values='rating')
pivotTable.shape

(3000, 1891)

### Inspect the head

In [51]:
pivotTable.sample(n=5)

itemId,1,2,3,4,5,6,7,8,10,11,...,2998,3105,3132,3150,3231,3345,3384,3422,3436,7373
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
23262,,,,3.0,,,,2.0,3.0,,...,,,,,,,,,,
1138,,3.0,,,4.0,,3.0,4.0,,3.0,...,,,,,,,,,,
47326,4.0,,,,,,,,,3.0,...,,,,,,,,,,
19328,,,,5.0,,,,4.0,,,...,,,,,,,,,,
11524,5.0,5.0,5.0,,4.0,5.0,,4.0,5.0,5.0,...,,,,,,,,,,


##  Find out the correlation matrix of all books with each other

In [52]:
%%time
computeCorr = False
if computeCorr == False:
    corrTable = pd.read_pickle(corrTableName)
else:
    corrTable = pivotTable.corr(min_periods=250)
    corrTable.to_pickle(corrTableName)

Wall time: 52.3 ms


In [53]:
#View the corrtable
corrTable.head()

itemId,1,2,3,4,5,6,7,8,10,11,...,2998,3105,3132,3150,3231,3345,3384,3422,3436,7373
itemId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.234702,0.388496,0.031843,-0.015047,0.18635,-0.008045,0.066032,0.090484,0.245843,...,,,,,,,,,,
2,0.234702,1.0,0.135253,0.119011,0.068083,0.150993,0.278121,0.01233,0.251825,0.165599,...,,,,,,,,,,
3,0.388496,0.135253,1.0,0.008118,-0.090979,0.170944,0.02874,0.116441,0.174361,0.278934,...,,,,,,,,,,
4,0.031843,0.119011,0.008118,1.0,0.333149,,0.123229,0.273784,0.296222,0.172932,...,,,,,,,,,,
5,-0.015047,0.068083,-0.090979,0.333149,1.0,,0.164837,0.397155,0.15106,0.048222,...,,,,,,,,,,


In [54]:
itemName = 'Harry Potter and the Deathly Hallows' 

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,27,Harry Potter and the Half-Blood Prince (Harry ...,"J.K. Rowling, Mary GrandPré"
1,24,Harry Potter and the Goblet of Fire,"J.K. Rowling, Mary GrandPré"
2,21,Harry Potter and the Order of the Phoenix,"J.K. Rowling, Mary GrandPré"
3,18,Harry Potter and the Prisoner of Azkaban,"J.K. Rowling, Mary GrandPré, Rufus Beck"
4,23,Harry Potter and the Chamber of Secrets,"J.K. Rowling, Mary GrandPré"
5,2,Harry Potter and the Philosopher's Stone,"J.K. Rowling, Mary GrandPré"
6,10,Pride and Prejudice,Jane Austen
7,17,"Catching Fire (The Hunger Games, #2)",Suzanne Collins
8,26,"The Da Vinci Code (Robert Langdon, #2)",Dan Brown
9,1,The Hunger Games,Suzanne Collins


In [55]:
itemName = 'Twilight (Twilight, #1)'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,49,"New Moon (Twilight, #2)",Stephenie Meyer
1,52,"Eclipse (Twilight, #3)",Stephenie Meyer
2,56,"Breaking Dawn (Twilight, #4)",Stephenie Meyer
3,1,The Hunger Games,Suzanne Collins
4,26,"The Da Vinci Code (Robert Langdon, #2)",Dan Brown
5,12,"Divergent (Divergent, #1)",Veronica Roth
6,20,"Mockingjay (The Hunger Games, #3)",Suzanne Collins
7,17,"Catching Fire (The Hunger Games, #2)",Suzanne Collins
8,11,The Kite Runner,Khaled Hosseini
9,33,Memoirs of a Geisha,Arthur Golden


In [56]:
itemName = 'The Kite Runner'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,67,A Thousand Splendid Suns,Khaled Hosseini
1,31,The Help,Kathryn Stockett
2,33,Memoirs of a Geisha,Arthur Golden
3,57,The Secret Life of Bees,Sue Monk Kidd
4,3,"Twilight (Twilight, #1)",Stephenie Meyer
5,46,Water for Elephants,Sara Gruen
6,1,The Hunger Games,Suzanne Collins
7,15,The Diary of a Young Girl,"Anne Frank, Eleanor Roosevelt, B.M. Mooyaart-D..."
8,26,"The Da Vinci Code (Robert Langdon, #2)",Dan Brown
9,10,Pride and Prejudice,Jane Austen


In [57]:
itemName = 'The Hunger Games'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,17,"Catching Fire (The Hunger Games, #2)",Suzanne Collins
1,20,"Mockingjay (The Hunger Games, #3)",Suzanne Collins
2,3,"Twilight (Twilight, #1)",Stephenie Meyer
3,12,"Divergent (Divergent, #1)",Veronica Roth
4,73,"The Host (The Host, #1)",Stephenie Meyer
5,64,My Sister's Keeper,Jodi Picoult
6,52,"Eclipse (Twilight, #3)",Stephenie Meyer
7,69,"Insurgent (Divergent, #2)",Veronica Roth
8,53,"Eragon (The Inheritance Cycle, #1)",Christopher Paolini
9,51,"City of Bones (The Mortal Instruments, #1)",Cassandra Clare


In [60]:
itemName = 'Divergent (Divergent, #1)'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,69,"Insurgent (Divergent, #2)",Veronica Roth
1,105,"Allegiant (Divergent, #3)",Veronica Roth
2,1,The Hunger Games,Suzanne Collins
3,17,"Catching Fire (The Hunger Games, #2)",Suzanne Collins
4,3,"Twilight (Twilight, #1)",Stephenie Meyer
5,20,"Mockingjay (The Hunger Games, #3)",Suzanne Collins
6,6,The Fault in Our Stars,John Green
7,21,Harry Potter and the Order of the Phoenix,"J.K. Rowling, Mary GrandPré"
8,2,Harry Potter and the Philosopher's Stone,"J.K. Rowling, Mary GrandPré"


### Try to find recommendations based on your favorite books

In [62]:
searchForItems(items, "Gatsby")

['The Great Gatsby']

In [59]:
# Specify a movie which exists in the database as Item in the code below.

In [63]:
itemName = 'The Great Gatsby'

top10Recos = relatedRecos(itemName)
top10Recos

Unnamed: 0,itemId,title,details
0,8,The Catcher in the Rye,J.D. Salinger
1,14,Animal Farm,George Orwell
2,4,To Kill a Mockingbird,Harper Lee
3,55,Brave New World,Aldous Huxley
4,28,Lord of the Flies,William Golding
5,13,1984,"George Orwell, Erich Fromm, Celâl Üster"
6,63,Wuthering Heights,"Emily Brontë, Richard J. Dunn"
7,15,The Diary of a Young Girl,"Anne Frank, Eleanor Roosevelt, B.M. Mooyaart-D..."
8,29,Romeo and Juliet,"William Shakespeare, Robert Jackson"
9,65,Slaughterhouse-Five,Kurt Vonnegut Jr.


In [65]:
searchForItems(items, "Pride")

['Pride and Prejudice',
 'Pride and Prejudice and Zombies (Pride and Prejudice and Zombies, #1)']

In [68]:
itemName = ''

top10Recos = relatedRecos(itemName)
top10Recos

IndexError: single positional indexer is out-of-bounds