# Amazon Recommendation System - Lab

## Introduction

Now that you've gotten an introduction to collaborative filtering and recommendation systems, it's time to put your skills to test and build a recommendation system for a real world dataset! For this lab, you'll be using a dataset regarding the book reviews on the Amazon marketplace. While the previous lesson focused on user-based recommendation systems, you'll apply a parallel process for an item-based recommendation system to recommend similar books at the bottom of the product page.

## Objectives

In this lab you will: 

- Use graph-based similarity metrics to create a collaborative filtering recommender system

## Load the Dataset

In [1]:
import pandas as pd
import networkx as nx
G = nx.Graph()

df = pd.read_csv('books_data.edgelist', names=['source', 'target', 'weight'], delimiter=' ')
df.head()

Unnamed: 0,source,target,weight
0,827229534,0804215715,0.7
1,827229534,156101074X,0.5
2,827229534,0687023955,0.8
3,827229534,0687074231,0.8
4,827229534,082721619X,0.7


## Load the Metadata 

Import the metadata available in the file `'books_meta.txt'` (note it is `'\t'` seperated). 

In [4]:
df_meta = pd.read_csv('./books_meta.txt', sep='\t')
df_meta.head()

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
0,1,827229534,Patterns of Preaching: A Sermon Sampler,clergi sermon subject religion preach spiritu ...,Book,396585,2,5.0,8,0.8
1,2,738700797,Candlemas: Feast of Flames,subject witchcraft earth religion spiritu base...,Book,168596,12,4.5,9,0.85
2,3,486287785,World War II Allied Fighter Planes Trading Cards,general hobbi subject craft home garden book,Book,1270652,1,5.0,0,0.0
3,4,842328327,Life Application Bible Commentary: 1 and 2 Tim...,spiritu translat commentari christian book gui...,Book,631289,1,4.0,6,0.79
4,5,1577943082,Prayers That Avail Much for Business: Executive,subject religion spiritu busi christian live w...,Book,455160,0,0.0,4,1.0


## Select Books to Test Your Recommender On

Select a small subset of books that you are interested in generating recommendations for. 

In [7]:
df_meta[df_meta['Title'].str.contains('Star Wars')]

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
976,1371,0553298046,Specter of the Past (Star Wars: The Hand of Th...,general subject media war seri fantasi book sc...,Book,18558,221,4.5,16,0.47
2283,3218,0553486519,A New Hope (Choose Your Own Star Wars Adventures),literatur action book com media fiction genera...,Book,429785,2,4.5,0,0.00
2377,3318,0375800050,Learning Word Sounds: Kindergarten (Star Wars ...,art general subject media nonfict episod war c...,Book,710560,1,4.0,2,0.00
2547,3562,0425168255,The Golden Globe (Star Wars: Junior Jedi Knights),adventur general subject literatur magic actio...,Book,393200,18,4.0,6,0.79
2708,3771,0553564927,The Last Command (Star Wars: The Thrawn Trilog...,general subject media war trilog seri fantasi ...,Book,1949,136,4.5,14,0.39
3255,4559,0345435389,Rogue Planet (Star Wars),general subject media b author bear war z seri...,Book,152907,171,3.5,4,0.90
4099,5747,0345420675,The Essential Guide to Droids (Star Wars),nonfict general entertain subject media war se...,Book,104889,12,5.0,7,0.83
5042,7083,0345440749,"Galactic Phrase Book & Travel Guide: Beeps, Bl...",general entertain subject media war critic ser...,Book,100113,7,4.5,4,0.43
7935,11312,0786928794,Coruscant and the Core Worlds (Star Wars Rolep...,general entertain subject media war seri fanta...,Book,15656,7,4.5,11,0.75
9645,13831,0553472046,Star Wars: Showdown at Centerpoint (Corellian ...,general subject media author macbrid roger war...,Book,599320,25,3.5,5,0.80


In [13]:
# Lets rexamine our fascination with Game of Thrones
myfav = df_meta[df_meta.Title.str.contains('Star Wars')][:10]
myfav

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
976,1371,553298046,Specter of the Past (Star Wars: The Hand of Th...,general subject media war seri fantasi book sc...,Book,18558,221,4.5,16,0.47
2283,3218,553486519,A New Hope (Choose Your Own Star Wars Adventures),literatur action book com media fiction genera...,Book,429785,2,4.5,0,0.0
2377,3318,375800050,Learning Word Sounds: Kindergarten (Star Wars ...,art general subject media nonfict episod war c...,Book,710560,1,4.0,2,0.0
2547,3562,425168255,The Golden Globe (Star Wars: Junior Jedi Knights),adventur general subject literatur magic actio...,Book,393200,18,4.0,6,0.79
2708,3771,553564927,The Last Command (Star Wars: The Thrawn Trilog...,general subject media war trilog seri fantasi ...,Book,1949,136,4.5,14,0.39
3255,4559,345435389,Rogue Planet (Star Wars),general subject media b author bear war z seri...,Book,152907,171,3.5,4,0.9
4099,5747,345420675,The Essential Guide to Droids (Star Wars),nonfict general entertain subject media war se...,Book,104889,12,5.0,7,0.83
5042,7083,345440749,"Galactic Phrase Book & Travel Guide: Beeps, Bl...",general entertain subject media war critic ser...,Book,100113,7,4.5,4,0.43
7935,11312,786928794,Coruscant and the Core Worlds (Star Wars Rolep...,general entertain subject media war seri fanta...,Book,15656,7,4.5,11,0.75
9645,13831,553472046,Star Wars: Showdown at Centerpoint (Corellian ...,general subject media author macbrid roger war...,Book,599320,25,3.5,5,0.8


## Generate Recommendations for a Few Books of Choice

The `'books_data.edgelist'` has conveniently already calculated the distance between items for you. Given this preprocessed data, it's time to employ collaborative filtering to generate recommendations! Generate the top 10 recommendations for each book in the subset you chose. Be sure to print the book name that you are generating recommendations for as well as the name of the books being recommended. 

In [14]:
# Well, got a couple or extraneous results in there, but perhaps good measure for comparion.
# What does our recommender return for these books?
rec_dict = {}
id_name_dict = dict(zip(df_meta.ASIN, df_meta.Title))
for row in myfav.index:
    book_id = myfav.ASIN[row]
    book_name = id_name_dict[book_id]
    most_similar = df[(df.source == book_id)
                      | (df.target == book_id)
                     ].sort_values(by='weight', ascending=False).head(10)
    most_similar['source_name'] = most_similar['source'].map(id_name_dict)
    most_similar['target_name'] = most_similar['target'].map(id_name_dict)
    recommendations = []
    for row in most_similar.index:
        if most_similar.source[row] == book_id:
            recommendations.append((most_similar.target_name[row], most_similar.weight[row]))
        else:
            recommendations.append((most_similar.source_name[row], most_similar.weight[row]))
    rec_dict[book_name] = recommendations
    print('Recommendations for:', book_name)
    for r in recommendations:
        print(r)
    print('\n\n')

Recommendations for: Specter of the Past (Star Wars: The Hand of Thrawn, Book One)
('The Last Command (Star Wars: The Thrawn Trilogy, Vol. 3)', 0.67)
('Dark Force Rising (Star Wars: The Thrawn Trilogy, Vol. 2)', 0.67)
('Heir to the Empire (Star Wars: The Thrawn Trilogy, Vol. 1)', 0.67)
('Heir to the Empire (Star Wars: The Thrawn Trilogy, Vol. 1)', 0.62)
('Dark Force Rising (Star Wars Vol. 2)', 0.62)
('Star Wars : Thrawn Omnibus', 0.62)
('Star Wars: Showdown at Centerpoint (Corellian Trilogy, No 3)', 0.59)
('The Corellian Trilogy Value Collection : Ambush at Corellia, Assault at Selonia, and Showdown at Centerpoint', 0.59)
('The Last Command (Star Wars: Thrawn Trilogy, Vol. 3)', 0.59)
('Star Wars: Champions of the Force/Dark Apprentice/Jedi Search', 0.56)



Recommendations for: A New Hope (Choose Your Own Star Wars Adventures)



Recommendations for: Learning Word Sounds: Kindergarten (Star Wars Fun-To-Learn Books)
('Writing Numbers 1 to 10: Preschool-Kindergarten (Star Wars Learning F

## Summary

Well done! In this lab, you effectively created a recommendation system for a real world dataset!