# Amazon Recommendation System - Lab

## Introduction

Now that you've gotten an introduction to collaborative filtering and recommendation systems, it's time to put your skills to test and build a recommendation system for a real world dataset! For this lab, you'll be using a dataset regarding the book reviews on the Amazon marketplace. While the previous lesson focused on user-based recommendation systems, you'll apply a parallel process for an item-based recommendation system to recommend similar books at the bottom of the product page.

## Objectives

In this lab you will: 

- Use graph-based similarity metrics to create a collaborative filtering recommender system

## Load the Dataset

In [1]:
import pandas as pd
import networkx as nx
G = nx.Graph()

df = pd.read_csv('books_data.edgelist', names=['source', 'target', 'weight'], delimiter=' ')
df.head()

Unnamed: 0,source,target,weight
0,827229534,0804215715,0.7
1,827229534,156101074X,0.5
2,827229534,0687023955,0.8
3,827229534,0687074231,0.8
4,827229534,082721619X,0.7


## Load the Metadata 

Import the metadata available in the file `'books_meta.txt'` (note it is `'\t'` seperated). 

In [2]:
# Your code here
df_meta = pd.read_csv("books_meta.txt", sep="\t")
df_meta

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
0,1,0827229534,Patterns of Preaching: A Sermon Sampler,clergi sermon subject religion preach spiritu ...,Book,396585,2,5.0,8,0.80
1,2,0738700797,Candlemas: Feast of Flames,subject witchcraft earth religion spiritu base...,Book,168596,12,4.5,9,0.85
2,3,0486287785,World War II Allied Fighter Planes Trading Cards,general hobbi subject craft home garden book,Book,1270652,1,5.0,0,0.00
3,4,0842328327,Life Application Bible Commentary: 1 and 2 Tim...,spiritu translat commentari christian book gui...,Book,631289,1,4.0,6,0.79
4,5,1577943082,Prayers That Avail Much for Business: Executive,subject religion spiritu busi christian live w...,Book,455160,0,0.0,4,1.00
...,...,...,...,...,...,...,...,...,...,...
393556,548541,9700507734,Para alcanzar el orgasmo,mind general subject health bodi book,Book,0,1,4.0,0,0.00
393557,548542,9627762644,Starting a Hedge Fund : A US Perspective,general subject busi book invest,Book,0,3,2.5,0,0.00
393558,548543,0970020503,Facts Every Injured Worker Should Know,general subject busi book law practic guid lab...,Book,0,5,4.5,0,0.00
393559,548546,1930519206,Adobe Photoshop 6 VTC Training CD,book com offic graphic subject photoshop inter...,Book,0,2,5.0,0,0.00


## Select Books to Test Your Recommender On

Select a small subset of books that you are interested in generating recommendations for. 

In [3]:
# Your code here
hf = df_meta[df_meta["Title"].str.contains("Hedge Funds:")]
hf

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
107224,149464,155738861X,Hedge Funds: Investment and Portfolio Strategi...,general subject home invest futur busi amazon ...,Book,48790,1,5.0,5,0.84
128487,179332,0471899739,Hedge Funds: Courtesans of Capitalism,general financ subject home invest futur accou...,Book,621163,1,1.0,1,0.0
163036,227380,3906765334,Marketing of Hedge Funds: A Key Strategic Vari...,general financ subject home invest busi amazon...,Book,737868,0,0.0,2,1.0
224689,312295,0470844779,Hedge Funds: Myths and Limits,general financ subject home invest futur accou...,Book,167312,6,4.5,5,0.82
368971,508564,1557389179,The Handbook of Managed Futures and Hedge Fund...,general subject home option invest futur busi ...,Book,417003,0,0.0,2,0.0


## Generate Recommendations for a Few Books of Choice

The `'books_data.edgelist'` has conveniently already calculated the distance between items for you. Given this preprocessed data, it's time to employ collaborative filtering to generate recommendations! Generate the top 10 recommendations for each book in the subset you chose. Be sure to print the book name that you are generating recommendations for as well as the name of the books being recommended. 

In [4]:
# Your code here

name_dict = dict(zip(df_meta["ASIN"], df_meta["Title"]))
for i in hf.index:
    book_id = hf["ASIN"][i]
    book_name = name_dict[book_id]
    top_similar = df[(df["source"]==book_id) | (df["target"]==book_id)]\
                    .sort_values(by="weight", ascending=False)\
                    .head(10)
    print(book_name)
    print("--------------------------------------")
    print("Recommendations:\n")
    
    for i in top_similar.index:
        if top_similar["source"][i] == book_id:
            print(top_similar["target"].map(name_dict)[i])
        else:
            print(top_similar["source"].map(name_dict)[i])
    
    print("\n")

Hedge Funds: Investment and Portfolio Strategies for the Institutional Investor
--------------------------------------
Recommendations:

Getting Started in Hedge Funds
How to Create and Manage a Hedge Fund: A Professional's Guide
Hedge Funds: Myths and Limits
Hedge Fund of Funds Investing: An Investor's Guide
All About Hedge Funds : The Easy Way to Get Started


Hedge Funds: Courtesans of Capitalism
--------------------------------------
Recommendations:

When Genius Failed : The Rise and Fall of Long-Term Capital Management


Marketing of Hedge Funds: A Key Strategic Variable in Defining Possible Roles of an Emerging Investment Force
--------------------------------------
Recommendations:

How to Create and Manage a Hedge Fund: A Professional's Guide
All About Hedge Funds : The Easy Way to Get Started


Hedge Funds: Myths and Limits
--------------------------------------
Recommendations:

How to Create and Manage a Hedge Fund: A Professional's Guide
Hedge Fund of Funds Investing: An I

## Summary

Well done! In this lab, you effectively created a recommendation system for a real world dataset!