# Recommendation Systems - Lab

## Introduction

Now that you've gotten an introduction to collaborative filtering and recommendation systems, it's time to put your skills to test and attempt to build a recommendation system for a real world dataset! For this exercise, you'll be using a dataset regarding the book reviews on the Amazon marketplace. While the previous lesson focused on user-based recommendation systems, you'll apply a parallel process for an item-based recommendation system to recommend similar books at the bottom of the product page.

## Objectives

You will be able to:
* Implement a recommendation system on a real world dataset

In [1]:
import networkx as nx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Load the Dataset

In [20]:
df = pd.read_csv('books_data.edgelist', delim_whitespace=True, names=['source', 'target', 'weight'])
df.head()

Unnamed: 0,source,target,weight
0,827229534,0804215715,0.7
1,827229534,156101074X,0.5
2,827229534,0687023955,0.8
3,827229534,0687074231,0.8
4,827229534,082721619X,0.7


In [17]:
len(df.source.unique())

161885

In [18]:
len(df.target.unique())

251247

## Load the MetaData

In [4]:
meta = pd.read_csv('books_meta.txt', delimiter='\t')
meta.head()

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
0,1,827229534,Patterns of Preaching: A Sermon Sampler,clergi sermon subject religion preach spiritu ...,Book,396585,2,5.0,8,0.8
1,2,738700797,Candlemas: Feast of Flames,subject witchcraft earth religion spiritu base...,Book,168596,12,4.5,9,0.85
2,3,486287785,World War II Allied Fighter Planes Trading Cards,general hobbi subject craft home garden book,Book,1270652,1,5.0,0,0.0
3,4,842328327,Life Application Bible Commentary: 1 and 2 Tim...,spiritu translat commentari christian book gui...,Book,631289,1,4.0,6,0.79
4,5,1577943082,Prayers That Avail Much for Business: Executive,subject religion spiritu busi christian live w...,Book,455160,0,0.0,4,1.0


## Select Books to Test Your Recommender On

Select a small subset of books that you are interested in generating recommendations for. 

In [29]:
subset = meta[meta.Title.str.contains('Goosebumps')][:5]
subset

Unnamed: 0,Id,ASIN,Title,Categories,Group,SalesRank,TotalReviews,AvgRating,DegreeCentrality,ClusteringCoeff
37,47,590568833,How to Kill a Monster (Goosebumps),literatur book r fiction general subject l z a...,Book,654748,10,5.0,0,0.0
3012,4217,590568906,Chicken Chicken (Goosebumps),general subject literatur author illustr l z h...,Book,318036,22,4.0,2,1.0
9384,13432,590673203,Beware of the Purple Peanut Butter (Give Yours...,general subject literatur author illustr l z h...,Book,212201,15,4.5,3,1.0
10640,15194,590568841,Legend of the Lost Legend (Goosebumps),general subject literatur author illustr l z h...,Book,484234,6,4.5,2,1.0
10641,15195,590494503,"You Can't Scare Me! (Goosebumps, No 15)",literatur book r fiction general subject l z a...,Book,723894,4,5.0,2,0.0


## Generate Recommendations for a Few Books of Choice

The 'books_data.edgelist' has conveniently already calculated the distance between items for you. Given this preprocessed and data, it's time to employ collaborative filtering to generate recommendations! Generate the top 10 recommendations for each book in the subset you chose. Be sure to print the book name that you are generating recommendations for as well as the name of the books being recommended.

In [33]:
rec_dict = {}
id_name_dict = dict(zip(meta.ASIN, meta.Title))
for row in subset.index:
    book_id = subset.ASIN[row]
    book_name = id_name_dict[book_id]
    most_similar = df[(df.source==book_id)
                      | (df.target==book_id)
                     ].sort_values(by='weight', ascending=False).head(10)
    most_similar['source_name'] = most_similar['source'].map(id_name_dict)
    most_similar['target_name'] = most_similar['target'].map(id_name_dict)
    recommendations = []
    for row in most_similar.index:
        if most_similar.source[row] == book_id:
            recommendations.append((most_similar.target_name[row], most_similar.weight[row]))
        else:
            recommendations.append((most_similar.source_name[row], most_similar.weight[row]))
    rec_dict[book_name] = recommendations
    print("Recommendations for:", book_name)
    for r in recommendations:
        print(r)
    print('\n\n')

Recommendations for: How to Kill a Monster (Goosebumps)



Recommendations for: Chicken Chicken (Goosebumps)
("Don't Go to Sleep! (Goosebumps)", 1.0)
('Deep Trouble II (Goosebumps)', 0.9)



Recommendations for: Beware of the Purple Peanut Butter (Give Yourself Goosebumps, No 6)
('The Creepy Creations of Professor Shock (Give Yourself Goosebumps, No 14)', 1.0)
("Tick Tock, You're Dead! (Give Yourself Goosebumps)", 1.0)
('Secret Agent Grandma (Give Yourself Goosebumps, No 16)', 1.0)



Recommendations for: Legend of the Lost Legend (Goosebumps)
('Deep Trouble II (Goosebumps)', 0.9)
('The Blob That Ate Everyone (Goosebumps, No 55)', 0.9)



Recommendations for: You Can't Scare Me! (Goosebumps, No 15)
('Be Careful What You Wish For... (Goosebumps, No 12)', 0.9)
('The Headless Ghost (Goosebumps, No 37)', 0.9)



