## Recommender Model

In this exercise, I attempt to build a simple recommender model using <code>NearsetNeighbors</code> and <code>cosine_similarity</code> buy defining a function that accepts a book name, amongst other parameters, and returns 5 recommendations based on the chosen book and their degrees of cosine similarity.

In [1]:
#loading the necessary python libraries
import pandas as pd
import numpy as np

import seaborn as sns
sns.set(font_scale=1.5)

import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'
%matplotlib inline
plt.style.use('fivethirtyeight')

In [2]:
#loading the necessary scikit-learn modules
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import cosine_similarity 

In [3]:
#loading the dataset into a pandas dataframe
amazon_books = pd.read_csv('../data-sources/amazon-bestsellers/amazon-bestsellers.csv')

#renaming the columns
amazon_books.columns = ['name', 'author', 'rating', 'reviews', 'price', 'year', 'genre']

#dropping the duplicate entries that appear across multiple years
amazon_books.drop_duplicates(subset='name', inplace=True, ignore_index=True)

#displaying the first 5 rows
amazon_books.head()

Unnamed: 0,name,author,rating,reviews,price,year,genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


The genre column contains categorical data which should be converted to binary. This can be done by using LabelEncoder from scikit-learn.

In [4]:
#instantiating a labelEncoder object
le = LabelEncoder()

#convering the genre column to binary
amazon_books['genre'] = le.fit_transform(amazon_books['genre'])

In [6]:
#dummifying the author names
amazon_books = pd.get_dummies(amazon_books, columns=['author'], drop_first=True)

(351, 253)

Given that this recommendation system depends on accurate calculation of distances, scaling the data is of utmost importance. Here I've chosen to use StandardScaler from scikit-learn to achieve this.

In [7]:
#separating continuous features for scaling 
scalable_cols = amazon_books[['rating', 'reviews', 'price', 'year']]

#instantiating a StandardScaler object and tranforming all the availabel continuous features
scaler = StandardScaler()
scalable_cols = pd.DataFrame(scaler.fit_transform(scalable_cols), columns=scalable_cols.columns)

Unnamed: 0,rating,reviews,price,year
0,0.402782,0.695506,-0.505844,0.757873
1,-0.039019,-0.713687,0.88906,-0.767433
2,0.402782,0.845563,0.191608,1.367996
3,0.402782,1.070787,-0.705116,1.062934
4,0.844583,-0.196639,-0.1073,1.673057


In [8]:
#inserting the scaled features back into the original dataframe
amazon_books[['rating', 'reviews', 'price', 'year']] = scalable_cols

implementing this recommender model as a function would make it easier to integrate it into a larger project further down the line. It also makes it easier to use as a stand-alone system. 

In [41]:
#defining a recommendation function
def BookRecommender(X, num_recommendation=5, selected_book, metric='manhattan'):
    
    """This function is used to recommend at least five similar books to the one chosen
       by the user and sorts them by degree of similarity. It used the following parameters to make its recommendations:
       
       database: a pandas dataframe containing the book names and related data
       
       num_recommendation: the number of recommended books set by the user, default = 5
       
       selected_book: A book chosen by the user form the database
       
       metric: the distance metric to use when finding the nearset neighbors, default = manhattan"""
    
    #instantiating a NearestNeighbors object and fitting it to the dataframe
    neighbor_finder = NearestNeighbors(n_neighbors=num_recommendations+1, metric=metric)
    neighbor_finder.fit(X.iloc[:, 1:])
    
    #getting the distances indices from the fitted model and creating a list of closest neighbors
    distances, indices = neighbor_finder.kneighbors(X.set_index('name', drop=True).loc[[selected_book]])
    recommended_books = [X.loc[i][0] for i in indices.flatten()][1:]
    
    #creating an empty list and appending the degree of similarity between the user selected book and each
    #of the closest neighbors using cosine_similarity
    similarity = []
    for book in recommended_books:
        similarity.append(cosine_similarity(X.set_index('name', drop=True).loc[[selected_book]], \
                                            X.set_index('name', drop=True).loc[[book]]).flatten()[0])
    
    #putting the recommendations and their rate of similarities in a dataframe and sorting by similarity
    recommended_books = pd.DataFrame({'Recommended Books': recommended_books, 'Similarity': similarity})
    recommended_books.sort_values('Similarity', ascending=False, inplace=True, ignore_index=True)
    
    #returning this dataframe
    print(f'These books best match "{selected_book}":')
    return recommended_books

In [45]:
#asking for 5 recommendations based on euclidean distance
BookRecommender(amazon_books, 5, amazon_books.iloc[35, 0], 'euclidean')

These books best match "Broke: The Plan to Restore Our Trust, Truth and Treasure":


Unnamed: 0,Recommended Books,Similarity
0,Arguing with Idiots: How to Stop Small Minds a...,0.964471
1,Glenn Beck's Common Sense: The Case Against an...,0.957089
2,"Game Change: Obama and the Clintons, McCain an...",0.759736
3,Food Rules: An Eater's Manual,0.757707
4,The Daily Show with Jon Stewart Presents Earth...,0.751448


In [46]:
#asking for 5 recommendations based on manhattan distance
BookRecommender(amazon_books, 5, amazon_books.iloc[35, 0], 'manhattan')

These books best match "Broke: The Plan to Restore Our Trust, Truth and Treasure":


Unnamed: 0,Recommended Books,Similarity
0,Arguing with Idiots: How to Stop Small Minds a...,0.964471
1,Glenn Beck's Common Sense: The Case Against an...,0.957089
2,"Game Change: Obama and the Clintons, McCain an...",0.759736
3,Food Rules: An Eater's Manual,0.757707
4,The Daily Show with Jon Stewart Presents Earth...,0.751448
