<h1>Case Study 2 - Retail</h1>
BY: MHD.SHADI HASAN

<h2>Problem Background</h2>

BooX is the largest online book rental chain in the coutry. The company charges a fixed fee per month plus rental per book, so the company makes more money when the user rents more books. The company is still unprofitable and is looking for means of improving both their revenue and their profit. At the moment, most of the users return their rentals and do not renew their rental (don’t rent new books).

**Requirments**

It is required to model a recommendation engine so that a user gets recommendations of books based on the behavior of similar users.

The goal of this case study is to increase both top line and bottom line, since more rentals per user would imply more revenue and more profit.

**Data**

There are two data files, Books, and Ratings. The Books file holds information about the available books including:

|Attribute|Values|
|------|------|
|isbn|Special ID for the book|
|book_title|The title of the book as 'object'|
|book_author|The author of the book as 'object'|
|year_of_publication|year in which the book was released|
|publisher|The publisher of the book as 'object'|

While the Ratings file holds the following information:

|Attribute|Values|
|---|---|
|User_id|Unique id of the user|
|Isbn|International standard book number is a unique numeric commercial book identifier|
|Rating|The rating provided by the user|

<h2>Data Preparation</h2>

In [1]:
#import the needed libraries
import pandas as pd
import numpy as np
from math import sqrt

In [2]:
#read the data from CSV files into 2 dataframes representing books and ratings
df_ratings = pd.read_csv('BX-Book-Ratings.csv', encoding = "ISO-8859-1")
df_books = pd.read_csv('BX-Books.csv', encoding = "ISO-8859-1")

  interactivity=interactivity, compiler=compiler, result=result)


<h2>Recommendation Engine</h2>

As the required recommendation engine should recommend books based on the behavior of similar users, Collaborative Filtering will be used for the task, and Pearson Correlation will be used to calculate the similarity between users.

<h3>Collaborative Filtering</h3>

The process of creating a Collaborative Filtering recommendation system is as follows:

    1. Input the user ID to find the books this user has rented
    2. Find other users that have rented the same books
    3. Calculate the similarity between the input user and the other users
    4. Recommend books according to the similarity between users 

In [3]:
#define the function 'recommender' that can be called to any User_ID and recommend books to that user
def recommender(ID):
    #store the ID in a variable
    input_user_id = ID
    #get all the books that this user has read as 'input_user'
    input_user = df_ratings.loc[df_ratings['user_id'] == input_user_id]
    #find other users who have read the same books
    UserSimilars = df_ratings[(df_ratings['isbn'].isin(input_user['isbn'].tolist())) & (df_ratings['user_id'] != input_user_id)]
    #group them by the each UserID and sort them according to how many books they share with the input_user
    UserSimilars_grouped = UserSimilars.groupby(['user_id'])
    UserSimilars_grouped = sorted(UserSimilars_grouped,  key=lambda x: len(x[1]), reverse=True)
    
    #define the pearson corrolation dictionary
    pearsonCorrelationDict = {}

    #for every user group in our subset
    for name, group in UserSimilars_grouped:
        #sort the input and current user group
        group = group.sort_values(by='isbn')
        input_user = input_user.sort_values(by='isbn')
        #get the N for the formula
        nRatings = len(group)
        #get the review scores for the books that they both have in common
        temp_df = input_user[input_user['isbn'].isin(group['isbn'].tolist())]
        #convert to list format and store in a temporary buffer variable
        tempRatingList = temp_df['rating'].tolist()
        #convert current user group reviews to list format
        tempGroupList = group['rating'].tolist()
        #calculate pearson correlation between two users: x and y
        Sxx = sum([i**2 for i in tempRatingList]) - pow(sum(tempRatingList),2)/float(nRatings)
        Syy = sum([i**2 for i in tempGroupList]) - pow(sum(tempGroupList),2)/float(nRatings)
        Sxy = sum( i*j for i, j in zip(tempRatingList, tempGroupList)) - sum(tempRatingList)*sum(tempGroupList)/float(nRatings)
    
        #if the denominator is different than zero, then divide, else, 0 correlation
        if Sxx != 0 and Syy != 0:
            pearsonCorrelationDict[name] = Sxy/sqrt(Sxx*Syy)
        else:
            pearsonCorrelationDict[name] = 0
            
    #build pearson dataframe showing the similarity index for each user
    pearsonDF = pd.DataFrame.from_dict(pearsonCorrelationDict, orient='index')
    pearsonDF.columns = ['similarityIndex']
    pearsonDF['user_id'] = pearsonDF.index
    pearsonDF.index = range(len(pearsonDF))
    
    #sort the users based on their similarity to our input_user
    topUsers = pearsonDF.sort_values(by='similarityIndex', ascending=False)
    #get the books read by each user
    topUsersRating = topUsers.merge(df_ratings, left_on='user_id', right_on='user_id', how='inner')
    #calculate weighted ratings by multiplying the similarity index by the rating
    topUsersRating['weightedRating'] = topUsersRating['similarityIndex']*topUsersRating['rating']
    #group by the book ID then sum similarity index and weighted rating
    tempTopUsersRating = topUsersRating.groupby('isbn').sum()[['similarityIndex','weightedRating']]
    tempTopUsersRating.columns = ['sum_similarityIndex','sum_weightedRating']
    
    #create empty dataframe to store recommendations
    recommendation_df = pd.DataFrame()
    #calculate 'weighted average recommendation score' and store in the 'recommendation' dataframe with the books IDs
    recommendation_df['weighted average recommendation score'] = tempTopUsersRating['sum_weightedRating']/tempTopUsersRating['sum_similarityIndex']
    recommendation_df['isbn'] = tempTopUsersRating.index
    #sort the recommendation dataframe by the 'weighted average recommendation score'
    recommendation_df = recommendation_df.sort_values(by='weighted average recommendation score', ascending=False)
    
    #return the list of books with ID's that are in the top 10 of the 'recommendation' dataframe
    return df_books.loc[df_books['isbn'].isin(recommendation_df.head(10)['isbn'].tolist())]

<h3>Testing the Recommendation Engine</h3>

We will test the recommendation engine with 3 random user ID's.

In [4]:
recommender(276925)

Unnamed: 0,isbn,book_title,book_author,year_of_publication,publisher
6367,042518689X,The Weedless Widow (Antique Lover's Mysteries ...,Deborah Morgan,2002,Berkley Publishing Group
26093,517593335,Lovers,Judith Krantz,1994,Random House Inc
29810,345450736,"Between Sisters (Hannah, Kristin)",KRISTIN HANNAH,2003,Ballantine Books
38085,380726033,Song of the River (Storyteller Trilogy),Sue Harrison,1998,Avon
79653,074324558X,Unwilling Accomplice : A Munch Mancini Crime N...,Barbara Seranella,2004,Scribner
96720,671816810,EMMELINE,Judith Rossner,1981,Pocket
127968,042519051X,Murder in the Pleasure Gardens (Beau Brummell ...,Rosemary Stevens,2003,Berkley Publishing Group
128637,1551669366,Squeeze Play: A Novel,R. J. Kaiser,2002,Mira Books
129249,312271239,"The Mummy's Ransom (Hunter, Fred. Ransom/Chart...",Fred Hunter,2002,St. Martin's Minotaur


In [5]:
recommender(78783)

Unnamed: 0,isbn,book_title,book_author,year_of_publication,publisher
1772,3423620196,Theos Reise. Roman Ã?Â¼ber die Religionen der ...,Catherine Clement,2000,Dtv
35197,3257218699,Tod Im Herbst,Nabb,0,Diogenes Verlag AG
57199,373484623,Finding Home (3 Novels in 1),Linda Howard,2002,Silhouette
66353,688089291,"Papa, My Father: A Celebration of Dads",Leo F. Buscaglia,1989,Harpercollins
134936,590129880,The Fat Cat Sat On The Mat (An I Can Read Book),Nurit Karlin,1997,Scholastic Inc.
157614,671836757,DEATH SHALL OVERCM,Emma Lathen,1981,Pocket
165801,435900439,The Beautiful Ones Are Not Yet Born,Ayi Kwei Armah,1980,Heinemann
167715,679600841,War and Peace (Modern Library),Leo Tolstoy,1994,Modern Library
208673,30018897,"\B\"" is for burglar (A Kinsey Millhone mystery)""",Sue Grafton,1985,"Holt, Rinehart, and Winston"


In [6]:
recommender(144555)

Unnamed: 0,isbn,book_title,book_author,year_of_publication,publisher
15286,743202236,A Mind at a Time,Mel Levine,2003,Simon &amp; Schuster
33760,671657518,HT STOP WORRYING R,Dale Carnegie,1987,Pocket
38935,679439374,The First Man,Albert Camus,1995,Alfred A. Knopf
87636,679743774,"The Mansion on the Hill : Dylan, Young, Geffen...",FRED GOODMAN,1998,Vintage
90189,1561480711,Favorite Recipes from Quilters: More Than 900 ...,Louise Stoltzfus,1992,Good Books
110904,1564556786,Eternal Echoes: Exploring Our Yearning to Belong,John O'Donohue,1999,Sounds True
130429,809122774,The Invisible Partners: How the Male and Femal...,John A. Sanford,1980,Paulist Press
222353,802400086,Treasures of the Snow,Patricia St. John,1950,Moody Pr
