# BOOK RECOMMENDER SYSTEM


## Introduction

This project focuses on demonstrating a collaborative-filtering recommendation system.

The dataset is taken from __[Goodreads Datasets](https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home)__, which contains three group of datasets:
* meta-data of the books
* user-book interactions
* users' detailed book reviews

However, this project only use **meta-data** and **user-book interactions** in order to build a book collaborative-filtering recommender system and we mainly focus on the **Comics & Graphic Genres** because the original dataset is too large (with over 2M books and 228M interactions)

## Implementation

We work on these 2 datasets but the given format is *.gz* 
* goodreads_books_comics_graphic.json.gz (89,411 books)
* goodreads_interactions_comics_graphic.json.gz (7,347,630 interactions)

For simplicity, I've already parse both json file into csv file and only take the important field

For book metadata, I parse it into *titles.csv* with below fields:
* *title*: title of the book
* *book_id*: unique id of a book
* *ratings* : number of ratings 
* *url* : goodreads url of the book
* *cover_image*: image url of the book

For interaction, I parse it into *interactions.csv* with below fields:
* *user_id*: unique id of a user
* *book_id*: unique id of a book
* *rating* : rating that a user give a book

and because *interactions.csv* is quite large (315MB) so you can download it __[here](https://drive.google.com/file/d/1fey5xMQkP4k2bbPVpwn0DeQqx5CZpxZM/view?usp=sharing)__

### Part I

In [110]:
import gzip
import json

In [111]:
import pandas as pd
titles = pd.read_csv('titles.csv')
titles["ratings"] = pd.to_numeric(titles["ratings"])

In [112]:
titles.head(5)

Unnamed: 0,book_id,title,ratings,url,cover_image
0,30128855,Cruelle,16,https://www.goodreads.com/book/show/30128855-c...,https://images.gr-assets.com/books/1462644346m...
1,13571772,Captain America: Winter Soldier (The Ultimate ...,51,https://www.goodreads.com/book/show/13571772-c...,https://images.gr-assets.com/books/1333287305m...
2,707611,"Superman Archives, Vol. 2",51,https://www.goodreads.com/book/show/707611.Sup...,https://images.gr-assets.com/books/1307838888m...
3,2250580,"A.I. Revolution, Vol. 1",46,https://www.goodreads.com/book/show/2250580.A_...,https://s.gr-assets.com/assets/nophoto/book/11...
4,27036536,"War Stories, Volume 3",39,https://www.goodreads.com/book/show/27036536-w...,https://images.gr-assets.com/books/1445402463m...


Process title and create *mod_title* so that we can query a book by its name. In this step, we basically keep only normal characters and strip the unncessary whitespace

In [113]:
import re

def process_title(title):
    mod_title = re.sub("[^a-zA-Z0-9 ]", "", title)
    mod_title = mod_title.lower()
    mod_title = re.sub("\s+", " ", mod_title)
    return mod_title

In [114]:
titles["mod_title"] = titles["title"].apply(process_title)
titles = titles[titles["mod_title"].str.len() > 0]

In [115]:
titles.head(5)

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
0,30128855,Cruelle,16,https://www.goodreads.com/book/show/30128855-c...,https://images.gr-assets.com/books/1462644346m...,cruelle
1,13571772,Captain America: Winter Soldier (The Ultimate ...,51,https://www.goodreads.com/book/show/13571772-c...,https://images.gr-assets.com/books/1333287305m...,captain america winter soldier the ultimate gr...
2,707611,"Superman Archives, Vol. 2",51,https://www.goodreads.com/book/show/707611.Sup...,https://images.gr-assets.com/books/1307838888m...,superman archives vol 2
3,2250580,"A.I. Revolution, Vol. 1",46,https://www.goodreads.com/book/show/2250580.A_...,https://s.gr-assets.com/assets/nophoto/book/11...,ai revolution vol 1
4,27036536,"War Stories, Volume 3",39,https://www.goodreads.com/book/show/27036536-w...,https://images.gr-assets.com/books/1445402463m...,war stories volume 3


Vectorize each *mod_title* using TF-IDF and find the most similar title with the given book name. We find the similarity by using *cosine_similarity*

In [116]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer()

tfidf_mat = tfidf_vectorizer.fit_transform(titles["mod_title"])

In [117]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def make_clickable(val):
    return '<a target= "_blank" href="{}">Goodreads</a>'.format(val)

def show_image(val):
    return '<img src="{}" width=50></img>'.format(val)

def search_book(query, tfidf_vectorizer):
    processed_q = process_title(query)
    query_vec = tfidf_vectorizer.transform([processed_q])
    similarity = cosine_similarity(query_vec,tfidf_mat).flatten()

    indices = np.argpartition(similarity, -10)[-10:]
    results = titles.iloc[indices]
    results = results.sort_values("ratings", ascending=False)
    return results.head(5).style.format({'url': make_clickable, 'cover_image':show_image})

In [118]:
search_book('Superman', tfidf_vectorizer)

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
53582,19993681,Superman #1,336,Goodreads,,superman 1
20859,19738768,Superman #2,108,Goodreads,,superman 2
32364,19749416,Superman #3,79,Goodreads,,superman 3
16936,20101276,Superman #6,63,Goodreads,,superman 6
16378,19652558,Superman #5,61,Goodreads,,superman 5


Simulate a list of favorite books so we can perform some recommendation based on these books

In [119]:
liked_books = ["1237398", "364950", "364958", "364954", "1532905"]

In [120]:
interactions = pd.read_csv('interactions.csv')
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["rating"] = pd.to_numeric(interactions["rating"])

In [121]:
interactions

Unnamed: 0,user_id,book_id,rating
0,8842281e1d1347389f2ab93d60773d4d,836610,0
1,8842281e1d1347389f2ab93d60773d4d,7648967,0
2,8842281e1d1347389f2ab93d60773d4d,15704307,0
3,8842281e1d1347389f2ab93d60773d4d,6902644,0
4,8842281e1d1347389f2ab93d60773d4d,9844623,0
...,...,...,...
7347625,bd3ac2e547a4f521927056cbd6bb5c2f,1484167,5
7347626,bd3ac2e547a4f521927056cbd6bb5c2f,122451,5
7347627,6384a10d5611945b26b25c971f348fa4,85574,3
7347628,e9aea57d21cdf9d91a65687d59518924,15197,5


Find the list of user who like the same book

In [122]:
#find overlap interaction, this is interaction of user who like book in our favorite book list
overlap_interactions = interactions[interactions["book_id"].isin(liked_books)]
overlap_interactions = overlap_interactions[overlap_interactions["rating"] >= 3]
overlap_interactions

Unnamed: 0,user_id,book_id,rating
3611,6b5ffddfaca8dec2049e0bb0e2d6edf6,364958,5
3613,6b5ffddfaca8dec2049e0bb0e2d6edf6,364954,3
3615,6b5ffddfaca8dec2049e0bb0e2d6edf6,364950,3
3620,6b5ffddfaca8dec2049e0bb0e2d6edf6,1237398,5
3711,3cd7e962765c795dea97babd41215e99,1237398,5
...,...,...,...
7346805,5a2154b4a0df45dcc946dbc6db4fa215,1237398,3
7347086,6fb7a4172710f1d8bcdc658a302450dc,1237398,5
7347377,1b51feff1cb53697b11e97ebf65e5595,1237398,4
7347535,0b7de731a1d5bfe06bc4d8e2939c1b94,364958,5


In [123]:
overlap_users = overlap_interactions["user_id"].unique()
overlap_users

array(['6b5ffddfaca8dec2049e0bb0e2d6edf6',
       '3cd7e962765c795dea97babd41215e99',
       'e6d35f5d6eed3b8981a224d43c24f2b7', ...,
       '6fb7a4172710f1d8bcdc658a302450dc',
       '1b51feff1cb53697b11e97ebf65e5595',
       '0b7de731a1d5bfe06bc4d8e2939c1b94'], dtype=object)

In part I, we gonna use a basic intuitive approach to provide books recommendation:

* Get list of user who also like the same book in the favorite list
* Get all the books that the above list of user rated
* Count number of appearance of each book 
* Calculate the popularity score of each book, make sure that the recommender don't just always recommend the most rated books
* Produce the recommendation based on popularity score


In [124]:
rec_books = interactions[interactions["user_id"].isin(overlap_users)]
# get list of book that common user also rated

In [125]:
import pandas as pd

recs = pd.DataFrame(rec_books, columns = ["user_id", "book_id", "rating"])
recs["book_id"] = recs["book_id"].astype(str)
recs

Unnamed: 0,user_id,book_id,rating
3510,6b5ffddfaca8dec2049e0bb0e2d6edf6,33583817,3
3511,6b5ffddfaca8dec2049e0bb0e2d6edf6,431274,5
3512,6b5ffddfaca8dec2049e0bb0e2d6edf6,32473025,4
3513,6b5ffddfaca8dec2049e0bb0e2d6edf6,17671951,4
3514,6b5ffddfaca8dec2049e0bb0e2d6edf6,31140467,3
...,...,...,...
7347548,0b7de731a1d5bfe06bc4d8e2939c1b94,330744,4
7347549,0b7de731a1d5bfe06bc4d8e2939c1b94,3173558,5
7347550,0b7de731a1d5bfe06bc4d8e2939c1b94,204042,5
7347551,0b7de731a1d5bfe06bc4d8e2939c1b94,2880,5


In [126]:
# recs.to_csv("recommendation.csv", index=False)

In [127]:
top_recs = recs["book_id"].value_counts()
top_books = top_recs.index.values
top_books

array(['1237398', '2880', '204042', ..., '16002104', '598642', '1792672'],
      dtype=object)

In [128]:
books_titles = titles
books_titles["book_id"] = books_titles["book_id"].astype(str)
books_titles

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
0,30128855,Cruelle,16,https://www.goodreads.com/book/show/30128855-c...,https://images.gr-assets.com/books/1462644346m...,cruelle
1,13571772,Captain America: Winter Soldier (The Ultimate ...,51,https://www.goodreads.com/book/show/13571772-c...,https://images.gr-assets.com/books/1333287305m...,captain america winter soldier the ultimate gr...
2,707611,"Superman Archives, Vol. 2",51,https://www.goodreads.com/book/show/707611.Sup...,https://images.gr-assets.com/books/1307838888m...,superman archives vol 2
3,2250580,"A.I. Revolution, Vol. 1",46,https://www.goodreads.com/book/show/2250580.A_...,https://s.gr-assets.com/assets/nophoto/book/11...,ai revolution vol 1
4,27036536,"War Stories, Volume 3",39,https://www.goodreads.com/book/show/27036536-w...,https://images.gr-assets.com/books/1445402463m...,war stories volume 3
...,...,...,...,...,...,...
63678,3106983,Persepolis: The Story of a Childhood and The S...,1966,https://www.goodreads.com/book/show/3106983-pe...,https://images.gr-assets.com/books/1466547436m...,persepolis the story of a childhood and the st...
63679,10644600,Fevre Dream,853,https://www.goodreads.com/book/show/10644600-f...,https://images.gr-assets.com/books/1350850473m...,fevre dream
63680,22746413,"Blood Lad, Vol. 10",66,https://www.goodreads.com/book/show/22746413-b...,https://images.gr-assets.com/books/1405832210m...,blood lad vol 10
63681,30848889,Doctor Who: Free Comic Book Day 2016,338,https://www.goodreads.com/book/show/30848889-d...,https://s.gr-assets.com/assets/nophoto/book/11...,doctor who free comic book day 2016


In [129]:
all_recs = recs["book_id"].value_counts()
all_recs = all_recs.to_frame().reset_index()
all_recs.rename(columns={"index":"book_id", "book_id":"book_count"}, inplace= True)
all_recs
#book_count denotes number of apperance 

Unnamed: 0,book_id,book_count
0,1237398,3322
1,2880,1843
2,204042,1825
3,13615,1681
4,870,1577
...,...,...
47512,2020926,1
47513,24612648,1
47514,16002104,1
47515,598642,1


In [130]:
all_recs = all_recs.merge(books_titles, how="inner", on="book_id")
all_recs

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title
0,1237398,3322,"One Piece, Volume 01: Romance Dawn (One Piece,...",69279,https://www.goodreads.com/book/show/1237398.On...,https://images.gr-assets.com/books/1318523719m...,one piece volume 01 romance dawn one piece 1
1,2880,1843,"Bleach, Volume 01",123807,https://www.goodreads.com/book/show/2880.Bleac...,https://s.gr-assets.com/assets/nophoto/book/11...,bleach volume 01
2,204042,1825,"Naruto, Vol. 01: The Tests of the Ninja (Narut...",107910,https://www.goodreads.com/book/show/204042.Nar...,https://images.gr-assets.com/books/1435524806m...,naruto vol 01 the tests of the ninja naruto 1
3,13615,1681,"Death Note, Vol. 1: Boredom (Death Note, #1)",142755,https://www.goodreads.com/book/show/13615.Deat...,https://images.gr-assets.com/books/1419952134m...,death note vol 1 boredom death note 1
4,870,1577,"Fullmetal Alchemist, Vol. 1 (Fullmetal Alchemi...",95704,https://www.goodreads.com/book/show/870.Fullme...,https://s.gr-assets.com/assets/nophoto/book/11...,fullmetal alchemist vol 1 fullmetal alchemist 1
...,...,...,...,...,...,...,...
43370,7575742,1,Superman: New Krypton Vol. 1,48,https://www.goodreads.com/book/show/7575742-su...,https://images.gr-assets.com/books/1308167673m...,superman new krypton vol 1
43371,24612648,1,Slappy's Tales of Horror,22,https://www.goodreads.com/book/show/24612648-s...,https://images.gr-assets.com/books/1423543969m...,slappys tales of horror
43372,16002104,1,Civil War Prose Novel,159,https://www.goodreads.com/book/show/16002104-c...,https://images.gr-assets.com/books/1360567098m...,civil war prose novel
43373,598642,1,"The Darkness Compendium, Vol. 1",124,https://www.goodreads.com/book/show/598642.The...,https://images.gr-assets.com/books/1333278478m...,the darkness compendium vol 1


In [131]:
#calculate popularity score for each book, we ensure the one with high rating count will be less popular than usual
all_recs["score"] = all_recs.book_count * (all_recs.book_count / all_recs.ratings)

In [132]:
recommendation = all_recs.sort_values("score", ascending = False)

In [133]:
def make_clickable(val):
    return '<a target= "_blank" href="{}">Goodreads</a>'.format(val)

def show_image(val):
    return '<img src="{}" width=50></img>'.format(val)

recommendation[~recommendation["book_id"].isin(liked_books)].head(5).style.format({'url': make_clickable, 'cover_image':show_image})
#get the top-5 highest score excluding the book already in the favorite list

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title,score
10,364956,899,"One Piece, Volume 02: Buggy the Clown (One Piece, #2)",6512,Goodreads,,one piece volume 02 buggy the clown one piece 2,124.10949
14,364957,782,"One Piece, Volume 03: Don't Get Fooled Again (One Piece, #3)",5472,Goodreads,,one piece volume 03 dont get fooled again one piece 3,111.755117
20,364952,727,"One Piece, Volume 04: The Black Cat Pirates (One Piece, #4)",5422,Goodreads,,one piece volume 04 the black cat pirates one piece 4,97.478606
24,364951,679,"One Piece, Volume 05: For Whom the Bell Tolls (One Piece, #5)",4919,Goodreads,,one piece volume 05 for whom the bell tolls one piece 5,93.72657
32,364960,545,"One Piece, Volume 10: OK, Let's Stand Up! (One Piece, #10)",3527,Goodreads,,one piece volume 10 ok lets stand up one piece 10,84.21463


### Part II

In Part I, we use an intuitive approach which may not produce the best recommendations.

In this part, we try to use another approach which find user who has the most similar taste to us using cosine_similarity and  take a look at which books the other users likes, then produce recommendations

In [134]:
# read the list of my rated books
my_books = pd.read_csv("goodreads_library_export.csv")
my_books["book_id"] = my_books["book_id"].astype(str)
my_books["rating"] = (my_books["rating"]).astype(float)
my_books

Unnamed: 0,book_id,title,rating,user_id
0,6131591,Doraemon Buku Ke-2,5.0,-1
1,1315744,"ドラえもん 1 [Doraemon 1] (Doraemon, #1)",5.0,-1
2,6131665,Doraemon Buku Ke-9,5.0,-1
3,6131651,Doraemon Buku Ke-8,5.0,-1
4,6131593,Doraemon Buku Ke-3,5.0,-1
...,...,...,...,...
71,18667307,"Tokyo Ghoul, Tome 1 (Tokyo Ghoul, #1)",5.0,-1
72,13154150,"Attack on Titan, Vol. 1 (Attack on Titan, #1)",5.0,-1
73,870,"Fullmetal Alchemist, Vol. 1 (Fullmetal Alchemi...",5.0,-1
74,969275,"Dragon Ball, Vol. 1: The Monkey King (Dragon B...",5.0,-1


In [135]:
my_books_list = list(my_books["book_id"])

Find the user who has common interest

In [136]:

interactions = pd.read_csv('interactions.csv')
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["rating"] = (interactions["rating"]).astype(float)
interactions

Unnamed: 0,user_id,book_id,rating
0,8842281e1d1347389f2ab93d60773d4d,836610,0.0
1,8842281e1d1347389f2ab93d60773d4d,7648967,0.0
2,8842281e1d1347389f2ab93d60773d4d,15704307,0.0
3,8842281e1d1347389f2ab93d60773d4d,6902644,0.0
4,8842281e1d1347389f2ab93d60773d4d,9844623,0.0
...,...,...,...
7347625,bd3ac2e547a4f521927056cbd6bb5c2f,1484167,5.0
7347626,bd3ac2e547a4f521927056cbd6bb5c2f,122451,5.0
7347627,6384a10d5611945b26b25c971f348fa4,85574,3.0
7347628,e9aea57d21cdf9d91a65687d59518924,15197,5.0


In [137]:
#find common interactions, which rated the same book in our favorite list
overlap_interactions = interactions[interactions["book_id"].isin(my_books_list)]
overlap_interactions

Unnamed: 0,user_id,book_id,rating
138,4035e5f05352217609c1a294410f2d50,13154150,4.0
139,4035e5f05352217609c1a294410f2d50,13531561,4.0
277,4980305f36ab8c2ab831e401a185f28a,204042,5.0
298,4980305f36ab8c2ab831e401a185f28a,13618,5.0
300,4980305f36ab8c2ab831e401a185f28a,13615,5.0
...,...,...,...
7347535,0b7de731a1d5bfe06bc4d8e2939c1b94,364958,5.0
7347543,0b7de731a1d5bfe06bc4d8e2939c1b94,969275,4.0
7347547,0b7de731a1d5bfe06bc4d8e2939c1b94,1237398,5.0
7347550,0b7de731a1d5bfe06bc4d8e2939c1b94,204042,5.0


In [138]:
#find the user who rated the same books as ours and count number of rated books we have in common
overlap_users = overlap_interactions["user_id"].value_counts()
overlap_users = overlap_users.to_dict()

In [139]:
# only consider user who has rated more than 20% books compared to our list
filtered_overlap_users = set([k for k in overlap_users if overlap_users[k] > my_books.shape[0]/5])

In [140]:
interactions = interactions[interactions["user_id"].isin(filtered_overlap_users)]
interactions.reset_index()
interactions

Unnamed: 0,user_id,book_id,rating
3510,6b5ffddfaca8dec2049e0bb0e2d6edf6,33583817,3.0
3511,6b5ffddfaca8dec2049e0bb0e2d6edf6,431274,5.0
3512,6b5ffddfaca8dec2049e0bb0e2d6edf6,32473025,4.0
3513,6b5ffddfaca8dec2049e0bb0e2d6edf6,17671951,4.0
3514,6b5ffddfaca8dec2049e0bb0e2d6edf6,31140467,3.0
...,...,...,...
7246898,a63a061afb8954263f57ebaaa9ac127e,13531561,0.0
7246899,a63a061afb8954263f57ebaaa9ac127e,870,0.0
7246900,a63a061afb8954263f57ebaaa9ac127e,204042,5.0
7246901,a63a061afb8954263f57ebaaa9ac127e,13154150,0.0


In [141]:
interactions = pd.concat([my_books[['user_id','book_id', 'rating']], interactions])
interactions

Unnamed: 0,user_id,book_id,rating
0,-1,6131591,5.0
1,-1,1315744,5.0
2,-1,6131665,5.0
3,-1,6131651,5.0
4,-1,6131593,5.0
...,...,...,...
7246898,a63a061afb8954263f57ebaaa9ac127e,13531561,0.0
7246899,a63a061afb8954263f57ebaaa9ac127e,870,0.0
7246900,a63a061afb8954263f57ebaaa9ac127e,204042,5.0
7246901,a63a061afb8954263f57ebaaa9ac127e,13154150,0.0


In [142]:
#make sure it in the right type
interactions["user_id"] = interactions["user_id"].astype(str)
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["rating"] = (interactions["rating"]).astype(float)

Create Ultility Matrix

In [143]:
#indexing user and book so that we can create a ultility matrix to compare similarity between user
interactions["user_index"] = interactions["user_id"].astype("category").cat.codes
interactions["book_index"] = interactions["book_id"].astype("category").cat.codes

In [144]:
from scipy.sparse import coo_matrix
# create ultility matrix
ratings_mat_coo = coo_matrix((interactions["rating"], (interactions["user_index"], interactions["book_index"])))
ratings_mat = ratings_mat_coo.tocsr()

In [145]:
#normalize rating
def normalize_sparse(csr_matrix):
    nonzero_rows = csr_matrix.nonzero()[0]
    for idx in np.unique(nonzero_rows):
        data_idx = np.where(nonzero_rows==idx)[0]
        data_mean = np.mean(csr_matrix.data[data_idx])
        if data_mean != 0:
            csr_matrix.data[data_idx] = csr_matrix.data[data_idx] -  data_mean

In [146]:
normalize_sparse(ratings_mat)

In [147]:
my_index = 0

In [148]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
#compute similarity between users
similarity = cosine_similarity(ratings_mat[my_index,:], ratings_mat).flatten()

In [149]:
#take top 30 similar 
indices = np.argpartition(similarity, -30)[-30:]
indices

array([465, 248, 528, 602, 169, 477, 259, 243, 436,  73, 556,  14, 102,
        99, 548, 600, 461,  21,  94, 516, 337, 425, 444, 467, 598, 511,
        36, 358, 522,   0], dtype=int64)

After having the top 30 similar users, we find all the books that these users like and recommend them

In [150]:
similar_users = interactions[interactions["user_index"].isin(indices)].copy()
similar_users = similar_users[similar_users["user_id"]  != "-1"]

In [151]:
similar_users

Unnamed: 0,user_id,book_id,rating,user_index,book_index
3510,6b5ffddfaca8dec2049e0bb0e2d6edf6,33583817,3.0,248,29858
3511,6b5ffddfaca8dec2049e0bb0e2d6edf6,431274,5.0,248,31723
3512,6b5ffddfaca8dec2049e0bb0e2d6edf6,32473025,4.0,248,28867
3513,6b5ffddfaca8dec2049e0bb0e2d6edf6,17671951,4.0,248,11191
3514,6b5ffddfaca8dec2049e0bb0e2d6edf6,31140467,3.0,248,27955
...,...,...,...,...,...
7123446,2ce174149ee99dfcea8d4190e1a65b3b,364951,5.0,99,31097
7123447,2ce174149ee99dfcea8d4190e1a65b3b,364952,5.0,99,31098
7123448,2ce174149ee99dfcea8d4190e1a65b3b,364957,5.0,99,31103
7123449,2ce174149ee99dfcea8d4190e1a65b3b,364956,5.0,99,31102


In [152]:
book_recs = similar_users.groupby("book_id").rating.agg(["count", "mean"])

In [153]:
#list of recommendation books together with there number of apperance and avrg rating among top 30 users
book_recs

Unnamed: 0_level_0,count,mean
book_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1000392,4,3.5
1003761,2,3.5
10088114,1,4.0
1011359,2,1.5
1015309,1,5.0
...,...,...
996565,2,4.5
9990279,1,4.0
9994188,2,2.0
9994190,2,4.5


In [154]:
book_recs = book_recs.merge(books_titles, how="inner", on="book_id")

In [155]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title
0,1000392,4,3.5,"Naruto, Vol. 16: Eulogy (Naruto, #16)",5837,https://www.goodreads.com/book/show/1000392.Na...,https://images.gr-assets.com/books/1435525315m...,naruto vol 16 eulogy naruto 16
1,1003761,2,3.5,"Beauty Pop, Vol. 1 (Beauty Pop, #1)",16509,https://www.goodreads.com/book/show/1003761.Be...,https://s.gr-assets.com/assets/nophoto/book/11...,beauty pop vol 1 beauty pop 1
2,10088114,1,4.0,"Morning Glories, Vol. 1: For a Better Future",8263,https://www.goodreads.com/book/show/10088114-m...,https://images.gr-assets.com/books/1486028570m...,morning glories vol 1 for a better future
3,1011359,2,1.5,"Ouran High School Host Club, Vol. 2 (Ouran Hig...",14557,https://www.goodreads.com/book/show/1011359.Ou...,https://s.gr-assets.com/assets/nophoto/book/11...,ouran high school host club vol 2 ouran high s...
4,1015309,1,5.0,"Negima!: Magister Negi Magi, Volume 6",868,https://www.goodreads.com/book/show/1015309.Ne...,https://images.gr-assets.com/books/1320480021m...,negima magister negi magi volume 6
...,...,...,...,...,...,...,...,...
2218,996565,2,4.5,Slam Dunk #27: El Shohoku tiene problemas,273,https://www.goodreads.com/book/show/996565.Sla...,https://images.gr-assets.com/books/1264347456m...,slam dunk 27 el shohoku tiene problemas
2219,9990279,1,4.0,Deadpool Pulp,621,https://www.goodreads.com/book/show/9990279-de...,https://s.gr-assets.com/assets/nophoto/book/11...,deadpool pulp
2220,9994188,2,2.0,"Bleach―ブリーチ― 48 [Burīchi 48] (Bleach, #48)",2198,https://www.goodreads.com/book/show/9994188-bl...,https://images.gr-assets.com/books/1426103377m...,bleach 48 burchi 48 bleach 48
2221,9994190,2,4.5,NARUTO -ナルト- 54 巻ノ五十四,2030,https://www.goodreads.com/book/show/9994190-na...,https://images.gr-assets.com/books/1333693046m...,naruto 54


In [156]:
# some hyper parameter to tune for filtering recommended books
min_appear = 2
min_rating = 4

book_recs = book_recs[~book_recs["book_id"].isin(my_books["book_id"])]
book_recs = book_recs[book_recs["count"] > min_appear]
book_recs = book_recs[book_recs["mean"] >= min_rating]

# create scoring metrics based on book's rating and number of appearance in recommendation list
book_recs["adjusted_count"] = book_recs["count"] * (book_recs["count"] / book_recs["ratings"])
book_recs["score"] = book_recs["mean"] * book_recs["adjusted_count"] 



In [157]:
top_recs = book_recs.sort_values("score", ascending = False)

In [158]:
top_k = 5 #top_book

def make_clickable(val):
    return '<a target= "_blank" href="{}">Goodreads</a>'.format(val)

def show_image(val):
    return '<img src="{}" width=50></img>'.format(val)

top_recs.head(top_k).style.format({'url': make_clickable, 'cover_image':show_image})

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count,score
1751,6582724,9,4.555556,"One Piece, Volume 23: Vivi's Adventure (One Piece, #23)",250,Goodreads,,one piece volume 23 vivis adventure one piece 23,0.324,1.476
2028,8230496,22,4.181818,"One Piece, Volume 55: A Ray of Hope (One Piece, #55)",1660,Goodreads,,one piece volume 55 a ray of hope one piece 55,0.291566,1.219277
1813,6801575,8,4.125,"One Piece, Volume 42: Pirates vs. CP9 (One Piece, #42)",218,Goodreads,,one piece volume 42 pirates vs cp9 one piece 42,0.293578,1.211009
1833,6891484,9,4.0,"One Piece, Volume 25: The 100 Million Berry Man (One Piece, #25)",288,Goodreads,,one piece volume 25 the 100 million berry man one piece 25,0.28125,1.125
1805,6797462,8,4.0,"One Piece, Volume 46: Adventure on Ghost Island (One Piece, #46)",230,Goodreads,,one piece volume 46 adventure on ghost island one piece 46,0.278261,1.113043
