# BOOK RECOMMENDER SYSTEM


## Introduction

This project focuses on demonstrating a collaborative-filtering recommendation system.

The dataset is taken from __[Goodreads Datasets](https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home)__, which contains three group of datasets:
* meta-data of the books
* user-book interactions
* users' detailed book reviews

However, this project only use **meta-data** and **user-book interactions** in order to build a book collaborative-filtering recommender system and we mainly focus on the **Comics & Graphic Genres** because the original dataset is too large (with over 2M books and 228M interactions)

## Implementation

We work on these 2 datasets but the given format is *.gz* 
* goodreads_books_comics_graphic.json.gz (89,411 books)
* goodreads_interactions_comics_graphic.json.gz (7,347,630 interactions)

For simplicity, I've already parse both json file into csv file and only take the important field

For book metadata, I parse it into *titles.csv* with below fields:
* *title*: title of the book
* *book_id*: unique id of a book
* *ratings* : number of ratings 
* *url* : goodreads url of the book
* *cover_image*: image url of the book

For interaction, I parse it into *interactions.csv* with below fields:
* *user_id*: unique id of a user
* *book_id*: unique id of a book
* *rating* : rating that a user give a book

and because *interactions.csv* is quite large (315MB) so you can download it __[here](https://drive.google.com/file/d/1fey5xMQkP4k2bbPVpwn0DeQqx5CZpxZM/view?usp=sharing)__

### Part I

In [365]:
import gzip
import json

In [366]:
import pandas as pd
titles = pd.read_csv('titles.csv')
titles["ratings"] = pd.to_numeric(titles["ratings"])

In [367]:
titles.head(5)

Unnamed: 0,book_id,title,ratings,url,cover_image
0,30128855,Cruelle,16,https://www.goodreads.com/book/show/30128855-c...,https://images.gr-assets.com/books/1462644346m...
1,13571772,Captain America: Winter Soldier (The Ultimate ...,51,https://www.goodreads.com/book/show/13571772-c...,https://images.gr-assets.com/books/1333287305m...
2,707611,"Superman Archives, Vol. 2",51,https://www.goodreads.com/book/show/707611.Sup...,https://images.gr-assets.com/books/1307838888m...
3,2250580,"A.I. Revolution, Vol. 1",46,https://www.goodreads.com/book/show/2250580.A_...,https://s.gr-assets.com/assets/nophoto/book/11...
4,27036536,"War Stories, Volume 3",39,https://www.goodreads.com/book/show/27036536-w...,https://images.gr-assets.com/books/1445402463m...


Process title and create *mod_title* so that we can query a book by its name. In this step, we basically keep only normal characters and strip the unncessary whitespace

In [368]:
import re

def process_title(title):
    mod_title = re.sub("[^a-zA-Z0-9 ]", "", title)
    mod_title = mod_title.lower()
    mod_title = re.sub("\s+", " ", mod_title)
    return mod_title

In [369]:
titles["mod_title"] = titles["title"].apply(process_title)
titles = titles[titles["mod_title"].str.len() > 0]

In [370]:
titles.head(5)

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
0,30128855,Cruelle,16,https://www.goodreads.com/book/show/30128855-c...,https://images.gr-assets.com/books/1462644346m...,cruelle
1,13571772,Captain America: Winter Soldier (The Ultimate ...,51,https://www.goodreads.com/book/show/13571772-c...,https://images.gr-assets.com/books/1333287305m...,captain america winter soldier the ultimate gr...
2,707611,"Superman Archives, Vol. 2",51,https://www.goodreads.com/book/show/707611.Sup...,https://images.gr-assets.com/books/1307838888m...,superman archives vol 2
3,2250580,"A.I. Revolution, Vol. 1",46,https://www.goodreads.com/book/show/2250580.A_...,https://s.gr-assets.com/assets/nophoto/book/11...,ai revolution vol 1
4,27036536,"War Stories, Volume 3",39,https://www.goodreads.com/book/show/27036536-w...,https://images.gr-assets.com/books/1445402463m...,war stories volume 3


Vectorize each *mod_title* using TF-IDF and find the most similar title with the given book name. We find the similarity by using *cosine_similarity*

In [371]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer()

tfidf_mat = tfidf_vectorizer.fit_transform(titles["mod_title"])

In [372]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

def make_clickable(val):
    return '<a target= "_blank" href="{}">Goodreads</a>'.format(val)

def show_image(val):
    return '<img src="{}" width=50></img>'.format(val)

def search_book(query, tfidf_vectorizer):
    processed_q = process_title(query)
    query_vec = tfidf_vectorizer.transform([processed_q])
    similarity = cosine_similarity(query_vec,tfidf_mat).flatten()

    indices = np.argpartition(similarity, -10)[-10:]
    results = titles.iloc[indices]
    results = results.sort_values("ratings", ascending=False)
    return results.head(5).style.format({'url': make_clickable, 'cover_image':show_image})

In [373]:
search_book('Superman', tfidf_vectorizer)

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
53582,19993681,Superman #1,336,Goodreads,,superman 1
20859,19738768,Superman #2,108,Goodreads,,superman 2
32364,19749416,Superman #3,79,Goodreads,,superman 3
16936,20101276,Superman #6,63,Goodreads,,superman 6
16378,19652558,Superman #5,61,Goodreads,,superman 5


Simulate a list of favorite books so we can perform some recommendation based on these books

In [374]:
liked_books = ["1237398", "364950", "364958", "364954", "1532905"]

In [375]:
interactions = pd.read_csv('interactions.csv')
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["rating"] = pd.to_numeric(interactions["rating"])

In [376]:
interactions

Unnamed: 0,user_id,book_id,rating
0,8842281e1d1347389f2ab93d60773d4d,836610,0
1,8842281e1d1347389f2ab93d60773d4d,7648967,0
2,8842281e1d1347389f2ab93d60773d4d,15704307,0
3,8842281e1d1347389f2ab93d60773d4d,6902644,0
4,8842281e1d1347389f2ab93d60773d4d,9844623,0
...,...,...,...
7347625,bd3ac2e547a4f521927056cbd6bb5c2f,1484167,5
7347626,bd3ac2e547a4f521927056cbd6bb5c2f,122451,5
7347627,6384a10d5611945b26b25c971f348fa4,85574,3
7347628,e9aea57d21cdf9d91a65687d59518924,15197,5


Find the list of user who like the same book

In [377]:
#find overlap interaction, this is interaction of user who like book in our favorite book list
overlap_interactions = interactions[interactions["book_id"].isin(liked_books)]
overlap_interactions = overlap_interactions[overlap_interactions["rating"] >= 3]
overlap_interactions

Unnamed: 0,user_id,book_id,rating
3611,6b5ffddfaca8dec2049e0bb0e2d6edf6,364958,5
3613,6b5ffddfaca8dec2049e0bb0e2d6edf6,364954,3
3615,6b5ffddfaca8dec2049e0bb0e2d6edf6,364950,3
3620,6b5ffddfaca8dec2049e0bb0e2d6edf6,1237398,5
3711,3cd7e962765c795dea97babd41215e99,1237398,5
...,...,...,...
7346805,5a2154b4a0df45dcc946dbc6db4fa215,1237398,3
7347086,6fb7a4172710f1d8bcdc658a302450dc,1237398,5
7347377,1b51feff1cb53697b11e97ebf65e5595,1237398,4
7347535,0b7de731a1d5bfe06bc4d8e2939c1b94,364958,5


In [378]:
overlap_users = overlap_interactions["user_id"].unique()
overlap_users

array(['6b5ffddfaca8dec2049e0bb0e2d6edf6',
       '3cd7e962765c795dea97babd41215e99',
       'e6d35f5d6eed3b8981a224d43c24f2b7', ...,
       '6fb7a4172710f1d8bcdc658a302450dc',
       '1b51feff1cb53697b11e97ebf65e5595',
       '0b7de731a1d5bfe06bc4d8e2939c1b94'], dtype=object)

In part I, we gonna use a basic intuitive approach to provide books recommendation:

* Get list of user who also like the same book in the favorite list
* Get all the books that the above list of user rated
* Count number of appearance of each book 
* Calculate the popularity score of each book, make sure that the recommender don't just always recommend the most rated books
* Produce the recommendation based on popularity score


In [379]:
rec_books = interactions[interactions["user_id"].isin(overlap_users)]
# get list of book that common user also rated

In [380]:
import pandas as pd

recs = pd.DataFrame(rec_books, columns = ["user_id", "book_id", "rating"])
recs["book_id"] = recs["book_id"].astype(str)
recs

Unnamed: 0,user_id,book_id,rating
3510,6b5ffddfaca8dec2049e0bb0e2d6edf6,33583817,3
3511,6b5ffddfaca8dec2049e0bb0e2d6edf6,431274,5
3512,6b5ffddfaca8dec2049e0bb0e2d6edf6,32473025,4
3513,6b5ffddfaca8dec2049e0bb0e2d6edf6,17671951,4
3514,6b5ffddfaca8dec2049e0bb0e2d6edf6,31140467,3
...,...,...,...
7347548,0b7de731a1d5bfe06bc4d8e2939c1b94,330744,4
7347549,0b7de731a1d5bfe06bc4d8e2939c1b94,3173558,5
7347550,0b7de731a1d5bfe06bc4d8e2939c1b94,204042,5
7347551,0b7de731a1d5bfe06bc4d8e2939c1b94,2880,5


In [381]:
# recs.to_csv("recommendation.csv", index=False)

In [382]:
top_recs = recs["book_id"].value_counts()
top_books = top_recs.index.values
top_books

array(['1237398', '2880', '204042', ..., '16002104', '598642', '1792672'],
      dtype=object)

In [383]:
books_titles = titles
books_titles["book_id"] = books_titles["book_id"].astype(str)
books_titles

Unnamed: 0,book_id,title,ratings,url,cover_image,mod_title
0,30128855,Cruelle,16,https://www.goodreads.com/book/show/30128855-c...,https://images.gr-assets.com/books/1462644346m...,cruelle
1,13571772,Captain America: Winter Soldier (The Ultimate ...,51,https://www.goodreads.com/book/show/13571772-c...,https://images.gr-assets.com/books/1333287305m...,captain america winter soldier the ultimate gr...
2,707611,"Superman Archives, Vol. 2",51,https://www.goodreads.com/book/show/707611.Sup...,https://images.gr-assets.com/books/1307838888m...,superman archives vol 2
3,2250580,"A.I. Revolution, Vol. 1",46,https://www.goodreads.com/book/show/2250580.A_...,https://s.gr-assets.com/assets/nophoto/book/11...,ai revolution vol 1
4,27036536,"War Stories, Volume 3",39,https://www.goodreads.com/book/show/27036536-w...,https://images.gr-assets.com/books/1445402463m...,war stories volume 3
...,...,...,...,...,...,...
63678,3106983,Persepolis: The Story of a Childhood and The S...,1966,https://www.goodreads.com/book/show/3106983-pe...,https://images.gr-assets.com/books/1466547436m...,persepolis the story of a childhood and the st...
63679,10644600,Fevre Dream,853,https://www.goodreads.com/book/show/10644600-f...,https://images.gr-assets.com/books/1350850473m...,fevre dream
63680,22746413,"Blood Lad, Vol. 10",66,https://www.goodreads.com/book/show/22746413-b...,https://images.gr-assets.com/books/1405832210m...,blood lad vol 10
63681,30848889,Doctor Who: Free Comic Book Day 2016,338,https://www.goodreads.com/book/show/30848889-d...,https://s.gr-assets.com/assets/nophoto/book/11...,doctor who free comic book day 2016


In [384]:
all_recs = recs["book_id"].value_counts()
all_recs = all_recs.to_frame().reset_index()
all_recs.rename(columns={"index":"book_id", "book_id":"book_count"}, inplace= True)
all_recs
#book_count denotes number of apperance 

Unnamed: 0,book_id,book_count
0,1237398,3322
1,2880,1843
2,204042,1825
3,13615,1681
4,870,1577
...,...,...
47512,2020926,1
47513,24612648,1
47514,16002104,1
47515,598642,1


In [385]:
all_recs = all_recs.merge(books_titles, how="inner", on="book_id")
all_recs

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title
0,1237398,3322,"One Piece, Volume 01: Romance Dawn (One Piece,...",69279,https://www.goodreads.com/book/show/1237398.On...,https://images.gr-assets.com/books/1318523719m...,one piece volume 01 romance dawn one piece 1
1,2880,1843,"Bleach, Volume 01",123807,https://www.goodreads.com/book/show/2880.Bleac...,https://s.gr-assets.com/assets/nophoto/book/11...,bleach volume 01
2,204042,1825,"Naruto, Vol. 01: The Tests of the Ninja (Narut...",107910,https://www.goodreads.com/book/show/204042.Nar...,https://images.gr-assets.com/books/1435524806m...,naruto vol 01 the tests of the ninja naruto 1
3,13615,1681,"Death Note, Vol. 1: Boredom (Death Note, #1)",142755,https://www.goodreads.com/book/show/13615.Deat...,https://images.gr-assets.com/books/1419952134m...,death note vol 1 boredom death note 1
4,870,1577,"Fullmetal Alchemist, Vol. 1 (Fullmetal Alchemi...",95704,https://www.goodreads.com/book/show/870.Fullme...,https://s.gr-assets.com/assets/nophoto/book/11...,fullmetal alchemist vol 1 fullmetal alchemist 1
...,...,...,...,...,...,...,...
43370,7575742,1,Superman: New Krypton Vol. 1,48,https://www.goodreads.com/book/show/7575742-su...,https://images.gr-assets.com/books/1308167673m...,superman new krypton vol 1
43371,24612648,1,Slappy's Tales of Horror,22,https://www.goodreads.com/book/show/24612648-s...,https://images.gr-assets.com/books/1423543969m...,slappys tales of horror
43372,16002104,1,Civil War Prose Novel,159,https://www.goodreads.com/book/show/16002104-c...,https://images.gr-assets.com/books/1360567098m...,civil war prose novel
43373,598642,1,"The Darkness Compendium, Vol. 1",124,https://www.goodreads.com/book/show/598642.The...,https://images.gr-assets.com/books/1333278478m...,the darkness compendium vol 1


In [386]:
#calculate popularity score for each book, we ensure the one with high rating count will be less popular than usual
all_recs["score"] = all_recs.book_count * (all_recs.book_count / all_recs.ratings)

In [387]:
recommendation = all_recs.sort_values("score", ascending = False)

In [388]:
def make_clickable(val):
    return '<a target= "_blank" href="{}">Goodreads</a>'.format(val)

def show_image(val):
    return '<img src="{}" width=50></img>'.format(val)

recommendation[~recommendation["book_id"].isin(liked_books)].head(5).style.format({'url': make_clickable, 'cover_image':show_image})
#get the top-5 highest score excluding the book already in the favorite list

Unnamed: 0,book_id,book_count,title,ratings,url,cover_image,mod_title,score
10,364956,899,"One Piece, Volume 02: Buggy the Clown (One Piece, #2)",6512,Goodreads,,one piece volume 02 buggy the clown one piece 2,124.10949
14,364957,782,"One Piece, Volume 03: Don't Get Fooled Again (One Piece, #3)",5472,Goodreads,,one piece volume 03 dont get fooled again one piece 3,111.755117
20,364952,727,"One Piece, Volume 04: The Black Cat Pirates (One Piece, #4)",5422,Goodreads,,one piece volume 04 the black cat pirates one piece 4,97.478606
24,364951,679,"One Piece, Volume 05: For Whom the Bell Tolls (One Piece, #5)",4919,Goodreads,,one piece volume 05 for whom the bell tolls one piece 5,93.72657
32,364960,545,"One Piece, Volume 10: OK, Let's Stand Up! (One Piece, #10)",3527,Goodreads,,one piece volume 10 ok lets stand up one piece 10,84.21463


### Part II

In Part I, we use an intuitive approach which may not produce the best recommendations.

In this part, we try to use another approach which find user who has the most similar taste to us using cosine_similarity and  take a look at which books the other users likes, then produce recommendations

In [389]:
# read the list of my rated books
my_books = pd.read_csv("goodreads_library_export.csv")
my_books["book_id"] = my_books["book_id"].astype(str)
my_books

Unnamed: 0,book_id,title,rating,user_id
0,6131591,Doraemon Buku Ke-2,5,-1
1,1315744,"ドラえもん 1 [Doraemon 1] (Doraemon, #1)",5,-1
2,6131665,Doraemon Buku Ke-9,5,-1
3,6131651,Doraemon Buku Ke-8,5,-1
4,6131593,Doraemon Buku Ke-3,5,-1
...,...,...,...,...
71,18667307,"Tokyo Ghoul, Tome 1 (Tokyo Ghoul, #1)",5,-1
72,13154150,"Attack on Titan, Vol. 1 (Attack on Titan, #1)",5,-1
73,870,"Fullmetal Alchemist, Vol. 1 (Fullmetal Alchemi...",5,-1
74,969275,"Dragon Ball, Vol. 1: The Monkey King (Dragon B...",5,-1


In [390]:
my_books_list = list(my_books["book_id"])

Find the user who has common interest

In [391]:

interactions = pd.read_csv('interactions.csv')
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["rating"] = pd.to_numeric(interactions["rating"])
interactions

Unnamed: 0,user_id,book_id,rating
0,8842281e1d1347389f2ab93d60773d4d,836610,0
1,8842281e1d1347389f2ab93d60773d4d,7648967,0
2,8842281e1d1347389f2ab93d60773d4d,15704307,0
3,8842281e1d1347389f2ab93d60773d4d,6902644,0
4,8842281e1d1347389f2ab93d60773d4d,9844623,0
...,...,...,...
7347625,bd3ac2e547a4f521927056cbd6bb5c2f,1484167,5
7347626,bd3ac2e547a4f521927056cbd6bb5c2f,122451,5
7347627,6384a10d5611945b26b25c971f348fa4,85574,3
7347628,e9aea57d21cdf9d91a65687d59518924,15197,5


In [392]:
#find common interactions, which rated the same book in our favorite list
overlap_interactions = interactions[interactions["book_id"].isin(my_books_list)]
overlap_interactions

Unnamed: 0,user_id,book_id,rating
138,4035e5f05352217609c1a294410f2d50,13154150,4
139,4035e5f05352217609c1a294410f2d50,13531561,4
277,4980305f36ab8c2ab831e401a185f28a,204042,5
298,4980305f36ab8c2ab831e401a185f28a,13618,5
300,4980305f36ab8c2ab831e401a185f28a,13615,5
...,...,...,...
7347535,0b7de731a1d5bfe06bc4d8e2939c1b94,364958,5
7347543,0b7de731a1d5bfe06bc4d8e2939c1b94,969275,4
7347547,0b7de731a1d5bfe06bc4d8e2939c1b94,1237398,5
7347550,0b7de731a1d5bfe06bc4d8e2939c1b94,204042,5


In [393]:
#find the user who rated the same books as ours and count number of rated books we have in common
overlap_users = overlap_interactions["user_id"].value_counts()
overlap_users = overlap_users.to_dict()

In [394]:
# only consider user who has rated more than 20% books compared to our list
filtered_overlap_users = set([k for k in overlap_users if overlap_users[k] > my_books.shape[0]/5])

In [395]:
interactions = interactions[interactions["user_id"].isin(filtered_overlap_users)]
interactions.reset_index()
interactions

Unnamed: 0,user_id,book_id,rating
3510,6b5ffddfaca8dec2049e0bb0e2d6edf6,33583817,3
3511,6b5ffddfaca8dec2049e0bb0e2d6edf6,431274,5
3512,6b5ffddfaca8dec2049e0bb0e2d6edf6,32473025,4
3513,6b5ffddfaca8dec2049e0bb0e2d6edf6,17671951,4
3514,6b5ffddfaca8dec2049e0bb0e2d6edf6,31140467,3
...,...,...,...
7246898,a63a061afb8954263f57ebaaa9ac127e,13531561,0
7246899,a63a061afb8954263f57ebaaa9ac127e,870,0
7246900,a63a061afb8954263f57ebaaa9ac127e,204042,5
7246901,a63a061afb8954263f57ebaaa9ac127e,13154150,0


In [396]:
interactions = pd.concat([my_books[['user_id','book_id', 'rating']], interactions])
interactions

Unnamed: 0,user_id,book_id,rating
0,-1,6131591,5
1,-1,1315744,5
2,-1,6131665,5
3,-1,6131651,5
4,-1,6131593,5
...,...,...,...
7246898,a63a061afb8954263f57ebaaa9ac127e,13531561,0
7246899,a63a061afb8954263f57ebaaa9ac127e,870,0
7246900,a63a061afb8954263f57ebaaa9ac127e,204042,5
7246901,a63a061afb8954263f57ebaaa9ac127e,13154150,0


In [397]:
#make sure it in the right type
interactions["user_id"] = interactions["user_id"].astype(str)
interactions["book_id"] = interactions["book_id"].astype(str)
interactions["rating"] = pd.to_numeric(interactions["rating"])

Create Ultility Matrix

In [398]:
#indexing user and book so that we can create a ultility matrix to compare similarity between user
interactions["user_index"] = interactions["user_id"].astype("category").cat.codes
interactions["book_index"] = interactions["book_id"].astype("category").cat.codes

In [399]:
from scipy.sparse import coo_matrix

ratings_mat_coo = coo_matrix((interactions["rating"], (interactions["user_index"], interactions["book_index"])))
ratings_mat = ratings_mat_coo.tocsr()

In [400]:
my_index = 0

In [401]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
similarity = cosine_similarity(ratings_mat[my_index,:], ratings_mat).flatten()

In [402]:
#take top 15 similar 
indices = np.argpartition(similarity, -15)[-15:]
indices

array([108, 248, 219, 280, 259, 591,  73, 511, 271, 598,  87, 358, 197,
        64,   0], dtype=int64)

In [403]:
similar_users = interactions[interactions["user_index"].isin(indices)].copy()
similar_users = similar_users[similar_users["user_id"]  != "-1"]

In [404]:
similar_users

Unnamed: 0,user_id,book_id,rating,user_index,book_index
3510,6b5ffddfaca8dec2049e0bb0e2d6edf6,33583817,3,248,29858
3511,6b5ffddfaca8dec2049e0bb0e2d6edf6,431274,5,248,31723
3512,6b5ffddfaca8dec2049e0bb0e2d6edf6,32473025,4,248,28867
3513,6b5ffddfaca8dec2049e0bb0e2d6edf6,17671951,4,248,11191
3514,6b5ffddfaca8dec2049e0bb0e2d6edf6,31140467,3,248,27955
...,...,...,...,...,...
6837357,61988e9ef5529df077c92ccf2cc0ce95,20504346,5,219,15631
6837358,61988e9ef5529df077c92ccf2cc0ce95,22447402,3,219,17790
6837359,61988e9ef5529df077c92ccf2cc0ce95,17835727,5,219,11613
6837360,61988e9ef5529df077c92ccf2cc0ce95,17703205,4,219,11300


In [405]:
book_recs = similar_users.groupby("book_id").rating.agg(["count", "mean"])

In [406]:
book_recs

Unnamed: 0_level_0,count,mean
book_id,Unnamed: 1_level_1,Unnamed: 2_level_1
1000392,2,2.0
10088114,1,3.0
1011359,1,0.0
10138607,1,0.0
1015307,1,0.0
...,...,...
986056,1,5.0
986057,1,5.0
986061,1,4.0
9876989,1,5.0


In [407]:
book_recs = book_recs.merge(books_titles, how="inner", on="book_id")

In [408]:
book_recs

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title
0,1000392,2,2.0,"Naruto, Vol. 16: Eulogy (Naruto, #16)",5837,https://www.goodreads.com/book/show/1000392.Na...,https://images.gr-assets.com/books/1435525315m...,naruto vol 16 eulogy naruto 16
1,10088114,1,3.0,"Morning Glories, Vol. 1: For a Better Future",8263,https://www.goodreads.com/book/show/10088114-m...,https://images.gr-assets.com/books/1486028570m...,morning glories vol 1 for a better future
2,1011359,1,0.0,"Ouran High School Host Club, Vol. 2 (Ouran Hig...",14557,https://www.goodreads.com/book/show/1011359.Ou...,https://s.gr-assets.com/assets/nophoto/book/11...,ouran high school host club vol 2 ouran high s...
3,10138607,1,0.0,Habibi,28405,https://www.goodreads.com/book/show/10138607-h...,https://images.gr-assets.com/books/1327899014m...,habibi
4,1015307,1,0.0,"Negima!: Magister Negi Magi, Volume 12",635,https://www.goodreads.com/book/show/1015307.Ne...,https://images.gr-assets.com/books/1333577628m...,negima magister negi magi volume 12
...,...,...,...,...,...,...,...,...
1158,986056,1,5.0,"Video Girl Ai, Vol. 07: Retake",159,https://www.goodreads.com/book/show/986056.Vid...,https://images.gr-assets.com/books/1344400197m...,video girl ai vol 07 retake
1159,986057,1,5.0,"Video Girl Ai, Vol. 02: Mix Down",228,https://www.goodreads.com/book/show/986057.Vid...,https://s.gr-assets.com/assets/nophoto/book/11...,video girl ai vol 02 mix down
1160,986061,1,4.0,"Video Girl Ai, Vol. 13: Fade Out",135,https://www.goodreads.com/book/show/986061.Vid...,https://s.gr-assets.com/assets/nophoto/book/11...,video girl ai vol 13 fade out
1161,9876989,1,5.0,"Blue Exorcist, Vol. 1 (Blue Exorcist, #1)",40889,https://www.goodreads.com/book/show/9876989-bl...,https://images.gr-assets.com/books/1432642113m...,blue exorcist vol 1 blue exorcist 1


In [409]:
book_recs["adjusted_count"] = book_recs["count"] * (book_recs["count"] / book_recs["ratings"])
book_recs["score"] = book_recs["mean"] * book_recs["adjusted_count"] 

book_recs = book_recs[~book_recs["book_id"].isin(my_books["book_id"])]
book_recs = book_recs[book_recs["count"] > 2]
book_recs = book_recs[book_recs["mean"] >= 3.5]

In [410]:
top_recs = book_recs.sort_values("score", ascending = False)

In [411]:
def make_clickable(val):
    return '<a target= "_blank" href="{}">Goodreads</a>'.format(val)

def show_image(val):
    return '<img src="{}" width=50></img>'.format(val)

top_recs.head(5).style.format({'url': make_clickable, 'cover_image':show_image})

Unnamed: 0,book_id,count,mean,title,ratings,url,cover_image,mod_title,adjusted_count,score
985,6801644,8,4.25,"One Piece, Volume 31: We'll Be Here (One Piece, #31)",227,Goodreads,,one piece volume 31 well be here one piece 31,0.281938,1.198238
981,6801613,7,4.285714,"One Piece, Volume 37: Tom (One Piece, #37)",194,Goodreads,,one piece volume 37 tom one piece 37,0.252577,1.082474
975,6801575,8,3.625,"One Piece, Volume 42: Pirates vs. CP9 (One Piece, #42)",218,Goodreads,,one piece volume 42 pirates vs cp9 one piece 42,0.293578,1.06422
984,6801643,7,4.428571,"One Piece, Volume 30: Capriccio (One Piece, #30)",208,Goodreads,,one piece volume 30 capriccio one piece 30,0.235577,1.043269
992,6891484,8,4.375,"One Piece, Volume 25: The 100 Million Berry Man (One Piece, #25)",288,Goodreads,,one piece volume 25 the 100 million berry man one piece 25,0.222222,0.972222
