# Book Recommendation System (Content Based)

As function of the Cosine Similarity between each books' content/characteristics.

Based on the following [video](https://www.youtube.com/watch?v=xySjbVUgAwU).

Dataset obtained from the following [source](https://www.kaggle.com/jealousleopard/goodreadsbooks).

## 1. Import Libraries

In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

## 2. Load and Visualize Data

In [3]:
# Load Data
df = pd.read_csv('data/content_similarity/book_data.csv', encoding="unicode_escape", error_bad_lines=False)

# Visualize Data
df

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,num_pages,ratings_count,text_reviews_count,publication_date,publisher
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling/Mary GrandPrÃ©,4.57,0439785960,9780439785969,eng,652,2095690,27591,9/16/2006,Scholastic Inc.
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling/Mary GrandPrÃ©,4.49,0439358078,9780439358071,eng,870,2153167,29221,9/1/2004,Scholastic Inc.
2,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.42,0439554896,9780439554893,eng,352,6333,244,11/1/2003,Scholastic
3,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling/Mary GrandPrÃ©,4.56,043965548X,9780439655484,eng,435,2339585,36325,5/1/2004,Scholastic Inc.
4,8,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling/Mary GrandPrÃ©,4.78,0439682584,9780439682589,eng,2690,41428,164,9/13/2004,Scholastic
...,...,...,...,...,...,...,...,...,...,...,...,...
11118,45631,Expelled from Eden: A William T. Vollmann Reader,William T. Vollmann/Larry McCaffery/Michael He...,4.06,1560254416,9781560254416,eng,512,156,20,12/21/2004,Da Capo Press
11119,45633,You Bright and Risen Angels,William T. Vollmann,4.08,0140110879,9780140110876,eng,635,783,56,12/1/1988,Penguin Books
11120,45634,The Ice-Shirt (Seven Dreams #1),William T. Vollmann,3.96,0140131965,9780140131963,eng,415,820,95,8/1/1993,Penguin Books
11121,45639,Poor People,William T. Vollmann,3.72,0060878827,9780060878825,eng,434,769,139,2/27/2007,Ecco


## 3. Select Relevant Features

In [4]:
# Combines the relevant features
def combine_features(data):
    features = []
    
    for i in range(0, data.shape[0]):
        features.append(data["title"][i] + " " + data["authors"][i] + " " + str(data["publisher"][i]) + " " + str(data["average_rating"][i]))
    
    return features

In [5]:
# Obtain the combined features
df["combined_features"] = combine_features(df)

# Display modified data frame
df

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,num_pages,ratings_count,text_reviews_count,publication_date,publisher,combined_features
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling/Mary GrandPrÃ©,4.57,0439785960,9780439785969,eng,652,2095690,27591,9/16/2006,Scholastic Inc.,Harry Potter and the Half-Blood Prince (Harry ...
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling/Mary GrandPrÃ©,4.49,0439358078,9780439358071,eng,870,2153167,29221,9/1/2004,Scholastic Inc.,Harry Potter and the Order of the Phoenix (Har...
2,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.42,0439554896,9780439554893,eng,352,6333,244,11/1/2003,Scholastic,Harry Potter and the Chamber of Secrets (Harry...
3,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling/Mary GrandPrÃ©,4.56,043965548X,9780439655484,eng,435,2339585,36325,5/1/2004,Scholastic Inc.,Harry Potter and the Prisoner of Azkaban (Harr...
4,8,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling/Mary GrandPrÃ©,4.78,0439682584,9780439682589,eng,2690,41428,164,9/13/2004,Scholastic,Harry Potter Boxed Set Books 1-5 (Harry Potte...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11118,45631,Expelled from Eden: A William T. Vollmann Reader,William T. Vollmann/Larry McCaffery/Michael He...,4.06,1560254416,9781560254416,eng,512,156,20,12/21/2004,Da Capo Press,Expelled from Eden: A William T. Vollmann Read...
11119,45633,You Bright and Risen Angels,William T. Vollmann,4.08,0140110879,9780140110876,eng,635,783,56,12/1/1988,Penguin Books,You Bright and Risen Angels William T. Vollman...
11120,45634,The Ice-Shirt (Seven Dreams #1),William T. Vollmann,3.96,0140131965,9780140131963,eng,415,820,95,8/1/1993,Penguin Books,The Ice-Shirt (Seven Dreams #1) William T. Vol...
11121,45639,Poor People,William T. Vollmann,3.72,0060878827,9780060878825,eng,434,769,139,2/27/2007,Ecco,Poor People William T. Vollmann Ecco 3.72


## 4. Obtain Similarity Matrix

In [6]:
# Obtain the word count matrix
cm = CountVectorizer().fit_transform(df["combined_features"])

In [7]:
# Obtain the cosine similarity matrix
cs = cosine_similarity(cm)

# Display the similarity matrix
cs

array([[1.        , 0.78258558, 0.6882472 , ..., 0.07254763, 0.        ,
        0.        ],
       [0.78258558, 1.        , 0.74620251, ..., 0.13483997, 0.        ,
        0.        ],
       [0.6882472 , 0.74620251, 1.        , ..., 0.07905694, 0.        ,
        0.        ],
       ...,
       [0.07254763, 0.13483997, 0.07905694, ..., 1.        , 0.25819889,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.25819889, 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

## 5. Simulate Book Recommendation

In [8]:
# The reader likes "The Godfather"
liked_book = df[df.title == "The Godfather"]

# Display the book's features
liked_book

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,num_pages,ratings_count,text_reviews_count,publication_date,publisher,combined_features
5855,22036,The Godfather,Mario Puzo/Peter Bart/Robert Thompson,4.37,451217403,9780451217400,eng,448,180,17,10/4/2005,New American Library,The Godfather Mario Puzo/Peter Bart/Robert Tho...


In [9]:
# Create a list of tuples (book_index, similarity score) representing the similar books
liked_book_index = liked_book.index.values[0]
similar_books = list(enumerate(cs[liked_book_index]))

In [10]:
# Sort the top 10 similar books
sorted_books = sorted(similar_books, key = lambda x : x[1], reverse = True)[1:]

In [11]:
# Convert to a data frame and display the top 10 similar books
similar_books_df = pd.DataFrame(data = [df.iloc[book[0], :] for book in sorted_books[0:10]])
similar_books_df["similarity"] = [book[1] for book in sorted_books[0:10]]
similar_books_df

Unnamed: 0,bookID,title,authors,average_rating,isbn,isbn13,language_code,num_pages,ratings_count,text_reviews_count,publication_date,publisher,combined_features,similarity
656,2112,The Art of Nonfiction: A Guide for Writers and...,Ayn Rand/Robert Mayhew/Peter Schwartz,3.96,0452282314,9780452282315,en-US,192,429,39,2/1/2001,New American Library,The Art of Nonfiction: A Guide for Writers and...,0.39736
484,1538,The Complete Plays,Sophocles/Paul Roche,4.27,0451527844,9780451527844,eng,420,2883,40,3/1/2001,New American Library,The Complete Plays Sophocles/Paul Roche New Am...,0.365148
3272,12000,Crime Novels: American Noir of the 1950s,Robert Polito/Jim Thompson/Patricia Highsmith/...,4.37,1883011493,9781883011499,eng,892,420,34,9/1/1997,Library of America,Crime Novels: American Noir of the 1950s Rober...,0.34641
335,1111,The Power Broker: Robert Moses and the Fall of...,Robert A. Caro,4.51,0394720245,9780394720241,eng,1344,11208,1237,7/12/1975,Vintage,The Power Broker: Robert Moses and the Fall of...,0.331133
5851,22026,The Sicilian,Mario Puzo,3.97,0345441702,9780345441706,eng,416,16557,502,5/1/2001,Ballantine Books,The Sicilian Mario Puzo Ballantine Books 3.97,0.327327
5852,22027,The Dark Arena,Mario Puzo,3.35,0345441699,9780345441690,eng,288,1554,54,5/1/2001,Ballantine Books,The Dark Arena Mario Puzo Ballantine Books 3.35,0.306186
5856,22037,The Fortunate Pilgrim,Mario Puzo,3.83,0345476727,9780345476722,eng,304,3232,222,9/28/2004,Ballantine Books,The Fortunate Pilgrim Mario Puzo Ballantine Bo...,0.306186
8532,32778,The Aeneid,Virgil/Robert Fitzgerald,3.84,0679413359,9780679413356,eng,483,184,21,6/30/1992,Everyman's Library,The Aeneid Virgil/Robert Fitzgerald Everyman's...,0.306186
9881,39792,The Man Who Smiled (Kurt Wallander #4),Henning Mankell/Laurie Thompson,3.93,1565849930,9781565849938,eng,325,14322,521,9/19/2006,The New Press,The Man Who Smiled (Kurt Wallander #4) Henning...,0.288675
3158,11597,The Dark Half,Stephen King,3.77,045052468X,9780450524684,eng,469,110131,1282,10/7/1990,New English Library,The Dark Half Stephen King New English Library...,0.288675
