#Recommendation system:
Content/ User specific recommendation system: Recommending books based on the users likes, book genres, etc. Recommended for you on Youtube, or you may also like section, etc

Importing the libraries

In [1]:
import matplotlib.pyplot as plt


In [2]:
import numpy as np
import pandas as pd
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

This data was collected in an attempt personally identify more books that one would like based on ones they may have read in the past. It comprises of some (around 10k) of the most recommended books of all time. (link: https://www.kaggle.com/datasets/ishikajohari/best-books-10k-multi-genre-data)

Columns:

Book - Name of the book.

Author - Name of the book's Author

Description - The book's description as mentioned on Goodreads

Genres - Multiple Genres as classified on Goodreads

Average Rating - The average rating (Out of 5) given on Goodreads

Number of Ratings - The Number of users that have Ratings. (Not to be confused with reviews)

URL - The Goodreads URL for the book's details' page
Inspiration

For different recommendation systems and ML projects here's what was recommended:
- Cluster books/authors based on Description and Genre
- Content based recomendation system using Genre, Description and Ratings
- Genre prediction from Description data (Multi-label classification)
- Can be used in conjunction with my IMDb dataset with descriptions for certain use cases

In [4]:
books = pd.read_csv('/content/goodreads_data.csv')

In [5]:
column = books.columns
for column in  column:
  print(column)


Unnamed: 0
Book
Author
Description
Genres
Avg_Rating
Num_Ratings
URL


In [6]:
books.head()

Unnamed: 0.1,Unnamed: 0,Book,Author,Description,Genres,Avg_Rating,Num_Ratings,URL
0,0,To Kill a Mockingbird,Harper Lee,The unforgettable novel of a childhood in a sl...,"['Classics', 'Fiction', 'Historical Fiction', ...",4.27,5691311,https://www.goodreads.com/book/show/2657.To_Ki...
1,1,Harry Potter and the Philosopher’s Stone (Harr...,J.K. Rowling,Harry Potter thinks he is an ordinary boy - un...,"['Fantasy', 'Fiction', 'Young Adult', 'Magic',...",4.47,9278135,https://www.goodreads.com/book/show/72193.Harr...
2,2,Pride and Prejudice,Jane Austen,"Since its immediate success in 1813, Pride and...","['Classics', 'Fiction', 'Romance', 'Historical...",4.28,3944155,https://www.goodreads.com/book/show/1885.Pride...
3,3,The Diary of a Young Girl,Anne Frank,Discovered in the attic in which she spent the...,"['Classics', 'Nonfiction', 'History', 'Biograp...",4.18,3488438,https://www.goodreads.com/book/show/48855.The_...
4,4,Animal Farm,George Orwell,Librarian's note: There is an Alternate Cover ...,"['Classics', 'Fiction', 'Dystopia', 'Fantasy',...",3.98,3575172,https://www.goodreads.com/book/show/170448.Ani...


In [7]:
books.shape

(10000, 8)

In [31]:
index = 'index'
books.rename(columns={books.columns[0]: index}, inplace=True)


In [32]:
books.head()

Unnamed: 0,index,Book,Author,Description,Genres,Avg_Rating,Num_Ratings,URL
0,0,To Kill a Mockingbird,Harper Lee,The unforgettable novel of a childhood in a sl...,"['Classics', 'Fiction', 'Historical Fiction', ...",4.27,5691311,https://www.goodreads.com/book/show/2657.To_Ki...
1,1,Harry Potter and the Philosopher’s Stone (Harr...,J.K. Rowling,Harry Potter thinks he is an ordinary boy - un...,"['Fantasy', 'Fiction', 'Young Adult', 'Magic',...",4.47,9278135,https://www.goodreads.com/book/show/72193.Harr...
2,2,Pride and Prejudice,Jane Austen,"Since its immediate success in 1813, Pride and...","['Classics', 'Fiction', 'Romance', 'Historical...",4.28,3944155,https://www.goodreads.com/book/show/1885.Pride...
3,3,The Diary of a Young Girl,Anne Frank,Discovered in the attic in which she spent the...,"['Classics', 'Nonfiction', 'History', 'Biograp...",4.18,3488438,https://www.goodreads.com/book/show/48855.The_...
4,4,Animal Farm,George Orwell,Librarian's note: There is an Alternate Cover ...,"['Classics', 'Fiction', 'Dystopia', 'Fantasy',...",3.98,3575172,https://www.goodreads.com/book/show/170448.Ani...


In [12]:
# for our specific content based recommendation system we will take these columns:
chosen_features = ['Description','Genres','Author']
print(chosen_features)

['Description', 'Genres', 'Author']


In [13]:
for feature in chosen_features:
  books[feature] = books[feature].fillna(' ')

In [14]:
books_data_features = books['Description']+' '+books['Genres']+' '+books['Author']

In [15]:
print(books_data_features)

0       The unforgettable novel of a childhood in a sl...
1       Harry Potter thinks he is an ordinary boy - un...
2       Since its immediate success in 1813, Pride and...
3       Discovered in the attic in which she spent the...
4       Librarian's note: There is an Alternate Cover ...
                              ...                        
9995    How far would you go? If human society was gen...
9996    Jeth Cavanaugh is searching for a new life alo...
9997    This dark fable tells the story of four Englis...
9998    For Adriana Monroe life couldn’t get any bette...
9999    After demands of thousands of fans in various ...
Length: 10000, dtype: object


In [16]:
# converting the text data to feature vectors

vector = TfidfVectorizer()

In [17]:
feature_vector = vector.fit_transform(books_data_features)

#Cosine Similarity

A metric used to measure how similar two non-zero vectors are. Often used in fields such as NLP or ML tasks like document similarity analysis, recommendation systems, and clustering.

In [18]:
similarity = cosine_similarity(feature_vector)

In [19]:
print(similarity)

[[1.         0.05790786 0.08424825 ... 0.09687636 0.06025832 0.06421541]
 [0.05790786 1.         0.02222687 ... 0.0354327  0.03531693 0.02540942]
 [0.08424825 0.02222687 1.         ... 0.08843606 0.05021294 0.05618735]
 ...
 [0.09687636 0.0354327  0.08843606 ... 1.         0.08043625 0.09310925]
 [0.06025832 0.03531693 0.05021294 ... 0.08043625 1.         0.04856855]
 [0.06421541 0.02540942 0.05618735 ... 0.09310925 0.04856855 1.        ]]


In [20]:
print(similarity.shape)

(10000, 10000)


Your favorite book

In [21]:

fav_book = input('Enter the title of your favorite book:' )

Enter the title of your favorite book:jane eyre


In [24]:
titles_list = books['Book'].tolist()
print(titles_list)



In [25]:

new_match = difflib.get_close_matches(fav_book, titles_list)
print(new_match)

['Jane Eyre', 'Cane River']


In [26]:
book_match = new_match[0]
print(book_match)

Jane Eyre


In [35]:

book_index = books[books.Book == book_match]['index'].values[0]
print(book_index)

11


In [36]:
similarity_score = list(enumerate(similarity[book_index]))
print(similarity_score)

[(0, 0.053617915673480496), (1, 0.027653565778910472), (2, 0.1812672521142652), (3, 0.06981253324463323), (4, 0.03180916045971246), (5, 0.04723108875764538), (6, 0.07298151164965799), (7, 0.051394479512535646), (8, 0.045411142397633934), (9, 0.030968705254218694), (10, 0.06856440786822243), (11, 1.0000000000000002), (12, 0.04076394764626182), (13, 0.03619790257756378), (14, 0.05422027818725327), (15, 0.04060267813230104), (16, 0.04619744121879345), (17, 0.03601125697433627), (18, 0.046943561530660274), (19, 0.0603974624289623), (20, 0.0629658842580627), (21, 0.07825871521167418), (22, 0.03138165951276873), (23, 0.05168808817664969), (24, 0.04752509566892842), (25, 0.01990679855630899), (26, 0.03736985681658151), (27, 0.055066032473448316), (28, 0.026113534298735704), (29, 0.05569122843612848), (30, 0.04052774280408466), (31, 0.08222771735326728), (32, 0.013066865483971849), (33, 0.03915118117501529), (34, 0.04125647010959534), (35, 0.026468593417720093), (36, 0.03865723215775354), (37,

In [37]:
similar_books = sorted(similarity_score, key = lambda x:x[1], reverse = True)
print(similar_books)

[(11, 1.0000000000000002), (8741, 0.2409745276962088), (8390, 0.21943844559349948), (2931, 0.2098618956304808), (9222, 0.2018409793120141), (919, 0.2011853842638815), (7649, 0.20101023323467862), (7642, 0.19738925391202314), (2, 0.1812672521142652), (3474, 0.1793878218323257), (8357, 0.1753806902023394), (1773, 0.17492663309177314), (5239, 0.17105848604503798), (3173, 0.17096908130267724), (1696, 0.1692038324105956), (1833, 0.1617892997801743), (3707, 0.16064597133033465), (4469, 0.15955168598462244), (895, 0.15947380042491252), (1156, 0.15196396052765218), (3605, 0.1507021212439383), (4950, 0.14650289479895207), (2747, 0.14640246707855825), (5288, 0.1462704781167658), (7959, 0.14603004011606183), (397, 0.14458671574538676), (9184, 0.14192727684999787), (4140, 0.14083318099564424), (8906, 0.1403405573537734), (5475, 0.13986125877477074), (7374, 0.13923882861404197), (424, 0.13722187260950272), (9406, 0.13699990039533547), (6248, 0.13675621834695997), (3591, 0.13634617078309685), (8137,

In [41]:
print('Books Recommended for you : \n')

i = 1

for book in similar_books:
  index = book[0]
  title_from_index = books[books.index==index]['Book'].values[0]
  if (i<30):
    print(i, '.',title_from_index)
    i+=1

Books Recommended for you : 

1 . Jane Eyre
2 . Nice Girls Don't Date Dead Men (Jane Jameson, #2)
3 . Innocent Traitor
4 . Sundays at Tiffany's
5 . The Promise
6 . The Eyre Affair (Thursday Next, #1)
7 . The Lake of Dead Languages
8 . The Last Tudor (The Plantagenet and Tudor Novels, #14)
9 . Pride and Prejudice
10 . The Silent Corner (Jane Hawk, #1)
11 . The Girl Before
12 . Villette
13 . Austenland (Austenland, #1)
14 . Vanish (Rizzoli & Isles, #5)
15 . The Complete Novels
16 . Big Little Lies
17 . I Am Her... (I Am Her..., #1)
18 . My Dearest Miss Fairfax
19 . The Tenant of Wildfell Hall
20 . Wide Sargasso Sea
21 . Ice Cold (Rizzoli & Isles, #8)
22 . Bone Music (Burning Girl, #1)
23 . The Devil's Arithmetic
24 . What the Wind Knows
25 . The Midwife's Apprentice
26 . Mansfield Park
27 . One Night of Regrets: A Story of Restoration and Grace
28 . Tell Me You Want Me (College Romance, #1)
29 . I Was Amelia Earhart
