## Book Recommendation System

### Introduction

Book recommendation systems utilize data-driven techniques to provide users with personalized book recommendations. These systems leverage user preferences, historical behavior, and content characteristics to suggest books that users are likely to enjoy. By doing so, book recommendation systems enhance user engagement, drive content discovery, and facilitate exploration of new literary experiences.

#### Import necessary libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import pickle

#### Load dataset files: Books, Ratings, and Users

Data was downloaded from: https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset

In [2]:
books = pd.read_csv('Books.csv')
ratings = pd.read_csv('Ratings.csv')
users = pd.read_csv('Users.csv')

  books = pd.read_csv('Books.csv')


#### Output the number of records in each dataset

In [3]:
books.shape[0], ratings.shape[0], users.shape[0]

(271360, 1149780, 278858)

#### Check for duplicate ratings

In [4]:
ratings.duplicated().sum()

0

### Popularity Based Recommendation System

Popularity-based recommendation systems, also known as ratings-based recommendation systems, draw their recommendations from the collective wisdom of the crowd. Instead of focusing on personalized user preferences, these systems recommend books based on their popularity, typically gauged by the number of ratings and average ratings received. Books that have garnered a substantial number of positive ratings are presented as recommendations to a wider audience. This approach is particularly valuable for newcomers or users seeking widely accepted and enjoyed books. Popularity-based recommendation systems are simple to implement and provide a straightforward way to showcase books that have resonated with a large number of readers.

#### Merge ratings and books dataset to include book information

In [5]:
ratings_w_name = rati#### Merge ratings and books dataset to include book informationngs.merge(books, on='ISBN')

#### Calculate the number of ratings for each book

In [6]:
num_rating_df = ratings_w_name.groupby('Book-Title').count()['Book-Rating'].reset_index()
num_rating_df.rename(columns={'Book-Rating':'num_ratings'}, inplace=True)

#### Calculate the average rating for each book

In [7]:
avg_rating_df = ratings_w_name.groupby('Book-Title').mean()['Book-Rating'].reset_index()
avg_rating_df.rename(columns={'Book-Rating':'avg_ratings'}, inplace=True)

  avg_rating_df = ratings_w_name.groupby('Book-Title').mean()['Book-Rating'].reset_index()


#### Combine the number of ratings and average ratings for each book

In [8]:
popular_df = num_rating_df.merge(avg_rating_df, on='Book-Title')

#### Filter out books with fewer than 250 ratings and sort by average rating

In [None]:
popular_df = popular_df[popular_df['num_ratings']>=250].sort_values('avg_ratings', ascending=False).head(50)

#### Merge with books dataset to get additional book information

In [9]:
popular_df = popular_df.merge(books, on='Book-Title').drop_duplicates('Book-Title')[['Book-Title','Book-Author','Image-URL-M','num_ratings','avg_ratings']]

### Collaborative Filtering Based Recommender System

Collaborative Filtering is a widely used technique in recommendation systems that leverages the collective behavior and preferences of users to make personalized recommendations. In the context of book recommendation, collaborative filtering examines the patterns of how users have rated and interacted with books in the past. By identifying users with similar tastes and preferences, collaborative filtering recommends books that like-minded individuals have enjoyed. This method excels in capturing complex user preferences and is particularly effective when explicit information about the content of books is limited. Collaborative filtering algorithms offer users a pathway to discover books that they might not have encountered otherwise, making it a powerful tool for enhancing the reading experience.

#### Identify users with more than 200 ratings

In [10]:
x = ratings_w_name.groupby('User-ID').count()['Book-Rating']>200
users_w_ratings = x[x].index

#### Filter ratings dataset to include only users with more than 200 ratings

In [None]:
filtered_rating = ratings_w_name[ratings_w_name['User-ID'].isin(users_w_ratings)]

#### Identify books with at least 50 ratings

In [None]:
y = filtered_rating.groupby('Book-Title').count()['Book-Rating']>=50
famous_books = y[y].index

#### Filter the final ratings dataset to include only famous books

In [None]:
final_ratings = filtered_rating[filtered_rating['Book-Title'].isin(famous_books)]

#### Create a pivot table with Book-Title as index and User-ID as columns, filling missing values with 0

In [None]:
final_pt = final_ratings.pivot_table(index='Book-Title', columns='User-ID', values='Book-Rating')
final_pt.fillna(0,inplace=True)

#### Calculate cosine similarity scores between books

In [11]:
similarity_scores = cosine_similarity(final_pt)

#### Define a function to recommend similar books

In [12]:
def recommend_book(book_name):
    #fetch index
    index = np.where(final_pt.index == book_name)[0][0]
    similar_items = sorted(list(enumerate(similarity_scores[index])), key=lambda x:x[1], reverse=True)[1:6]

    for i in similar_items:
        print(final_pt.index[i[0]])

#### Call the recommend_book function with a specific book title

In [13]:
recommend_book('Message in a Bottle')

Nights in Rodanthe
The Mulberry Tree
A Walk to Remember
River's End
Nightmares &amp; Dreamscapes


#### Save important data structures using pickle to be used in the app

In [14]:
pickle.dump(popular_df,open('popular.pkl','wb'))
pickle.dump(final_pt,open('final_pt.pkl','wb'))
pickle.dump(books,open('books.pkl','wb'))
pickle.dump(similarity_scores,open('similarity_scores.pkl','wb'))