# Book Recommender System

This project embarks on the fascinating journey of building a Book Recommender System using data sourced from Kaggle. 
With the goal of providing users with book recommendations, it leverages basic unsupervised learning algorithms to propose the user with book based on authors, ratings, and categories. 


## Key Features:

<b>Data Source -</b>  Kaggle:

This project uses the data available on Kaggle for the extensive analysis and allows the users to identify the next read based on the latest book the user read.

### Unsupervised Learning Algorithms:

The core of our recommender system lies in the application of basic unsupervised learning algorithms. This project uses the data and provide the personalized book suggestions without the need for labeled training data.

### Author-based Recommendations:

The recommender system intelligently analyzes authorship patterns to suggest books that align with your reading preferences.

### Rating-driven Suggestions:

Your past ratings play a crucial role in shaping the recommendations. The system examines your rating history to understand your preferences, ensuring that each suggested book resonates with your taste.


### Category-specific Suggestions:

Dive into genres that user is interested in. Whether the user is fan of mystery, romance, or science fiction, our recommender system tailors suggestions based on the categories that pique your interest.


## Import Libraries

In [17]:
import pandas as pd
import re


## Load Data

In [18]:
# https://www.kaggle.com/datasets/ruchi798/bookcrossing-dataset/data
books1 = pd.read_csv('./data/Preprocessed1.csv')
books2 = pd.read_csv('./data/Preprocessed2.csv')
books3 = pd.read_csv('./data/Preprocessed3.csv')
books4 = pd.read_csv('./data/Preprocessed4.csv')
books5 = pd.read_csv('./data/Preprocessed5.csv')
books6 = pd.read_csv('./data/Preprocessed6.csv')
print(f'1st Set of Books  size: {books1.shape}')
print(f'2nd Set of Books  size: {books2.shape}')
print(f'3rd Set of Books  size: {books3.shape}')
print(f'4th Set of Books  size: {books4.shape}')
print(f'5th Set of Books  size: {books5.shape}')
print(f'6th Set of Books  size: {books6.shape}')

books = pd.concat([books1, books2, books3, books4, books5, books6], axis=0)
print(f'Shape of Books: {books.shape}')
books.head()

1st Set of Books  size: (185693, 19)
2nd Set of Books  size: (186463, 19)
3rd Set of Books  size: (189340, 19)
4th Set of Books  size: (200477, 19)
5th Set of Books  size: (215255, 19)
6th Set of Books  size: (53947, 19)
Shape of Books: (1031175, 19)


Unnamed: 0.1,Unnamed: 0,user_id,location,age,isbn,rating,book_title,book_author,year_of_publication,publisher,img_s,img_m,img_l,Summary,Language,Category,city,state,country
0,0,2,"stockton, california, usa",18.0,195153448,0,Classical Mythology,Mark P. O. Morford,2002.0,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Provides an introduction to classical myths pl...,en,['Social Science'],stockton,california,usa
1,1,8,"timmins, ontario, canada",34.7439,2005018,5,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],timmins,ontario,canada
2,2,11400,"ottawa, ontario, canada",49.0,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],ottawa,ontario,canada
3,3,11676,"n/a, n/a, n/a",34.7439,2005018,8,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],,,
4,4,41385,"sudbury, ontario, canada",34.7439,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],sudbury,ontario,canada


## How it Works:



### Data Preprocessing:

To start with, preprocess the data. This includes handling missing values, cleaning data, and organizing it into a structured format.

### Unsupervised Learning Model:

Leveraging basic unsupervised learning algorithms, it unveil hidden patterns in the dataset. It autonomously identify associations between books, authors, and categories.

### User-centric Recommendations:

The recommender system takes user input, such as favorite authors, preferred categories, and historical ratings, to generate personalized book recommendations.

### Utilize the Scikit for recommendation:

The recommendor system make use of scikit cosine-similarity algorithm to recommend the book to the user based on his last book issued.


In [19]:
dataFrame = books.copy()
dataFrame.dropna(inplace=True)

dataFrame.drop(columns = ['Unnamed: 0','location','isbn',
                   'img_s','img_m','city','age',
                   'state','Language','country',
                   'year_of_publication'],axis=1,inplace = True)


categories = dataFrame['Category'].unique()



# Clean up categories
dataFrame.drop(index=dataFrame[dataFrame['Category'] == '9'].index, inplace=True)

dataFrame['Category'] = dataFrame['Category'].apply(lambda x: re.sub('[\W_]+',' ',x).strip())

categories = dataFrame['Category'].unique()
print(f"Unique Categories:{categories}") 

# Remove rows that constains 0 as rating
ratings = dataFrame['rating'].unique()

dataFrame.drop(index=dataFrame[dataFrame['rating'] == 0].index, inplace=True)

ratings = dataFrame['rating'].unique()
print(f"Unique Ratings:{ratings}")

print(dataFrame.head(10))


Unique Categories:['Actresses' '1940 1949' 'Fiction' ... 'Algonquian Indians' 'Menus'
 'Merchants']
Unique Ratings:[ 4  1  9  8  5 10  7  6  3  2]
       user_id  rating                         book_title  \
2619    209569       4                        Wild Animus   
2763    225727       1                        Wild Animus   
4536    228504       9              To Kill a Mockingbird   
7443    270801       9                         Seabiscuit   
7866     90793       8             The Catcher in the Rye   
12529    56554       9           Diary of a Mad Mom-To-Be   
12915   228321       5                         My Antonia   
19512    41585      10  Sisterhood of the Traveling Pants   
28535     3079       8                 The Hours: A Novel   
29266    63956       8                The Mists of Avalon   

                 book_author                          publisher  \
2619            Rich Shapero                            Too Far   
2763            Rich Shapero                   

# Item Based Collabrative Rating

## 1. Rating-driven Suggestions

In [20]:
book_basic_data = dataFrame.copy()
book_basic_data.drop(columns = ['book_author','publisher','img_l',
                   'Summary'],axis=1,inplace = True)

print(f'Number of Books: {len(book_basic_data)}')

def recommend_by_ratings(book_title):
    if book_title in dataFrame['book_title'].values:
    
        num_ratings = pd.DataFrame(book_basic_data['book_title'].value_counts())
        
        less_rating_books = num_ratings[num_ratings['book_title'] <= 150].index
        common_books = dataFrame[~dataFrame['book_title'].isin(less_rating_books)]
        #print(f'{book_title}: {rating_counts}')
        
        user_book_df = common_books.pivot_table(index=['user_id'],
                                                    columns=['book_title'],
                                                    values='rating')
        #print(f'{user_book_df}')
        book = user_book_df[book_title]
        
        print(f'Book: {book}')
    
        recom_books = pd.DataFrame(user_book_df.corrwith(book). \
                                      sort_values(ascending=False)).reset_index(drop=False)
        
        print(f'Recommended Books: {recom_books.head(10)}')


## Assuming the user read the book 'Timeline'
print(f'Recommending for Timeline') 
recommend_by_ratings('Timeline')

Number of Books: 4422
Recommending for Timeline


## 2. Category-specific Suggestions

In [21]:
book_with_categories = dataFrame.copy()
book_with_categories.drop(columns = ['rating','publisher','img_l',
                   'Summary','book_author'],axis=1,inplace = True)

print(f'Number of Books: {len(book_with_categories)}')
#print(f'Books: {book_with_categories.head(10)}')

def recommend_by_category(book_title):
    if book_title in dataFrame['book_title'].values:
    
        count_by_categories = pd.DataFrame(book_with_categories['Category'].value_counts())
        #print(f'count_by_categories : {count_by_categories}')
        
        books_with_few_categories = count_by_categories[count_by_categories['Category'] <= 1].index
        #print(f'books_with_few_categories: {books_with_few_categories}')
        
        books_with_more_categories = book_with_categories[~book_with_categories['Category'].isin(books_with_few_categories)]
        #print(f'books_with_more_categories: {books_with_more_categories}')
        
        books_with_more_categories['belongs_to'] = True
        
        book_category_df = books_with_more_categories.pivot_table(index=['Category'],
                                                    columns=['book_title'],
                                                    values='belongs_to', aggfunc='any', fill_value=False)
        print(f'book_category_df : {book_category_df}')
        book = book_category_df[book_title]
        
        print(f'Reacommend based on book: {book}') 
    
        recom_books = pd.DataFrame(book_category_df.corrwith(book). \
                                      sort_values(ascending=False)).reset_index(drop=False)
        
        print(f'Recommended Books: {recom_books}')


## Assuming the user read the book 'Dark Justice'
print(f'Recommending for Dark Justice') 
recommend_by_category('Dark Justice')

Number of Books: 4422
Recommending for Dark Justice


## 3. Author-based Recommendations

In [22]:
book_author_data = dataFrame.copy()
book_author_data.drop(columns = ['rating','publisher','img_l',
                   'Summary'],axis=1,inplace = True)

print(f'Number of Books: {len(book_author_data)}')

def recommend_by_author(book_title):
    if book_title in dataFrame['book_title'].values:
    
        count_by_authors = pd.DataFrame(book_author_data['book_author'].value_counts())
        #print(f'count_by_authors : {count_by_authors}')
        authors_with_less_book = count_by_authors[count_by_authors['book_author'] <= 200].index
        #print(f'authors_with_less_book: {authors_with_less_book}')
        
        authors_with_more_books = book_author_data[~book_author_data['book_author'].isin(authors_with_less_book)]
        #print(f'authors_with_more_books: {authors_with_more_books}')
        
        authors_with_more_books['wrote_book'] = True
        
        author_book_df = authors_with_more_books.pivot_table(index=['book_author'],
                                                    columns=['book_title'],
                                                    values='wrote_book', aggfunc='any', fill_value=False)
        #print(f'author_book_df : {author_book_df}')
        book = author_book_df[book_title]
        
        print(f'{book}')
    
        recom_books = pd.DataFrame(author_book_df.corrwith(book). \
                                      sort_values(ascending=False)).reset_index(drop=False)
        
        print(f'Recommended Books: {recom_books}')


## Assuming the user read the book 'Timeline'
recommend_by_author('Timeline')

Number of Books: 4422


## 4. Using Scikit Learn

In [None]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity


dataFrame = books.copy()
# Create a user-item matrix
user_item_matrix = dataFrame.pivot(index='user_id', columns='isbn', values='rating').fillna(0)

# Calculate item-item similarity using cosine similarity
item_similarity = cosine_similarity(user_item_matrix.T)

# Convert the similarity matrix into a DataFrame
item_similarity_df = pd.DataFrame(item_similarity, index=user_item_matrix.columns, columns=user_item_matrix.columns)


def get_book_recommendations(book_id, n=5):
    similar_books = item_similarity_df[book_id].sort_values(ascending=False).index[1:n+1]
    return similar_books



  user_item_matrix = dataFrame.pivot(index='user_id', columns='isbn', values='rating').fillna(0)


In [None]:
# Get recommendations for book with ID 0002005018 i.e. Clara Callan
book_id_to_recommend = '0002005018' 
recommendations = get_book_recommendations(book_id_to_recommend)

print(f"Top 5 Recommendations for Book {book_id_to_recommend}:")
print(recommendations)

## Conclusion:
This Book Recommender System uses one attribute at time to recommend the user with the next one. This recommender can be extended to next level where the system utilises all the attributes together to provide the best suggestion. The book recommender can also utilize the summary column to extract the keywords of the book and provide the next recommendation to user.