# Book Recommender System

This project embarks on the fascinating journey of building a Book Recommender System using data sourced from Kaggle. 
With the goal of providing users with book recommendations, it leverages basic unsupervised learning algorithms to propose the user with book based on authors, ratings, and categories. 


## Key Features:

<b>Data Source -</b>  Kaggle:

This project uses the data available on Kaggle for the extensive analysis and allows the users to identify the next read based on the latest book the user read.

### Unsupervised Learning Algorithms:

The core of our recommender system lies in the application of basic unsupervised learning algorithms. This project uses the data and provide the personalized book suggestions without the need for labeled training data.

### Author-based Recommendations:

The recommender system intelligently analyzes authorship patterns to suggest books that align with your reading preferences.

### Rating-driven Suggestions:

Your past ratings play a crucial role in shaping the recommendations. The system examines your rating history to understand your preferences, ensuring that each suggested book resonates with your taste.


### Category-specific Suggestions:

Dive into genres that user is interested in. Whether the user is fan of mystery, romance, or science fiction, our recommender system tailors suggestions based on the categories that pique your interest.


## Import Libraries

In [11]:
import pandas as pd
import re


## Load Data

In [12]:
books = pd.read_csv('./data/booksWithCategory.csv')
books.head(10)

Unnamed: 0.1,Unnamed: 0,user_id,location,age,isbn,rating,book_title,book_author,year_of_publication,publisher,img_s,img_m,img_l,Summary,Language,Category,city,state,country
0,0,2,"stockton, california, usa",18.0,195153448,0,Classical Mythology,Mark P. O. Morford,2002.0,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,Provides an introduction to classical myths pl...,en,['Social Science'],stockton,california,usa
1,1,8,"timmins, ontario, canada",34.7439,2005018,5,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],timmins,ontario,canada
2,2,11400,"ottawa, ontario, canada",49.0,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],ottawa,ontario,canada
3,3,11676,"n/a, n/a, n/a",34.7439,2005018,8,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],,,
4,4,41385,"sudbury, ontario, canada",34.7439,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],sudbury,ontario,canada
5,5,67544,"toronto, ontario, canada",30.0,2005018,8,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],toronto,ontario,canada
6,6,85526,"victoria, british columbia, canada",36.0,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],victoria,british columbia,canada
7,7,96054,"ottawa, ontario, canada",29.0,2005018,0,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],ottawa,ontario,canada
8,8,116866,"ottawa, ,",34.7439,2005018,9,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],ottawa,",",
9,9,123629,"kingston, ontario, canada",34.7439,2005018,9,Clara Callan,Richard Bruce Wright,2001.0,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,"In a small town in Canada, Clara Callan reluct...",en,['Actresses'],kingston,ontario,canada


## How it Works:



### Data Preprocessing:

We meticulously preprocess the dataset, ensuring its readiness for analysis. This includes handling missing values, cleaning data, and organizing it into a structured format.

### Unsupervised Learning Model:

Leveraging basic unsupervised learning algorithms, it unveil hidden patterns in the dataset. It autonomously identify associations between books, authors, and categories.

### User-centric Recommendations:

The recommender system takes user input, such as favorite authors, preferred categories, and historical ratings, to generate personalized book recommendations.



In [13]:
dataFrame = books.copy()
dataFrame.dropna(inplace=True)

dataFrame.drop(columns = ['Unnamed: 0','location','isbn',
                   'img_s','img_m','city','age',
                   'state','Language','country',
                   'year_of_publication'],axis=1,inplace = True)


categories = dataFrame['Category'].unique()



# Clean up categories
dataFrame.drop(index=dataFrame[dataFrame['Category'] == '9'].index, inplace=True)

dataFrame['Category'] = dataFrame['Category'].apply(lambda x: re.sub('[\W_]+',' ',x).strip())

categories = dataFrame['Category'].unique()
print(f"Unique Categories:{categories}") 

# Remove rows that constains 0 as rating
ratings = dataFrame['rating'].unique()

dataFrame.drop(index=dataFrame[dataFrame['rating'] == 0].index, inplace=True)

ratings = dataFrame['rating'].unique()
print(f"Unique Ratings:{ratings}")

print(dataFrame.head(10))


Unique Categories:['Social Science' 'Actresses' '1940 1949' ... 'Microsoft Windows NT'
 'Merchants' 'Alternative histories']
Unique Ratings:[ 5  8  9  7  6 10  4  3  2  1]
    user_id  rating                                         book_title  \
1         8       5                                       Clara Callan   
5     67544       8                                       Clara Callan   
9    123629       9                                       Clara Callan   
11   200273       8                                       Clara Callan   
12   210926       9                                       Clara Callan   
13   219008       7                                       Clara Callan   
14   263325       6                                       Clara Callan   
16     2954       8                               Decision in Normandy   
17   152827       7                               Decision in Normandy   
19    35704       6  Flu: The Story of the Great Influenza Pandemic...   

             

# Item Based Collabrative Rating

## 1. Rating-driven Suggestions

In [14]:
book_basic_data = dataFrame.copy()
book_basic_data.drop(columns = ['book_author','publisher','img_l',
                   'Summary'],axis=1,inplace = True)

print(f'Number of Books: {len(book_basic_data)}')

def recommend_by_ratings(book_title):
    if book_title in dataFrame['book_title'].values:
    
        num_ratings = pd.DataFrame(book_basic_data['book_title'].value_counts())
        
        less_rating_books = num_ratings[num_ratings['book_title'] <= 150].index
        common_books = dataFrame[~dataFrame['book_title'].isin(less_rating_books)]
        #print(f'{book_title}: {rating_counts}')
        
        user_book_df = common_books.pivot_table(index=['user_id'],
                                                    columns=['book_title'],
                                                    values='rating')
        print(f'{user_book_df}')
        book = user_book_df[book_title]
        
        print(f'{book}')
    
        recom_books = pd.DataFrame(user_book_df.corrwith(book). \
                                      sort_values(ascending=False)).reset_index(drop=False)
        
        print(f'Recommended Books: {recom_books.head(10)}')


## Assuming the user read the book 'Timeline'
recommend_by_ratings('Timeline')

Number of Books: 217314
book_title  1st to Die: A Novel  A Painted House  A Prayer for Owen Meany  \
user_id                                                                     
26                          NaN              NaN                      NaN   
51                          NaN              NaN                      NaN   
165                         NaN              NaN                      NaN   
183                         NaN              NaN                      NaN   
242                         NaN              NaN                      NaN   
...                         ...              ...                      ...   
278582                      NaN              NaN                      NaN   
278586                      NaN              NaN                      NaN   
278633                      NaN              NaN                      NaN   
278653                      NaN              NaN                      NaN   
278843                      NaN              NaN    

  c = cov(x, y, rowvar, dtype=dtype)
  c *= np.true_divide(1, fact)


## 2. Category-specific Suggestions

In [17]:
book_with_categories = dataFrame.copy()
book_with_categories.drop(columns = ['rating','publisher','img_l',
                   'Summary','book_author'],axis=1,inplace = True)

print(f'Number of Books: {len(book_with_categories)}')
print(f'Books: {book_with_categories.head(10)}')

def recommend_by_category(book_title):
    if book_title in dataFrame['book_title'].values:
    
        count_by_categories = pd.DataFrame(book_with_categories['Category'].value_counts())
        #print(f'count_by_categories : {count_by_categories}')
        
        books_with_few_categories = count_by_categories[count_by_categories['Category'] <= 1].index
        #print(f'books_with_few_categories: {books_with_few_categories}')
        
        books_with_more_categories = book_with_categories[~book_with_categories['Category'].isin(books_with_few_categories)]
        #print(f'books_with_more_categories: {books_with_more_categories}')
        
        books_with_more_categories['belongs_to'] = True
        
        book_category_df = books_with_more_categories.pivot_table(index=['Category'],
                                                    columns=['book_title'],
                                                    values='belongs_to', aggfunc='any', fill_value=False)
        print(f'book_category_df : {book_category_df}')
        book = book_category_df[book_title]
        
        print(f'BOOK: {book}') 
    
        recom_books = pd.DataFrame(book_category_df.corrwith(book). \
                                      sort_values(ascending=False)).reset_index(drop=False)
        
        print(f'Recommended Books: {recom_books}')


## Assuming the user read the book 'Dark Justice'
recommend_by_category('Dark Justice')

Number of Books: 217314
Books:     user_id                                         book_title   Category
1         8                                       Clara Callan  Actresses
5     67544                                       Clara Callan  Actresses
9    123629                                       Clara Callan  Actresses
11   200273                                       Clara Callan  Actresses
12   210926                                       Clara Callan  Actresses
13   219008                                       Clara Callan  Actresses
14   263325                                       Clara Callan  Actresses
16     2954                               Decision in Normandy  1940 1949
17   152827                               Decision in Normandy  1940 1949
19    35704  Flu: The Story of the Great Influenza Pandemic...    Medical


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  books_with_more_categories['belongs_to'] = True


book_category_df : book_title                      A Light in the Storm: The Civil War Diary of Amelia Martin, Fenwick Island, Delaware, 1861 (Dear America)  \
Category                                                                                                                                    
0                                                                          False                                                            
1940 1949                                                                  False                                                            
364614153                                                                  False                                                            
87th Precinct Imaginary place                                              False                                                            
A Grow and Learn Library                                                   False                                                       

Recommended Books:                                               book_title         0
0                                          Donovan's Bed  1.000000
1                               The Snow Garden: A Novel  1.000000
2                                                 Friday  1.000000
3                                              Rain Line  1.000000
4      Very Special Delivery (Maitland Maternity: Pro...  1.000000
...                                                  ...       ...
71253                                               Love -0.000828
71254                                     The Art of War -0.000957
71255                    All Things Bright and Beautiful -0.000957
71256                                            Matilda -0.000957
71257                                 The Scarlet Letter -0.000957

[71258 rows x 2 columns]


## 3. Author-based Recommendations

In [18]:
book_author_data = dataFrame.copy()
book_author_data.drop(columns = ['rating','publisher','img_l',
                   'Summary'],axis=1,inplace = True)

print(f'Number of Books: {len(book_author_data)}')

def recommend_by_author(book_title):
    if book_title in dataFrame['book_title'].values:
    
        count_by_authors = pd.DataFrame(book_author_data['book_author'].value_counts())
        print(f'count_by_authors : {count_by_authors}')
        authors_with_less_book = count_by_authors[count_by_authors['book_author'] <= 200].index
        print(f'authors_with_less_book: {authors_with_less_book}')
        
        authors_with_more_books = book_author_data[~book_author_data['book_author'].isin(authors_with_less_book)]
        print(f'authors_with_more_books: {authors_with_more_books}')
        
        authors_with_more_books['wrote_book'] = True
        
        author_book_df = authors_with_more_books.pivot_table(index=['book_author'],
                                                    columns=['book_title'],
                                                    values='wrote_book', aggfunc='any', fill_value=False)
        print(f'author_book_df : {author_book_df}')
        book = author_book_df[book_title]
        
        print(f'{book}')
    
        recom_books = pd.DataFrame(author_book_df.corrwith(book). \
                                      sort_values(ascending=False)).reset_index(drop=False)
        
        print(f'Recommended Books: {recom_books}')


## Assuming the user read the book 'Timeline'
recommend_by_author('Timeline')

Number of Books: 217314
count_by_authors :                     book_author
Stephen King               2639
Nora Roberts               1874
John Grisham               1579
J. K. Rowling              1232
Mary Higgins Clark          963
...                         ...
Donna Masini                  1
George Gmelch                 1
Larry L. King                 1
Mark Friedman                 1
Jeremy Lloyd                  1

[37464 rows x 1 columns]
authors_with_less_book: Index(['James A. Michener', 'Linda Howard', 'Nelson DeMille', 'J.A. Jance',
       'Jeffery Deaver', 'David Eddings', 'MITCH ALBOM', 'Terry Goodkind',
       'Amy Tan', 'Michael Connelly',
       ...
       'Alex Berson', 'Lindgren', 'Johanna Angermeyer', 'Otl Aicher',
       'Erich Maria Remarque', 'Donna Masini', 'George Gmelch',
       'Larry L. King', 'Mark Friedman', 'Jeremy Lloyd'],
      dtype='object', length=37335)
authors_with_more_books:          user_id                                         book_title  \

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  authors_with_more_books['wrote_book'] = True


author_book_df : book_title               Little Comic Shop of Horrors (Give Yourself Goosebumps, Book 17)  \
book_author                                                                                 
ANNE RICE                                                           False                   
Agatha Christie                                                     False                   
Alexander McCall Smith                                              False                   
Alice Hoffman                                                       False                   
Alice Walker                                                        False                   
...                                                                   ...                   
Toni Morrison                                                       False                   
Tony Hillerman                                                      False                   
Tracy Chevalier                                      

Recommended Books:                 book_title         0
0                   Sphere  1.000000
1               Rising Sun  1.000000
2     The Andromeda Strain  1.000000
3                 Timeline  1.000000
4            Five Patients  0.704339
...                    ...       ...
3790              Stardust -0.015748
3791          Mirror Image -0.015748
3792              Paradise -0.015748
3793           The Wedding -0.019364
3794              The Gift -0.019364

[3795 rows x 2 columns]


## Conclusion:
This Book Recommender System uses one attribute at time to recommend the user with the next one. This recommender can be extended to next level where the system utilises all the attributes together to provide the best suggestion. The book recommender can also utilize the summary column to extract the keywords of the book and provide the next recommendation to user.