# ZAF202310_Book_Recommendation_Service

## Overview
A book recommendation system is a type of recommendation system that uses data analysis and machine learning algorithms to provide personalized book recommendations to users. These systems can be used by online bookstores, libraries, and other organizations that provide book-related services to their users.

The goal of a book recommendation system is to provide users with recommendations that are tailored to their interests and preferences. This can be accomplished by analyzing user data such as past purchases, browsing history, and ratings of books. The system can also take into account other factors such as genre, author, and publication date to provide more relevant recommendations.

## Methodology
In this project, we will be following methodology mentioned below:

- Data collection: Collect data on book ratings from users. This data can be obtained from a variety of sources such as online bookstores, library systems, or social media platforms.

- Data preprocessing: Clean and preprocess the data to ensure that it is in a usable format. This may involve removing duplicates, handling missing data, and transforming the data into a matrix or other format that can be used by the recommendation algorithm.

- User similarity calculation: Calculate the similarity between users based on their book ratings. There are several methods for calculating similarity, such as Pearson correlation or cosine similarity.

- Neighborhood selection: Identify a set of similar users (known as the "neighborhood") for each user in the dataset. This can be done by setting a threshold similarity score or selecting a fixed number of neighbors.

- Prediction calculation: Predict the rating that a user would give to a particular book by taking a weighted average of the ratings given to that book by the user's neighbors. The weights can be based on the similarity between the user and their neighbors.

- Recommendation generation: Generate a list of recommended books for each user based on their predicted ratings. The number of recommended books can be customized based on user preferences.

- Evaluation: Evaluate the performance of the recommendation system using metrics such as precision, recall, and mean absolute error. These metrics can be used to fine-tune the system and improve its performance.

## Business Segments
1. Retail
2. E-Commerce


## Data
1. Book Recommendation System - [Link](https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset)

## Papers
- Collaborative Filtering Recommender Systems - [Link](https://www.researchgate.net/publication/200121027_Collaborative_Filtering_Recommender_Systems)

## 1. Load Dataset

In [1]:
# Importing Libraries
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors

In [2]:
# Reading the csv files
books_df = pd.read_csv('data/Books.csv')
ratings_df = pd.read_csv('data/Ratings.csv')
user_df = pd.read_csv('data/Users.csv')

  books_df = pd.read_csv('data/Books.csv')


## 2. Data Preprocessing

In [3]:
# Merging data into one dataframe
book_ratings_df = ratings_df.merge(books_df, on='ISBN')
user_book_ratings_df = book_ratings_df.merge(user_df, on='User-ID')

In [4]:
# First look at data
user_book_ratings_df.head()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L,Location,Age
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,"tyler, texas, usa",
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,"cincinnati, ohio, usa",23.0
2,2313,0812533550,9,Ender's Game (Ender Wiggins Saga (Paperback)),Orson Scott Card,1986,Tor Books,http://images.amazon.com/images/P/0812533550.0...,http://images.amazon.com/images/P/0812533550.0...,http://images.amazon.com/images/P/0812533550.0...,"cincinnati, ohio, usa",23.0
3,2313,0679745580,8,In Cold Blood (Vintage International),TRUMAN CAPOTE,1994,Vintage,http://images.amazon.com/images/P/0679745580.0...,http://images.amazon.com/images/P/0679745580.0...,http://images.amazon.com/images/P/0679745580.0...,"cincinnati, ohio, usa",23.0
4,2313,0060173289,9,Divine Secrets of the Ya-Ya Sisterhood : A Novel,Rebecca Wells,1996,HarperCollins,http://images.amazon.com/images/P/0060173289.0...,http://images.amazon.com/images/P/0060173289.0...,http://images.amazon.com/images/P/0060173289.0...,"cincinnati, ohio, usa",23.0


In [5]:
# Creating a dictionary of unique Book-Titles and creating a new column
book_2_id_dict = {}
for idx, book in enumerate(user_book_ratings_df['Book-Title'].unique()):
    book_2_id_dict[book] = idx
user_book_ratings_df['Book-ID'] = user_book_ratings_df['Book-Title'].map(book_2_id_dict)

In [6]:
# Creating a dictionary of unique User-IDs and creating a new column
user_dict = {}
for idx, user in enumerate(user_book_ratings_df['User-ID'].unique()):
    user_dict[user] = idx
user_book_ratings_df['New-User-ID'] = user_book_ratings_df['User-ID'].map(user_dict)

In [7]:
# Creating Final DF to use
final_df = user_book_ratings_df[['New-User-ID', 'Book-Title', 'Book-Rating']]

In [8]:
# Final DF look
final_df = final_df.rename(columns={'New-User-ID': 'User-ID'})
final_df.head()

Unnamed: 0,User-ID,Book-Title,Book-Rating
0,0,Flesh Tones: A Novel,0
1,1,Flesh Tones: A Novel,5
2,1,Ender's Game (Ender Wiggins Saga (Paperback)),9
3,1,In Cold Blood (Vintage International),8
4,1,Divine Secrets of the Ya-Ya Sisterhood : A Novel,9


## 3. Data Preparation

In [9]:
# Filtering out books with < 200 num of ratings
num_ratings = final_df.groupby('Book-Title').count()['Book-Rating'].to_frame()
num_ratings.columns = ['Num-Ratings']
num_ratings.reset_index(inplace=True)
num_ratings = num_ratings[num_ratings['Num-Ratings'] > 200]

In [10]:
# Merging num_ratings with final_df and dropping duplicate values
merged_df = num_ratings.merge(final_df, on='Book-Title')
merged_df.drop_duplicates(['Book-Title', 'User-ID'], inplace=True)

In [11]:
# Creating a pivot table and filling na values with 0
pivot_df = merged_df.pivot(index='Book-Title', columns='User-ID',
               values='Book-Rating').fillna(0)


In [12]:
pivot_df

User-ID,1,2,3,4,5,7,10,11,13,14,...,86771,86783,87269,87551,87731,87740,88303,89458,89891,91827
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,0.0,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0
1st to Die: A Novel,0.0,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Bend in the Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
A Case of Need,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Wild Animus,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,2.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Wish You Well,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Without Remorse,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 4. Model Building

In [13]:
from sklearn.preprocessing import MinMaxScaler

In [14]:
scalar = MinMaxScaler()
scaled_df = scalar.fit_transform(pivot_df)

In [15]:
# Creating a cosine similarity martix
similarity_df = cosine_similarity(scaled_df)

In [16]:
# Initializing Nearest Neighbours model
model = NearestNeighbors(n_neighbors=6,algorithm='brute')

In [17]:
# Fitting model
model.fit(similarity_df)

NearestNeighbors(algorithm='brute', n_neighbors=6)

In [18]:
sample_title = np.random.choice(pivot_df.index.values)
sample_title_index = pivot_df.index.values.tolist().index(sample_title)
print(f"Sample Title: {sample_title}")
print(f"Sample Title Index: {sample_title_index}")

Sample Title: Cradle and All
Sample Title Index: 54


In [19]:
# Retrieving suggestions and distances
distance, suggestions = model.kneighbors(similarity_df[sample_title_index, :].reshape(1, -1))

In [20]:
# Function to recommend
def recommend(book_title):
    list_of_books = pivot_df.index.values.tolist()
    if book_title in list_of_books:
        title_index = list_of_books.index(book_title)
        distance, suggestions = model.kneighbors(similarity_df[title_index, :].reshape(1, -1))
        distance, suggestions = distance[0][1:].tolist(), suggestions[0][1:].tolist()
        book_names = [list_of_books[i] for i in suggestions]
        recommendation_df = pd.DataFrame({'Book-Title': book_names,
                                         'Similarity-Score': distance})

        recommendation_df['Similarity-Score'] = recommendation_df['Similarity-Score'].apply(lambda x: 2.0 - x if x > 1 else x)
        recommendation_df.sort_values('Similarity-Score', ascending=False, inplace=True)
        print("-"*50)
        print(f"Here are top 5 recommendation for the book title : {book_title}")
        print("-"*50)
        return recommendation_df
    else:
        print("ERROR: Couldn't find the Book Title in the database")
        print("-"*50)
        suggest_books = []
        print("You can try with the below titles:")
        print("-"*50)
        for i in range(5):
            print(np.random.choice(list_of_books))

In [21]:
recommend(sample_title)

--------------------------------------------------
Here are top 5 recommendation for the book title : Cradle and All
--------------------------------------------------


Unnamed: 0,Book-Title,Similarity-Score
0,Pop Goes the Weasel,0.718247
1,Roses Are Red (Alex Cross Novels),0.691419
2,Four Blind Mice,0.691049
3,Violets Are Blue,0.683892
4,Along Came a Spider (Alex Cross Novels),0.681862


In [24]:
import joblib
joblib.dump(model, "model.h5")
pivot_df.to_csv("pivot_data.csv")
joblib.dump(scalar, "scalar.h5")

['scalar.h5']