# INTRODUCTION
In the digital age, the volume of books available online has grown exponentially, making it increasingly difficult for readers to discover content that matches their interests. Recommendation systems have emerged as powerful tools to personalize user experiences and improve engagement across platforms. An Online Book Recommender System uses data about user preferences and item features to suggest books that users are likely to enjoy, based on their reading history or similarity to other users

## PROBLEM STATEMENT
With thousands of books being added to online platforms every day, users often face challenges in choosing what to read next. This leads to decision fatigue and can reduce user satisfaction and engagement. Without personalized recommendations, users may overlook books that align with their tastes, while content creators and publishers may struggle to reach their target audience. Therefore, there is a need for an intelligent system that can filter through vast book collections and recommend titles tailored to individual user preferences

## Objectives

### General Objective
To develop a machine learning-based book recommender system that provides personalized book suggestions to users based on past ratings and book attributes.

### Specific Objectives
- To analyze and clean the book and rating datasets for accurate modeling.
- To implement **collaborative filtering** techniques (user-based and item-based) using historical rating data.
- To build a **content-based filtering** model that uses book metadata (e.g., title, author, genre).
- To evaluate the performance of the recommender models using relevant metrics such as RMSE, precision, and recall.
- To visualize trends in user preferences and book popularity.


### Loading the three datasets(books_df, rating_df and user_df)

In [20]:
# Loading the data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the dataset using the correct parameters
books_df = pd.read_csv(
    r'D:\PROJECT\Online-Book-Recommender-System\books_df.csv',
    sep=';',
    quotechar='"',
    encoding='latin1',
    on_bad_lines='skip'  # Correct parameter for pandas 1.3+
)

# Display the first 10 rows
books_df.head(5)



  books_df = pd.read_csv(


Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [21]:
# Ratings_df
ratings_df = pd.read_csv(
    r'D:\PROJECT\Online-Book-Recommender-System\ratings_df.csv',
    sep=';',
    quotechar='"',
    encoding='latin1',
    on_bad_lines='skip'  # Correct parameter for pandas 1.3+
)

# Display the first 10 rows
ratings_df.head(5)

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [22]:
# user_df
user_df = pd.read_csv(
    r'D:\PROJECT\Online-Book-Recommender-System\user_df.csv',
    sep=';',
    quotechar='"',
    encoding='latin1',
    on_bad_lines='skip'  # Correct parameter for pandas 1.3+
)

# Display the first 10 rows
user_df.head(5)

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


### Data Inspection

Checking on the on the data types within the datasets and also number of columns and row on each.

In [24]:
def show_dataset_info():
    print("📘 books_df Info:")
    print("-" * 40)
    books_df.info()
    print("\n\n")

    print("⭐ ratings_df Info:")
    print("-" * 40)
    ratings_df.info()
    print("\n\n")

    print("👤 user_df Info:")
    print("-" * 40)
    user_df.info()
    print("\n\n")

# Call the function
show_dataset_info()


📘 books_df Info:
----------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 8 columns):
 #   Column               Non-Null Count   Dtype 
---  ------               --------------   ----- 
 0   ISBN                 271360 non-null  object
 1   Book-Title           271360 non-null  object
 2   Book-Author          271358 non-null  object
 3   Year-Of-Publication  271360 non-null  object
 4   Publisher            271358 non-null  object
 5   Image-URL-S          271360 non-null  object
 6   Image-URL-M          271360 non-null  object
 7   Image-URL-L          271357 non-null  object
dtypes: object(8)
memory usage: 16.6+ MB



⭐ ratings_df Info:
----------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   User-ID      1149780 non-

<small>

## 📘 books_df Column Descriptions

| Column Name            | Description |
|------------------------|-------------|
| **ISBN**               | A unique identifier for each book (International Standard Book Number). |
| **Book-Title**         | The title of the book. |
| **Book-Author**        | The name of the book's author. |
| **Year-Of-Publication**| The year the book was published. May contain inconsistent or invalid values. |
| **Publisher**          | The name of the publishing company. |
| **Image-URL-S**        | URL to a small-sized image of the book cover. |
| **Image-URL-M**        | URL to a medium-sized image of the book cover. |
| **Image-URL-L**        | URL to a large-sized image of the book cover. |

## ⭐ ratings_df Column Descriptions

| Column Name    | Description |
|----------------|-------------|
| **User-ID**    | Unique identifier for each user who rated a book. |
| **ISBN**       | ISBN of the book that was rated (links to `books_df`). |
| **Book-Rating**| Rating given by the user to the book, typically on a scale of 0–10. A `0` may indicate no opinion or an implicit rating. |

## 👤 user_df Column Descriptions

| Column Name | Description |
|-------------|-------------|
| **User-ID** | Unique identifier for each user. Can be joined with `ratings_df`. |
| **Location**| The user’s location, often formatted as `City, State, Country`. |
| **Age**     | Age of the user. May include missing or out-of-range values (e.g., extremely young or old). |

</small>


### Data merging

This wil help on seeing the full picture of the datasets, which books have ratings, active users and finally relevant comlumns from each dataset.

In [25]:
# Step 1: Merge ratings with books (on ISBN)
ratings_books = pd.merge(ratings_df, books_df, on='ISBN', how='left')

# Step 2: Merge with users (on User-ID)
final_df = pd.merge(ratings_books, user_df, on='User-ID', how='left')

# Preview merged dataset
final_df.head()


Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L,Location,Age
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,http://images.amazon.com/images/P/034545104X.0...,"tyler, texas, usa",
1,276726,0155061224,5,Rites of Passage,Judith Rae,2001,Heinle,http://images.amazon.com/images/P/0155061224.0...,http://images.amazon.com/images/P/0155061224.0...,http://images.amazon.com/images/P/0155061224.0...,"seattle, washington, usa",
2,276727,0446520802,0,The Notebook,Nicholas Sparks,1996,Warner Books,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,http://images.amazon.com/images/P/0446520802.0...,"h, new south wales, australia",16.0
3,276729,052165615X,3,Help!: Level 1,Philip Prowse,1999,Cambridge University Press,http://images.amazon.com/images/P/052165615X.0...,http://images.amazon.com/images/P/052165615X.0...,http://images.amazon.com/images/P/052165615X.0...,"rijeka, n/a, croatia",16.0
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...,Sue Leather,2001,Cambridge University Press,http://images.amazon.com/images/P/0521795028.0...,http://images.amazon.com/images/P/0521795028.0...,http://images.amazon.com/images/P/0521795028.0...,"rijeka, n/a, croatia",16.0


<small>

## 🔗 Merging Strategy

### 📘 Ratings + Books Merge (on `ISBN`)
- `ISBN` is the unique identifier for each book.
- This merge enriches the ratings data with book metadata such as title, author, and publisher.
- A **left join** is used to ensure that all rating records are retained, even if some books are missing from the `books_df`.

### 👤 Result + Users Merge (on `User-ID`)
- `User-ID` is the unique identifier for each user.
- This merge adds demographic information such as user location and age to each rating record.
- Again, a **left join** is applied to keep all ratings, even when some user details are incomplete or unavailable.

<small>