Problem Statement

The objective of this task is to build a Recommendation System using Collaborative Filtering techniques.
The system should recommend items (books) to users based on similarities between users’ rating behavior.



Dataset Description

The **Book-Crossing Dataset** was used for this task.

* Contains over 1 million book ratings.
* Includes:

  * User-ID
  * ISBN
  * Book Rating
  * Book Title
* Ratings range from 1 to 10.
* Only explicit ratings (rating > 0) were considered.

Due to the large dataset size, filtering was applied to reduce sparsity and memory usage.


Methodology

  Data Cleaning

* Removed implicit ratings (rating = 0).
* Merged ratings with book titles.
* Selected relevant columns.

  Data Filtering (Memory Optimization)

To avoid memory errors:

* Selected **popular books** (minimum 50 ratings).
* Selected **active users** (minimum 50 ratings).

This significantly reduced dataset size and sparsity.



  User-Item Matrix Creation

Created a pivot table:

* Rows → User-ID
* Columns → Book Title
* Values → Book Rating
* Filled missing values with 0

This matrix represents user preferences.


  Similarity Computation

Used **Cosine Similarity** to measure similarity between users.

Cosine similarity measures the angle between user rating vectors to determine how similar two users are.



  Recommendation Generation

For a given user:

* Found top similar users.
* Computed weighted ratings.
* Recommended books not already rated by the user.


  Evaluation

Used **RMSE (Root Mean Squared Error)** to evaluate prediction accuracy.

RMSE measures the difference between predicted ratings and actual ratings.


  Results

* Successfully created user similarity matrix.
* Generated personalized book recommendations.
* RMSE was calculated to evaluate model performance.
* System handled large dataset efficiently after filtering.


  Conclusion

A user-based collaborative filtering recommendation system was successfully implemented.

Key achievements:

* Handled large dataset using filtering techniques.
* Created user-item interaction matrix.
* Applied cosine similarity for recommendation.
* Generated personalized recommendations.
* Evaluated performance using RMSE.





In [11]:
# ==============================
# TASK 4 - RECOMMENDATION SYSTEM
# Book Recommendation (Collaborative Filtering)
# ==============================

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import mean_squared_error
from math import sqrt

# ------------------------------
# 1. LOAD DATA
# ------------------------------

books = pd.read_csv("BX_Books.csv", sep=";", encoding="latin-1", on_bad_lines='skip')
ratings = pd.read_csv("BX-Book-Ratings.csv", sep=";", encoding="latin-1", on_bad_lines='skip')

# Rename columns
books.columns = ['ISBN','Book-Title','Book-Author','Year',
                 'Publisher','Image-URL-S','Image-URL-M','Image-URL-L']

ratings.columns = ['User-ID','ISBN','Book-Rating']

print("Original Ratings Shape:", ratings.shape)

# ------------------------------
# 2. DATA CLEANING
# ------------------------------

# Keep only explicit ratings (>0)
ratings = ratings[ratings['Book-Rating'] > 0]

# Merge books and ratings
df = ratings.merge(books[['ISBN','Book-Title']], on='ISBN')

print("After Merge Shape:", df.shape)

# ------------------------------
# 3. FILTER DATA (IMPORTANT - MEMORY SAFE)
# ------------------------------

# Keep popular books (at least 50 ratings)
book_counts = df['Book-Title'].value_counts()
popular_books = book_counts[book_counts >= 50].index
df = df[df['Book-Title'].isin(popular_books)]

# Keep active users (at least 50 ratings)
user_counts = df['User-ID'].value_counts()
active_users = user_counts[user_counts >= 50].index
df = df[df['User-ID'].isin(active_users)]

print("Filtered Dataset Shape:", df.shape)

# ------------------------------
# 4. CREATE USER-ITEM MATRIX
# ------------------------------

user_item_matrix = df.pivot_table(
    index='User-ID',
    columns='Book-Title',
    values='Book-Rating'
).fillna(0)

print("Matrix Shape:", user_item_matrix.shape)

# ------------------------------
# 5. COMPUTE USER SIMILARITY
# ------------------------------

user_similarity = cosine_similarity(user_item_matrix)

user_similarity_df = pd.DataFrame(
    user_similarity,
    index=user_item_matrix.index,
    columns=user_item_matrix.index
)

print("Similarity Matrix Created Successfully ✅")

# ------------------------------
# 6. RECOMMENDATION FUNCTION
# ------------------------------

def recommend_books(user_id, num_recommendations=5):

    # Get similar users
    similar_users = user_similarity_df[user_id].sort_values(ascending=False)[1:6]

    weighted_ratings = pd.Series(dtype=float)

    for similar_user, similarity_score in similar_users.items():
        weighted_ratings = weighted_ratings.add(
            user_item_matrix.loc[similar_user] * similarity_score,
            fill_value=0
        )

    recommendations = weighted_ratings.sort_values(ascending=False)

    # Remove books already rated
    already_rated = user_item_matrix.loc[user_id]
    recommendations = recommendations[already_rated == 0]

    return recommendations.head(num_recommendations)

# ------------------------------
# 7. TEST RECOMMENDATION
# ------------------------------

sample_user = user_item_matrix.index[0]
print("\nRecommendations for User:", sample_user)
print(recommend_books(sample_user))

# ------------------------------
# 8. EVALUATION (RMSE)
# ------------------------------

predicted_ratings = user_similarity.dot(user_item_matrix)

rmse = sqrt(mean_squared_error(
    user_item_matrix.values.flatten(),
    predicted_ratings.flatten()
))

print("\nRMSE:", rmse)



Original Ratings Shape: (1149780, 3)
After Merge Shape: (383852, 4)
Filtered Dataset Shape: (3273, 4)
Matrix Shape: (41, 641)
Similarity Matrix Created Successfully ✅

Recommendations for User: 6575
Book-Title
A Prayer for Owen Meany                                                  9.156266
High Five (A Stephanie Plum Novel)                                       9.155796
The Honk and Holler Opening Soon                                         8.448495
Ender's Game (Ender Wiggins Saga (Paperback))                            7.580315
Three To Get Deadly : A Stephanie Plum Novel (A Stephanie Plum Novel)    7.578255
dtype: float64

RMSE: 6.4262144259045435
