# Bookster

## Project Overview
This project aims to develop an advanced book recommendation system. By leveraging user data and sophisticated algorithms, the system can suggest books that users are likely to enjoy, enhancing their reading experience.


# Data Preprocessing

## Purpose and Rationale
Moving data to an SQL database offers several advantages:
- **Scalability**: SQL databases efficiently handle large, growing datasets.
- **Efficient Querying**: SQL databases enable complex queries for data manipulation and analysis.

Preprocessing is a critical step in machine learning to ensure data quality and integrity. Properly preprocessed data can significantly improve the performance of machine learning models.

## Preprocessing Steps
1. **Converting Data Types**: Ensuring data consistency, such as converting age to integers.
2. **Handling Missing Values**: Dealing with missing or invalid data to maintain data quality.
3. **Validating Related Entities**: Ensuring referential integrity, like verifying the existence of user IDs in ratings.


In [None]:
import pandas as pd
from sqlalchemy import create_engine

DATABASE_URL = "xxxxxxxxxx"
engine = create_engine(DATABASE_URL)

def preprocess_data(df, table_name):
    if table_name == 'users':
        df['age'] = pd.to_numeric(df['age'], errors='coerce')
    
    if table_name == 'ratings':
        df['user_id'] = pd.to_numeric(df['user_id'], errors='coerce')
        df['book_rating'] = pd.to_numeric(df['book_rating'], errors='coerce')
        df.dropna(subset=['user_id', 'book_isbn'], inplace=True)
    
    df.dropna(inplace=True)
    return df

def import_csv(csv_file, table_name):
    df = pd.read_csv(csv_file)
    df = preprocess_data(df, table_name)
    df.to_sql(table_name, con=engine, if_exists='append', index=False)
    print(f"Imported {csv_file} into {table_name} table.")

## Data Import into SQL Database
The process of importing CSV data into an SQL database involves reading the data, applying preprocessing steps, and then loading it into the respective tables. Challenges during this process can include handling large data volumes and resolving format inconsistencies.


# Analysis of Recommendation Algorithms

## Overview of the Hybrid System

The hybrid recommendation system combines various algorithms to offer a more personalized and effective recommendation experience. This approach leverages the strengths of different recommendation strategies:

- **Personalization**: Tailoring recommendations to individual user preferences and behavior.
- **General Trends**: Incorporating broader trends and popular choices to provide well-rounded suggestions.

This hybrid system ensures a balance between personalized content and popular choices, enhancing discovery and user satisfaction.


## Algorithm 1: User-Based Collaborative Filtering

### How It Works and Its Relevance

User-Based Collaborative Filtering focuses on finding users with similar preferences or rating patterns and recommending items liked by these similar users. It's based on the premise that users with similar tastes in the past will have similar preferences in the future.

### Similarity Calculation Using Cosine Similarity

To identify similar users, we use cosine similarity, a metric that measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. This choice is motivated by:

- **Effectiveness in High-Dimensional Data**: Cosine similarity performs well with high-dimensional data, typical in user-item matrices.
- **Normalization**: It considers the rating pattern, rather than the magnitude, making it effective for data with varying scales.

### Implementation Snippet

```python
from sklearn.metrics.pairwise import cosine_similarity

def calculate_user_similarity(ratings_matrix):
    # Assuming ratings_matrix is a DataFrame with users as rows and items as columns
    similarity_matrix = cosine_similarity(ratings_matrix)
    return similarity_matrix

# Example usage
# similarity_matrix = calculate_user_similarity(ratings_matrix)




## Algorithm 2: Content-Based Filtering for Popular Books

### Recommendation Method

This approach recommends books based on their popularity within the community, adjusted for user-specific preferences. Books are ranked based on:

- **Community Ratings**: Average ratings from the user community.
- **User Preferences**: Aligning popular choices with the user's preferred authors and publishers.

### Implementation Outline

The implementation involves aggregating book ratings and filtering based on user preferences. The algorithm ranks books by their average ratings and considers user preferences for a tailored experience.



In [None]:
def fetch_popular_books(self, session, user_preferences, page):
    offset = (page - 1) * 10
    limit = 10  

    # Apply filters based on user's preferred authors and publishers
    if user_preferences.get('preferred_authors'):
        query = session.query(Book, func.avg(Rating.book_rating).label('average_rating')) \
                        .join(Rating, Rating.book_isbn == Book.isbn) \
                        .group_by(Book.isbn).filter(Book.author.in_(user_preferences['preferred_authors']))
        
    if user_preferences.get('preferred_publishers'):
        query = session.query(Book, func.avg(Rating.book_rating).label('average_rating')) \
                        .join(Rating, Rating.book_isbn == Book.isbn) \
                        .group_by(Book.isbn).filter(Book.publisher.in_(user_preferences['preferred_publishers']))

    # Sort the result by average rating in descending order
    query = query.order_by(func.avg(Rating.book_rating).desc())
    popular_books = query.offset(offset).limit(limit).all()

    return [self.serialize_book(result[0]) for result in popular_books]

## Algorithm 3: Recommendations Based on Preferred Authors

### Concept and Methodology

This algorithm personalizes recommendations by focusing on a user's preferred authors and publishers. It selects high-rated books by these authors or publishers, ensuring that the books haven't been rated by the user already.

### Selecting High-Rated Books

The process involves:

1. Extracting a list of preferred authors and publishers from user data.
2. Filtering the books database to include only those written by preferred authors or published by preferred publishers.
3. Sorting these books based on ratings, while excluding books already rated by the user.




In [None]:
def fetch_books_by_preferences(self, session, user_preferences, user_rated_books, page):
    offset = (page - 1) * 10
    limit = 10  

    if user_preferences.get('preferred_authors'):
        query = session.query(Book).filter(Book.author.in_(user_preferences['preferred_authors']))
    if user_preferences.get('preferred_publishers'):
        query = session.query(Book).filter(Book.publisher.in_(user_preferences['preferred_publishers']))

    query = query.filter(Book.isbn.notin_(user_rated_books))

    books = query.offset(offset).limit(limit).all()
    return ([self.serialize_book(book) for book in books])

## Testing and Tuning

### Testing Process

- **Data Splitting**: We divided our dataset into training and testing sets, ensuring a representative distribution of user interactions.
- **Cross-Validation**: Employed k-fold cross-validation to assess the effectiveness of our recomendations, which helps in understanding their performance across different subsets of data.

### Tuning

- **Adjusting Similarity Thresholds**: For collaborative filtering, we experimented with different thresholds for user similarity to optimize recommendations.
- **Number of Similar Users**: Tuned the number of similar users to consider for generating recommendations, balancing between quality and computational efficiency.


# Conclusion

In this project, we developed a hybrid book recommendation system, employing collaborative and content-based filtering methods. 

While challenges like the cold start problem persist, the project lays a foundation for future exploration in recommendation systems, with opportunities to integrate more sophisticated AI techniques for even more personalized recommendations.
