
## Exercise 3: Recommender Systems with KNN

### Project Description

In this exercise, you will develop a simple recommender system using the K-Nearest Neighbors (KNN) algorithm. The project will involve understanding collaborative filtering techniques, implementing a user-based collaborative filtering model, and evaluating its performance.

### Phase 1: Understanding Recommender Systems

1. **Part 1: Collaborative Filtering Overview**:
   - **User-Based Collaborative Filtering**:
     - Explain in simple terms how user-based collaborative filtering works. This method recommends items to a user based on the preferences of similar users.
   - **Item-Based Collaborative Filtering**:
     - Explain how item-based collaborative filtering differs from user-based filtering. This method recommends items similar to those that the user has liked or interacted with before.

2. **Part 2: Similarity Metrics**:
   - **Cosine Similarity**:
     - Explain what cosine similarity is and how it is used to measure the similarity between two vectors (e.g., user preference vectors or item feature vectors).
     - Discuss the significance of cosine similarity in the context of collaborative filtering.

### Phase 2: Data Preprocessing

Before building the recommender system, preprocess the data to ensure it is in the right format.

1. **Part 3: Data Preparation**:
   - Download the `ratings.csv` dataset, which contains user-item interactions (e.g., ratings).
   - Perform Min-Max Scaling on the ratings to normalize the data between 0 and 1.

2. **Part 4: Data Splitting**:
   - Split the dataset into training and testing sets to evaluate the model's performance on unseen data.

### Phase 3: Model Development

1. **Part 5: KNN-Based Collaborative Filtering**:
   - Implement a user-based collaborative filtering model using the Surprise library and the KNN algorithm.
   - Use cosine similarity as the metric to find similar users.

2. **Part 6: Model Training**:
   - Train the model on the training data and generate predictions for the test set.

### Phase 4: Model Evaluation

1. **Part 7: Predictions**:
   - Print the first five predictions made by the model for the test set.

2. **Part 8: Performance Metrics**:
   - Evaluate the model's performance using metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE).
   - Compare the performance with a baseline model (e.g., a model that predicts the average rating).

### Phase 5: Conclusion

1. **Part 9: Insights and Interpretation**:
   - Discuss the strengths and limitations of the KNN-based recommender system.
   - Provide insights into how the model could be improved or extended (e.g., by incorporating item-based filtering or hybrid methods).


In [None]:
!pip install scikit-surprise

Collecting scikit-surprise
  Downloading scikit_surprise-1.1.4.tar.gz (154 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/154.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.4/154.4 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (pyproject.toml) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.4-cp310-cp310-linux_x86_64.whl size=2357294 sha256=98cd7c7ca087f14922599860eb6f75a90b43b2e588891d1154a8f17a9e20a5b9
  Stored in directory: /root/.cache/pip/wheels/4b/3f/df/6acbf0a40397d9bf3ff97f582cc22fb9ce66adde75bc71fd54
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Succe

In [None]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from surprise import Dataset, Reader, KNNBasic, accuracy
from surprise.model_selection import train_test_split

#EDA

In [None]:
ratings_df = pd.read_csv('/content/ratings.csv')

#Data preprocessing

In [None]:
# Perform Min-Max Scaling
scaler = MinMaxScaler()
ratings_df[['rating']] = scaler.fit_transform(ratings_df[['rating']])

# Filter out users and items with very few ratings
user_counts = ratings_df['userId'].value_counts()
item_counts = ratings_df['movieId'].value_counts()
valid_users = user_counts[user_counts >= 5].index
valid_items = item_counts[item_counts >= 5].index

ratings_df = ratings_df[ratings_df['userId'].isin(valid_users) & ratings_df['movieId'].isin(valid_items)]
reader = Reader(rating_scale=(ratings_df['rating'].min(), ratings_df['rating'].max()))
data = Dataset.load_from_df(ratings_df[['userId', 'movieId', 'rating']], reader)
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)

#Modelling

In [None]:
# User-Based Collaborative Filtering
sim_options_user = {
    'name': 'pearson_baseline',
    'user_based': True,
    'min_k': 5,
    'shrinkage': 100
}
model_user = KNNBasic(sim_options=sim_options_user)
model_user.fit(trainset)
predictions_user = model_user.test(testset)

print("User-Based Collaborative Filtering:")
rmse_user = accuracy.rmse(predictions_user)
mae_user = accuracy.mae(predictions_user)
print(f"User-Based RMSE: {rmse_user}")
print(f"User-Based MAE: {mae_user}")

# Item-Based Collaborative Filtering
sim_options_item = {
    'name': 'pearson_baseline',
    'user_based': False,
    'min_k': 5,
    'shrinkage': 100
}
model_item = KNNBasic(sim_options=sim_options_item)
model_item.fit(trainset)
predictions_item = model_item.test(testset)

print("Item-Based Collaborative Filtering:")
rmse_item = accuracy.rmse(predictions_item)
mae_item = accuracy.mae(predictions_item)
print(f"Item-Based RMSE: {rmse_item}")
print(f"Item-Based MAE: {mae_item}")

# Baseline Model
baseline_prediction = ratings_df['rating'].mean()
baseline_predictions = [(uid, iid, true_r, baseline_prediction, {}) for (uid, iid, true_r) in testset]

baseline_rmse = accuracy.rmse(baseline_predictions, verbose=False)
baseline_mae = accuracy.mae(baseline_predictions, verbose=False)
print(f"Baseline RMSE: {baseline_rmse}")
print(f"Baseline MAE: {baseline_mae}")

Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
User-Based Collaborative Filtering:
RMSE: 0.2084
MAE:  0.1613
User-Based RMSE: 0.20842543613441006
User-Based MAE: 0.16125310927118833
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Item-Based Collaborative Filtering:
RMSE: 0.1954
MAE:  0.1492
Item-Based RMSE: 0.19539891708550408
Item-Based MAE: 0.1491597772391319
Baseline RMSE: 0.2279486540519928
Baseline MAE: 0.18214491824048465


Item-Based Collaborative Filtering (better in terms of RMSE and MAE.) > User-Based Collaborative Filtering > Baseline Model