## Implement collaborative filtering recommender that predicts user rating for an item.

- Test different configurations (e.g. different number of nearest neighbors, different similarities)
- Evaluate them by usage of (implemented by yourself) MAE and RMSE
- Choose between cross-validation and hold-out validation to perform you evaluation.

### About the dataset:
This data set consists of:
	* 100,000 ratings (1-5) from 943 users on 1682 movies. 
	* Each user has rated at least 20 movies. 
        * Simple demographic info for the users (age, gender, occupation, zip)

In [15]:
# %pip install --upgrade scikit-surprise numpy
import pandas as pd
from surprise import Dataset, Reader, Prediction
from surprise import KNNBasic, BaselineOnly
from surprise.model_selection import cross_validate
from surprise.model_selection import KFold
from surprise import accuracy
import numpy as np

In [8]:
# Load the MovieLens 100k dataset (standard in Surprise)
data = Dataset.load_builtin('ml-100k')

print("Data loaded successfully.")

Data loaded successfully.


In [4]:
data

<surprise.dataset.DatasetAutoFolds at 0x11cc62d50>

In [5]:
# custom performance metrics
def calculate_mae(predictions):
    """Calculates Mean Absolute Error (MAE) for a list of predictions."""
    if not predictions:
        return 0
    # The 'r_ui' is the true rating, and 'est' is the estimated rating.
    errors = [abs(true_rating - est_rating) for (_, _, true_rating, est_rating, _) in predictions]
    return np.mean(errors)

def calculate_rmse(predictions):
    """Calculates Root Mean Squared Error (RMSE) for a list of predictions."""
    if not predictions:
        return 0
    # The 'r_ui' is the true rating, and 'est' is the estimated rating.
    squared_errors = [(true_rating - est_rating)**2 for (_, _, true_rating, est_rating, _) in predictions]
    return np.sqrt(np.mean(squared_errors))

### Testing and Evaluation
We will use 5-fold cross-validation and test different combinations of collaborative filtering settings:
- Similarity Metrics: cosine, MSD, and pearson
- Number of Neighbors (k): 20, 40, and 60

#### We will use the User-Based Collaborative Filtering approach (user_based=True) with the KNNBasic algorithm.

In [6]:
import itertools

# Define cross-validation splitter
kf = KFold(n_splits=5, random_state=42, shuffle=True)

# Define configurations to test
similarity_metrics = ['cosine', 'MSD', 'pearson']
k_values = [20, 40, 60]

# Generate all combinations (Cartesian product)
combinations = list(itertools.product(similarity_metrics, k_values))

# Build the final list of configuration dictionaries using a list comprehension
configurations = [
    {
        'sim_options': {'name': sim, 'user_based': True},
        'k': k
    }
    for sim, k in combinations
]

results = []

print("Starting cross-validation and evaluation for all configurations...")

for config in configurations:
    sim_name = config['sim_options']['name']
    k_val = config['k']

    # Set the algorithm with current configuration
    algo = KNNBasic(k=k_val, sim_options=config['sim_options'], random_state=42)

    # Perform 5-fold cross-validation
    mae_list = []
    rmse_list = []

    for trainset, testset in kf.split(data):
        # Train the algorithm
        algo.fit(trainset)

        # Make predictions on the test set
        predictions = algo.test(testset)

        # Calculate MAE and RMSE using the custom functions
        mae_fold = calculate_mae(predictions)
        rmse_fold = calculate_rmse(predictions)

        mae_list.append(mae_fold)
        rmse_list.append(rmse_fold)

    # Calculate average MAE and RMSE over all folds
    avg_mae = np.mean(mae_list)
    avg_rmse = np.mean(rmse_list)

    # Store results
    results.append({
        'Similarity': sim_name,
        'k (Neighbors)': k_val,
        'Mean MAE': avg_mae,
        'Mean RMSE': avg_rmse
    })

print("Evaluation complete.")

# Convert results to a DataFrame for clean display
results_df = pd.DataFrame(results)

Starting cross-validation and evaluation for all configurations...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine 

In [7]:
results_df

Unnamed: 0,Similarity,k (Neighbors),Mean MAE,Mean RMSE
0,cosine,20,0.809675,1.02523
1,cosine,40,0.80439,1.017077
2,cosine,60,0.804802,1.016486
3,MSD,20,0.770803,0.976807
4,MSD,40,0.773483,0.978931
5,MSD,60,0.778333,0.983958
6,pearson,20,0.808883,1.019592
7,pearson,40,0.803166,1.011994
8,pearson,60,0.802392,1.010806


Best performing combination: MSD similarity, 20 Neighbors. MAE = 0.770803, RMSE = 0.976807

#### Baseline model - Naive predictor

In [13]:
# We use BaselineOnly, which predicts r_ui = global_mean + user_bias + item_bias, item_bias is related to the item's average rating.
baseline = []

algo = BaselineOnly()

mae_list = []
rmse_list = []

for trainset, testset in kf.split(data):
    algo.fit(trainset)
    predictions = algo.test(testset)

    # Use the custom-implemented functions
    mae_fold = calculate_mae(predictions)
    rmse_fold = calculate_rmse(predictions)

    mae_list.append(mae_fold)
    rmse_list.append(rmse_fold)

# Calculate average MAE and RMSE over all folds
avg_mae = np.mean(mae_list)
avg_rmse = np.mean(rmse_list)

# Store results
baseline.append({
    'Mean MAE': avg_mae,
    'Mean RMSE': avg_rmse
})

print("Evaluation complete.")

baseline_df = pd.DataFrame(baseline)

Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Estimating biases using als...
Evaluation complete.


In [14]:
baseline_df

Unnamed: 0,Mean MAE,Mean RMSE
0,0.748089,0.943637


In [18]:
def get_item_average_predictions(trainset, testset):
    """
    Predicts a rating for a user/item pair solely as the average of
    that item's ratings in the training set.
    """
    # Calculate global mean for cases where an item is new (cold start)
    global_mean = trainset.global_mean

    # 1. Calculate item averages from the training set
    item_ratings = {}
    # trainset.all_ratings() returns inner IDs (iid)
    for uid, iid, rating in trainset.all_ratings():
        # iid is the inner item ID here, which we use as the key
        if iid not in item_ratings:
            item_ratings[iid] = []
        item_ratings[iid].append(rating)

    item_averages = {
        iid: np.mean(ratings)
        for iid, ratings in item_ratings.items()
    }

    # 2. Generate predictions for the test set
    predictions = []
    for uid, iid, true_r in testset:
        # Get the predicted rating (est)
        # iid is already the inner ID, which is the key in item_averages.
        # We look up the average rating for the item, falling back to global mean if the item is not in the training set.
        est = item_averages.get(iid, global_mean)

        # Create a Prediction object (required format for custom MAE/RMSE)
        # We convert inner IDs back to raw IDs for the final Prediction object
        pred = Prediction(uid=trainset.to_raw_uid(uid), iid=trainset.to_raw_iid(iid), r_ui=true_r, est=est, details={})
        predictions.append(pred)

    return predictions

In [19]:
predictions = get_item_average_predictions(trainset, testset)

baseline_avg = []
mae_list = []
rmse_list = []

for trainset, testset in kf.split(data):
    algo.fit(trainset)
    predictions = algo.test(testset)

    # Use the custom-implemented functions
    mae_fold = calculate_mae(predictions)
    rmse_fold = calculate_rmse(predictions)

    mae_list.append(mae_fold)
    rmse_list.append(rmse_fold)

# Calculate average MAE and RMSE over all folds
avg_mae = np.mean(mae_list)
avg_rmse = np.mean(rmse_list)

# Store results
baseline_avg.append({
    'Mean MAE': avg_mae,
    'Mean RMSE': avg_rmse
})

print("Evaluation complete.")

baseline_avg_df = pd.DataFrame(baseline_avg)

ValueError: 907 is not a valid inner id.

In [20]:
import pandas as pd
from surprise import Dataset, Prediction
from surprise import KNNBasic
from surprise.model_selection import KFold
import numpy as np
import itertools
import warnings

# Filter out common pandas/surprise warnings during cross-validation runs
warnings.filterwarnings('ignore')

# --- 1. Custom Evaluation Functions ---
def calculate_mae(predictions):
    """Calculates Mean Absolute Error (MAE) for a list of predictions."""
    if not predictions:
        return 0
    # The 'r_ui' is the true rating, and 'est' is the estimated rating.
    errors = [abs(true_rating - est_rating) for (_, _, true_rating, est_rating, _) in predictions]
    return np.mean(errors)

def calculate_rmse(predictions):
    """Calculates Root Mean Squared Error (RMSE) for a list of predictions."""
    if not predictions:
        return 0
    # The 'r_ui' is the true rating, and 'est' is the estimated rating.
    squared_errors = [(true_rating - est_rating)**2 for (_, _, true_rating, est_rating, _) in predictions]
    return np.sqrt(np.mean(squared_errors))

# --- 2. Custom Item Average Predictor Function ---
def get_item_average_predictions(trainset, testset):
    """
    Predicts a rating for a user/item pair solely as the average of
    that item's ratings in the training set.
    """
    # Calculate global mean for cases where an item is new (cold start)
    global_mean = trainset.global_mean

    # 1. Calculate item averages from the training set
    item_ratings = {}
    # trainset.all_ratings() returns inner IDs (iid)
    for uid, iid, rating in trainset.all_ratings():
        # iid is the inner item ID here, which we use as the key
        if iid not in item_ratings:
            item_ratings[iid] = []
        item_ratings[iid].append(rating)

    item_averages = {
        iid: np.mean(ratings)
        for iid, ratings in item_ratings.items()
    }

    # 2. Generate predictions for the test set
    predictions = []
    for uid, iid, true_r in testset:
        # Get the predicted rating (est)
        # iid is already the inner ID, which is the key in item_averages.
        # We look up the average rating for the item, falling back to global mean if the item is not in the training set.
        est = item_averages.get(iid, global_mean)

        # Create a Prediction object (required format for custom MAE/RMSE)
        # We convert inner IDs back to raw IDs for the final Prediction object
        pred = Prediction(uid=trainset.to_raw_uid(uid), iid=trainset.to_raw_iid(iid), r_ui=true_r, est=est, details={})
        predictions.append(pred)

    return predictions


# --- 3. Setup and Data Loading ---
print("Loading MovieLens 100k data...")
data = Dataset.load_builtin('ml-100k')
kf = KFold(n_splits=5, random_state=42, shuffle=True)
results = []
all_folds_predictions = []

# --- 4. Configuration Generation using itertools ---
similarity_metrics = ['cosine', 'MSD', 'pearson']
k_values = [20, 40, 60]
cf_combinations = list(itertools.product(similarity_metrics, k_values))

cf_configurations = [
    {
        'Algorithm': 'KNNBasic',
        'Model Details': f"{sim} (k={k})",
        'sim_options': {'name': sim, 'user_based': True},
        'k': k
    }
    for sim, k in cf_combinations
]

# --- 5. Evaluation Loop ---
print(f"Starting evaluation of {len(cf_configurations)} CF models...")

for config in cf_configurations:
    # --- Collaborative Filtering (KNNBasic) Evaluation ---
    algo = KNNBasic(k=config['k'], sim_options=config['sim_options'], random_state=42)
    mae_list = []
    rmse_list = []

    for trainset, testset in kf.split(data):
        algo.fit(trainset)
        predictions = algo.test(testset)

        mae_list.append(calculate_mae(predictions))
        rmse_list.append(calculate_rmse(predictions))

    results.append({
        'Algorithm': config['Algorithm'],
        'Model Details': config['Model Details'],
        'Mean MAE': np.mean(mae_list),
        'Mean RMSE': np.mean(rmse_list)
    })

# --- 6. Naive Baseline Evaluation ---
print("Starting evaluation of the Naive Item Average Baseline...")
naive_mae_list = []
naive_rmse_list = []

for trainset, testset in kf.split(data):
    # Get predictions using the custom function
    predictions = get_item_average_predictions(trainset, testset)

    naive_mae_list.append(calculate_mae(predictions))
    naive_rmse_list.append(calculate_rmse(predictions))

results.append({
    'Algorithm': 'Naive Baseline',
    'Model Details': 'Strict Item Average',
    'Mean MAE': np.mean(naive_mae_list),
    'Mean RMSE': np.mean(naive_rmse_list)
})

print("Evaluation complete.")

# --- 7. Final Results Summary ---
results_df = pd.DataFrame(results).sort_values(by='Mean RMSE')
print("\n--- Summary of All Results ---")
print(results_df.to_markdown(index=False))

# --- Finding Best CF and Baseline for Reporting ---
best_cf = results_df[results_df['Algorithm'] == 'KNNBasic'].iloc[0]
naive_baseline = results_df[results_df['Algorithm'] == 'Naive Baseline'].iloc[0]

# These are the original CF results the user provided, used for the report comparison
original_cf_best = {'Mean MAE': 0.770803, 'Mean RMSE': 0.976807}

print("\nBest Collaborative Filtering (CF) Model (from original results):")
print(f"  RMSE: {original_cf_best['Mean RMSE']:.4f}, MAE: {original_cf_best['Mean MAE']:.4f}")
print("Strict Naive Baseline Model (New):")
print(f"  RMSE: {naive_baseline['Mean RMSE']:.4f}, MAE: {naive_baseline['Mean MAE']:.4f}")

Loading MovieLens 100k data...
Starting evaluation of 9 CF models...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing the cosin

ValueError: 877 is not a valid inner id.