# The Project

The project focuses on building and evaluating **recommender systems** using two types of models:

1. **Matrix Factorization Models (SVD, SVD++, ALS):**  
   - Decompose the user–item rating matrix into latent factors.  
   - Capture underlying patterns in user preferences and item characteristics.  
   - Provide accurate rating predictions and serve as strong baselines in recommender systems research.  

2. **Neural Collaborative Filtering Models (NCF, NeuMF):**  
   - Replace linear factorization with neural networks to learn non-linear interactions.  
   - Combine embeddings of users and items through deep layers (MLP) or hybrid architectures (NeuMF = GMF + MLP).  
   - Aim to outperform traditional matrix factorization by modeling complex relationships.  

The comparison of these two families of models demonstrates the evolution of recommender systems from classical approaches to modern deep learning methods.


# Dataset: MovieLens 100K

The experiments are based on the **MovieLens 100K dataset**, a widely used benchmark in recommender systems research.  

- **Size:** 100,000 ratings  
- **Users:** 943  
- **Movies:** 1,682  
- **Format:** tab-delimited files (CSV-like)

## Main Columns
- **userId** – unique identifier of each user (anonymized, 1–943).  
- **movieId** – unique identifier of each movie (1–1682).  
- **rating** – explicit rating from 1 to 5, where higher values indicate stronger preference.  
- **timestamp** – UNIX time indicating when the rating was made.  
- **title** (from `u.item`) – the name of the movie.  
- **genres** (from `u.item`) – one or more genres assigned to each movie (e.g., Action, Comedy).  

This dataset is small enough to allow fast experimentation, yet rich enough to demonstrate the strengths and weaknesses of different recommendation algorithms.


In [None]:
!pip uninstall -y scikit-surprise
!pip install numpy==1.26.4 --force-reinstall

In [None]:
!pip install scikit-surprise --no-binary scikit-surprise --quiet

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import zipfile
import pandas as pd

# Path to the ZIP file in Google Drive
zip_path = "/content/drive/MyDrive/Portfolio datasets/Recommender engine/ml-100k.zip"
extract_path = "/content/ml-100k"

# Extract the dataset
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(extract_path)

# Load the ratings (u.data)
data_path = f"{extract_path}/ml-100k/u.data"
df = pd.read_csv(
    data_path,
    sep="\t",
    names=["userId", "movieId", "rating", "timestamp"]
)

# Load the movie titles (u.item)
item_path = f"{extract_path}/ml-100k/u.item"
movies = pd.read_csv(
    item_path,
    sep="|",
    encoding="latin-1",
    header=None,
    usecols=[0, 1],
    names=["movieId", "title"]
)

# Merge ratings with movie titles
df_merged = pd.merge(df, movies, on="movieId")

print("Data shape:", df_merged.shape)
print(df_merged.head())


In [None]:
print("Unique users:", df_merged["userId"].nunique())
print("Unique movies:", df_merged["movieId"].nunique())


# Split by rating

In [None]:
import matplotlib.pyplot as plt

rating_counts = df_merged["rating"].value_counts().sort_index()

print(rating_counts)

rating_counts.plot(kind="bar")
plt.xlabel("Rating")
plt.ylabel("Count")
plt.title("Distribution of Ratings in MovieLens 100K")
plt.show()


#Recommender engines

## Matrix Factorization (SVD / SVD++ / ALS)

**How it works:**  
Decomposes the user–item rating matrix into latent factors (vectors for users and items).

**Advantages:**  
- Simple and very powerful  
- Serves as the foundation for many real-world recommender engines  

**Performance:**  
- Excellent results on MovieLens 100K  
- RMSE ~0.91–0.94 (SVD++ slightly better than regular SVD)  

**Limitation:**  
- Difficult to incorporate side information (content, genres, metadata)  


# Recommender System Evaluation with Cross-Validation

This code evaluates different recommender models (**SVD, SVD++, ALS**) on the **MovieLens 100K** dataset using **5-Fold Cross-Validation**.  
It computes both **rating accuracy (RMSE)** and **Top-K recommendation metrics (Precision, Recall, F1, NDCG, HitRate)** for K = 5, 10, 20.

---


## 1. Helper Function: `metrics_at_k`
- Groups predictions per user.  
- For each user:
  - Sorts predicted ratings in descending order.  
  - Defines "relevant items" as those with true rating ≥ 4.  
  - For each `k` in {5, 10, 20}:  
    - **Precision@k** = relevant recommended / k  
    - **Recall@k** = relevant recommended / total relevant  
    - **F1@k** = harmonic mean of Precision and Recall  
    - **NDCG@k** = quality of ranking (higher weight for relevant items at top)  
    - **HitRate@k** = whether at least one relevant item was recommended  

The function returns **average metrics across all users**.

---

## 2. Data Preparation
- Extracts only `userId`, `movieId`, `rating` from the merged MovieLens dataset.  
- Converts into Surprise `Dataset` with ratings in the range 1–5.

---

## 4. Models
- **SVD** – basic matrix factorization.  
- **SVD++** – improved version using implicit feedback.  
- **ALS (BaselineOnly)** – bias-based baseline using Alternating Least Squares.

---

## 5. Cross-Validation Loop
- Uses **5-Fold CV** (`KFold(n_splits=5)`).  
- For each fold and model:
  - Train on trainset, predict on testset.  
  - Compute **RMSE** on testset.  
  - Compute **Precision, Recall, F1, NDCG, HitRate** for K=5,10,20.  
  - Collect metrics across folds.

---

## 7. Final Output

| Model          | RMSE (mean) | Precision@5 | Recall@5 | F1@5  | NDCG@5 | HitRate@5 | Precision@10 | Recall@10 | F1@10 | NDCG@10 | HitRate@10 | Precision@20 | Recall@20 | F1@20 | NDCG@20 | HitRate@20 |
|----------------|-------------|-------------|----------|-------|--------|-----------|--------------|-----------|-------|---------|-------------|--------------|-----------|-------|---------|-------------|
| **SVD**        | **0.9349**  | 0.696       | 0.518    | 0.497 | 0.804  | 0.975     | 0.577        | 0.715     | 0.539 | 0.823   | 0.979       | 0.423        | 0.865     | 0.488 | 0.844   | 0.980       |
| **SVD++**      | **0.9195**  | 0.706       | 0.522    | 0.503 | 0.815  | 0.975     | 0.582        | 0.720     | 0.544 | 0.831   | 0.980       | 0.425        | 0.867     | 0.490 | 0.851   | 0.980       |
| **ALS (Base)** | **0.9436**  | 0.691       | 0.517    | 0.495 | 0.799  | 0.975     | 0.572        | 0.713     | 0.536 | 0.818   | 0.979       | 0.420        | 0.863     | 0.486 | 0.841   | 0.980       |

---

## Insights
- **SVD++ performed best overall**: lowest RMSE (0.9195) and strongest Top-K metrics (Precision, Recall, NDCG).  
- **SVD** was slightly weaker but still strong, with performance close to SVD++.  
- **ALS (Baseline)** provided a solid baseline but consistently underperformed compared to SVD and SVD++.  
- Across all models:
  - **HitRate@5,10,20 ≈ 0.98** → almost every user got at least one relevant recommendation in the top list.  
  - **Recall increases with K** (users see more relevant items as K grows).  
  - **Precision decreases with K** (top-5 is more precise than top-20).  



# Conclusions from Results

1. **Overall Performance**
   - The models achieved strong performance on the MovieLens 100K dataset.
   - All models reached a **HitRate@K ≈ 0.98**, meaning almost every user received at least one relevant recommendation in the top list.

2. **Model Comparison**
   - **SVD++** delivered the best performance:
     - Lowest RMSE (≈ 0.918)
     - Highest Precision, Recall, and NDCG across all K values
   - **SVD** performed slightly worse (RMSE ≈ 0.936) but remained competitive, showing it is still a strong baseline.
   - **ALS (Baseline)** was consistently weaker (RMSE ≈ 0.944) but provides a fast and simple benchmark.

3. **Top-K Behavior**
   - **Precision@K decreases** as K increases (e.g., ~0.71 at K=5 vs. ~0.42 at K=20).  
     → More recommendations mean less accuracy per item.
   - **Recall@K increases** with K (e.g., ~0.52 at K=5 vs. ~0.87 at K=20).  
     → Longer recommendation lists capture more relevant items.
   - **NDCG values** confirm that relevant items are ranked near the top, especially for SVD++.

4. **Key Insight**
   - **SVD++ is the best choice** for this dataset, balancing both rating prediction accuracy (RMSE) and recommendation quality (Top-K metrics).
   - Traditional **SVD** is a solid and simpler alternative.
   - **ALS** can serve as a baseline but should not be the final choice for production.

---


In [None]:
import numpy as np
import pandas as pd
from surprise import Dataset, Reader, SVD, SVDpp, BaselineOnly, accuracy
from surprise.model_selection import KFold
from collections import defaultdict

# === Helper: compute metrics for multiple K values ===
def metrics_at_k(predictions, ks=[5, 10, 20]):
    user_ratings = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        user_ratings[uid].append((iid, est, true_r))

    results = {k: {"Precision": [], "Recall": [], "F1": [], "NDCG": [], "HitRate": []} for k in ks}

    for uid, ratings in user_ratings.items():
        ratings.sort(key=lambda x: x[1], reverse=True)

        # Relevant = rating >= 4
        rel = [r for (_, _, r) in ratings if r >= 4]
        n_rel = len(rel)

        for k in ks:
            top_k = ratings[:k]
            rec = [iid for (iid, _, r) in top_k if r >= 4]
            n_rel_and_rec_k = len(rec)

            precision = n_rel_and_rec_k / k if k > 0 else 0
            recall = n_rel_and_rec_k / n_rel if n_rel > 0 else 0
            f1 = (2 * precision * recall / (precision + recall)) if (precision + recall) > 0 else 0

            dcg = sum([1 / np.log2(idx+2) for idx, (iid, _, r) in enumerate(top_k) if r >= 4])
            idcg = sum([1 / np.log2(idx+2) for idx in range(min(n_rel, k))])
            ndcg = dcg / idcg if idcg > 0 else 0

            hit = 1 if n_rel_and_rec_k > 0 else 0

            results[k]["Precision"].append(precision)
            results[k]["Recall"].append(recall)
            results[k]["F1"].append(f1)
            results[k]["NDCG"].append(ndcg)
            results[k]["HitRate"].append(hit)

    # Average across users
    return {
        k: {m: np.mean(vals) for m, vals in metrics.items()}
        for k, metrics in results.items()
    }

# === Prepare data ===
ratings_df = df_merged[["userId", "movieId", "rating"]]
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings_df, reader)

# === Models ===
models = {
    "SVD": SVD(),
    "SVD++": SVDpp(),
    "ALS (Baseline)": BaselineOnly()
}

# === Cross-validation ===
kf = KFold(n_splits=5, random_state=42, shuffle=True)
results = []

for name, model in models.items():
    print(f"\n=== Training {name} ===")
    rmses = []
    metrics_all = {5: [], 10: [], 20: []}

    for fold, (trainset, testset) in enumerate(kf.split(data), 1):
        print(f" Fold {fold} ...")
        model.fit(trainset)
        predictions = model.test(testset)

        # RMSE
        rmse = accuracy.rmse(predictions, verbose=False)
        print(f"   RMSE: {rmse:.4f}")
        rmses.append(rmse)

        # Top-K metrics
        metrics = metrics_at_k(predictions, ks=[5, 10, 20])
        for k in metrics:
            metrics_all[k].append(metrics[k])

    # Aggregate
    row = {"Model": name, "RMSE (mean)": np.mean(rmses)}
    for k in [5, 10, 20]:
        avg_metrics = {m+f"@{k}": np.mean([fold[m] for fold in metrics_all[k]]) for m in metrics_all[k][0]}
        row.update(avg_metrics)
    results.append(row)

# === Final results ===
results_df = pd.DataFrame(results)
print("\n=== 5-Fold Cross-Validation Results (Aggregated) ===")
print(results_df)


# NeuMF 5-Fold Cross-Validation: Code Overview

## What the code does
This script trains and evaluates a **NeuMF (Neural Matrix Factorization)** recommender on MovieLens 100K using **5-fold cross-validation**, reporting both **RMSE** and **Top-K ranking metrics**.

## Data handling
- Expects a pre-built `df_merged` with `userId`, `movieId`, `rating`.
- Converts `userId`/`movieId` to zero-based indices.
- Wraps samples in a `RatingsDataset` and uses PyTorch `DataLoader` for batching.

## Model architecture (NeuMF)
- **GMF branch:** user/item embeddings with element-wise product to capture linear interactions.
- **MLP branch:** separate user/item embeddings concatenated and passed through fully-connected layers (default hidden sizes: `[64, 32, 16]`) with ReLU.
- **Fusion:** concatenation of GMF output and MLP output, followed by a final linear layer to predict a rating.
- Default embedding sizes: `emb_size_gmf=32`, `emb_size_mlp=32`.

## Training setup
- Optimizer: Adam (`lr=0.001`).
- Loss: Mean Squared Error (predicting explicit ratings 1–5).
- Epochs: `5` per fold.
- Batch size: `512`.
- For each fold, a **fresh NeuMF model is initialized**, trained, and evaluated.

## Evaluation
- **RMSE** on the test split of each fold.
- **Top-K metrics** computed by `metrics_at_k` for K ∈ {5, 10, 20}:
  - Precision@K, Recall@K, F1@K
  - NDCG@K
  - HitRate@K
- An item is considered **relevant** if `true_rating ≥ 4`.
- The script prints per-fold results and then an aggregated mean across folds.

---

# NeuMF 5-Fold Cross-Validation: Results and Conclusions

## Results

### Training and Test RMSE per Fold
- **Fold 1:** Final Train RMSE = 0.9789, Test RMSE = 0.9936  
- **Fold 2:** Final Train RMSE = 0.9802, Test RMSE = 0.9905  
- **Fold 3:** Final Train RMSE = 0.9787, Test RMSE = 0.9999  
- **Fold 4:** Final Train RMSE = 0.9801, Test RMSE = 0.9971  
- **Fold 5:** Final Train RMSE = 0.9764, Test RMSE = 0.9987  

### Averaged Results across 5 Folds
- **RMSE:** 0.996  
- **Precision@5:** 0.673  
- **Recall@5:** 0.505  
- **F1@5:** 0.483  
- **NDCG@5:** 0.777  
- **HitRate@5:** 0.974  
- **Precision@10:** 0.559  
- **Recall@10:** 0.703  
- **F1@10:** 0.525  
- **NDCG@10:** 0.798  
- **HitRate@10:** 0.979  
- **Precision@20:** 0.412  
- **Recall@20:** 0.855  
- **F1@20:** 0.478  
- **NDCG@20:** 0.824  
- **HitRate@20:** 0.980  

## Conclusions

1. **Overall Performance**
   - NeuMF achieved stable performance across all folds with consistent RMSE and Top-K metrics.
   - HitRate remained very high (≈0.98) for all K, showing nearly every user received at least one relevant recommendation.

2. **Accuracy**
   - RMSE averaged around 0.996, which is higher (worse) than SVD++ (≈0.918) and SVD (≈0.936).
   - Indicates NeuMF did not outperform matrix factorization under this training setup.

3. **Top-K Metrics**
   - Precision@K decreases as K increases (from ≈0.67 at K=5 down to ≈0.41 at K=20).
   - Recall@K increases with K (from ≈0.51 at K=5 up to ≈0.86 at K=20).
   - NDCG values confirm that relevant items are ranked relatively high, with best results at K=20.

4. **Key Insight**
   - While NeuMF provides solid recommendations, its RMSE is worse than classical matrix factorization methods in this experiment.
   - With more epochs, larger embeddings, and hyperparameter tuning, NeuMF is expected to improve and potentially surpass SVD++.


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
from sklearn.model_selection import KFold
from collections import defaultdict

# === Dataset Wrapper ===
class RatingsDataset(Dataset):
    def __init__(self, df):
        self.users = torch.tensor(df["userId"].values, dtype=torch.long)
        self.items = torch.tensor(df["movieId"].values, dtype=torch.long)
        self.ratings = torch.tensor(df["rating"].values, dtype=torch.float32)

    def __len__(self):
        return len(self.ratings)

    def __getitem__(self, idx):
        return self.users[idx], self.items[idx], self.ratings[idx]

# === NeuMF Model ===
class NeuMF(nn.Module):
    def __init__(self, n_users, n_items, emb_size_gmf=32, emb_size_mlp=32, hidden=[64,32,16]):
        super(NeuMF, self).__init__()

        # GMF embeddings
        self.user_emb_gmf = nn.Embedding(n_users, emb_size_gmf)
        self.item_emb_gmf = nn.Embedding(n_items, emb_size_gmf)

        # MLP embeddings
        self.user_emb_mlp = nn.Embedding(n_users, emb_size_mlp)
        self.item_emb_mlp = nn.Embedding(n_items, emb_size_mlp)

        # MLP layers
        mlp_layers = []
        input_size = emb_size_mlp * 2
        for h in hidden:
            mlp_layers.append(nn.Linear(input_size, h))
            mlp_layers.append(nn.ReLU())
            input_size = h
        self.mlp = nn.Sequential(*mlp_layers)

        # Final prediction layer
        self.output = nn.Linear(emb_size_gmf + hidden[-1], 1)

    def forward(self, users, items):
        gmf_u = self.user_emb_gmf(users)
        gmf_i = self.item_emb_gmf(items)
        gmf = gmf_u * gmf_i

        mlp_u = self.user_emb_mlp(users)
        mlp_i = self.item_emb_mlp(items)
        mlp = self.mlp(torch.cat([mlp_u, mlp_i], dim=1))

        x = torch.cat([gmf, mlp], dim=1)
        return self.output(x).squeeze()

# === Helper: Top-K metrics ===
def metrics_at_k(users, items, ratings, preds, ks=[5,10,20]):
    user_ratings = defaultdict(list)
    for u, i, r, p in zip(users, items, ratings, preds):
        user_ratings[int(u)].append((i, p, r))

    results = {k: {"Precision": [], "Recall": [], "F1": [], "NDCG": [], "HitRate": []} for k in ks}

    for uid, ratings in user_ratings.items():
        ratings.sort(key=lambda x: x[1], reverse=True)
        rel = [r for (_, _, r) in ratings if r >= 4]
        n_rel = len(rel)

        for k in ks:
            top_k = ratings[:k]
            rec = [iid for (iid, _, r) in top_k if r >= 4]
            n_rel_and_rec_k = len(rec)

            precision = n_rel_and_rec_k / k if k > 0 else 0
            recall = n_rel_and_rec_k / n_rel if n_rel > 0 else 0
            f1 = (2*precision*recall / (precision+recall)) if (precision+recall)>0 else 0

            dcg = sum([1/np.log2(idx+2) for idx,(iid,_,r) in enumerate(top_k) if r>=4])
            idcg = sum([1/np.log2(idx+2) for idx in range(min(n_rel, k))])
            ndcg = dcg/idcg if idcg>0 else 0

            hit = 1 if n_rel_and_rec_k>0 else 0

            results[k]["Precision"].append(precision)
            results[k]["Recall"].append(recall)
            results[k]["F1"].append(f1)
            results[k]["NDCG"].append(ndcg)
            results[k]["HitRate"].append(hit)

    return {k: {m: np.mean(vals) for m, vals in metrics.items()} for k, metrics in results.items()}

# === Load ratings data (from df_merged) ===
ratings_df = df_merged[["userId", "movieId", "rating"]].copy()
ratings_df["userId"] -= 1
ratings_df["movieId"] -= 1

n_users = ratings_df["userId"].nunique()
n_items = ratings_df["movieId"].nunique()

# === 5-Fold Cross Validation ===
kf = KFold(n_splits=5, shuffle=True, random_state=42)
results = []

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

for fold, (train_idx, test_idx) in enumerate(kf.split(ratings_df), 1):
    print(f"\n=== Fold {fold} ===")
    train_df = ratings_df.iloc[train_idx]
    test_df = ratings_df.iloc[test_idx]

    train_loader = DataLoader(RatingsDataset(train_df), batch_size=512, shuffle=True)
    test_loader = DataLoader(RatingsDataset(test_df), batch_size=512, shuffle=False)

    # Init new NeuMF each fold
    model = NeuMF(n_users, n_items).to(device)
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    # Training
    epochs = 5
    for epoch in range(epochs):
        model.train()
        train_loss = 0
        for users, items, ratings in train_loader:
            users, items, ratings = users.to(device), items.to(device), ratings.to(device)
            optimizer.zero_grad()
            preds = model(users, items)
            loss = criterion(preds, ratings)
            loss.backward()
            optimizer.step()
            train_loss += loss.item() * len(ratings)
        print(f"  Epoch {epoch+1}/{epochs}, Train RMSE: {np.sqrt(train_loss/len(train_df)):.4f}")

    # Evaluation
    model.eval()
    test_preds, test_truth, test_users, test_items = [], [], [], []
    with torch.no_grad():
        for users, items, ratings in test_loader:
            users, items = users.to(device), items.to(device)
            preds = model(users, items).cpu().numpy()
            test_preds.extend(preds)
            test_truth.extend(ratings.numpy())
            test_users.extend(users.cpu().numpy())
            test_items.extend(items.cpu().numpy())

    rmse = np.sqrt(np.mean((np.array(test_preds) - np.array(test_truth))**2))
    print(f"  Fold {fold} RMSE: {rmse:.4f}")

    # Top-K
    metrics = metrics_at_k(test_users, test_items, test_truth, test_preds, ks=[5,10,20])

    row = {"Fold": fold, "RMSE": rmse}
    for k in [5,10,20]:
        for m,v in metrics[k].items():
            row[m+f"@{k}"] = v
    results.append(row)

# === Aggregate Results ===
results_df = pd.DataFrame(results)
print("\n=== NeuMF 5-Fold CV Results (Per Fold) ===")
print(results_df)

avg_results = results_df.mean(numeric_only=True)
print("\n=== NeuMF 5-Fold CV Results (Averaged) ===")
print(avg_results)
