<a href="https://colab.research.google.com/github/harishkulkarni10/ecommerce-session-recommender/blob/main/3_Baseline_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Notebook 3 — Baseline Models

**Goal:**  
To build and evaluate non-neural baseline recommenders as performance benchmarks for future deep models.

### Why Baselines Matter
Before building GRU or Transformer models, it’s important to know how simple models perform.  
If a deep model can’t beat a simple popularity or co-occurrence baseline, it’s not worth deploying.

### Baselines Implemented
1. **Popularity-Based Recommender** — Recommends globally popular products.  
2. **Item-Based kNN Recommender** — Recommends products similar to those viewed in the session (based on co-occurrence).

### Evaluation Metrics
- **Recall@K:** Whether the true next item appears in the top-K recommendations.  
- **MRR@K:** How highly ranked the true next item is among recommendations.

### Outputs
- Baseline metrics (`baseline_metrics.json`)
- Example predictions (`baseline_predictions.csv`)
- Ready for deep model benchmarking in Notebook 4.


In [9]:
# STEP 3.1 — Mount Drive and load session data

from google.colab import drive
import os, pickle, pandas as pd

# Mount Google Drive
drive.mount('/content/drive', force_remount=True)

# Define project root
PROJECT_ROOT = '/content/drive/MyDrive/Data Science course/Major Projects/Projects/e-commerce recommender/diginetica_recommender_project/'

# Define paths
SESSION_PATH = os.path.join(PROJECT_ROOT, 'data/sessions/')
CLEANED_PATH = os.path.join(PROJECT_ROOT, 'data/cleaned/')

# Load session data
with open(os.path.join(SESSION_PATH, 'sessions.pkl'), 'rb') as f:
    sessions = pickle.load(f)

# Load split info (train/val/test)
with open(os.path.join(SESSION_PATH, 'split.pkl'), 'rb') as f:
    split = pickle.load(f)

print(f"Loaded sessions: {len(sessions)} total")
print("Split info keys:", list(split.keys()))
print(f"Train sessions: {len(split['train'])}, Val: {len(split['val'])}, Test: {len(split['test'])}")

# Quick check
sample_sid = list(sessions.keys())[0]
print(f"\n🛍 Example Session {sample_sid} → {sessions[sample_sid][:10]} (showing first 10 items)")


Mounted at /content/drive
Loaded sessions: 16397 total
Split info keys: ['train', 'val', 'test']
Train sessions: 13117, Val: 1640, Test: 1640

🛍 Example Session 1 → [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] (showing first 10 items)


In [10]:
# STEP 3.2 Popularity Based Recommender
from collections import Counter

# Extract train sessions
train_sids = set(split['train'])

# Count item frequencies across all "train" sessions
pop_counter = Counter()
for sid in train_sids:
  items = sessions.get(sid)
  if items:
    pop_counter.update(items)

# Sort items by descending order of popularity
popular_items = [item for item, _ in pop_counter.most_common()]

print(f"Counted {len(popular_items)} unique items from training sessions...")
print(f"Top 10 most popular item indices: {popular_items[:10]}")

# Quick look at item frequencies
top10 = list(pop_counter.most_common(10))
print("\nTop 10 Items (item_idx, count):")
for itm, cnt in top10:
    print(f"Item {itm:6d} → viewed {cnt} times")

Counted 25970 unique items from training sessions...
Top 10 most popular item indices: [18, 557, 1003, 894, 1392, 86, 896, 1309, 1049, 2985]

Top 10 Items (item_idx, count):
Item     18 → viewed 89 times
Item    557 → viewed 73 times
Item   1003 → viewed 69 times
Item    894 → viewed 64 times
Item   1392 → viewed 62 times
Item     86 → viewed 60 times
Item    896 → viewed 55 times
Item   1309 → viewed 55 times
Item   1049 → viewed 54 times
Item   2985 → viewed 54 times


In [11]:
# STEP 3.3 Evaluate
import numpy as np, json, pandas as pd
from tqdm import tqdm
import os

# Evaluation functions
def recall_at_k(preds, target, k):
    return 1.0 if target in preds[:k] else 0.0

def mrr_at_k(preds, target, k):
    for i, p in enumerate(preds[:k], start=1):
        if p == target:
            return 1.0 / i
    return 0.0

# Build test examples: (session, input_sequence, target_item)
test_sids = set(split['test'])
test_examples = []
for sid in test_sids:
    seq = sessions.get(sid)
    if not seq or len(seq) < 2:
        continue
    input_seq = seq[:-1]  # context (history)
    target = seq[-1]      # true next item
    test_examples.append((sid, input_seq, target))

print(f"Prepared {len(test_examples)} test examples.")

# Evaluate on test set
K = 20
recalls, mrrs = [], []
for sid, inp, tgt in tqdm(test_examples, desc="Evaluating Popularity Model"):
    preds = popular_items  # same for all sessions
    recalls.append(recall_at_k(preds, tgt, K))
    mrrs.append(mrr_at_k(preds, tgt, K))

# Compute metrics
recall_k = np.mean(recalls)
mrr_k = np.mean(mrrs)

# Save metrics
results_dir = os.path.join(PROJECT_ROOT, 'results')
os.makedirs(results_dir, exist_ok=True)

pop_metrics = {
    "model": "popularity_baseline",
    "K": K,
    "recall@K": round(float(recall_k), 4),
    "mrr@K": round(float(mrr_k), 4),
    "num_test_sessions": len(test_examples)
}

metrics_fp = os.path.join(results_dir, 'baseline_metrics.json')
with open(metrics_fp, 'w') as f:
    json.dump(pop_metrics, f, indent=2)
print(f"Metrics saved to: {metrics_fp}")

# save sample predictions
sample_preds = pd.DataFrame({
    "session_id": [sid for sid, _, _ in test_examples[:10]],
    "ground_truth": [tgt for _, _, tgt in test_examples[:10]],
    "top5_predictions": [popular_items[:5]] * 10
})
sample_preds_fp = os.path.join(results_dir, 'baseline_predictions.csv')
sample_preds.to_csv(sample_preds_fp, index=False)
print(f"Sample predictions saved to: {sample_preds_fp}")

# results
print("\nPopularity Baseline Results")
for k, v in pop_metrics.items():
    print(f"{k:20}: {v}")

Prepared 1640 test examples.


Evaluating Popularity Model: 100%|██████████| 1640/1640 [00:00<00:00, 333931.67it/s]

Metrics saved to: /content/drive/MyDrive/Data Science course/Major Projects/Projects/e-commerce recommender/diginetica_recommender_project/results/baseline_metrics.json
Sample predictions saved to: /content/drive/MyDrive/Data Science course/Major Projects/Projects/e-commerce recommender/diginetica_recommender_project/results/baseline_predictions.csv

Popularity Baseline Results
model               : popularity_baseline
K                   : 20
recall@K            : 0.0116
mrr@K               : 0.0028
num_test_sessions   : 1640





## Summary — Popularity-Based Recommender

**Objective:**  
Build a simple global popularity model as a baseline for evaluating more advanced recommender systems.

### Key Steps
1. Counted item occurrences across all **training sessions**.  
2. Ranked items by frequency to obtain a **Top-K popularity list**.  
3. Evaluated model on test sessions using:
   - **Recall@20** — 0.0116  
   - **MRR@20** — 0.0028  

### Insights
- This baseline reflects non-personalized “Top Trending Items.”  
- Despite simplicity, it provides a strong **benchmark floor**.  
- All future models (GRU4Rec, SASRec) should outperform these numbers to justify added complexity.

### Outputs
| File | Description |
|------|--------------|
| `results/baseline_metrics.json` | Stored evaluation metrics for comparison |
| `results/baseline_predictions.csv` | Example Top-K predictions for test sessions |

Next → **Item-Based kNN Recommender**, which introduces **session-level personalization** by leveraging item co-occurrence patterns.


##### **Item based kNN Recommender**
|               |                                                             |
| -------------------- | ----------------------------------------------------------------------- |
| **Goal**             | Recommend items similar to those in the current session                 |
| **Similarity Basis** | Co-occurrence of items within sessions                                  |
| **Algorithm**        | Compute item-item cosine similarity; aggregate scores for session items |
| **Evaluation**       | Recall@K and MRR@K (same as popularity)                                 |
| **Output**           | Personalized Top-K recommendations                                      |


## Step 3.5 — Item-Based kNN Recommender

### Objective
To build a **personalized, session-aware recommender** that suggests products based on their co-occurrence with other products in the same session.

Unlike the **Popularity-Based Recommender**, which recommends the same top items to every user, this model captures **relationships between products**.  
It uses a *k-nearest neighbors (k-NN)* approach, where “neighbors” are **items that are often viewed together** in the same session.

---

### Why Item-Based kNN?
In our dataset, users are mostly anonymous (identified by sessions, not persistent user IDs).  
Hence, **User-Based Collaborative Filtering** isn’t suitable.  
Instead, we apply **Item-Based Collaborative Filtering (CF)** which leverages co-occurrence patterns within sessions.

---

### Core Idea
- If many sessions contain both *Item A* and *Item B*,  
  → then A and B are considered *similar*.  
- For a new session `[A, D]`, the recommender looks up items similar to A and D,  
  aggregates their similarity scores, and recommends the top items not yet viewed.

---

### Methodology Overview
1. **Build an Item-Item Co-occurrence Matrix**  
   - Count how many times each pair of items appears together in sessions.  

2. **Compute Item Similarity Scores**  
   - Use a normalized cosine-style measure to get similarity between items.  

3. **Get k Nearest Neighbors**  
   - For each item, store the top-k most similar items.  

4. **Generate Recommendations for Test Sessions**  
   - For a given session, aggregate similar items of the products already viewed,  
     and rank them by cumulative similarity.  

5. **Evaluate using Recall@K and MRR@K**  
   - Same metrics as in the Popularity Model, for direct comparison.

---

### Business Intuition
This model mimics the logic behind real-world product discovery:
> “People who viewed this item also viewed...”  
and provides **personalized suggestions** based on the user’s ongoing session —  
a crucial step towards intelligent, real-time e-commerce recommendations.

---

### Expected Outcome
We expect **Recall@K** and **MRR@K** to be higher than the Popularity Baseline,  
because the recommendations are now *contextualized to the user’s session*.

---

**Next:**  
We’ll implement the item co-occurrence logic and generate similarity-based recommendations.


In [12]:
# STEP 3.5.1 — Build Item Co-occurrence Matrix

from collections import defaultdict
import itertools

# Use only training sessions for building similarity
train_sids = set(split['train'])

cooccur = defaultdict(lambda: defaultdict(int))

for sid in train_sids:
    items = sessions.get(sid)
    if not items or len(items) < 2:
        continue
    # For each unique pair of items in a session, increment co-occurrence count
    for i, j in itertools.combinations(set(items), 2):
        cooccur[i][j] += 1
        cooccur[j][i] += 1  # symmetric matrix

# Count total number of co-occurrence relationships stored
num_pairs = sum(len(v) for v in cooccur.values())
print(f"Built co-occurrence matrix with {len(cooccur)} items and {num_pairs} item-pairs.")

# a few items' neighbors
sample_items = list(cooccur.keys())[:5]
for item in sample_items:
    print(f"\nItem {item} co-occurs with:")
    neighbors = list(cooccur[item].items())[:5]
    for nb, cnt in neighbors:
        print(f"  → Item {nb} ({cnt} times)")


Built co-occurrence matrix with 25970 items and 316726 item-pairs.

Item 3309 co-occurs with:
  → Item 8757 (3 times)
  → Item 3306 (2 times)
  → Item 3307 (1 times)
  → Item 3308 (2 times)
  → Item 3310 (1 times)

Item 8757 co-occurs with:
  → Item 3309 (3 times)
  → Item 5249 (1 times)
  → Item 8751 (1 times)
  → Item 8752 (1 times)
  → Item 8753 (1 times)

Item 27721 co-occurs with:
  → Item 27722 (1 times)
  → Item 27723 (1 times)
  → Item 27724 (1 times)
  → Item 27725 (1 times)
  → Item 26542 (1 times)

Item 27722 co-occurs with:
  → Item 27721 (1 times)
  → Item 27723 (1 times)
  → Item 27724 (1 times)
  → Item 27725 (1 times)
  → Item 26542 (1 times)

Item 27723 co-occurs with:
  → Item 27721 (1 times)
  → Item 27722 (1 times)
  → Item 27724 (1 times)
  → Item 27725 (1 times)
  → Item 26542 (1 times)


In [13]:
# STEP 3.5.2 — Compute normalized item similarity scores
from math import sqrt
import heapq

# 1. Compute frequency of each item (how many sessions contained it)
freq = defaultdict(int)
for sid in split['train']:
    items = set(sessions.get(sid, []))
    for i in items:
        freq[i] += 1

# 2. Convert co-occurrence counts to cosine-style similarity
sim_matrix = defaultdict(dict)
for i, neighbors in cooccur.items():
    for j, cij in neighbors.items():
        if freq[i] > 0 and freq[j] > 0:
            sim = cij / sqrt(freq[i] * freq[j])
            sim_matrix[i][j] = sim

print(f"Built similarity matrix for {len(sim_matrix)} items.")

# 3. Keep only top-N similar items per item (memory optimization)
TOPN = 50
for i, neighbors in sim_matrix.items():
    top_neighbors = heapq.nlargest(TOPN, neighbors.items(), key=lambda x: x[1])
    sim_matrix[i] = dict(top_neighbors)

# Peek at a few examples
for item in list(sim_matrix.keys())[:3]:
    print(f"\nItem {item} top similar items:")
    for nb, score in list(sim_matrix[item].items())[:5]:
        print(f"  → Item {nb} (sim={score:.3f})")


Built similarity matrix for 25970 items.

Item 3309 top similar items:
  → Item 8757 (sim=0.567)
  → Item 3310 (sim=0.500)
  → Item 3311 (sim=0.500)
  → Item 11042 (sim=0.500)
  → Item 11043 (sim=0.500)

Item 8757 top similar items:
  → Item 3309 (sim=0.567)
  → Item 8756 (sim=0.567)
  → Item 3306 (sim=0.463)
  → Item 8755 (sim=0.378)
  → Item 8758 (sim=0.378)

Item 27721 top similar items:
  → Item 27722 (sim=1.000)
  → Item 27723 (sim=1.000)
  → Item 27724 (sim=1.000)
  → Item 27725 (sim=1.000)
  → Item 27726 (sim=1.000)


In [14]:
# STEP 3.5.3 Generate item based kNN recommendations for each test session and evaluate

import os, json
import numpy as np
import pandas as pd
from tqdm import tqdm
from collections import defaultdict

# Settings
K = 20          # evaluation cutoff
AGG_TOPN = 100  # how many candidate neighbors to consider (per session aggregation)
RESULTS_DIR = os.path.join(PROJECT_ROOT, 'results')
os.makedirs(RESULTS_DIR, exist_ok=True)

# Prepare test examples - session_id, history, target
test_sids = set(split['test'])
test_examples = []
for sid in test_sids:
    seq = sessions.get(sid)
    if not seq or len(seq) < 2:
        continue
    history = seq[:-1]  # input history
    target = seq[-1]  # ground truth next item
    test_examples.append((sid, history, target))

print(f"Prepared {len(test_examples)} test examples for kNN evaluation ...")

def recall_at_k(preds, target, k):
    return 1.0 if target in preds[:k] else 0.0

def mrr_at_k(preds, target, k):
    for i, p in enumerate(preds[:k], start=1):
        if p == target:
            return 1.0 / i
    return 0.0

# Recommendation function using sim_matrix
def recommend_for_history(history, sim_matrix, top_k=K, agg_topn=AGG_TOPN):
    score = defaultdict(float)
    seen = set(history)
    # Aggregate neighbors for each item in history
    for item in history:
        neighbors = sim_matrix.get(item, {})
        # accumulate neighbor similarity scores
        for nb, sim in neighbors.items():
            if nb in seen:
                continue  # skip already-seen items (we don't wanna recommend already seen items)
            score[nb] += sim
    # If no candidates (cold case), fallback to popularity (popular_items must exist)
    if not score:
        # popular_items is list from earlier popularity baseline
        return popular_items[:top_k]
    # Sort candidates by aggregated score
    # limit to agg_topn candidates for speed
    candidates = sorted(score.items(), key=lambda x: x[1], reverse=True)[:agg_topn]
    preds = [item for item, _ in candidates][:top_k]
    return preds

# Evaluate on test set
recalls = []
mrrs = []
pred_rows = []
for sid, history, target in tqdm(test_examples, desc="Evaluating k-NN"):
    preds = recommend_for_history(history, sim_matrix, top_k=K)
    recalls.append(recall_at_k(preds, target, K))
    mrrs.append(mrr_at_k(preds, target, K))
    # save example row (first 20)
    if len(pred_rows) < 100:
        pred_rows.append({
            "session_id": sid,
            "history": history,
            "ground_truth": target,
            "top20_predictions": preds
        })

# Compute metrics
recall_k = float(np.mean(recalls)) if recalls else 0.0
mrr_k = float(np.mean(mrrs)) if mrrs else 0.0

knn_metrics = {
    "model": "item_knn",
    "K": K,
    "recall@K": round(recall_k, 4),
    "mrr@K": round(mrr_k, 4),
    "num_test_sessions": len(recalls)
}

# Save metrics and sample predictions
metrics_fp = os.path.join(RESULTS_DIR, 'knn_metrics.json')
with open(metrics_fp, 'w') as f:
    json.dump(knn_metrics, f, indent=2)

preds_fp = os.path.join(RESULTS_DIR, 'knn_predictions_sample.csv')
pd.DataFrame(pred_rows).to_csv(preds_fp, index=False)

print("\Item-based k-NN Results")
for k, v in knn_metrics.items():
    print(f"{k:20}: {v}")

print(f"\Metrics saved to: {metrics_fp}")
print(f"Sample predictions saved to: {preds_fp}")

  print("\Item-based k-NN Results")
  print(f"\Metrics saved to: {metrics_fp}")


Prepared 1640 test examples for kNN evaluation ...


Evaluating k-NN: 100%|██████████| 1640/1640 [00:00<00:00, 29372.25it/s]

\Item-based k-NN Results
model               : item_knn
K                   : 20
recall@K            : 0.1799
mrr@K               : 0.0484
num_test_sessions   : 1640
\Metrics saved to: /content/drive/MyDrive/Data Science course/Major Projects/Projects/e-commerce recommender/diginetica_recommender_project/results/knn_metrics.json
Sample predictions saved to: /content/drive/MyDrive/Data Science course/Major Projects/Projects/e-commerce recommender/diginetica_recommender_project/results/knn_predictions_sample.csv





In [15]:
# Display the kNN metrics
print("\nItem-based k-NN Results")
for k, v in knn_metrics.items():
    print(f"{k:20}: {v}")


Item-based k-NN Results
model               : item_knn
K                   : 20
recall@K            : 0.1799
mrr@K               : 0.0484
num_test_sessions   : 1640


## Summary — Baseline Models (Popularity vs. Item-kNN)

| Model | Recall@20 | MRR@20 |
|--------|------------|--------|
| Popularity | 0.0116 | 0.0028 |
| Item-kNN | **0.1799** | **0.0484** |

### Observations
- The **Item-kNN Recommender** shows a massive improvement in both Recall and MRR.
- This confirms that **session context** (recently viewed items) is critical for personalization.
- Even without explicit machine learning, the co-occurrence + similarity logic captures strong behavioral patterns.

### Key Learnings
1. Popularity models serve as a fast, robust fallback.
2. Item-based kNN introduces data-driven personalization with minimal complexity.
3. These baselines form a solid foundation for deep learning models like **GRU4Rec** and **SASRec**.

### Outputs
| File | Description |
|------|--------------|
| `baseline_metrics.json` | Popularity model metrics |
| `knn_metrics.json` | Item-kNN model metrics |
| `baseline_predictions.csv` | Popularity sample predictions |
| `knn_predictions_sample.csv` | Item-kNN sample predictions |

**Next:** Move to **Notebook 4 — Advanced Sequential Models**  
to train and evaluate a **GRU4Rec** model that learns dynamic, temporal user behavior directly from session sequences.
