# Personalized recommendation

# Project Outline: User–User Collaborative Filtering with Purchases (Implicit Feedback)
 **item–item similarity**

item_similarity matrix is built from the user–item interaction matrix.
Each column is an item vector showing which users bought it (and how strongly, e.g. with log1p weights).
To compute similarity between two items, you compare those columns.
👉 So two items will have high cosine similarity if:
Many of the same users bought both, and
Their purchase patterns (weights) are similar.

## 1) Goal

Recommend items to users, using only **purchase (0/1) histories**.


### 🔹 User similarity (user–user CF)

* Two users are considered **similar** if they **interacted with the same items in similar ways**.
* You compare **rows** of the user–item matrix.
* Example: if Alice and Bob both bought mostly the same products, their vectors look alike → high similarity.

---

### 🔹 Item similarity (item–item CF)

* Two items are considered **similar** if they are **consumed by the same users**.
* You compare **columns** of the user–item matrix.
* Example: if product A and product B are often bought by the same customers, their vectors look alike → high similarity.

---

### 🔹 How is it calculated?

* The most common way is **cosine similarity**:

$$
\text{sim}(i,j) = \frac{ \sum_{u} r_{u,i} \, r_{u,j} }{ \sqrt{\sum_{u} r_{u,i}^2} \cdot \sqrt{\sum_{u} r_{u,j}^2} }
$$

* Here, $r_{u,i}$ is the interaction weight of user $u$ with item $i$.
* Intuition: similarity is **high** if the *same users* interacted with both items, especially with similar intensity.

---

### 🔹 Concept in words

* **User similarity**: “Find other people who behave like me, then recommend what they bought.”
* **Item similarity**: “Find products that behave alike across customers, then recommend them to me based on what I bought.”




In [2]:
import pandas as pd

cols = ['shopUserId', 'quantity', 'groupId']
tx = pd.read_csv('../data/processed//transactions_clean.csv', usecols=cols + ['status'], low_memory=False)
tx = tx[tx['status'] == 'active'].copy()
tx = tx[cols]  # Drop the 'status' column after filtering
tx[['quantity']] = tx[['quantity']].astype(int)
tx


Unnamed: 0,shopUserId,quantity,groupId
0,812427,1,261873
1,831360,4,261745
2,209204,1,265298
4,831340,1,260596
5,831340,1,260596
...,...,...,...
250024,78202,1,221416
250026,78181,1,265843
250038,78145,1,261518
250039,78136,1,542087


In [3]:
# Aggregate in case same user bought the same product multiple times
# now quantity = total number of units this user has ever bought of this product
user_item = tx.groupby(["shopUserId", "groupId"], as_index=False)["quantity"].sum() 
user_item

Unnamed: 0,shopUserId,groupId,quantity
0,78135,291294,1
1,78136,542087,1
2,78145,261518,1
3,78162,291278,1
4,78162,404269,1
...,...,...,...
119915,831187,210765,2
119916,831202,250124,1
119917,831331,270610,1
119918,831340,260596,2


# user has stronger signal for items they bought more often

In [4]:
import numpy as np
user_item["interaction"] = np.log1p(user_item["quantity"])
user_item = user_item.drop(columns=["quantity"])

In [5]:
user_item

Unnamed: 0,shopUserId,groupId,interaction
0,78135,291294,0.693147
1,78136,542087,0.693147
2,78145,261518,0.693147
3,78162,291278,0.693147
4,78162,404269,0.693147
...,...,...,...
119915,831187,210765,1.098612
119916,831202,250124,0.693147
119917,831331,270610,0.693147
119918,831340,260596,1.098612


In [6]:
from sklearn.preprocessing import LabelEncoder

user_enc = LabelEncoder()
item_enc = LabelEncoder()

user_item["user_idx"] = user_enc.fit_transform(user_item["shopUserId"])
user_item["item_idx"] = item_enc.fit_transform(user_item["groupId"])

user_item

Unnamed: 0,shopUserId,groupId,interaction,user_idx,item_idx
0,78135,291294,0.693147,0,756
1,78136,542087,0.693147,1,1053
2,78145,261518,0.693147,2,314
3,78162,291278,0.693147,3,755
4,78162,404269,0.693147,3,895
...,...,...,...,...,...
119915,831187,210765,1.098612,57989,36
119916,831202,250124,0.693147,57990,91
119917,831331,270610,0.693147,57991,620
119918,831340,260596,1.098612,57992,186


In [7]:
user_item.sort_values(by="interaction", ascending=False)

Unnamed: 0,shopUserId,groupId,interaction,user_idx,item_idx
4089,126151,261637,3.465736,1475,348
47031,395080,503402,3.433987,20405,974
19098,281155,266072,3.367296,7592,503
67902,528903,218982,3.332205,30577,59
6519,174425,261902,3.258097,2439,428
...,...,...,...,...,...
46029,391124,292813,0.693147,19940,789
46028,391109,290104,0.693147,19939,667
46027,391105,260223,0.693147,19938,136
46026,391105,240166,0.693147,19938,71


In [8]:
from scipy.sparse import coo_matrix

sparse_matrix = coo_matrix(
    (user_item["interaction"], (user_item["user_idx"], user_item["item_idx"]))
)

print(sparse_matrix.shape)  # (n_users, n_items)

(57994, 1123)


In [9]:
from sklearn.metrics.pairwise import cosine_similarity

# compute item-item cosine similarity
item_similarity = cosine_similarity(sparse_matrix.T)  # transpose = items as rows

print(item_similarity.shape)  # (n_items, n_items)

(1123, 1123)


In [10]:
# convert once, after building the sparse matrix
sparse_matrix = sparse_matrix.tocsr()

def recommend_for_user(user_id, top_n=5):
    uidx = user_enc.transform([user_id])[0] #Pick a user
    user_row = sparse_matrix[uidx].toarray().ravel()  # extracts that user’s interaction vector
    
    # weighted sum of similarities
    scores = user_row @ item_similarity  # multiply this vector with the item–item similarity matrix
    
    # mask already bought items
    scores[user_row > 0] = -np.inf  #set their scores to -∞
    
    # top N indices
    top_items_idx = np.argsort(scores)[-top_n:][::-1] #sort scores and return the highest-ranked unseen items.
    return item_enc.inverse_transform(top_items_idx) #maps back to the real groupId values

In [11]:
print(recommend_for_user(395080, top_n=5))

['503380' '445897' '440419' '530335' '350225']


In [12]:
def similar_items(product_id, top_n=5):
    # Ensure product_id is passed as a string
    pidx = item_enc.transform([str(product_id)])[0]
    sims = item_similarity[pidx]
    top_idx = np.argsort(sims)[-top_n-1:][::-1]  # +1 to skip itself
    # Return as strings
    return item_enc.inverse_transform(top_idx[1:]).astype(str)

print(similar_items("503380", top_n=5))


['503402' '503373' '503407' '503397' '507707']


HitRate@K / Recall@K: did the next purchase appear in top-K?

In [13]:
import pandas as pd

cols = ['shopUserId', 'quantity', 'groupId', 'created']
tx = pd.read_csv('../data/processed//transactions_clean.csv', usecols=cols + ['status'], low_memory=False)
tx = tx[tx['status'] == 'active'].copy()
tx = tx[cols]  # Drop the 'status' column after filtering
tx[['quantity']] = tx[['quantity']].astype(int)
tx

Unnamed: 0,shopUserId,quantity,groupId,created
0,812427,1,261873,2025-08-05 20:14:28
1,831360,4,261745,2025-08-05 19:55:36
2,209204,1,265298,2025-08-05 19:47:22
4,831340,1,260596,2025-08-05 19:46:09
5,831340,1,260596,2025-08-05 19:46:09
...,...,...,...,...
250024,78202,1,221416,2024-05-22 14:18:16
250026,78181,1,265843,2024-05-22 13:42:39
250038,78145,1,261518,2024-05-22 12:54:51
250039,78136,1,542087,2024-05-22 12:44:01



1. **Time split:** for each user, keep their past purchases as **history** and hold out their **next purchase(s)**.
2. **Ask the model:** “Given this user’s history, **recommend Top-K items**.”

### Metrics

* **HitRate\@K / Recall\@K:** did any of the user’s held-out purchases show up in the Top-K?

  * One held-out item → HitRate\@K = Recall\@K (hit or miss).
  * Many held-out items → Recall\@K = (# test items covered) / (total test items).
* **Precision\@K:** of the Top-K shown, how many were actually bought in the test window?
* **NDCG\@K / MRR:** reward putting the true next purchase **higher** in the list.
* **Coverage/Diversity:** how much of the catalog ever gets recommended (not just best-sellers).


For each basket item, take its nearest neighbors from your precomputed item–item graph, sum/average the scores, remove items already in the basket, and show the top-K.
If the user has any past purchases once they log in, add those as extra seeds (optionally give higher weight to basket items for recency/context).


In [67]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from scipy.sparse import coo_matrix, csr_matrix
from sklearn.neighbors import NearestNeighbors

# ================== Config ==================
K_RECS, K_NEIGH = 10, 100
ALPHA = 100
MIN_TRAIN_EVENTS = 1
MIN_ITEM_SUPPORT = 4
MIN_COUSER_OVERLAP = 1
PMI_GAMMA = 0.5  # gamma for lift/PMI reweighting

# ========== Prep data + temporal split ==========
tx = tx.copy()
tx["created"] = pd.to_datetime(tx["created"], errors="coerce")
tx = tx.dropna(subset=["created"]).sort_values("created")
event_key = "orderId" if "orderId" in tx.columns else "created"

last_idx = tx.groupby("shopUserId")["created"].idxmax()
test_df  = tx.loc[last_idx, ["shopUserId","groupId","created"]]
train_df = tx.drop(last_idx)

# ====== Exclude cold users (by distinct events) ======
user_event_counts = train_df.groupby("shopUserId")[event_key].nunique()
warm_users = user_event_counts[user_event_counts >= MIN_TRAIN_EVENTS].index
train_df = train_df[train_df["shopUserId"].isin(warm_users)]
test_df  = test_df [test_df ["shopUserId"].isin(warm_users)]

# ====== Keep only items with sufficient support ======
item_support = train_df.groupby("groupId")["shopUserId"].nunique()
supported_items = item_support[item_support >= MIN_ITEM_SUPPORT].index
train_df = train_df[train_df["groupId"].isin(supported_items)]
test_df  = test_df [test_df ["groupId"].isin(supported_items)]

# ====== Build TRAIN interactions (log1p qty) ======
agg = train_df.groupby(["shopUserId","groupId"], as_index=False)["quantity"].sum()
agg["interaction"] = np.log1p(agg["quantity"]).astype(np.float32)

# ====== Encode ======
user_enc = LabelEncoder().fit(agg["shopUserId"])
item_enc = LabelEncoder().fit(agg["groupId"])
agg["u"] = user_enc.transform(agg["shopUserId"])
agg["i"] = item_enc.transform(agg["groupId"])
n_users, n_items = agg["u"].max() + 1, agg["i"].max() + 1

# ====== User–item matrix (raw) ======
X_raw: csr_matrix = coo_matrix(
    (agg["interaction"], (agg["u"], agg["i"])),
    shape=(n_users, n_items)
).tocsr()

# ====== BM25 weighting ======
def bm25_weight(X, K1=1.2, B=0.75):
    """BM25 weighting for user-item matrix X (users x items, CSR)"""
    # Document length (user profile length)
    X = X.tocsc()
    N = float(X.shape[0])
    df = np.diff(X.indptr)  # document frequency for each item
    idf = np.log((N - df + 0.5) / (df + 0.5))
    idf = np.maximum(idf, 0)  # clip negative idf to zero

    X = X.tocsr()
    avgdl = X.sum(axis=1).mean()
    rows, cols = X.nonzero()
    data = X.data.copy()
    dl = np.array(X.sum(axis=1)).ravel()
    # BM25 formula
    for idx in range(len(data)):
        u = rows[idx]
        i = cols[idx]
        tf = data[idx]
        norm = (1 - B) + B * (dl[u] / avgdl)
        data[idx] = idf[i] * (tf * (K1 + 1)) / (tf + K1 * norm)
    X_bm25 = csr_matrix((data, (rows, cols)), shape=X.shape)
    return X_bm25

X_bm25: csr_matrix = bm25_weight(X_raw, K1=1.2, B=0.75)

# Use BM25 matrix to compute neighbors
X_for_knn = X_bm25

# ====== Restrict to supported items for neighbors/outcomes ======
supported_item_ids = set(item_enc.transform(supported_items))
supported_item_mask = np.array([i in supported_item_ids for i in range(n_items)])
supported_item_indices = np.where(supported_item_mask)[0]
X_T_supported = X_for_knn.T[supported_item_indices]

# ====== Item–item KNN (cosine) ======
nn = NearestNeighbors(metric="cosine", algorithm="brute", n_neighbors=K_NEIGH+1)
nn.fit(X_T_supported)
dist, idx = nn.kneighbors(X_T_supported, return_distance=True)
neighbors_idx_supported = supported_item_indices[idx[:, 1:]]
neighbors_sim_supported = (1.0 - dist[:, 1:]).astype(np.float32)

# Place neighbors back into full matrices; mark others invalid
neighbors_idx = np.full((n_items, K_NEIGH), -1, dtype=int)
neighbors_sim = np.zeros((n_items, K_NEIGH), dtype=np.float32)
for pos, i in enumerate(supported_item_indices):
    neighbors_idx[i] = neighbors_idx_supported[pos]
    neighbors_sim[i] = neighbors_sim_supported[pos]

# ====== Shrinkage + co-user overlap filter (computed on BINARIZED raw matrix) ======
X_bin = X_raw.copy(); X_bin.data[:] = 1.0
X_csc = X_bin.tocsc()

# Precompute item degrees (number of unique users per item)
item_degrees = np.array(X_bin.sum(axis=0)).ravel()  # shape: (n_items,)

for i in supported_item_indices:
    js = neighbors_idx[i]
    valid = js >= 0
    if not np.any(valid): 
        continue
    js_valid = js[valid]
    overlaps = (X_csc[:, i].T @ X_csc[:, js_valid]).toarray().ravel()  # #shared buyers
    deg_i = item_degrees[i]
    deg_js = item_degrees[js_valid]
    expected_overlap = (deg_i * deg_js) / n_users
    # Avoid division by zero
    expected_overlap = np.maximum(expected_overlap, 1e-8)
    # Compute lift and PMI
    lift = overlaps / expected_overlap
    pmi = np.log((overlaps * n_users) / (deg_i * deg_js + 1e-8) + 1e-8)
    # Use lift^gamma as reweighting factor (as per prompt)
    lift_factor = np.power(lift, PMI_GAMMA)
    # shrink
    sims = neighbors_sim[i, valid] * (overlaps / (overlaps + ALPHA)) * lift_factor
    # enforce min overlap
    keep = overlaps >= MIN_COUSER_OVERLAP
    sims[~keep] = 0.0
    js_valid[~keep] = -1
    # write back
    neighbors_sim[i, valid] = sims
    neighbors_idx[i, valid] = js_valid

# ====== Recommender (allow repeats) ======
def recommend_for_user(uidx, top_n=K_RECS):
    seen_idx, seen_w = X_for_knn[uidx].indices, X_for_knn[uidx].data  # use BM25-weighted user row
    if seen_idx.size == 0:
        return np.array([], dtype=int)
    scores = np.zeros(n_items, dtype=np.float32)
    for i, w in zip(seen_idx, seen_w):
        if i not in supported_item_ids:
            continue
        js = neighbors_idx[i]
        sim = neighbors_sim[i]
        valid = js >= 0
        if np.any(valid):
            scores[js[valid]] += w * sim[valid]
    scores[~supported_item_mask] = -np.inf
    top = np.argpartition(scores, -top_n)[-top_n:]
    return top[np.argsort(scores[top])[::-1]]
def recommend_with_explanations(uidx, top_n=K_RECS, max_reasons=3):
    """
    Return (top_indices, explanations) where explanations is a list of dicts:
      [
        {
          'rec_i': <int item index>,
          'rec_groupId': <original item id>,
          'score': <float total score>,
          'reasons': [
             {'seen_i': <int>, 'seen_groupId': <orig>, 'user_weight': w_i,
              'similarity': sim_ij, 'contribution': w_i*sim_ij},
             ...
          ]
        },
        ...
      ]
    """
    seen_idx = X_for_knn[uidx].indices
    seen_w   = X_for_knn[uidx].data
    if seen_idx.size == 0:
        return np.array([], dtype=int), []

    scores = np.zeros(n_items, dtype=np.float32)
    contribs = {}  # j -> list of (i, w_i, sim_ij, contrib)

    for i, w in zip(seen_idx, seen_w):
        if i not in supported_item_ids:
            continue
        js = neighbors_idx[i]
        sim = neighbors_sim[i]
        valid = js >= 0
        if not np.any(valid):
            continue
        js  = js[valid]
        sim = sim[valid]
        c   = w * sim  # contribution from seen item i to each candidate j

        scores[js] += c
        for j, s_ij, c_ij in zip(js, sim, c):
            if j not in contribs:
                contribs[j] = []
            contribs[j].append((int(i), float(w), float(s_ij), float(c_ij)))

    # mask unsupported
    scores[~supported_item_mask] = -np.inf

    # top-N
    if np.all(np.isneginf(scores)):
        return np.array([], dtype=int), []
    top = np.argpartition(scores, -top_n)[-top_n:]
    top = top[np.argsort(scores[top])[::-1]]

    # build explanations for the top-N
    explanations = []
    for j in top:
        reasons_raw = contribs.get(int(j), [])
        reasons_raw.sort(key=lambda t: t[3], reverse=True)  # by contribution
        reasons_raw = reasons_raw[:max_reasons]
        explanations.append({
            "rec_i": int(j),
            "rec_groupId": item_enc.inverse_transform([j])[0],
            "score": float(scores[j]),
            "reasons": [{
                "seen_i": ii,
                "seen_groupId": item_enc.inverse_transform([ii])[0],
                "user_weight": w_i,
                "similarity": sim_ij,
                "contribution": contrib
            } for (ii, w_i, sim_ij, contrib) in reasons_raw]
        })
    return top, explanations


# ====== Prepare TEST (encoded) ======
mask = test_df["shopUserId"].isin(user_enc.classes_) & test_df["groupId"].isin(item_enc.classes_)
test_mapped = test_df.loc[mask].copy()
test_mapped["u"] = user_enc.transform(test_mapped["shopUserId"])
test_mapped["i"] = item_enc.transform(test_mapped["groupId"])

# ====== Evaluate HitRate@K ======
def hitrate_at_k():
    hits = sum(r["i"] in recommend_for_user(r["u"], top_n=K_RECS) for _, r in test_mapped.iterrows())
    total = len(test_mapped)
    return hits / max(total, 1), total

hr, n = hitrate_at_k()
print(f"HitRate@{K_RECS} = {hr:.3f}  (users: {n})   [BM25 weighting + lift/PMI reweighting]")

HitRate@10 = 0.217  (users: 34157)   [BM25 weighting + lift/PMI reweighting]


In [68]:
# Neighbor graph density diagnostics
deg = (neighbors_idx >= 0).sum(axis=1)               # neighbors per item after filters
num_items = len(deg)
print(f"Items with 0 neighbors: {(deg==0).sum()} / {num_items} "
      f"({(deg==0).mean():.1%})")
print(f"Median neighbors/item: {np.median(deg)}; 10th pct: {np.percentile(deg,10)}; 90th pct: {np.percentile(deg,90)}")

# How many user requests end up with <K candidates before backfill?
def candidate_count_for_user(uidx):
    seen = X_for_knn[uidx].indices
    cands = set()
    for i in seen:
        if i in supported_item_ids:
            cands.update(neighbors_idx[i][neighbors_idx[i] >= 0].tolist())
    return len(cands)

sample_users = np.random.choice(test_mapped["u"].unique(), size=min(1000, test_mapped["u"].nunique()), replace=False)
cand_counts = [candidate_count_for_user(u) for u in sample_users]
print(f"Users with <{K_RECS} candidate items: {np.mean(np.array(cand_counts) < K_RECS):.1%}")


Items with 0 neighbors: 0 / 733 (0.0%)
Median neighbors/item: 48.0; 10th pct: 10.0; 90th pct: 100.0
Users with <10 candidate items: 0.4%


In [69]:
# Output the top 10 most similar item pairs (after shrinkage) without displaying interactions

import numpy as np

top_n_pairs = 20
pair_sims = []
seen_pairs = set()

for i in range(n_items):
    if not supported_item_mask[i]:
        continue
    js = neighbors_idx[i]
    sims = neighbors_sim[i]
    for j, s in zip(js, sims):
        if j < 0 or not np.isfinite(s):
            continue
        a, b = (i, j) if i < j else (j, i)
        if a == b or (a, b) in seen_pairs:
            continue
        seen_pairs.add((a, b))
        pair_sims.append((float(s), a, b))

if not pair_sims:
    print("No valid similar pairs found.")
else:
    # Sort by similarity descending and take top N
    top_pairs = sorted(pair_sims, key=lambda x: -x[0])[:top_n_pairs]
    print(f"Top {top_n_pairs} most similar item pairs (after shrinkage):\n")
    for idx, (sim, i1, i2) in enumerate(top_pairs, 1):
        g1, g2 = item_enc.inverse_transform([i1, i2])
        overlap = (X_csc[:, i1].T @ X_csc[:, i2]).toarray()[0, 0]
        print(f"{idx}. {g1} (i={i1}) ↔ {g2} (j={i2}) | Similarity: {sim:.4f} | Co-user overlap: {int(overlap)} buyers")


Top 20 most similar item pairs (after shrinkage):

1. 270307 (i=413) ↔ 270308 (j=414) | Similarity: 1.1675 | Co-user overlap: 28 buyers
2. 260097 (i=83) ↔ 260098 (j=84) | Similarity: 1.0988 | Co-user overlap: 5 buyers
3. 261747 (i=273) ↔ 261749 (j=274) | Similarity: 0.5789 | Co-user overlap: 7 buyers
4. 270561 (i=425) ↔ 270569 (j=426) | Similarity: 0.5750 | Co-user overlap: 3 buyers
5. 270301 (i=408) ↔ 270302 (j=409) | Similarity: 0.4404 | Co-user overlap: 5 buyers
6. 261721 (i=266) ↔ 261756 (j=275) | Similarity: 0.4196 | Co-user overlap: 2 buyers
7. 290000 (i=465) ↔ 290012 (j=470) | Similarity: 0.3993 | Co-user overlap: 7 buyers
8. 409090 (i=623) ↔ 430040 (j=629) | Similarity: 0.3485 | Co-user overlap: 2 buyers
9. 210777 (i=30) ↔ 210784 (j=32) | Similarity: 0.3382 | Co-user overlap: 2 buyers
10. 261405 (i=207) ↔ 261426 (j=209) | Similarity: 0.3342 | Co-user overlap: 31 buyers
11. 261148 (i=180) ↔ 571201 (j=723) | Similarity: 0.3262 | Co-user overlap: 4 buyers
12. 270304 (i=411) ↔ 2703

In [70]:
import pandas as pd

# Load the catalog of items with explicit dtype to avoid DtypeWarning
catalog_df = pd.read_csv(
    "../data/processed/articles_clean.csv",
    dtype={"groupId": str, "article_id": str},  # add other columns as needed
    low_memory=False
)

# Only consider items with status 'active'
active_catalog_df = catalog_df[catalog_df["status"] == "active"]

# Get all recommended item indices for all users in the train set
all_recommended = set()
for uidx in range(n_users):
    recs = recommend_for_user(uidx, top_n=K_RECS)
    all_recommended.update(recs)

# Map recommended indices back to groupId (as string for consistency)
recommended_groupIds = set(map(str, item_enc.inverse_transform(list(all_recommended))))

# How many unique *active* items from the catalog were covered by recommendations?
active_catalog_groupIds = set(map(str, active_catalog_df["groupId"].unique()))
covered_items = recommended_groupIds & active_catalog_groupIds

print(f"Unique active catalog items covered by recommendations: {len(covered_items)} out of {len(active_catalog_groupIds)}")


Unique active catalog items covered by recommendations: 730 out of 1503


In [71]:
# ---------- Sample a few users and print explanations ----------
import random

# Use Python's random.sample to ensure random users are shown each run
sample_size = min(5, n_users)
sample_user_indices = random.sample(range(n_users), sample_size)

for uidx in sample_user_indices:
    user_id = user_enc.inverse_transform([uidx])[0]
    seen_items_idx = X_for_knn[uidx].indices
    seen_items = item_enc.inverse_transform(seen_items_idx)

    rec_indices, expl = recommend_with_explanations(uidx, top_n=K_RECS, max_reasons=3)
    rec_groupIds = item_enc.inverse_transform(rec_indices)

    print(f"\nUser {user_id} (idx {uidx})")
    print(f"Seen items ({len(seen_items)}): {list(seen_items)}")
    print(f"Recommendations: {list(rec_groupIds)}")

    for e in expl:
        rec_gid = e["rec_groupId"]
        score   = e["score"]
        reasons_txt = ", ".join(
            f"{r['seen_groupId']} [w={r['user_weight']:.2f}, sim={r['similarity']:.3f}, contrib={r['contribution']:.3f}]"
            for r in e["reasons"]
        )
        because = f"because you bought: {reasons_txt}" if reasons_txt else "no contributing items found"
        print(f"  • {rec_gid} (score {score:.3f}) — {because}")



User 374387 (idx 11322)
Seen items (2): ['260232', '260620']
Recommendations: ['261595', '260951', '260182', '265041', '261040', '261924', '260646', '261637', '263988', '260949']
  • 261595 (score 0.101) — because you bought: 260232 [w=4.26, sim=0.022, contrib=0.095], 260620 [w=4.01, sim=0.001, contrib=0.006]
  • 260951 (score 0.066) — because you bought: 260620 [w=4.01, sim=0.017, contrib=0.066]
  • 260182 (score 0.030) — because you bought: 260232 [w=4.26, sim=0.007, contrib=0.029], 260620 [w=4.01, sim=0.000, contrib=0.001]
  • 265041 (score 0.028) — because you bought: 260232 [w=4.26, sim=0.007, contrib=0.028]
  • 261040 (score 0.028) — because you bought: 260232 [w=4.26, sim=0.006, contrib=0.027], 260620 [w=4.01, sim=0.000, contrib=0.000]
  • 261924 (score 0.022) — because you bought: 260232 [w=4.26, sim=0.005, contrib=0.021], 260620 [w=4.01, sim=0.000, contrib=0.001]
  • 260646 (score 0.018) — because you bought: 260620 [w=4.01, sim=0.002, contrib=0.010], 260232 [w=4.26, sim=0.00