# Mitigated Popularity Bias Evaluation

This notebook implements and evaluates a popularity bias mitigation strategy (multiplicative damping post-processing) based on the framework established in the original `music.ipynb` and `movie.ipynb`.

**Mitigation Strategy:**
- Type: Post-processing re-ranking
- Formula: `new_score = original_score / (item_popularity ** alpha)`
- `item_popularity`: Normalized popularity (count / num_users)
- `alpha`: Mitigation strength hyperparameter (defined below)

**Evaluation:**
- Domains: Music (LastFM scaled), Movie (MovieLens1M)
- Algorithms: MostPop, UserKNN, ItemKNN, PMF, NMF, HPF
- Evaluation Strategies: UserTest (`eva_two`), TrainItems (`eva_three`)
- User Grouping: PopularPercentage (`pop_one`), AveragePopularity (`pop_two`), NicheConsumptionRate (`pop_four`)
- Metrics: %ΔGAP, NDCG@10, T-tests
- Output: CSV files in `mitigated_results/` directory, formatted for comparison with original results.

In [1]:
# =============================================================================
# SECTION A: IMPORTS & SETUP
# =============================================================================
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.simplefilter(action='ignore', category=UserWarning) # Ignore Cornac's UserWarnings about unknown users/items

import matplotlib.pyplot as plt
import random as rd
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm
import pickle as pkl
import time
import os
from collections import defaultdict
from scipy import stats
from sklearn.metrics import mean_squared_error, precision_score, recall_score, ndcg_score
from numpy.linalg import norm
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler

# Cornac imports
import cornac
from cornac.eval_methods import RatioSplit
from cornac.data import Reader as CornacReader
from cornac.models import MostPop, MF, PMF, WMF, HPF, ItemKNN, UserKNN
from cornac.models import NMF as CornacNMF
from cornac.metrics import MAE, MSE, RMSE, Precision, Recall, NDCG, AUC, MAP, FMeasure, MRR

# set plot style: grey grid in the background:
sns.set(style="darkgrid")
pd.set_option("display.precision", 8)
print("Libraries imported.")

Libraries imported.


In [2]:
# =============================================================================
# SECTION B: HYPERPARAMETERS & CONFIGURATION
# =============================================================================

# --- Core Settings ---
rating_threshold = 1.0 # For Cornac ranking metrics binarization
my_seed = 0
test_size = 0.2
predict_col = "rating"
user_col = "user"
# item_col will be set per domain
top_fraction = 0.2 # For defining "popular" items and user group splits
rec_k = 10

# --- Mitigation Hyperparameter ---
mitigation_alpha = 0.5 # Strength of the damping. Higher alpha = stronger mitigation.

# --- Output Directory ---
results_location = 'mitigated_results/'
os.makedirs(results_location, exist_ok=True)

# --- Seed ---
rd.seed(my_seed)
np.random.seed(my_seed)

# --- Algorithms to Evaluate ---
# Match the subset used in the original reproduction
algo_map = {
    "MostPop": MostPop(),
    "UserKNN": UserKNN(k=40, similarity='cosine', mean_centered=False, seed=my_seed, verbose=False), # Verbose set to False for cleaner loops
    "ItemKNN": ItemKNN(k=40, similarity='cosine', mean_centered=False, seed=my_seed, verbose=False),
    "PMF": PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=my_seed, verbose=False),
    "NMF": CornacNMF(k=15, max_iter=50, learning_rate=0.005, lambda_u=0.06, lambda_v=0.06, lambda_bu=0.02, lambda_bi=0.02, use_bias=False, seed=my_seed, verbose=False),
    "HPF": HPF(k=50, seed=my_seed, hierarchical=False, name="PF", verbose=False) # Note: MF was in original notebooks but not HPF/PF? Assuming HPF based on prompt. Revert if needed.
}
algo_names = list(algo_map.keys())
models_list = list(algo_map.values())


# --- Evaluation Configurations ---
evaluation_strategies = ['eva_two', 'eva_three'] # Corresponds to UserTest and TrainItems
popularity_notions = ['pop_one', 'pop_two', 'pop_four'] # PopularPercentage, AveragePopularity, NicheConsumptionRate

print(f"Mitigation Alpha: {mitigation_alpha}")
print(f"Algorithms: {algo_names}")
print(f"Evaluation Strategies: {evaluation_strategies}")
print(f"Popularity Notions: {popularity_notions}")
print(f"Results will be saved to: {results_location}")

Mitigation Alpha: 0.5
Algorithms: ['MostPop', 'UserKNN', 'ItemKNN', 'PMF', 'NMF', 'HPF']
Evaluation Strategies: ['eva_two', 'eva_three']
Popularity Notions: ['pop_one', 'pop_two', 'pop_four']
Results will be saved to: mitigated_results/


# =============================================================================
# SECTION C: HELPER FUNCTIONS (Analysis, Grouping, Metrics)
# =============================================================================
# Reusing functions from the original notebooks

In [3]:
# --- Data Analysis Functions ---
def users_and_items(df_events, user_col, item_col):
    print(f"  No. user events: {len(df_events)}")
    print(f"  No. unique {item_col}s: {len(df_events[item_col].unique())}")
    print(f"  No. unique {user_col}s: {len(df_events[user_col].unique())}")
    print("-" * 20)
    return len(df_events[user_col].unique()), len(df_events[item_col].unique())

def user_distribution(df_events, user_col, item_col):
    user_dist = df_events[user_col].value_counts()
    print(f"  Mean {item_col}s per user: {np.round(user_dist.mean(), 1)}")
    print(f"  Min {item_col}s per user: {np.round(user_dist.min(), 1)}")
    print(f"  Max {item_col}s per user: {np.round(user_dist.max(), 1)}")
    print("-" * 20)
    return user_dist

def item_distribution(df_events, user_col, item_col):
    item_dist = df_events[item_col].value_counts()
    print(f"  Mean users per {item_col}: {np.round(item_dist.mean(), 1)}")
    print(f"  Min users per {item_col}: {np.round(item_dist.min(), 1)}")
    print(f"  Max users per {item_col}: {np.round(item_dist.max(), 1)}")
    print("-" * 20)
    return item_dist

# --- Item Popularity Calculation ---
def calculate_item_popularity(item_dist_series, num_users, top_fraction):
    df_item_dist = pd.DataFrame(item_dist_series)
    df_item_dist.columns = ['count']
    df_item_dist['popularity'] = df_item_dist['count'] / num_users # Normalized
    num_items = len(df_item_dist)
    num_top = int(top_fraction * num_items)
    top_item_ids = set(item_dist_series[:num_top].index)
    print(f"  Identified {len(top_item_ids)} popular items (top {top_fraction*100:.0f}%)")
    print("-" * 20)
    return df_item_dist, top_item_ids

# --- Base User Metric Calculation ---
# Function to calculate base popularity metrics for all users
def calculate_popularity_metrics(df_events, top_item_ids, df_item_dist, num_users, user_col, item_col):
    # item_dist_df should be the DataFrame with 'popularity' column
    user_metrics_data = [] # List to store results per user

    print("  Calculating base popularity metrics (pop_one, pop_two)...")
    grouped = df_events.groupby(user_col)
    for u, df in tqdm(grouped, total=len(grouped), desc="  Metrics (pop_one/two)"):
        user_id_int = int(u)
        user_items = set(df[item_col])
        no_user_items = len(user_items)

        if no_user_items == 0:
            pop_count = 0
            pop_fraq = np.nan
            pop_item_fraq = np.nan
        else:
            no_user_pop_items = len(user_items & top_item_ids)
            pop_count = no_user_pop_items
            pop_fraq = no_user_pop_items / no_user_items
            # Use the 'popularity' (normalized) column for GAP calculation
            # Need to handle items potentially not in df_item_dist if df_events has items removed later
            valid_items = df[item_col][df[item_col].isin(df_item_dist.index)]
            if not valid_items.empty:
                 pop_item_fraq = df_item_dist.loc[valid_items, 'popularity'].sum() / no_user_items
            else:
                 pop_item_fraq = np.nan # Assign NaN if no valid items found for user

        user_metrics_data.append({
            'user_id': user_id_int,
            'pop_count': pop_count,
            'user_hist': no_user_items,
            'pop_fraq': pop_fraq, # pop_one metric
            'pop_item_fraq': pop_item_fraq # pop_two metric
        })

    return pd.DataFrame(user_metrics_data).set_index('user_id')

# Function to calculate niche consumption rate
def calculate_niche_consumption_rates(df_events, df_item_dist, user_dist_counts, user_col, item_col, niche_threshold_percentile=0.3):
    # df_item_dist should have the 'count' column
    print(f"  Defining niche items (bottom {niche_threshold_percentile*100:.0f}% popularity)...")

    # --- Step 1: Identify Niche Items ---
    item_popularity_values = pd.to_numeric(df_item_dist['count'].values, errors='coerce')
    item_popularity_values = item_popularity_values[~np.isnan(item_popularity_values)]

    if len(item_popularity_values) == 0:
         raise ValueError("Item distribution contains no valid numeric popularity values.")

    popularity_threshold = np.percentile(item_popularity_values, niche_threshold_percentile * 100)
    niche_item_ids = df_item_dist[df_item_dist['count'] <= popularity_threshold].index
    niche_item_set = set(niche_item_ids)
    print(f"  Found {len(niche_item_set)} niche items (popularity count <= {popularity_threshold:.2f}).")

    # --- Step 2: Calculate Rate Per User ---
    niche_rates = {}
    user_groups = dict(iter(df_events.groupby(user_col)))

    print("  Calculating niche consumption rate per user (pop_four)...")
    for u in tqdm(user_dist_counts.index, desc="  Metrics (pop_four)"):
        if u not in user_groups:
            # print(f"Warning: User {u} found in user_dist_counts but not in df_events. Assigning rate NaN.")
            niche_rates[u] = np.nan
            continue

        df_user = user_groups[u]
        user_items_set = set(df_user[item_col])
        total_user_items = len(user_items_set)

        if total_user_items == 0:
            niche_rates[u] = np.nan
            continue

        niche_items_consumed = user_items_set.intersection(niche_item_set)
        niche_item_count = len(niche_items_consumed)
        niche_rate = niche_item_count / total_user_items
        niche_rates[u] = niche_rate

    print("  Finished calculating niche consumption rates.")
    print("-" * 20)
    return pd.Series(niche_rates, index=user_dist_counts.index, name='niche_consumption_rate')

# --- User Grouping Functions ---
def sort_users(user_metrics_df, by = "pop_fraq"):
    if by not in user_metrics_df.columns:
        raise ValueError(f"Sorting column '{by}' not found in user metrics DataFrame.")
    # Drop NaNs before sorting to avoid errors and ensure meaningful split
    user_metrics_to_sort = user_metrics_df.dropna(subset=[by])
    if user_metrics_to_sort.empty:
        print(f"Warning: No users left after dropping NaNs for sorting column '{by}'. Returning empty DataFrame.")
        return user_metrics_to_sort
    user_dist_sorted = user_metrics_to_sort.sort_values(by=[by])
    return user_dist_sorted

def split(user_dist_sorted, top_fraction):
    n = len(user_dist_sorted)
    if n < 3: # Cannot make 3 groups
        print(f"Warning: Cannot split {n} users into 3 distinct groups. Returning empty groups.")
        return pd.DataFrame(), pd.DataFrame(), pd.DataFrame()

    idx1 = int(top_fraction * n)
    idx2 = int((1 - top_fraction) * n)

    # Adjust indices to prevent overlap or zero-sized groups if possible
    if idx1 == 0: idx1 = 1 # Ensure low group has at least 1 if possible
    if idx2 >= n: idx2 = n - 1 # Ensure high group has at least 1 if possible
    if idx1 >= idx2: # If still overlapping, force a minimal middle group
       idx1 = max(0, idx2 - 1)


    low = user_dist_sorted.iloc[:idx1]
    med = user_dist_sorted.iloc[idx1:idx2]
    high = user_dist_sorted.iloc[idx2:]

    if low.empty or med.empty or high.empty:
        print(f"Warning: Generated empty user groups after split (Low: {len(low)}, Med: {len(med)}, High: {len(high)}). Check data and top_fraction.")

    return low, med, high

def calculate_group_characteristics(low, med, high, pop_notion_key):
    # Use the calculated 'user_hist' and 'pop_item_fraq' (GAP in profile)
    low_profile_size = low['user_hist'].mean() if not low.empty else np.nan
    med_profile_size = med['user_hist'].mean() if not med.empty else np.nan
    high_profile_size = high['user_hist'].mean() if not high.empty else np.nan

    low_nr_users = len(low)
    med_nr_users = len(med)
    high_nr_users = len(high)

    # Base profile GAP ('pop_item_fraq') is the same regardless of sorting notion
    low_GAP_profile = low['pop_item_fraq'].mean() if not low.empty else np.nan
    med_GAP_profile = med['pop_item_fraq'].mean() if not med.empty else np.nan
    high_GAP_profile = high['pop_item_fraq'].mean() if not high.empty else np.nan

    # Determine group names based on pop_notion
    if pop_notion_key == 'pop_four':
        low_name, high_name = 'Blockbuster', 'Niche' # Low niche rate = Blockbuster
    else:
        low_name, high_name = 'Niche', 'Blockbuster' # Low pop frac/avg = Niche

    print(f"  Group Stats ({pop_notion_key}):")
    print(f"    Low Group ('{low_name}', {low_nr_users} users): Avg Profile Size={low_profile_size:.2f}, Avg Profile GAP={low_GAP_profile:.6f}")
    print(f"    Med Group ('Diverse', {med_nr_users} users): Avg Profile Size={med_profile_size:.2f}, Avg Profile GAP={med_GAP_profile:.6f}")
    print(f"    High Group ('{high_name}', {high_nr_users} users): Avg Profile Size={high_profile_size:.2f}, Avg Profile GAP={high_GAP_profile:.6f}")
    print("-" * 20)

    return low_GAP_profile, med_GAP_profile, high_GAP_profile

# --- Map notion key to column name for sorting ---
popularity_sort_column = {
    'pop_one': "pop_fraq",
    'pop_two': "pop_item_fraq",
    'pop_four': "niche_consumption_rate"
}

# --- Metric Calculation Functions ---
def calculate_NDCG_per_group(algo_name, recommendations, cornac_exp, low_group_users, med_group_users, high_group_users, rec_k=10):
    # Assumes cornac_exp.eval_method holds the relevant RatioSplit object
    rs = cornac_exp.eval_method
    train_set = rs.train_set
    test_set = rs.test_set

    ndcg_low, ndcg_med, ndcg_high = [], [], []
    low_ids = set(low_group_users.index.astype(int))
    med_ids = set(med_group_users.index.astype(int))
    high_ids = set(high_group_users.index.astype(int))

    orig_uid_to_cornac_uid = train_set.uid_map
    orig_iid_to_cornac_iid = train_set.iid_map

    print(f"    Calculating NDCG@{rec_k} for {algo_name}...")
    for user_id_int, user_rec_items_scores in tqdm(recommendations.items(), desc=f"      NDCG {algo_name}", leave=False):
        user_id_str = str(user_id_int)

        # Check if user is in the test set (relevant items available)
        if user_id_str not in orig_uid_to_cornac_uid or orig_uid_to_cornac_uid[user_id_str] not in test_set.user_data:
            continue

        uid_cornac = orig_uid_to_cornac_uid[user_id_str]
        true_item_indices, true_relevance_scores = test_set.user_data[uid_cornac]
        true_relevance_dict = dict(zip(true_item_indices, true_relevance_scores))

        # Align predicted items with true relevance scores
        user_pred_scores = []
        user_true_scores = []

        for item_id_int, pred_score in user_rec_items_scores:
            # Handle potential NaN prediction scores (assign minimum relevance)
            if pd.isna(pred_score):
                user_pred_scores.append(-np.inf) # Ensure it ranks last
                user_true_scores.append(0.0)
                continue

            item_id_str = str(item_id_int)
            user_pred_scores.append(pred_score)

            true_score = 0.0
            if item_id_str in orig_iid_to_cornac_iid:
                iid_cornac = orig_iid_to_cornac_iid[item_id_str]
                if iid_cornac in true_relevance_dict:
                    true_score = true_relevance_dict[iid_cornac] # Use actual score from test set
            user_true_scores.append(true_score)

        # Calculate NDCG for the user if possible
        if len(user_true_scores) > 0: # Need at least one item recommended
             # Ensure scores are numpy arrays for ndcg_score
            true_relevance_arr = np.asarray([user_true_scores])
            pred_scores_arr = np.asarray([user_pred_scores])

            if np.all(true_relevance_arr == 0):
                 user_ndcg = 0.0 # No relevant items in top-k anyway
            else:
                try:
                    user_ndcg = ndcg_score(true_relevance_arr, pred_scores_arr, k=rec_k)
                except ValueError:
                    user_ndcg = np.nan # Handle cases where ndcg_score fails

            # Assign to group if valid NDCG calculated
            if not pd.isna(user_ndcg):
                if user_id_int in low_ids:
                    ndcg_low.append(user_ndcg)
                elif user_id_int in med_ids:
                    ndcg_med.append(user_ndcg)
                elif user_id_int in high_ids:
                    ndcg_high.append(user_ndcg)

    # Calculate means and T-tests
    mean_ndcg_low = np.nanmean(ndcg_low) if ndcg_low else np.nan
    mean_ndcg_med = np.nanmean(ndcg_med) if ndcg_med else np.nan
    mean_ndcg_high = np.nanmean(ndcg_high) if ndcg_high else np.nan

    ttests = [np.nan] * 3
    try:
        if len(ndcg_low) > 1 and len(ndcg_med) > 1:
            ttests[0] = stats.ttest_ind(ndcg_low, ndcg_med, equal_var=False, nan_policy='omit')[1]
        if len(ndcg_low) > 1 and len(ndcg_high) > 1:
            ttests[1] = stats.ttest_ind(ndcg_low, ndcg_high, equal_var=False, nan_policy='omit')[1]
        if len(ndcg_med) > 1 and len(ndcg_high) > 1:
            ttests[2] = stats.ttest_ind(ndcg_med, ndcg_high, equal_var=False, nan_policy='omit')[1]
    except Exception as e:
        print(f"      Error during t-test calculation for NDCG ({algo_name}): {e}")

    return mean_ndcg_low, mean_ndcg_med, mean_ndcg_high, ttests


def calculate_delta_GAP_per_group(algo_name, recommendations, df_item_dist_cornac_pop,
                                  low_group_users, med_group_users, high_group_users,
                                  low_GAP_profile, med_GAP_profile, high_GAP_profile):

    low_rec_gap_user, medium_rec_gap_user, high_rec_gap_user = [], [], []
    rel_diff_low, rel_diff_med, rel_diff_high = [], [], []

    print(f"    Calculating %DeltaGAP for {algo_name}...")
    for uid_int, user_rec_items_scores in tqdm(recommendations.items(), desc=f"      GAP {algo_name}", leave=False):
        iid_list_int = [iid for (iid, _) in user_rec_items_scores]
        if not iid_list_int: continue

        iid_list_str = [str(iid) for iid in iid_list_int]
        # Filter to items present in the popularity dataframe
        valid_iids_str = [iid_str for iid_str in iid_list_str if iid_str in df_item_dist_cornac_pop.index]
        if not valid_iids_str: continue

        # Calculate average normalized popularity of recommended items
        try:
             gap_rec = df_item_dist_cornac_pop.loc[valid_iids_str, 'popularity'].mean()
        except KeyError:
             # This might happen if df_item_dist_cornac_pop doesn't contain an item for some reason
             # print(f"KeyError calculating GAP for user {uid_int}. Items: {valid_iids_str}")
             gap_rec = np.nan

        if pd.isna(gap_rec): continue # Skip user if GAP couldn't be calculated

        # Assign GAP to group and calculate relative difference
        if uid_int in low_group_users.index:
            low_rec_gap_user.append(gap_rec)
            if pd.notna(low_GAP_profile) and low_GAP_profile > 1e-9: # Avoid division by zero
                rel_diff_low.append((gap_rec - low_GAP_profile) / low_GAP_profile)
        elif uid_int in med_group_users.index:
            medium_rec_gap_user.append(gap_rec)
            if pd.notna(med_GAP_profile) and med_GAP_profile > 1e-9:
                rel_diff_med.append((gap_rec - med_GAP_profile) / med_GAP_profile)
        elif uid_int in high_group_users.index:
            high_rec_gap_user.append(gap_rec)
            if pd.notna(high_GAP_profile) and high_GAP_profile > 1e-9:
                rel_diff_high.append((gap_rec - high_GAP_profile) / high_GAP_profile)

    # Calculate mean recommendation GAP for each group
    mean_gap_rec_low = np.nanmean(low_rec_gap_user) if low_rec_gap_user else np.nan
    mean_gap_rec_med = np.nanmean(medium_rec_gap_user) if medium_rec_gap_user else np.nan
    mean_gap_rec_high = np.nanmean(high_rec_gap_user) if high_rec_gap_user else np.nan

    # Calculate %ΔGAP
    delta_gap_low = (mean_gap_rec_low - low_GAP_profile) / low_GAP_profile * 100 if pd.notna(low_GAP_profile) and low_GAP_profile > 1e-9 and pd.notna(mean_gap_rec_low) else np.nan
    delta_gap_med = (mean_gap_rec_med - med_GAP_profile) / med_GAP_profile * 100 if pd.notna(med_GAP_profile) and med_GAP_profile > 1e-9 and pd.notna(mean_gap_rec_med) else np.nan
    delta_gap_high = (mean_gap_rec_high - high_GAP_profile) / high_GAP_profile * 100 if pd.notna(high_GAP_profile) and high_GAP_profile > 1e-9 and pd.notna(mean_gap_rec_high) else np.nan

    # Calculate t-tests on the *relative differences*
    ttests_gap = [np.nan] * 3
    try:
        # Ensure lists are not empty and contain valid numbers for t-test
        valid_rel_diff_low = [x for x in rel_diff_low if pd.notna(x)]
        valid_rel_diff_med = [x for x in rel_diff_med if pd.notna(x)]
        valid_rel_diff_high = [x for x in rel_diff_high if pd.notna(x)]

        if len(valid_rel_diff_low) > 1 and len(valid_rel_diff_med) > 1:
             ttests_gap[0] = stats.ttest_ind(valid_rel_diff_low, valid_rel_diff_med, equal_var=False, nan_policy='omit')[1]
        if len(valid_rel_diff_low) > 1 and len(valid_rel_diff_high) > 1:
             ttests_gap[1] = stats.ttest_ind(valid_rel_diff_low, valid_rel_diff_high, equal_var=False, nan_policy='omit')[1]
        if len(valid_rel_diff_med) > 1 and len(valid_rel_diff_high) > 1:
             ttests_gap[2] = stats.ttest_ind(valid_rel_diff_med, valid_rel_diff_high, equal_var=False, nan_policy='omit')[1]
    except Exception as e:
         print(f"      Error during t-test calculation for GAP ({algo_name}): {e}")

    return delta_gap_low, delta_gap_med, delta_gap_high, ttests_gap


print("Helper functions defined.")

Helper functions defined.


# =============================================================================
# SECTION D: MITIGATED RECOMMENDATION FUNCTIONS
# =============================================================================
# Implement the post-processing mitigation within new functions

In [4]:
def get_top_n_mitigated_eva_two(model, user_idx_map, test_set, df_item_dist_cornac_pop, alpha, rec_k=10):
    """
    Generates top-N recommendations using UserTest strategy with multiplicative damping.
    Args:
        model: Trained Cornac model.
        user_idx_map: Mapping from Cornac internal UID to original string UID.
        test_set: Cornac test set object.
        df_item_dist_cornac_pop: DataFrame with item popularity ('popularity' column), indexed by Cornac *string* IID.
        alpha: Mitigation strength hyperparameter.
        rec_k: Number of recommendations to return.
    Returns:
        defaultdict: Dictionary {original_user_id (int): [(original_item_id (int), mitigated_score)]}
    """
    top_n = defaultdict(list)
    item_idx_map = model.train_set.iid_map # Original string IID -> Cornac internal IID
    idx_item_map = {v: k for k, v in item_idx_map.items()} # Cornac internal IID -> Original string IID

    print(f"    Generating mitigated recs (eva_two, alpha={alpha})...")
    for uid_cornac in tqdm(test_set.uid_map.values(), desc="      Recs eva_two", leave=False):
        if uid_cornac in test_set.user_data:
            original_user_id_str = user_idx_map[uid_cornac]
            # Candidate items are ONLY those the user rated in the test set
            candidate_items_indices = test_set.user_data[uid_cornac][0]
            if not candidate_items_indices: continue

            try:
                # Rank *all* candidate items for this user
                item_indices_ranked, item_scores_ranked = model.rank(user_idx=uid_cornac, item_indices=candidate_items_indices)

                mitigated_candidates = []
                for iid_cornac, score in zip(item_indices_ranked, item_scores_ranked):
                    if pd.isna(score): continue # Skip items with NaN scores from model

                    original_item_id_str = idx_item_map[iid_cornac]

                    # Get normalized popularity
                    item_pop = 0.0
                    if original_item_id_str in df_item_dist_cornac_pop.index:
                        item_pop = df_item_dist_cornac_pop.loc[original_item_id_str, 'popularity']

                    # Apply multiplicative damping
                    mitigated_score = score
                    if item_pop > 1e-9: # Avoid division by zero or near-zero
                         mitigated_score = score / (item_pop ** alpha)
                    # If item_pop is 0 or very small, keep original score

                    mitigated_candidates.append((iid_cornac, mitigated_score))

                # Sort candidates by mitigated score (descending)
                mitigated_candidates.sort(key=lambda x: x[1], reverse=True)

                # Get top-k based on mitigated scores
                for iid_cornac, mitigated_score in mitigated_candidates[:rec_k]:
                    original_item_id_str = idx_item_map[iid_cornac]
                    # Need to convert original IDs back to int for consistency
                    top_n[int(original_user_id_str)].append((int(original_item_id_str), mitigated_score))

            except Exception as e:
                 print(f"      Error during mitigated ranking for user {original_user_id_str} (eva_two): {e}")
                 continue
    return top_n

In [5]:
def get_top_n_mitigated_eva_three(model, user_idx_map, train_set, test_set, all_items_cornac_indices, df_item_dist_cornac_pop, alpha, rec_k=10):
    """
    Generates top-N recommendations using TrainItems strategy with multiplicative damping.
    Args:
        model: Trained Cornac model.
        user_idx_map: Mapping from Cornac internal UID to original string UID.
        train_set: Cornac train set object.
        test_set: Cornac test set object (used to identify users).
        all_items_cornac_indices: Set of all known Cornac internal item indices.
        df_item_dist_cornac_pop: DataFrame with item popularity ('popularity' column), indexed by Cornac *string* IID.
        alpha: Mitigation strength hyperparameter.
        rec_k: Number of recommendations to return.
    Returns:
        defaultdict: Dictionary {original_user_id (int): [(original_item_id (int), mitigated_score)]}
    """
    top_n = defaultdict(list)
    item_idx_map = model.train_set.iid_map # Original string IID -> Cornac internal IID
    idx_item_map = {v: k for k, v in item_idx_map.items()} # Cornac internal IID -> Original string IID

    print(f"    Generating mitigated recs (eva_three, alpha={alpha})...")
    for uid_cornac in tqdm(test_set.uid_map.values(), desc="      Recs eva_three", leave=False): # Iterate through users present in the test set
        original_user_id_str = user_idx_map[uid_cornac]

        # Find items user interacted with in the training set
        items_in_train = set()
        if uid_cornac in train_set.user_data:
            items_in_train = set(train_set.user_data[uid_cornac][0])

        # Candidate items are ALL items EXCEPT those in the user's training set
        candidate_items_indices = list(all_items_cornac_indices.difference(items_in_train))
        if not candidate_items_indices: continue

        try:
            # Rank *all* candidate items for this user
            item_indices_ranked, item_scores_ranked = model.rank(user_idx=uid_cornac, item_indices=candidate_items_indices)

            mitigated_candidates = []
            for iid_cornac, score in zip(item_indices_ranked, item_scores_ranked):
                 if pd.isna(score): continue # Skip items with NaN scores

                 original_item_id_str = idx_item_map[iid_cornac]

                 # Get normalized popularity
                 item_pop = 0.0
                 if original_item_id_str in df_item_dist_cornac_pop.index:
                     item_pop = df_item_dist_cornac_pop.loc[original_item_id_str, 'popularity']

                 # Apply multiplicative damping
                 mitigated_score = score
                 if item_pop > 1e-9: # Avoid division by zero or near-zero
                     mitigated_score = score / (item_pop ** alpha)

                 mitigated_candidates.append((iid_cornac, mitigated_score))

            # Sort candidates by mitigated score
            mitigated_candidates.sort(key=lambda x: x[1], reverse=True)

            # Get top-k
            for iid_cornac, mitigated_score in mitigated_candidates[:rec_k]:
                 original_item_id_str = idx_item_map[iid_cornac]
                 # Need to convert original IDs back to int for consistency
                 top_n[int(original_user_id_str)].append((int(original_item_id_str), mitigated_score))

        except Exception as e:
             print(f"      Error during mitigated ranking for user {original_user_id_str} (eva_three): {e}")
             continue
    return top_n

# =============================================================================
# SECTION E: MAIN EVALUATION LOOP
# =============================================================================

In [6]:
# --- Store results across domains ---
domain_results = {} # Will store metrics DFs, Cornac objects, etc. per domain
domain_base_metrics = {} # Will store base user/item metrics per domain

# --- Define Domains ---
domains = {
    'music': {'file': 'data/relevant_music_data_20.csv', 'item_col': 'artist', 'needs_scaling': True},
    'movie': {'file': 'data/ratings_movies.dat', 'item_col': 'movie', 'needs_scaling': False}
}

# --- Loop 1: Data Loading, Preprocessing, Base Metrics, Training (Per Domain) ---
print("="*30)
print("STARTING: Data Prep & Model Training")
print("="*30 + "\n")
start_prep_train = time.time()

for domain, config in domains.items():
    print(f"\n--- Processing Domain: {domain.upper()} ---")
    item_col = config['item_col']
    domain_results[domain] = {} # Initialize results dict for domain
    domain_base_metrics[domain] = {} # Initialize base metrics dict

    # 1. Load Data
    print(f"  Loading data: {config['file']}")
    if domain == 'movie':
        cols = ['user', 'movie', 'rating', 'timestamp']
        # Using 'latin1' or 'ISO-8859-1' encoding is often needed for ml-1m ratings.dat
        df_events_orig = pd.read_table(config['file'], sep = "::", engine="python", names=cols, encoding='ISO-8859-1')
        df_events_orig = df_events_orig.astype({user_col: "int", item_col: "int", predict_col: "int"}) # Ensure correct types
    else: # music
        df_events_orig = pd.read_csv(config['file'], index_col=0)
        df_events_orig = df_events_orig.astype({user_col: "int", item_col: "int", predict_col: "int"})

    print(f"  Initial interactions: {len(df_events_orig)}")

    # 2. Initial Analysis & Base Item Popularity
    print("  Running initial analysis...")
    num_users, num_items = users_and_items(df_events_orig, user_col, item_col)
    user_dist_counts = user_distribution(df_events_orig, user_col, item_col)
    item_dist_counts = item_distribution(df_events_orig, user_col, item_col)
    df_item_dist, top_item_ids = calculate_item_popularity(item_dist_counts, num_users, top_fraction)
    domain_base_metrics[domain]['df_item_dist'] = df_item_dist # Store original item dist

    # 3. Base User Metrics Calculation
    print("  Calculating base user metrics...")
    base_user_metrics_df = calculate_popularity_metrics(df_events_orig, top_item_ids, df_item_dist, num_users, user_col, item_col)
    niche_consumption_rates_series = calculate_niche_consumption_rates(df_events_orig, df_item_dist, user_dist_counts, user_col, item_col, niche_threshold_percentile=0.3)
    # Ensure index alignment before join
    user_metrics_all = base_user_metrics_df.join(user_dist_counts.rename('total_interactions'), how='left').join(niche_consumption_rates_series, how='left')
    domain_base_metrics[domain]['user_metrics_all'] = user_metrics_all # Store base metrics

    # 4. Domain-Specific Scaling (if needed)
    df_events_processed = df_events_orig.copy()
    if config['needs_scaling']:
        print("  Scaling user ratings (1-1000)...")
        scaled_df_list = []
        grouped = df_events_processed.groupby(user_col)
        for _, group in tqdm(grouped, total=len(grouped), desc="  Scaling"):
            # Ensure there's data to scale and more than one unique value
            if not group.empty and group[predict_col].nunique() > 1:
                scaler = MinMaxScaler(feature_range=(1, 1000))
                try:
                    scaled_ratings = scaler.fit_transform(group[predict_col].values.reshape(-1, 1).astype(float))
                except ValueError:
                    # Handle potential errors during scaling, e.g., if data is not numeric
                    scaled_ratings = group[predict_col].values.reshape(-1, 1).astype(float) # Use original if scaling fails
            else:
                # If only one unique rating or empty group, keep original
                scaled_ratings = group[predict_col].values.reshape(-1, 1).astype(float)

            new_rows = group.copy()
            new_rows[predict_col] = scaled_ratings.flatten()
            scaled_df_list.append(new_rows)

        if scaled_df_list: # Check if list is not empty before concatenating
             df_events_processed = pd.concat(scaled_df_list, ignore_index=True)
        else:
             print("  Warning: Scaled data list is empty, using original data.")
        print("  Scaling complete.")
        print("-" * 20)

    # 5. Prepare Data for Cornac
    print("  Preparing data for Cornac...")
    df_events_cornac = df_events_processed.copy()
    # Ensure item IDs are consistently handled (using original IDs before mapping)
    unique_items_domain = df_events_cornac[item_col].unique()
    mapping_dict = {orig_id: i for i, orig_id in enumerate(unique_items_domain)}
    # Create reverse map using the generated sequential indices
    reverse_mapping_dict = {i: orig_id for orig_id, i in mapping_dict.items()}

    df_events_cornac[item_col] = df_events_cornac[item_col].map(mapping_dict)

    df_events_cornac[user_col] = df_events_cornac[user_col].astype("string")
    df_events_cornac[item_col] = df_events_cornac[item_col].astype("string") # Cornac expects string item IDs

    # Create item dist indexed by *Cornac string IID*
    df_item_dist_cornac = domain_base_metrics[domain]['df_item_dist'].copy()
    # Map index (original item ID) to Cornac *string* item ID
    df_item_dist_cornac.index = df_item_dist_cornac.index.map(lambda x: str(mapping_dict.get(x, -1))) # Use get with default
    df_item_dist_cornac = df_item_dist_cornac[df_item_dist_cornac.index != '-1'] # Remove items not mapped
    domain_base_metrics[domain]['df_item_dist_cornac'] = df_item_dist_cornac # Store this version

    data_cornac = list(df_events_cornac[[user_col, item_col, predict_col]].to_records(index=False))
    print(f"  Prepared {len(data_cornac)} interactions for Cornac.")
    print("-" * 20)

    # 6. Split Data
    print(f"  Splitting data ({1-test_size:.0%}/{test_size:.0%})...")
    rs = RatioSplit(data=data_cornac, test_size=test_size, rating_threshold=rating_threshold, seed=my_seed, exclude_unknowns=True, verbose=True)
    domain_results[domain]['cornac_rs'] = rs
    # Ensure all_items_cornac_indices uses the internal integer IDs from the built train_set
    all_items_cornac_indices = set(rs.train_set.iid_map.values())
    print(f"  Total unique items known to Cornac: {len(all_items_cornac_indices)}")
    domain_results[domain]['all_items_cornac_indices'] = all_items_cornac_indices
    domain_results[domain]['user_idx_map'] = {v: k for k, v in rs.train_set.uid_map.items()} # Cornac internal idx -> original string uid
    print("-" * 20)

    # 7. Train Models
    print("  Training models...")
    metrics_train = [MAE()] # Use MAE just for monitoring during training
    # Re-initialize models for each domain to avoid state carry-over
    current_models = [MostPop(),
                      UserKNN(k=40, similarity='cosine', mean_centered=False, seed=my_seed, verbose=False),
                      ItemKNN(k=40, similarity='cosine', mean_centered=False, seed=my_seed, verbose=False),
                      PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=my_seed, verbose=False),
                      CornacNMF(k=15, max_iter=50, learning_rate=0.005, lambda_u=0.06, lambda_v=0.06, lambda_bu=0.02, lambda_bi=0.02, use_bias=False, seed=my_seed, verbose=False),
                      HPF(k=50, seed=my_seed, hierarchical=False, name="PF", verbose=False)]

    exp = cornac.Experiment(eval_method=rs, models=current_models, metrics=metrics_train, user_based=True, verbose=True)
    exp.run()
    domain_results[domain]['cornac_exp'] = exp # Store the trained experiment object
    print("  Training complete.")
    print("-" * 20)

end_prep_train = time.time()
print(f"\nData Prep & Model Training completed in {round(end_prep_train - start_prep_train)} seconds.")

STARTING: Data Prep & Model Training


--- Processing Domain: MUSIC ---
  Loading data: data/relevant_music_data_20.csv
  Initial interactions: 1008479
  Running initial analysis...
  No. user events: 1008479
  No. unique artists: 12690
  No. unique users: 3000
--------------------
  Mean artists per user: 336.2
  Min artists per user: 4
  Max artists per user: 2057
--------------------
  Mean users per artist: 79.5
  Min users per artist: 21
  Max users per artist: 1389
--------------------
  Identified 2538 popular items (top 20%)
--------------------
  Calculating base user metrics...
  Calculating base popularity metrics (pop_one, pop_two)...


  Metrics (pop_one/two):   0%|          | 0/3000 [00:00<?, ?it/s]

  Defining niche items (bottom 30% popularity)...
  Found 3855 niche items (popularity count <= 29.00).
  Calculating niche consumption rate per user (pop_four)...


  Metrics (pop_four):   0%|          | 0/3000 [00:00<?, ?it/s]

  Finished calculating niche consumption rates.
--------------------
  Scaling user ratings (1-1000)...


  Scaling:   0%|          | 0/3000 [00:00<?, ?it/s]

  Scaling complete.
--------------------
  Preparing data for Cornac...
  Prepared 1008479 interactions for Cornac.
--------------------
  Splitting data (80%/20%)...
rating_threshold = 1.0
exclude_unknowns = True
---
Training data:
Number of users = 3000
Number of items = 12690
Number of ratings = 806783
Max rating = 1000.0
Min rating = 1.0
Global mean = 44.5
---
Test data:
Number of users = 3000
Number of items = 12690
Number of ratings = 201696
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 3000
Total items = 12690
  Total unique items known to Cornac: 12690
--------------------
  Training models...

[MostPop] Training started!

[MostPop] Evaluation started!


Rating:   0%|          | 0/201696 [00:00<?, ?it/s]


[UserKNN] Training started!


  0%|          | 0/3000 [00:00<?, ?it/s]


[UserKNN] Evaluation started!


Rating:   0%|          | 0/201696 [00:00<?, ?it/s]


[ItemKNN] Training started!


  0%|          | 0/12690 [00:00<?, ?it/s]


[ItemKNN] Evaluation started!


Rating:   0%|          | 0/201696 [00:00<?, ?it/s]


[PMF] Training started!
Learning...
epoch 0, loss: 155084.795966
epoch 1, loss: 86975.061789
epoch 2, loss: 53418.749656
epoch 3, loss: 38709.684055
epoch 4, loss: 30908.752810
epoch 5, loss: 26151.894519
epoch 6, loss: 23008.108093
epoch 7, loss: 20840.053146
epoch 8, loss: 19296.269270
epoch 9, loss: 18163.060215
epoch 10, loss: 17307.567132
epoch 11, loss: 16645.596483
epoch 12, loss: 16122.517976
epoch 13, loss: 15701.942342
epoch 14, loss: 15358.960617
epoch 15, loss: 15076.048770
epoch 16, loss: 14840.543998
epoch 17, loss: 14643.062929
epoch 18, loss: 14476.492447
epoch 19, loss: 14335.331996
epoch 20, loss: 14215.253792
epoch 21, loss: 14112.801286
epoch 22, loss: 14025.176957
epoch 23, loss: 13950.090024
epoch 24, loss: 13885.645940
epoch 25, loss: 13830.265047
epoch 26, loss: 13782.621197
epoch 27, loss: 13741.594617
epoch 28, loss: 13706.235288
epoch 29, loss: 13675.734138
epoch 30, loss: 13649.400159
epoch 31, loss: 13626.642039
epoch 32, loss: 13606.953356
epoch 33, loss:

Rating:   0%|          | 0/201696 [00:00<?, ?it/s]


[NMF] Training started!


  0%|          | 0/50 [00:00<?, ?it/s]

Optimization finished!

[NMF] Evaluation started!


Rating:   0%|          | 0/201696 [00:00<?, ?it/s]


[PF] Training started!
Learning...
Learning completed!

[PF] Evaluation started!


Rating:   0%|          | 0/201696 [00:00<?, ?it/s]


TEST:
...
        |      MAE | Train (s) | Test (s)
------- + -------- + --------- + --------
MostPop | 193.9530 |    0.0411 |   2.4149
UserKNN |  66.4852 |    1.7672 |  22.1585
ItemKNN |  65.6744 |    9.1386 |  27.1424
PMF     |  72.1270 |   42.9307 |   4.3383
NMF     |  57.0491 |    4.7248 |   4.3803
PF      |  54.6507 | 1403.6335 |   3.6372

  Training complete.
--------------------

--- Processing Domain: MOVIE ---
  Loading data: data/ratings_movies.dat
  Initial interactions: 1000209
  Running initial analysis...
  No. user events: 1000209
  No. unique movies: 3706
  No. unique users: 6040
--------------------
  Mean movies per user: 165.6
  Min movies per user: 20
  Max movies per user: 2314
--------------------
  Mean users per movie: 269.9
  Min users per movie: 1
  Max users per movie: 3428
--------------------
  Identified 741 popular items (top 20%)
--------------------
  Calculating base user metrics...
  Calculating base popularity metrics (pop_one, pop_two)...


  Metrics (pop_one/two):   0%|          | 0/6040 [00:00<?, ?it/s]

  Defining niche items (bottom 30% popularity)...
  Found 1122 niche items (popularity count <= 44.00).
  Calculating niche consumption rate per user (pop_four)...


  Metrics (pop_four):   0%|          | 0/6040 [00:00<?, ?it/s]

  Finished calculating niche consumption rates.
--------------------
  Preparing data for Cornac...
  Prepared 1000209 interactions for Cornac.
--------------------
  Splitting data (80%/20%)...
rating_threshold = 1.0
exclude_unknowns = True
---
Training data:
Number of users = 6040
Number of items = 3680
Number of ratings = 800167
Max rating = 5.0
Min rating = 1.0
Global mean = 3.6
---
Test data:
Number of users = 6040
Number of items = 3680
Number of ratings = 200012
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 6040
Total items = 3680
  Total unique items known to Cornac: 3680
--------------------
  Training models...

[MostPop] Training started!

[MostPop] Evaluation started!


Rating:   0%|          | 0/200012 [00:00<?, ?it/s]


[UserKNN] Training started!


  0%|          | 0/6040 [00:00<?, ?it/s]


[UserKNN] Evaluation started!


Rating:   0%|          | 0/200012 [00:00<?, ?it/s]


[ItemKNN] Training started!


  0%|          | 0/3680 [00:00<?, ?it/s]


[ItemKNN] Evaluation started!


Rating:   0%|          | 0/200012 [00:00<?, ?it/s]


[PMF] Training started!
Learning...
epoch 0, loss: 74094.462488
epoch 1, loss: 58434.859407
epoch 2, loss: 52539.283118
epoch 3, loss: 49925.439458
epoch 4, loss: 48292.154778
epoch 5, loss: 47049.400664
epoch 6, loss: 46020.022051
epoch 7, loss: 45163.574883
epoch 8, loss: 44462.164723
epoch 9, loss: 43889.877628
epoch 10, loss: 43417.859088
epoch 11, loss: 43021.174998
epoch 12, loss: 42680.473614
epoch 13, loss: 42381.097342
epoch 14, loss: 42111.841218
epoch 15, loss: 41864.062088
epoch 16, loss: 41631.189411
epoch 17, loss: 41408.508447
epoch 18, loss: 41193.039955
epoch 19, loss: 40983.339405
epoch 20, loss: 40779.101844
epoch 21, loss: 40580.595165
epoch 22, loss: 40388.080315
epoch 23, loss: 40201.417518
epoch 24, loss: 40019.957164
epoch 25, loss: 39842.670338
epoch 26, loss: 39668.398215
epoch 27, loss: 39496.108862
epoch 28, loss: 39325.093571
epoch 29, loss: 39155.072244
epoch 30, loss: 38986.202647
epoch 31, loss: 38819.003725
epoch 32, loss: 38654.217619
epoch 33, loss: 

Rating:   0%|          | 0/200012 [00:00<?, ?it/s]


[NMF] Training started!


  0%|          | 0/50 [00:00<?, ?it/s]

Optimization finished!

[NMF] Evaluation started!


Rating:   0%|          | 0/200012 [00:00<?, ?it/s]


[PF] Training started!
Learning...
Learning completed!

[PF] Evaluation started!


Rating:   0%|          | 0/200012 [00:00<?, ?it/s]


TEST:
...
        |    MAE | Train (s) | Test (s)
------- + ------ + --------- + --------
MostPop | 1.2959 |    0.0521 |   2.6813
UserKNN | 0.7604 |    5.5882 |  25.9634
ItemKNN | 0.8008 |    2.5340 |  22.9914
PMF     | 0.7007 |   38.7273 |   4.2912
NMF     | 0.7453 |    4.1114 |   4.6019
PF      | 2.4665 | 1323.2904 |   3.9961

  Training complete.
--------------------

Data Prep & Model Training completed in 3001 seconds.


In [7]:
# --- Loop 2: Recommendation Generation (Per Domain, Per Strategy) ---
print("\n" + "="*30)
print("STARTING: Mitigated Recommendation Generation")
print("="*30 + "\n")
start_rec_gen = time.time()

all_recommendations_mitigated = {} # Nested dict: {domain: {strategy: {algo_name: recommendations}}}

for domain, config in domains.items():
    print(f"\n--- Generating Recommendations for Domain: {domain.upper()} ---")
    all_recommendations_mitigated[domain] = {}
    exp = domain_results[domain]['cornac_exp']
    rs = domain_results[domain]['cornac_rs']
    df_item_dist_cornac = domain_base_metrics[domain]['df_item_dist_cornac'] # Use the one indexed by Cornac string IID
    all_items_cornac_indices = domain_results[domain]['all_items_cornac_indices']
    user_idx_map = domain_results[domain]['user_idx_map'] # Cornac internal index -> original string uid

    for eva_key in evaluation_strategies:
        print(f"  Strategy: {eva_key}")
        all_recommendations_mitigated[domain][eva_key] = {}
        for i, algo_name in enumerate(algo_names):
             model = exp.models[i]
             if eva_key == 'eva_two':
                  top_n = get_top_n_mitigated_eva_two(model, user_idx_map, rs.test_set, df_item_dist_cornac, mitigation_alpha, rec_k=rec_k)
             elif eva_key == 'eva_three':
                  top_n = get_top_n_mitigated_eva_three(model, user_idx_map, rs.train_set, rs.test_set, all_items_cornac_indices, df_item_dist_cornac, mitigation_alpha, rec_k=rec_k)
             else:
                  print(f"      Warning: Unknown strategy {eva_key}. Skipping.")
                  continue
             all_recommendations_mitigated[domain][eva_key][algo_name] = top_n

end_rec_gen = time.time()
print(f"\nMitigated Recommendation Generation completed in {round(end_rec_gen - start_rec_gen)} seconds.")


STARTING: Mitigated Recommendation Generation


--- Generating Recommendations for Domain: MUSIC ---
  Strategy: eva_two
    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/3000 [00:00<?, ?it/s]

  Strategy: eva_three
    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/3000 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/3000 [00:00<?, ?it/s]


--- Generating Recommendations for Domain: MOVIE ---
  Strategy: eva_two
    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_two, alpha=0.5)...


      Recs eva_two:   0%|          | 0/6040 [00:00<?, ?it/s]

  Strategy: eva_three
    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/6040 [00:00<?, ?it/s]

    Generating mitigated recs (eva_three, alpha=0.5)...


      Recs eva_three:   0%|          | 0/6040 [00:00<?, ?it/s]


Mitigated Recommendation Generation completed in 7569 seconds.


In [8]:
# --- Loop 3: Evaluation (Per Domain, Per Strategy, Per Notion) ---
print("\n" + "="*30)
print("STARTING: Evaluation Loop")
print("="*30 + "\n")
start_eval = time.time()

for domain, config in domains.items():
    print(f"\n--- Evaluating Domain: {domain.upper()} ---")
    exp = domain_results[domain]['cornac_exp']
    user_metrics_all = domain_base_metrics[domain]['user_metrics_all']
    df_item_dist_cornac = domain_base_metrics[domain]['df_item_dist_cornac']

    for eva_key in evaluation_strategies:
        print(f"  Evaluation Strategy: {eva_key}")
        recommendations_domain_strategy = all_recommendations_mitigated[domain][eva_key]

        for pop_key in popularity_notions:
            print(f"    Popularity Notion: {pop_key}")
            pop_sort_col = popularity_sort_column[pop_key]

            # 1. Determine user groups for this notion
            print(f"      Sorting users by '{pop_sort_col}' and splitting...")
            try:
                user_dist_sorted = sort_users(user_metrics_all, by=pop_sort_col)
                if user_dist_sorted.empty:
                     print(f"      Skipping {pop_key} due to empty sorted user list (likely all NaNs for metric).")
                     continue
                low_group, med_group, high_group = split(user_dist_sorted, top_fraction)
                if low_group.empty or med_group.empty or high_group.empty:
                     print(f"      Skipping {pop_key} due to empty groups after split.")
                     continue
                low_GAP_profile, med_GAP_profile, high_GAP_profile = calculate_group_characteristics(low_group, med_group, high_group, pop_key)
            except Exception as e:
                print(f"      Error during user sorting/splitting for {pop_key}: {e}. Skipping.")
                continue

            # Initialize results storage for this combination
            NDCGs = pd.DataFrame(index=algo_names, columns=['low', 'med', 'high']).fillna(np.nan)
            TTESTs_ndcg = pd.DataFrame(index=algo_names, columns=['low-med', 'low-high', 'med-high']).fillna(np.nan)
            GAPs = pd.DataFrame(index=algo_names, columns=['low', 'med', 'high']).fillna(np.nan)
            TTESTs_gap = pd.DataFrame(index=algo_names, columns=['low-med', 'low-high', 'med-high']).fillna(np.nan)


            # 2. Calculate Metrics per Algorithm
            for i, algo_name in enumerate(algo_names):
                 recommendations_for_algo = recommendations_domain_strategy[algo_name]

                 # Calculate NDCG
                 ndcg_low, ndcg_med, ndcg_high, ttests_n = calculate_NDCG_per_group(
                     algo_name, recommendations_for_algo, exp, low_group, med_group, high_group, rec_k
                 )
                 NDCGs.loc[algo_name] = [ndcg_low, ndcg_med, ndcg_high]
                 TTESTs_ndcg.loc[algo_name] = ttests_n

                 # Calculate %ΔGAP
                 gap_low, gap_med, gap_high, ttests_g = calculate_delta_GAP_per_group(
                     algo_name, recommendations_for_algo, df_item_dist_cornac,
                     low_group, med_group, high_group,
                     low_GAP_profile, med_GAP_profile, high_GAP_profile
                 )
                 GAPs.loc[algo_name] = [gap_low, gap_med, gap_high]
                 TTESTs_gap.loc[algo_name] = ttests_g

            # 3. Save Results for this combination
            print(f"      Saving results for {domain}, {eva_key}, {pop_key}...")
            ndcg_file = results_location + f'NDCGs_{domain}_{eva_key}_{pop_key}.csv'
            ndcg_ttest_file = results_location + f'NDCG_ttests_{domain}_{eva_key}_{pop_key}.csv'
            gap_file = results_location + f'PercentDeltaGAP_{domain}_{eva_key}_{pop_key}.csv'
            gap_ttest_file = results_location + f'GAP_ttests_{domain}_{eva_key}_{pop_key}.csv'

            NDCGs.to_csv(ndcg_file)
            TTESTs_ndcg.to_csv(ndcg_ttest_file)
            GAPs.to_csv(gap_file)
            TTESTs_gap.to_csv(gap_ttest_file)
            print(f"      Results saved.")
            print("-" * 20)


end_eval = time.time()
print(f"\nEvaluation Loop completed in {round(end_eval - start_eval)} seconds.")
print("\nProcessing finished.")


STARTING: Evaluation Loop


--- Evaluating Domain: MUSIC ---
  Evaluation Strategy: eva_two
    Popularity Notion: pop_one
      Sorting users by 'pop_fraq' and splitting...
  Group Stats (pop_one):
    Low Group ('Niche', 600 users): Avg Profile Size=303.26, Avg Profile GAP=0.051648
    Med Group ('Diverse', 1800 users): Avg Profile Size=380.66, Avg Profile GAP=0.084701
    High Group ('Blockbuster', 600 users): Avg Profile Size=235.56, Avg Profile GAP=0.121155
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

      Saving results for music, eva_two, pop_one...
      Results saved.
--------------------
    Popularity Notion: pop_two
      Sorting users by 'pop_item_fraq' and splitting...
  Group Stats (pop_two):
    Low Group ('Niche', 600 users): Avg Profile Size=336.87, Avg Profile GAP=0.049812
    Med Group ('Diverse', 1800 users): Avg Profile Size=385.25, Avg Profile GAP=0.083783
    High Group ('Blockbuster', 600 users): Avg Profile Size=188.19, Avg Profile GAP=0.125745
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

      Saving results for music, eva_two, pop_two...
      Results saved.
--------------------
    Popularity Notion: pop_four
      Sorting users by 'niche_consumption_rate' and splitting...
  Group Stats (pop_four):
    Low Group ('Blockbuster', 600 users): Avg Profile Size=262.23, Avg Profile GAP=0.115610
    Med Group ('Diverse', 1800 users): Avg Profile Size=377.53, Avg Profile GAP=0.085114
    High Group ('Niche', 600 users): Avg Profile Size=285.96, Avg Profile GAP=0.055953
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

      Saving results for music, eva_two, pop_four...
      Results saved.
--------------------
  Evaluation Strategy: eva_three
    Popularity Notion: pop_one
      Sorting users by 'pop_fraq' and splitting...
  Group Stats (pop_one):
    Low Group ('Niche', 600 users): Avg Profile Size=303.26, Avg Profile GAP=0.051648
    Med Group ('Diverse', 1800 users): Avg Profile Size=380.66, Avg Profile GAP=0.084701
    High Group ('Blockbuster', 600 users): Avg Profile Size=235.56, Avg Profile GAP=0.121155
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

      Saving results for music, eva_three, pop_one...
      Results saved.
--------------------
    Popularity Notion: pop_two
      Sorting users by 'pop_item_fraq' and splitting...
  Group Stats (pop_two):
    Low Group ('Niche', 600 users): Avg Profile Size=336.87, Avg Profile GAP=0.049812
    Med Group ('Diverse', 1800 users): Avg Profile Size=385.25, Avg Profile GAP=0.083783
    High Group ('Blockbuster', 600 users): Avg Profile Size=188.19, Avg Profile GAP=0.125745
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

      Saving results for music, eva_three, pop_two...
      Results saved.
--------------------
    Popularity Notion: pop_four
      Sorting users by 'niche_consumption_rate' and splitting...
  Group Stats (pop_four):
    Low Group ('Blockbuster', 600 users): Avg Profile Size=262.23, Avg Profile GAP=0.115610
    Med Group ('Diverse', 1800 users): Avg Profile Size=377.53, Avg Profile GAP=0.085114
    High Group ('Niche', 600 users): Avg Profile Size=285.96, Avg Profile GAP=0.055953
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/3000 [00:00<?, ?it/s]

      Saving results for music, eva_three, pop_four...
      Results saved.
--------------------

--- Evaluating Domain: MOVIE ---
  Evaluation Strategy: eva_two
    Popularity Notion: pop_one
      Sorting users by 'pop_fraq' and splitting...
  Group Stats (pop_one):
    Low Group ('Niche', 1208 users): Avg Profile Size=301.74, Avg Profile GAP=0.108016
    Med Group ('Diverse', 3624 users): Avg Profile Size=152.35, Avg Profile GAP=0.156976
    High Group ('Blockbuster', 1208 users): Avg Profile Size=69.19, Avg Profile GAP=0.209073
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/6034 [00:00<?, ?it/s]

      Saving results for movie, eva_two, pop_one...
      Results saved.
--------------------
    Popularity Notion: pop_two
      Sorting users by 'pop_item_fraq' and splitting...
  Group Stats (pop_two):
    Low Group ('Niche', 1208 users): Avg Profile Size=336.11, Avg Profile GAP=0.105535
    Med Group ('Diverse', 3624 users): Avg Profile Size=145.64, Avg Profile GAP=0.155064
    High Group ('Blockbuster', 1208 users): Avg Profile Size=54.95, Avg Profile GAP=0.217289
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/6034 [00:00<?, ?it/s]

      Saving results for movie, eva_two, pop_two...
      Results saved.
--------------------
    Popularity Notion: pop_four
      Sorting users by 'niche_consumption_rate' and splitting...
  Group Stats (pop_four):
    Low Group ('Blockbuster', 1208 users): Avg Profile Size=78.37, Avg Profile GAP=0.177772
    Med Group ('Diverse', 3624 users): Avg Profile Size=179.59, Avg Profile GAP=0.160309
    High Group ('Niche', 1208 users): Avg Profile Size=210.85, Avg Profile GAP=0.129314
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/6034 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/6034 [00:00<?, ?it/s]

      Saving results for movie, eva_two, pop_four...
      Results saved.
--------------------
  Evaluation Strategy: eva_three
    Popularity Notion: pop_one
      Sorting users by 'pop_fraq' and splitting...
  Group Stats (pop_one):
    Low Group ('Niche', 1208 users): Avg Profile Size=301.74, Avg Profile GAP=0.108016
    Med Group ('Diverse', 3624 users): Avg Profile Size=152.35, Avg Profile GAP=0.156976
    High Group ('Blockbuster', 1208 users): Avg Profile Size=69.19, Avg Profile GAP=0.209073
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/6040 [00:00<?, ?it/s]

      Saving results for movie, eva_three, pop_one...
      Results saved.
--------------------
    Popularity Notion: pop_two
      Sorting users by 'pop_item_fraq' and splitting...
  Group Stats (pop_two):
    Low Group ('Niche', 1208 users): Avg Profile Size=336.11, Avg Profile GAP=0.105535
    Med Group ('Diverse', 3624 users): Avg Profile Size=145.64, Avg Profile GAP=0.155064
    High Group ('Blockbuster', 1208 users): Avg Profile Size=54.95, Avg Profile GAP=0.217289
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/6040 [00:00<?, ?it/s]

      Saving results for movie, eva_three, pop_two...
      Results saved.
--------------------
    Popularity Notion: pop_four
      Sorting users by 'niche_consumption_rate' and splitting...
  Group Stats (pop_four):
    Low Group ('Blockbuster', 1208 users): Avg Profile Size=78.37, Avg Profile GAP=0.177772
    Med Group ('Diverse', 3624 users): Avg Profile Size=179.59, Avg Profile GAP=0.160309
    High Group ('Niche', 1208 users): Avg Profile Size=210.85, Avg Profile GAP=0.129314
--------------------
    Calculating NDCG@10 for MostPop...


      NDCG MostPop:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for MostPop...


      GAP MostPop:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for UserKNN...


      NDCG UserKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for UserKNN...


      GAP UserKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for ItemKNN...


      NDCG ItemKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for ItemKNN...


      GAP ItemKNN:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for PMF...


      NDCG PMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for PMF...


      GAP PMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for NMF...


      NDCG NMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for NMF...


      GAP NMF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating NDCG@10 for HPF...


      NDCG HPF:   0%|          | 0/6040 [00:00<?, ?it/s]

    Calculating %DeltaGAP for HPF...


      GAP HPF:   0%|          | 0/6040 [00:00<?, ?it/s]

      Saving results for movie, eva_three, pop_four...
      Results saved.
--------------------

Evaluation Loop completed in 222 seconds.

Processing finished.
