# NCF with Focal Loss - ML-100K Improved Experimental Design

**Paper**: "Addressing Class Imbalance in NCF with Focal Loss" (AAMAS 2025)

## Objective

This notebook implements improved experimental methodology to test three key hypotheses:

**H1 (Efficacy)**: Focal Loss improves NeuMF performance over standard BCE loss on implicit feedback recommendation.

**H2 (Robustness)**: The improvement is robust across different negative sampling ratios (1:4, 1:10, 1:50).

**H3 (Mechanism)**: The focusing effect (gamma > 0) is necessary beyond simple class weighting (alpha-balanced BCE).

## Improvements Over Original Design

1. **Primary experiment at 1:10 sampling** (not just 1:4) - more realistic for real systems
2. **Alpha-balanced BCE control** - isolates focusing effect from class weighting
3. **Alpha-sampling interaction analysis** - addresses confound between alpha and sampling ratio
4. **Robustness study** - tests across 3 sampling ratios (1:4, 1:10, 1:50)
5. **Training dynamics tracking** - validates mechanism by analyzing loss contribution by confidence bin
6. **Proper statistical framing** - clear hypothesis testing structure

## Dataset: MovieLens 100K

- 100,000 ratings from 943 users on 1,682 movies
- Binarization: ratings >= 4 -> positive interaction
- Leave-one-out evaluation (most recent for test)
- Full ranking evaluation

## Cell 1: Suppress Warnings

In [4]:
# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings('ignore')

## Cell 2: Install Dependencies

**Instructions:**
1. Run the install cell below
2. **RESTART** the runtime (Runtime -> Restart session)
3. Run the verification cell
4. Continue with remaining cells

In [None]:
# ============================================
# Install Dependencies (Part 1)
# ============================================
# After running this cell, RESTART the runtime, then run the next cell

%pip install -q ray
%pip install -q recbole==1.2.0
%pip install -q kmeans-pytorch

# Force numpy 1.x (required for RecBole compatibility)
%pip uninstall -y numpy
%pip install -q "numpy<2"

print("\n" + "="*60)
print("RESTART REQUIRED")
print("="*60)
print("Go to: Runtime -> Restart session")
print("Then run the NEXT cell to verify installation.")

In [None]:
# ============================================
# Verify Installation (Part 2) - Run AFTER restart
# ============================================
import numpy as np
print(f"NumPy version: {np.__version__}")

if np.__version__.startswith("2."):
    print("\nERROR: NumPy 2.x still detected!")
    print("Try: Runtime -> Restart session -> Run this cell again")
else:
    print("SUCCESS: NumPy 1.x installed. Continue to next cell.")

## Cell 3: Imports

In [None]:
# Standard imports
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import pandas as pd
from collections import defaultdict
import os
import logging
import sys

# ============================================
# Environment Setup (Colab / Local)
# ============================================
if 'google.colab' in sys.modules:
    # Clone repo if not already present
    if not os.path.exists('/content/Project-2026---Team-7'):
        !git clone https://github.com/omereliy/Project-2026---Team-7.git /content/Project-2026---Team-7
    # Add experiments folder to path
    sys.path.insert(0, '/content/Project-2026---Team-7/experiments')
    %cd /content/Project-2026---Team-7/experiments
    print("Running on Google Colab - repo cloned")
else:
    # Local: assume running from experiments folder
    sys.path.insert(0, '.')
    print("Running locally")

# PyTorch 2.6+ compatibility patch
if not hasattr(torch, '_load_patched'):
    _original_torch_load = torch.load
    def _patched_torch_load(*args, **kwargs):
        if 'weights_only' not in kwargs:
            kwargs['weights_only'] = False
        return _original_torch_load(*args, **kwargs)
    torch.load = _patched_torch_load
    torch._load_patched = True

from focal_loss_utils import (
    # Loss functions
    FocalLoss, AlphaBalancedBCE,
    # Configuration
    get_base_config, get_neumf_config,
    # Training functions
    train_neumf_focal_loss, train_neumf_alpha_bce,
    # Evaluation
    create_comparison_table, compute_improvement,
    # Validation
    validate_focal_loss_implementation, demonstrate_focal_loss_effect,
    # Alpha-sampling analysis
    analyze_alpha_sampling_interaction, compute_effective_class_ratio,
    get_balanced_alpha,
    # Multi-seed experiments
    run_multi_seed_experiment
)

# RecBole imports
from recbole.quick_start import run_recbole
from recbole.model.general_recommender.neumf import NeuMF
from recbole.config import Config
from recbole.data import create_dataset, data_preparation
from recbole.trainer import Trainer
from recbole.utils import init_seed, init_logger

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if torch.cuda.is_available():
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("Using CPU")

## Cell 4: Focal Loss Validation Test

Verify that the Focal Loss implementation is correct:
1. FL(gamma=0, alpha=0.5) = 0.5 * BCE
2. Higher gamma reduces loss for well-classified examples

In [None]:
print("="*70)
print("VALIDATION: Focal Loss Implementation")
print("="*70)

# Test implementation correctness
validate_focal_loss_implementation()

print("\n" + "="*70)
print("DEMONSTRATION: Focal Loss Effect on Easy vs Hard Examples")
print("="*70)

# Demonstrate focusing effect
demo_df = demonstrate_focal_loss_effect()
print("\n" + demo_df.to_string(index=False))

print("\nInterpretation:")
print("- Easy examples (high confidence) are down-weighted by 10-100x")
print("- Hard examples (low confidence) retain most of their loss contribution")
print("- This forces the model to focus on hard-to-classify examples")

## Cell 5: Experiment Configuration

We test three negative sampling ratios:
- **1:4** - Standard ratio used in many NCF papers
- **1:10** - **PRIMARY EXPERIMENT** - more realistic for production systems
- **1:50** - High negative ratio to test robustness under extreme imbalance

The primary experiment uses 1:10 sampling because:
1. It's more representative of real recommendation systems
2. It provides stronger evidence of robustness than just testing at 1:4
3. The class imbalance problem is more pronounced, making Focal Loss improvements more meaningful

In [None]:
# Dataset and sampling ratios
DATASET = 'ml-100k'
SAMPLING_RATIOS = [4, 10, 50]  # negative samples per positive
PRIMARY_RATIO = 10  # Main experiment uses 1:10 sampling

# Focal Loss hyperparameters (from literature)
GAMMA = 2.0  # Focusing parameter
ALPHA = 0.25  # Class balancing weight for positives

print(f"Dataset: {DATASET}")
print(f"Negative sampling ratios: {SAMPLING_RATIOS}")
print(f"Primary experiment: 1:{PRIMARY_RATIO} sampling")
print(f"\nFocal Loss hyperparameters:")
print(f"  gamma = {GAMMA} (focusing parameter)")
print(f"  alpha = {ALPHA} (class balancing weight for positives)")

## Cell 6: Configuration Setup

Create configurations for each sampling ratio

In [None]:
# Create configurations for each sampling ratio
configs = {}
for ratio in SAMPLING_RATIOS:
    base = get_base_config(DATASET, device, neg_sample_num=ratio)
    configs[ratio] = get_neumf_config(base)
    print(f"Configuration for 1:{ratio} sampling created")

print("\nNeuMF Architecture:")
print(f"  MF Embedding Size: {configs[PRIMARY_RATIO]['mf_embedding_size']}")
print(f"  MLP Embedding Size: {configs[PRIMARY_RATIO]['mlp_embedding_size']}")
print(f"  MLP Hidden Layers: {configs[PRIMARY_RATIO]['mlp_hidden_size']}")
print(f"  Dropout: {configs[PRIMARY_RATIO]['dropout_prob']}")
print(f"\nTraining Settings:")
print(f"  Max Epochs: {configs[PRIMARY_RATIO]['epochs']}")
print(f"  Early Stopping Patience: {configs[PRIMARY_RATIO]['stopping_step']}")
print(f"  Learning Rate: {configs[PRIMARY_RATIO]['learning_rate']}")
print(f"  Batch Size: {configs[PRIMARY_RATIO]['train_batch_size']}")

## Cell 7: Alpha-Sampling Interaction Analysis

**Important Issue**: The alpha parameter and sampling ratio interact in non-obvious ways.

With 1:4 sampling and alpha=0.5:
- Each batch has 1 positive (weight: 0.5) and 4 negatives (weight: 0.5 each)
- **Effective ratio**: (0.5 × 4) / 0.5 = 4:1 (negatives still get 4x more weight!)

For balanced weighting with 1:N sampling:
- Need alpha = N / (N + 1)
- Example: 1:4 -> alpha = 0.8, 1:10 -> alpha = 0.909, 1:50 -> alpha = 0.98

This explains why alpha=0.25 (from computer vision) may not be optimal for NCF.

In [None]:
print("="*70)
print("ALPHA-SAMPLING INTERACTION ANALYSIS")
print("="*70)

# Analyze interaction for various alpha values
interaction_df = analyze_alpha_sampling_interaction(
    neg_ratios=SAMPLING_RATIOS,
    alphas=[0.25, 0.5, 0.75]
)
print("\n" + interaction_df.to_string(index=False))

print("\n" + "-"*70)
print("BALANCED ALPHA VALUES (for effective ratio = 1:1)")
print("-"*70)

for ratio in SAMPLING_RATIOS:
    balanced = get_balanced_alpha(ratio)
    actual_ratio = compute_effective_class_ratio(ALPHA, ratio)
    print(f"1:{ratio} sampling -> balanced alpha = {balanced:.3f}")
    print(f"  Using alpha={ALPHA} -> effective ratio = {actual_ratio:.1f}:1")
    print()

## Experiment 1: NeuMF-BCE Baseline (1:10 Sampling)

Train standard NeuMF with Binary Cross-Entropy loss.

This serves as the baseline for H1 (efficacy hypothesis).

In [None]:
print("="*70)
print(f"EXPERIMENT 1: NeuMF-BCE Baseline (1:{PRIMARY_RATIO} sampling)")
print("="*70)

result_bce = run_recbole(
    model='NeuMF',
    dataset=DATASET,
    config_dict=configs[PRIMARY_RATIO]
)

print("\nNeuMF-BCE Results:")
print(f"  Best Validation NDCG@10: {result_bce['best_valid_score']:.4f}")
print(f"  Test HR@10: {result_bce['test_result'].get('hit@10', 0):.4f}")
print(f"  Test NDCG@10: {result_bce['test_result'].get('ndcg@10', 0):.4f}")

## Experiment 2: NeuMF with Alpha-Balanced BCE (1:10 Sampling)

**NEW CONTROL EXPERIMENT**: Train NeuMF with alpha-balanced BCE (Focal Loss with gamma=0).

This isolates the **focusing effect** (gamma > 0) from simple **class weighting** (alpha).

- If Alpha-BCE ≈ FL: Improvement comes mainly from class weighting (H3 false)
- If Alpha-BCE < FL: Focusing effect is necessary (H3 true)

We use alpha=0.25 (same as Focal Loss) for direct comparison.

In [None]:
print("="*70)
print(f"EXPERIMENT 2: NeuMF-AlphaBCE Control (1:{PRIMARY_RATIO} sampling)")
print("="*70)
print(f"Alpha-Balanced BCE: gamma=0, alpha={ALPHA}")
print("This isolates class weighting from the focusing effect.\n")

result_alpha_bce = train_neumf_alpha_bce(
    config_dict=configs[PRIMARY_RATIO],
    dataset=DATASET,
    alpha=ALPHA,
    seed=42,
    track_dynamics=False
)

print("\nNeuMF-AlphaBCE Results:")
print(f"  Best Validation NDCG@10: {result_alpha_bce['best_valid_score']:.4f}")
print(f"  Test HR@10: {result_alpha_bce['test_result'].get('hit@10', 0):.4f}")
print(f"  Test NDCG@10: {result_alpha_bce['test_result'].get('ndcg@10', 0):.4f}")

## Experiment 3: NeuMF with Focal Loss (1:10 Sampling)

Train NeuMF with Focal Loss (gamma=2.0, alpha=0.25).

This is the main proposed method for H1.

In [None]:
print("="*70)
print(f"EXPERIMENT 3: NeuMF-FocalLoss (1:{PRIMARY_RATIO} sampling)")
print("="*70)
print(f"Focal Loss: gamma={GAMMA}, alpha={ALPHA}\n")

result_focal = train_neumf_focal_loss(
    config_dict=configs[PRIMARY_RATIO],
    dataset=DATASET,
    gamma=GAMMA,
    alpha=ALPHA,
    seed=42,
    track_dynamics=False
)

print("\nNeuMF-FocalLoss Results:")
print(f"  Best Validation NDCG@10: {result_focal['best_valid_score']:.4f}")
print(f"  Test HR@10: {result_focal['test_result'].get('hit@10', 0):.4f}")
print(f"  Test NDCG@10: {result_focal['test_result'].get('ndcg@10', 0):.4f}")

## Hypothesis Testing: Primary Results (1:10 Sampling)

Compare BCE vs Alpha-BCE vs Focal Loss to test H1 and H3.

In [None]:
print("="*70)
print(f"HYPOTHESIS TESTING: Primary Results (1:{PRIMARY_RATIO} sampling)")
print("="*70)

# Create comparison table
comparison_df = create_comparison_table(
    results_list=[result_bce, result_alpha_bce, result_focal],
    model_names=['BCE', 'AlphaBCE', 'FocalLoss']
)
print("\n" + comparison_df.to_string(index=False))

# Compute improvements
print("\n" + "-"*70)
print("IMPROVEMENT ANALYSIS")
print("-"*70)

# H1: Focal Loss vs BCE
print("\nH1: Focal Loss vs BCE (Efficacy Hypothesis)")
improvements_fl_vs_bce = compute_improvement(result_bce, result_focal)
for metric in ['ndcg@10', 'hit@10']:
    imp = improvements_fl_vs_bce[metric]
    print(f"  {metric.upper()}: {imp['baseline']:.4f} -> {imp['comparison']:.4f} ({imp['pct_change']:+.2f}%)")

# H3: Focal Loss vs Alpha-BCE (Mechanism Hypothesis)
print("\nH3: Focal Loss vs Alpha-BCE (Mechanism - Focusing Effect)")
improvements_fl_vs_alpha = compute_improvement(result_alpha_bce, result_focal)
for metric in ['ndcg@10', 'hit@10']:
    imp = improvements_fl_vs_alpha[metric]
    print(f"  {metric.upper()}: {imp['baseline']:.4f} -> {imp['comparison']:.4f} ({imp['pct_change']:+.2f}%)")

# Interpretation
print("\n" + "-"*70)
print("INTERPRETATION")
print("-"*70)

ndcg_fl = result_focal['test_result'].get('ndcg@10', 0)
ndcg_bce = result_bce['test_result'].get('ndcg@10', 0)
ndcg_alpha = result_alpha_bce['test_result'].get('ndcg@10', 0)

if ndcg_fl > ndcg_bce:
    print("✓ H1 SUPPORTED: Focal Loss improves over BCE")
else:
    print("✗ H1 NOT SUPPORTED: Focal Loss does not improve over BCE")

if ndcg_fl > ndcg_alpha:
    print("✓ H3 SUPPORTED: Focusing effect (gamma > 0) is necessary beyond class weighting")
else:
    print("✗ H3 NOT SUPPORTED: Class weighting alone (Alpha-BCE) is sufficient")

if ndcg_alpha > ndcg_bce:
    print("  Note: Alpha-BCE also improves over BCE, suggesting class weighting helps")
else:
    print("  Note: Alpha-BCE does not improve over BCE")

## Robustness Study: Multiple Sampling Ratios (H2)

Test whether Focal Loss improvements are robust across different sampling ratios.

We compare BCE vs Focal Loss at 1:4, 1:10, and 1:50 sampling.

In [None]:
print("="*70)
print("H2: ROBUSTNESS STUDY - Multiple Sampling Ratios")
print("="*70)
print("Testing BCE vs Focal Loss at 1:4, 1:10, and 1:50 sampling\n")

robustness_results = {
    'bce': {},
    'focal': {}
}

for ratio in SAMPLING_RATIOS:
    print(f"\n{'='*70}")
    print(f"Sampling Ratio: 1:{ratio}")
    print('='*70)
    
    # Skip if we already trained at PRIMARY_RATIO
    if ratio == PRIMARY_RATIO:
        print(f"Using existing results from primary experiment")
        robustness_results['bce'][ratio] = result_bce
        robustness_results['focal'][ratio] = result_focal
        continue
    
    # Train BCE
    print(f"\nTraining NeuMF-BCE (1:{ratio})...")
    result_bce_ratio = run_recbole(
        model='NeuMF',
        dataset=DATASET,
        config_dict=configs[ratio]
    )
    robustness_results['bce'][ratio] = result_bce_ratio
    
    # Train Focal Loss
    print(f"Training NeuMF-FocalLoss (1:{ratio})...")
    result_focal_ratio = train_neumf_focal_loss(
        config_dict=configs[ratio],
        dataset=DATASET,
        gamma=GAMMA,
        alpha=ALPHA,
        seed=42,
        track_dynamics=False
    )
    robustness_results['focal'][ratio] = result_focal_ratio
    
    # Show results
    bce_ndcg = result_bce_ratio['test_result'].get('ndcg@10', 0)
    fl_ndcg = result_focal_ratio['test_result'].get('ndcg@10', 0)
    improvement = (fl_ndcg - bce_ndcg) / bce_ndcg * 100 if bce_ndcg > 0 else 0
    
    print(f"\nResults for 1:{ratio} sampling:")
    print(f"  BCE NDCG@10: {bce_ndcg:.4f}")
    print(f"  Focal NDCG@10: {fl_ndcg:.4f}")
    print(f"  Improvement: {improvement:+.2f}%")

## Robustness Study: Summary Table

In [None]:
print("="*70)
print("ROBUSTNESS STUDY SUMMARY")
print("="*70)

# Create summary table
summary_data = []
for ratio in SAMPLING_RATIOS:
    bce_result = robustness_results['bce'][ratio]
    fl_result = robustness_results['focal'][ratio]
    
    bce_ndcg = bce_result['test_result'].get('ndcg@10', 0)
    fl_ndcg = fl_result['test_result'].get('ndcg@10', 0)
    bce_hr = bce_result['test_result'].get('hit@10', 0)
    fl_hr = fl_result['test_result'].get('hit@10', 0)
    
    ndcg_imp = (fl_ndcg - bce_ndcg) / bce_ndcg * 100 if bce_ndcg > 0 else 0
    hr_imp = (fl_hr - bce_hr) / bce_hr * 100 if bce_hr > 0 else 0
    
    summary_data.append({
        'Sampling': f'1:{ratio}',
        'BCE_NDCG@10': f'{bce_ndcg:.4f}',
        'FL_NDCG@10': f'{fl_ndcg:.4f}',
        'NDCG_Improvement': f'{ndcg_imp:+.2f}%',
        'BCE_HR@10': f'{bce_hr:.4f}',
        'FL_HR@10': f'{fl_hr:.4f}',
        'HR_Improvement': f'{hr_imp:+.2f}%'
    })

summary_df = pd.DataFrame(summary_data)
print("\n" + summary_df.to_string(index=False))

# H2 interpretation
print("\n" + "-"*70)
print("H2 INTERPRETATION (Robustness)")
print("-"*70)

improvements = []
for ratio in SAMPLING_RATIOS:
    bce_ndcg = robustness_results['bce'][ratio]['test_result'].get('ndcg@10', 0)
    fl_ndcg = robustness_results['focal'][ratio]['test_result'].get('ndcg@10', 0)
    improvements.append(fl_ndcg > bce_ndcg)

if all(improvements):
    print("✓ H2 STRONGLY SUPPORTED: Focal Loss improves over BCE at ALL sampling ratios")
elif sum(improvements) >= 2:
    print("✓ H2 PARTIALLY SUPPORTED: Focal Loss improves at most sampling ratios")
else:
    print("✗ H2 NOT SUPPORTED: Focal Loss improvements are not robust")

print(f"\nFocal Loss wins at {sum(improvements)}/{len(SAMPLING_RATIOS)} sampling ratios")

## Training Dynamics Analysis

**NEW ANALYSIS**: Track training dynamics to validate the Focal Loss mechanism.

We track loss contribution by confidence bin to verify:
1. Focal Loss down-weights easy examples (high confidence)
2. Focal Loss emphasizes hard examples (low confidence)
3. This leads to better focusing on difficult instances

This provides direct evidence for the claimed mechanism.

In [None]:
print("="*70)
print("TRAINING DYNAMICS ANALYSIS")
print("="*70)
print("Re-training models with dynamics tracking enabled...\n")

# Train BCE with dynamics tracking
print("Training BCE with dynamics tracking...")
# Note: Standard BCE from RecBole doesn't support dynamics tracking
# We approximate by using Alpha-BCE with alpha=0.5
bce_dynamics = train_neumf_alpha_bce(
    config_dict=configs[PRIMARY_RATIO],
    dataset=DATASET,
    alpha=0.5,  # Standard BCE
    seed=42,
    track_dynamics=True
)

# Train Focal Loss with dynamics tracking
print("\nTraining Focal Loss with dynamics tracking...")
focal_dynamics = train_neumf_focal_loss(
    config_dict=configs[PRIMARY_RATIO],
    dataset=DATASET,
    gamma=GAMMA,
    alpha=ALPHA,
    seed=42,
    track_dynamics=True
)

print("\nTraining complete. Analyzing dynamics...")

## Training Dynamics: Results

In [None]:
print("="*70)
print("TRAINING DYNAMICS: Loss by Confidence Bin")
print("="*70)

if 'dynamics' in bce_dynamics and 'dynamics' in focal_dynamics:
    bce_dyn_df = bce_dynamics['dynamics']
    focal_dyn_df = focal_dynamics['dynamics']
    
    # Show final epoch dynamics
    if len(bce_dyn_df) > 0 and len(focal_dyn_df) > 0:
        print("\nBCE - Final Epoch Loss by Confidence Bin:")
        print(bce_dyn_df.tail(1).to_string(index=False))
        
        print("\nFocal Loss - Final Epoch Loss by Confidence Bin:")
        print(focal_dyn_df.tail(1).to_string(index=False))
        
        print("\n" + "-"*70)
        print("INTERPRETATION")
        print("-"*70)
        print("Expected pattern for Focal Loss:")
        print("  - LOW loss in high confidence bins [0.8,1.0) - easy examples down-weighted")
        print("  - HIGH loss in low confidence bins [0.0,0.4) - hard examples emphasized")
        print("\nThis validates the focusing mechanism.")
    else:
        print("Dynamics tracking did not record data. Check model implementation.")
else:
    print("Dynamics tracking not available. Models may not support this feature.")
    print("This is optional analysis - main results are still valid.")

## Results Summary: Full Comparison

Complete results across all experiments.

In [None]:
print("="*70)
print("FINAL RESULTS SUMMARY")
print("="*70)

print(f"\nDataset: {DATASET}")
print(f"Primary Experiment: 1:{PRIMARY_RATIO} sampling")
print(f"Focal Loss: gamma={GAMMA}, alpha={ALPHA}")

print("\n" + "-"*70)
print(f"PRIMARY RESULTS (1:{PRIMARY_RATIO} sampling)")
print("-"*70)

primary_comparison = create_comparison_table(
    results_list=[result_bce, result_alpha_bce, result_focal],
    model_names=['BCE', 'AlphaBCE', 'FocalLoss']
)
print("\n" + primary_comparison.to_string(index=False))

print("\n" + "-"*70)
print("ROBUSTNESS ACROSS SAMPLING RATIOS")
print("-"*70)
print("\n" + summary_df.to_string(index=False))

print("\n" + "="*70)
print("HYPOTHESIS TESTING CONCLUSIONS")
print("="*70)

# H1
ndcg_fl = result_focal['test_result'].get('ndcg@10', 0)
ndcg_bce = result_bce['test_result'].get('ndcg@10', 0)
h1_imp = (ndcg_fl - ndcg_bce) / ndcg_bce * 100 if ndcg_bce > 0 else 0

print("\nH1 (Efficacy): Focal Loss improves NeuMF over BCE")
if ndcg_fl > ndcg_bce:
    print(f"  ✓ SUPPORTED: {h1_imp:+.2f}% improvement in NDCG@10")
else:
    print(f"  ✗ NOT SUPPORTED: {h1_imp:+.2f}% change in NDCG@10")

# H2
robust_count = sum([robustness_results['focal'][r]['test_result'].get('ndcg@10', 0) > 
                    robustness_results['bce'][r]['test_result'].get('ndcg@10', 0) 
                    for r in SAMPLING_RATIOS])

print(f"\nH2 (Robustness): Improvements are robust across sampling ratios")
if robust_count == len(SAMPLING_RATIOS):
    print(f"  ✓ STRONGLY SUPPORTED: FL wins at {robust_count}/{len(SAMPLING_RATIOS)} ratios")
elif robust_count >= 2:
    print(f"  ✓ PARTIALLY SUPPORTED: FL wins at {robust_count}/{len(SAMPLING_RATIOS)} ratios")
else:
    print(f"  ✗ NOT SUPPORTED: FL wins at only {robust_count}/{len(SAMPLING_RATIOS)} ratios")

# H3
ndcg_alpha = result_alpha_bce['test_result'].get('ndcg@10', 0)
h3_imp = (ndcg_fl - ndcg_alpha) / ndcg_alpha * 100 if ndcg_alpha > 0 else 0

print(f"\nH3 (Mechanism): Focusing effect (gamma > 0) is necessary beyond class weighting")
if ndcg_fl > ndcg_alpha:
    print(f"  ✓ SUPPORTED: FL gains {h3_imp:+.2f}% over Alpha-BCE")
else:
    print(f"  ✗ NOT SUPPORTED: Alpha-BCE is sufficient ({h3_imp:+.2f}%)")

print("\n" + "="*70)

## Optional: Grid Search for Hyperparameter Tuning

**IMPORTANT**: This is computationally expensive. Only run if you want to optimize gamma and alpha.

The grid search tests:
- Multiple gamma values: [0.5, 1.0, 2.0, 3.0]
- Multiple alpha values: [0.25, 0.5, 0.75]
- Multiple sampling ratios: [4, 10, 50]

**Uncomment to run**

In [None]:
# # ============================================
# # GRID SEARCH (OPTIONAL - COMPUTATIONALLY EXPENSIVE)
# # ============================================
# 
# GAMMA_VALUES = [0.5, 1.0, 2.0, 3.0]
# ALPHA_VALUES = [0.25, 0.5, 0.75]
# 
# print("="*70)
# print("GRID SEARCH: Hyperparameter Tuning")
# print("="*70)
# print(f"Testing {len(GAMMA_VALUES)} gamma × {len(ALPHA_VALUES)} alpha × {len(SAMPLING_RATIOS)} ratios")
# print(f"Total: {len(GAMMA_VALUES) * len(ALPHA_VALUES) * len(SAMPLING_RATIOS)} experiments\n")
# 
# grid_results = []
# 
# for ratio in SAMPLING_RATIOS:
#     print(f"\nSampling Ratio: 1:{ratio}")
#     print("-"*70)
#     
#     for gamma in GAMMA_VALUES:
#         for alpha in ALPHA_VALUES:
#             print(f"Training: gamma={gamma}, alpha={alpha}")
#             
#             result = train_neumf_focal_loss(
#                 config_dict=configs[ratio],
#                 dataset=DATASET,
#                 gamma=gamma,
#                 alpha=alpha,
#                 seed=42,
#                 track_dynamics=False
#             )
#             
#             ndcg10 = result['test_result'].get('ndcg@10', 0)
#             hr10 = result['test_result'].get('hit@10', 0)
#             
#             grid_results.append({
#                 'ratio': ratio,
#                 'gamma': gamma,
#                 'alpha': alpha,
#                 'ndcg@10': ndcg10,
#                 'hit@10': hr10
#             })
#             
#             print(f"  NDCG@10: {ndcg10:.4f}, HR@10: {hr10:.4f}")
# 
# # Show grid search results
# grid_df = pd.DataFrame(grid_results)
# 
# print("\n" + "="*70)
# print("GRID SEARCH RESULTS")
# print("="*70)
# print("\n" + grid_df.to_string(index=False))
# 
# # Find best configuration for each ratio
# print("\n" + "-"*70)
# print("BEST CONFIGURATIONS PER SAMPLING RATIO")
# print("-"*70)
# 
# for ratio in SAMPLING_RATIOS:
#     ratio_results = grid_df[grid_df['ratio'] == ratio]
#     best_idx = ratio_results['ndcg@10'].idxmax()
#     best_row = ratio_results.loc[best_idx]
#     
#     print(f"\n1:{ratio} sampling:")
#     print(f"  Best: gamma={best_row['gamma']}, alpha={best_row['alpha']}")
#     print(f"  NDCG@10: {best_row['ndcg@10']:.4f}")
#     print(f"  HR@10: {best_row['hit@10']:.4f}")