# NCF with Focal Loss - ML-1M Improved Experimental Design

**Paper**: "Addressing Class Imbalance in NCF with Focal Loss" (AAMAS 2025)

## Hypotheses

This notebook tests three key hypotheses:

**H1: Focal Loss improves NeuMF performance**  
Focal Loss will outperform standard BCE loss on NeuMF for implicit feedback recommendation.

**H2: Focal Loss is robust to negative sampling ratio**  
Focal Loss will maintain consistent performance across different negative sampling ratios (1:4, 1:10, 1:50), while BCE may degrade with higher imbalance.

**H3: Focusing effect is necessary beyond class weighting**  
The focusing parameter (gamma) provides benefits beyond class balancing (alpha). We test this by comparing Focal Loss to Alpha-Balanced BCE (gamma=0 control).

## Experimental Design

- **Dataset**: MovieLens 1M (larger, more realistic than ML-100K)
- **Model**: NeuMF (Neural Collaborative Filtering)
- **Evaluation**: Leave-one-out, full ranking, HR@10 and NDCG@10
- **Primary sampling ratio**: 1:10 (moderate imbalance, commonly used in literature)
- **Robustness test**: 1:4 (low), 1:10 (moderate), 1:50 (high imbalance)

## Methodology Improvements from Review

1. **Alpha-Balanced BCE control** - Isolates focusing effect (gamma) from class weighting (alpha)
2. **Multiple sampling ratios** - Tests robustness hypothesis (H2)
3. **1:10 as primary** - More realistic imbalance than 1:4
4. **Training dynamics tracking** - Validates mechanism (loss contribution by confidence bin)
5. **Alpha-sampling interaction analysis** - Documents effective class weight ratios

## Cell 1: Suppress Warnings

In [None]:
import warnings

# Filter out FutureWarnings from pandas
warnings.filterwarnings('ignore', category=FutureWarning, module='pandas')

# Filter out FutureWarnings from torch (like the GradScaler one)
warnings.filterwarnings('ignore', category=FutureWarning, module='torch')

# Filter out warnings from RecBole
warnings.filterwarnings('ignore', message='^command line args.*will not be used in RecBole', category=UserWarning)
warnings.filterwarnings('ignore', message='^All the same value in \[label\].*', category=UserWarning)

print("Warnings suppressed.")

## Cell 2: Install Dependencies

Run this cell, then **RESTART the runtime** before continuing!

In [None]:
# ============================================
# Install Dependencies
# ============================================
# Run this cell, then RESTART the runtime before continuing!

!pip install ray
!pip uninstall numpy -y
!pip install numpy==1.26.4
!pip install recbole==1.2.0
!pip install kmeans-pytorch

# Verify numpy version
import numpy as np
print(f"\nNumPy version: {np.__version__}")
if np.__version__.startswith("2."):
    print("ERROR: NumPy 2.x detected! Please RESTART the runtime now.")
    print("Go to: Runtime -> Restart session")
else:
    print("NumPy version OK. You can continue to the next cell.")

## Cell 3: Imports

Import utilities from `focal_loss_utils.py` and standard libraries.

In [None]:
# ============================================
# Imports
# ============================================
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import logging
import os
from collections import defaultdict

# Fix for PyTorch 2.6+ (only patch once)
if not hasattr(torch, '_load_patched'):
    _original_torch_load = torch.load
    def _patched_torch_load(*args, **kwargs):
        if 'weights_only' not in kwargs:
            kwargs['weights_only'] = False
        return _original_torch_load(*args, **kwargs)
    torch.load = _patched_torch_load
    torch._load_patched = True

# RecBole imports
from recbole.quick_start import run_recbole
from recbole.model.general_recommender.neumf import NeuMF
from recbole.config import Config
from recbole.data import create_dataset, data_preparation
from recbole.trainer import Trainer
from recbole.utils import init_seed, init_logger

# Import from focal_loss_utils.py
import sys
sys.path.insert(0, '/content' if 'google.colab' in sys.modules else '.')
from focal_loss_utils import (
    FocalLoss, AlphaBalancedBCE,
    get_base_config, get_neumf_config,
    train_neumf_focal_loss, train_neumf_alpha_bce,
    create_comparison_table, compute_improvement,
    validate_focal_loss_implementation, demonstrate_focal_loss_effect,
    analyze_alpha_sampling_interaction, compute_effective_class_ratio,
    get_balanced_alpha, run_multi_seed_experiment
)

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if torch.cuda.is_available():
    print(f"Using: {torch.cuda.get_device_name(0)}")
else:
    print(f"Using: CPU")

print("\nImports successful!")

## Cell 4: Focal Loss Validation Test

Validate that our Focal Loss implementation is correct before running experiments.

In [None]:
# ============================================
# Focal Loss Validation
# ============================================
print("="*70)
print("VALIDATION: Focal Loss Implementation")
print("="*70)

# Test 1: Verify implementation correctness
validate_focal_loss_implementation()

# Test 2: Demonstrate focusing effect
print("\n" + "="*70)
print("DEMONSTRATION: Focal Loss Effect on Easy vs Hard Examples")
print("="*70)

demo_df = demonstrate_focal_loss_effect()
print("\n" + demo_df.to_string(index=False))

print("\n" + "="*70)
print("Key Insight: Focal Loss down-weights easy examples, allowing the model")
print("to focus on hard examples (the focusing effect).")
print("="*70)

## Cell 5: Experiment Configuration

### Negative Sampling Ratios

We test three negative sampling ratios to evaluate robustness (H2):

- **1:4** - Standard ratio (low imbalance)
- **1:10** - Moderate imbalance (PRIMARY experiment)
- **1:50** - High imbalance (realistic for large-scale systems)

### Alpha-Sampling Interaction

The class balancing weight (alpha) interacts with the negative sampling ratio. A fixed alpha=0.5 does NOT provide equal weighting across different sampling ratios.

Example: With 1:4 sampling and alpha=0.5:
- Effective ratio = ((1-alpha) * neg_ratio) / alpha = ((1-0.5) * 4) / 0.5 = 4:1
- Negatives are weighted 4x more than positives!

We address this by computing balanced alpha values for each sampling ratio.

In [None]:
# ============================================
# Configuration Setup
# ============================================
SAMPLING_RATIOS = [4, 10, 50]
DATASET = 'ml-1m'

# Create configs for each sampling ratio
configs = {}
for ratio in SAMPLING_RATIOS:
    base = get_base_config(DATASET, device, neg_sample_num=ratio)
    configs[ratio] = get_neumf_config(base)

print("Configurations created for negative sampling ratios:")
for ratio in SAMPLING_RATIOS:
    print(f"  1:{ratio} (config key: {ratio})")

print(f"\nDataset: {DATASET}")
print(f"Model: NeuMF")
print(f"Primary experiment: 1:10 sampling")

## Cell 6: Alpha-Sampling Interaction Analysis

This analysis reveals a critical issue: **alpha=0.5 does NOT provide balanced weighting** when combined with negative sampling.

The effective class weight ratio is:

$$\text{Effective Ratio} = \frac{(1-\alpha) \times \text{neg\_ratio}}{\alpha}$$

For balanced weighting (ratio = 1:1), we need:

$$\alpha = \frac{\text{neg\_ratio}}{\text{neg\_ratio} + 1}$$

In [None]:
# ============================================
# Alpha-Sampling Interaction Analysis
# ============================================
print("="*70)
print("ANALYSIS: Alpha-Sampling Interaction")
print("="*70)

interaction_df = analyze_alpha_sampling_interaction(
    neg_ratios=SAMPLING_RATIOS,
    alphas=[0.25, 0.5, 0.75]
)

print("\nEffective class weight ratios (neg:pos) for different alpha values:")
print(interaction_df.to_string(index=False))

print("\n" + "="*70)
print("Balanced alpha values (for effective ratio = 1:1):")
print("="*70)
for ratio in SAMPLING_RATIOS:
    balanced = get_balanced_alpha(ratio)
    print(f"  1:{ratio} sampling -> balanced alpha = {balanced:.4f}")

print("\n" + "="*70)
print("Key Insight: Alpha values must be tuned for each sampling ratio.")
print("Using alpha=0.25 (from object detection) may not be optimal for NCF.")
print("="*70)

## Experiment 1: NeuMF-BCE Baseline (1:10 Sampling)

Train standard NeuMF with Binary Cross-Entropy loss.

This serves as our baseline for testing H1 (Focal Loss improves NeuMF).

In [None]:
# ============================================
# Train NeuMF-BCE (1:10 sampling - PRIMARY)
# ============================================
print("="*70)
print("EXPERIMENT 1: NeuMF-BCE Baseline (1:10 Sampling)")
print("="*70)

# Use 1:10 as primary experiment
PRIMARY_RATIO = 10
bce_config = configs[PRIMARY_RATIO].copy()
bce_config['loss_type'] = 'BCE'

result_bce = run_recbole(
    model='NeuMF',
    dataset=DATASET,
    config_dict=bce_config
)

# Store results
bce_results = {
    'model': 'NeuMF-BCE',
    'best_valid_score': result_bce['best_valid_score'],
    'test_result': result_bce['test_result']
}

print("\n" + "="*70)
print("NeuMF-BCE Results (1:10 sampling):")
print("="*70)
print(f"Best Validation NDCG@10: {result_bce['best_valid_score']:.4f}")
print(f"\nTest Results:")
for metric, value in result_bce['test_result'].items():
    print(f"  {metric.upper():12s}: {value:.4f}")

## Experiment 2: NeuMF with Alpha-Balanced BCE (NEW Control)

### Purpose: Isolate Focusing Effect from Class Weighting (H3)

Alpha-Balanced BCE is Focal Loss with gamma=0. This control allows us to determine whether improvements come from:
- **Class weighting (alpha)** - balancing positive/negative loss contributions
- **Focusing effect (gamma)** - down-weighting easy examples

If Focal Loss outperforms Alpha-BCE, we can attribute the gain to the focusing effect.

### Tuned Alpha Values

We use tuned alpha values rather than alpha=0.25 from object detection:
- For 1:10 sampling, we test alpha in [0.5, 0.7, 0.9] to find the optimal class balancing.

In [None]:
# ============================================
# Train NeuMF-AlphaBCE (1:10 sampling)
# ============================================
print("="*70)
print("EXPERIMENT 2: NeuMF with Alpha-Balanced BCE (gamma=0 Control)")
print("="*70)

# Test multiple alpha values to find optimal class balancing
ALPHA_VALUES = [0.5, 0.7, 0.9]
alpha_bce_results = {}

for alpha in ALPHA_VALUES:
    print(f"\nTraining with alpha={alpha}...")
    result_alpha = train_neumf_alpha_bce(
        config_dict=configs[PRIMARY_RATIO],
        dataset=DATASET,
        alpha=alpha,
        seed=42,
        track_dynamics=False
    )
    
    alpha_bce_results[alpha] = result_alpha
    print(f"  Validation NDCG@10: {result_alpha['best_valid_score']:.4f}")
    print(f"  Test NDCG@10: {result_alpha['test_result']['ndcg@10']:.4f}")

# Select best alpha
best_alpha = max(ALPHA_VALUES, key=lambda a: alpha_bce_results[a]['best_valid_score'])
best_alpha_bce_result = alpha_bce_results[best_alpha]

print("\n" + "="*70)
print(f"Best Alpha-Balanced BCE: alpha={best_alpha}")
print("="*70)
print(f"Validation NDCG@10: {best_alpha_bce_result['best_valid_score']:.4f}")
print(f"Test NDCG@10: {best_alpha_bce_result['test_result']['ndcg@10']:.4f}")

## Experiment 3: NeuMF with Focal Loss

Train NeuMF with Focal Loss (gamma=2.0, alpha=0.25).

This tests H1: Focal Loss improves NeuMF performance.

In [None]:
# ============================================
# Train NeuMF-FocalLoss (1:10 sampling)
# ============================================
print("="*70)
print("EXPERIMENT 3: NeuMF with Focal Loss")
print("="*70)

GAMMA = 2.0
ALPHA_FL = 0.25

result_fl = train_neumf_focal_loss(
    config_dict=configs[PRIMARY_RATIO],
    dataset=DATASET,
    gamma=GAMMA,
    alpha=ALPHA_FL,
    seed=42,
    track_dynamics=False
)

fl_results = {
    'model': f'NeuMF-FL(g={GAMMA},a={ALPHA_FL})',
    'best_valid_score': result_fl['best_valid_score'],
    'test_result': result_fl['test_result']
}

print("\n" + "="*70)
print(f"NeuMF-FocalLoss Results (gamma={GAMMA}, alpha={ALPHA_FL}):")
print("="*70)
print(f"Best Validation NDCG@10: {result_fl['best_valid_score']:.4f}")
print(f"\nTest Results:")
for metric, value in result_fl['test_result'].items():
    print(f"  {metric.upper():12s}: {value:.4f}")

## Hypothesis Testing: Primary Comparison (1:10 Sampling)

Compare the three approaches:
1. **NeuMF-BCE** - Standard baseline
2. **NeuMF-AlphaBCE** - Class weighting only (gamma=0)
3. **NeuMF-FocalLoss** - Class weighting + focusing effect (gamma=2)

This comparison tests:
- **H1**: Does Focal Loss improve over BCE?
- **H3**: Does the focusing effect (gamma) provide benefits beyond class weighting (alpha)?

In [None]:
# ============================================
# Comparison Table
# ============================================
print("="*70)
print("HYPOTHESIS TESTING: Primary Comparison (1:10 Sampling)")
print("="*70)

# Create comparison table
comparison_df = create_comparison_table(
    [bce_results, best_alpha_bce_result, fl_results],
    ['BCE', f'Alpha-BCE(a={best_alpha})', f'Focal(g={GAMMA},a={ALPHA_FL})']
)

print("\n" + comparison_df.to_string(index=False))

# Compute improvements
print("\n" + "="*70)
print("IMPROVEMENTS OVER BASELINE (BCE):")
print("="*70)

# Alpha-BCE vs BCE
alpha_improvement = compute_improvement(bce_results, best_alpha_bce_result)
print(f"\nAlpha-BCE (alpha={best_alpha}) vs BCE:")
print(f"  NDCG@10: {alpha_improvement['ndcg@10']['comparison']:.4f} vs {alpha_improvement['ndcg@10']['baseline']:.4f}")
print(f"  Change: {alpha_improvement['ndcg@10']['pct_change']:+.2f}%")
print(f"  HR@10:   {alpha_improvement['hit@10']['comparison']:.4f} vs {alpha_improvement['hit@10']['baseline']:.4f}")
print(f"  Change: {alpha_improvement['hit@10']['pct_change']:+.2f}%")

# Focal Loss vs BCE
fl_improvement = compute_improvement(bce_results, fl_results)
print(f"\nFocal Loss (gamma={GAMMA}, alpha={ALPHA_FL}) vs BCE:")
print(f"  NDCG@10: {fl_improvement['ndcg@10']['comparison']:.4f} vs {fl_improvement['ndcg@10']['baseline']:.4f}")
print(f"  Change: {fl_improvement['ndcg@10']['pct_change']:+.2f}%")
print(f"  HR@10:   {fl_improvement['hit@10']['comparison']:.4f} vs {fl_improvement['hit@10']['baseline']:.4f}")
print(f"  Change: {fl_improvement['hit@10']['pct_change']:+.2f}%")

# Focal Loss vs Alpha-BCE (isolating focusing effect)
fl_vs_alpha_improvement = compute_improvement(best_alpha_bce_result, fl_results)
print(f"\nFocal Loss vs Alpha-BCE (focusing effect only):")
print(f"  NDCG@10: {fl_vs_alpha_improvement['ndcg@10']['comparison']:.4f} vs {fl_vs_alpha_improvement['ndcg@10']['baseline']:.4f}")
print(f"  Change: {fl_vs_alpha_improvement['ndcg@10']['pct_change']:+.2f}%")
print(f"  HR@10:   {fl_vs_alpha_improvement['hit@10']['comparison']:.4f} vs {fl_vs_alpha_improvement['hit@10']['baseline']:.4f}")
print(f"  Change: {fl_vs_alpha_improvement['hit@10']['pct_change']:+.2f}%")

print("\n" + "="*70)
print("HYPOTHESIS TEST RESULTS (1:10 Sampling):")
print("="*70)
print(f"H1 (FL improves NeuMF): {'SUPPORTED' if fl_improvement['ndcg@10']['pct_change'] > 0 else 'NOT SUPPORTED'}")
print(f"H3 (Focusing beyond weighting): {'SUPPORTED' if fl_vs_alpha_improvement['ndcg@10']['pct_change'] > 0 else 'NOT SUPPORTED'}")
print("="*70)

## Robustness Study: Multiple Sampling Ratios (H2)

Test whether Focal Loss maintains consistent performance across different negative sampling ratios:
- **1:4** (low imbalance)
- **1:10** (moderate imbalance) - already tested above
- **1:50** (high imbalance)

**H2 Prediction**: Focal Loss will maintain more consistent performance than BCE as imbalance increases.

In [None]:
# ============================================
# Robustness Study - Multiple Sampling Ratios
# ============================================
print("="*70)
print("ROBUSTNESS STUDY: Testing H2 (Multiple Sampling Ratios)")
print("="*70)

# Store results for each ratio
robustness_results = {
    'bce': {},
    'focal': {}
}

# We already have results for 1:10, so test 1:4 and 1:50
test_ratios = [4, 50]

for ratio in test_ratios:
    print(f"\n{'='*70}")
    print(f"Testing with 1:{ratio} sampling")
    print(f"{'='*70}")
    
    # Train BCE
    print(f"\n[1:{ratio}] Training NeuMF-BCE...")
    bce_config_ratio = configs[ratio].copy()
    bce_config_ratio['loss_type'] = 'BCE'
    result_bce_ratio = run_recbole(
        model='NeuMF',
        dataset=DATASET,
        config_dict=bce_config_ratio
    )
    robustness_results['bce'][ratio] = result_bce_ratio
    print(f"  Test NDCG@10: {result_bce_ratio['test_result']['ndcg@10']:.4f}")
    
    # Train Focal Loss
    print(f"\n[1:{ratio}] Training NeuMF-FocalLoss...")
    result_fl_ratio = train_neumf_focal_loss(
        config_dict=configs[ratio],
        dataset=DATASET,
        gamma=GAMMA,
        alpha=ALPHA_FL,
        seed=42,
        track_dynamics=False
    )
    robustness_results['focal'][ratio] = result_fl_ratio
    print(f"  Test NDCG@10: {result_fl_ratio['test_result']['ndcg@10']:.4f}")

# Add 1:10 results (already computed)
robustness_results['bce'][PRIMARY_RATIO] = result_bce
robustness_results['focal'][PRIMARY_RATIO] = result_fl

# Create robustness comparison table
print("\n" + "="*70)
print("ROBUSTNESS COMPARISON: BCE vs Focal Loss Across Sampling Ratios")
print("="*70)

robustness_data = []
for ratio in SAMPLING_RATIOS:
    bce_ndcg = robustness_results['bce'][ratio]['test_result']['ndcg@10']
    bce_hr = robustness_results['bce'][ratio]['test_result']['hit@10']
    fl_ndcg = robustness_results['focal'][ratio]['test_result']['ndcg@10']
    fl_hr = robustness_results['focal'][ratio]['test_result']['hit@10']
    
    improvement_ndcg = (fl_ndcg - bce_ndcg) / bce_ndcg * 100
    improvement_hr = (fl_hr - bce_hr) / bce_hr * 100
    
    robustness_data.append({
        'Sampling Ratio': f'1:{ratio}',
        'BCE NDCG@10': f'{bce_ndcg:.4f}',
        'FL NDCG@10': f'{fl_ndcg:.4f}',
        'NDCG Improve': f'{improvement_ndcg:+.2f}%',
        'BCE HR@10': f'{bce_hr:.4f}',
        'FL HR@10': f'{fl_hr:.4f}',
        'HR Improve': f'{improvement_hr:+.2f}%'
    })

robustness_df = pd.DataFrame(robustness_data)
print("\n" + robustness_df.to_string(index=False))

print("\n" + "="*70)
print("H2 TEST: Robustness to Sampling Ratio")
print("="*70)
print("Focal Loss should maintain more consistent performance than BCE")
print("as class imbalance increases (1:4 -> 1:10 -> 1:50).")
print("="*70)

## Training Dynamics Analysis (NEW)

### Validation of Focal Loss Mechanism

We track training dynamics to validate that Focal Loss actually focuses on hard examples:

1. **Loss by confidence bin** - Show that well-classified examples (high confidence) contribute less to the total loss
2. **Easy vs Hard examples** - Demonstrate that hard examples dominate gradient updates

This provides mechanism validation beyond just performance metrics.

In [None]:
# ============================================
# Training Dynamics Analysis
# ============================================
print("="*70)
print("MECHANISM VALIDATION: Training Dynamics with Focal Loss")
print("="*70)
print("\nRetraining with dynamics tracking enabled...")

# Retrain with dynamics tracking
result_fl_dynamics = train_neumf_focal_loss(
    config_dict=configs[PRIMARY_RATIO],
    dataset=DATASET,
    gamma=GAMMA,
    alpha=ALPHA_FL,
    seed=42,
    track_dynamics=True
)

if 'dynamics' in result_fl_dynamics:
    dynamics_df = result_fl_dynamics['dynamics']
    
    print("\n" + "="*70)
    print("Training Dynamics Summary (First 5 Epochs):")
    print("="*70)
    
    # Show first few epochs
    if len(dynamics_df) > 0:
        print(dynamics_df.head().to_string(index=False))
        
        # Analyze loss distribution across confidence bins
        print("\n" + "="*70)
        print("Loss Contribution by Confidence Bin (Final Epoch):")
        print("="*70)
        
        final_epoch = dynamics_df.iloc[-1]
        print("\nLower confidence = harder examples")
        print("Higher confidence = easier examples")
        print("\nFocal Loss should concentrate loss on low-confidence bins.")
        
        # Extract loss by bin columns
        loss_cols = [col for col in dynamics_df.columns if col.startswith('loss_')]
        if loss_cols:
            print("\nLoss by confidence bin:")
            for col in sorted(loss_cols):
                bin_name = col.replace('loss_', '')
                loss_val = final_epoch.get(col, 0)
                count_col = f'count_{bin_name}'
                count_val = final_epoch.get(count_col, 0)
                print(f"  {bin_name}: loss={loss_val:.4f}, count={int(count_val)}")
else:
    print("\nDynamics tracking not available in this training run.")

print("\n" + "="*70)
print("Key Insight: Focal Loss should show higher loss contribution from")
print("low-confidence bins (hard examples) compared to high-confidence bins.")
print("="*70)

## Results Summary

Comprehensive summary of all experimental results.

In [None]:
# ============================================
# Full Comparison and Validation
# ============================================
print("="*70)
print("EXPERIMENT SUMMARY: NCF with Focal Loss on ML-1M")
print("="*70)

print("\n" + "="*70)
print("PRIMARY RESULTS (1:10 Sampling):")
print("="*70)
print(comparison_df.to_string(index=False))

print("\n" + "="*70)
print("ROBUSTNESS RESULTS (Multiple Sampling Ratios):")
print("="*70)
print(robustness_df.to_string(index=False))

print("\n" + "="*70)
print("HYPOTHESIS TEST RESULTS:")
print("="*70)

print("\nH1: Focal Loss improves NeuMF performance")
print(f"  Status: {'SUPPORTED' if fl_improvement['ndcg@10']['pct_change'] > 0 else 'NOT SUPPORTED'}")
print(f"  Evidence: FL shows {fl_improvement['ndcg@10']['pct_change']:+.2f}% improvement in NDCG@10")

print("\nH2: Focal Loss is robust to negative sampling ratio")
print(f"  Status: Review robustness table above")
print(f"  Evidence: Compare FL performance consistency across 1:4, 1:10, 1:50")

print("\nH3: Focusing effect is necessary beyond class weighting")
print(f"  Status: {'SUPPORTED' if fl_vs_alpha_improvement['ndcg@10']['pct_change'] > 0 else 'NOT SUPPORTED'}")
print(f"  Evidence: FL vs Alpha-BCE shows {fl_vs_alpha_improvement['ndcg@10']['pct_change']:+.2f}% improvement")
print(f"  This isolates the gamma (focusing) effect from alpha (weighting)")

print("\n" + "="*70)
print("METHODOLOGY IMPROVEMENTS IMPLEMENTED:")
print("="*70)
print("1. Alpha-Balanced BCE control (isolates focusing effect)")
print("2. Multiple sampling ratios (tests robustness)")
print("3. 1:10 as primary experiment (more realistic than 1:4)")
print("4. Training dynamics tracking (mechanism validation)")
print("5. Alpha-sampling interaction analysis (documents effective ratios)")
print("="*70)

## Optional: Hyperparameter Grid Search

Uncomment and run for full hyperparameter exploration.

### Extended Grid Search

This searches over:
- **Gamma**: [0.5, 1.0, 2.0, 3.0]
- **Alpha**: [0.25, 0.5, 0.75, 0.9]
- **Sampling ratios**: [4, 10, 50]

With statistical testing (10 seeds per configuration).

In [None]:
# ============================================
# Grid Search (Optional - Uncomment to run)
# ============================================
"""
GAMMA_VALUES = [0.5, 1.0, 2.0, 3.0]
ALPHA_VALUES = [0.25, 0.5, 0.75, 0.9]
SAMPLING_RATIOS_GRID = [4, 10, 50]
SEEDS = list(range(10))  # 10 random seeds for statistical testing

grid_search_results = []

print("="*70)
print("HYPERPARAMETER GRID SEARCH")
print("="*70)
print(f"Configurations to test: {len(GAMMA_VALUES) * len(ALPHA_VALUES) * len(SAMPLING_RATIOS_GRID)}")
print(f"Seeds per configuration: {len(SEEDS)}")
print(f"Total training runs: {len(GAMMA_VALUES) * len(ALPHA_VALUES) * len(SAMPLING_RATIOS_GRID) * len(SEEDS)}")
print("="*70)

for ratio in SAMPLING_RATIOS_GRID:
    print(f"\n{'='*70}")
    print(f"Testing with 1:{ratio} sampling")
    print(f"{'='*70}")
    
    for gamma in GAMMA_VALUES:
        for alpha in ALPHA_VALUES:
            print(f"\n[1:{ratio}] gamma={gamma}, alpha={alpha}")
            
            seed_results = []
            for seed in SEEDS:
                result = train_neumf_focal_loss(
                    config_dict=configs[ratio],
                    dataset=DATASET,
                    gamma=gamma,
                    alpha=alpha,
                    seed=seed,
                    track_dynamics=False
                )
                seed_results.append(result['test_result'])
            
            # Aggregate results
            avg_ndcg10 = np.mean([r.get('ndcg@10', 0) for r in seed_results])
            std_ndcg10 = np.std([r.get('ndcg@10', 0) for r in seed_results])
            avg_hr10 = np.mean([r.get('hit@10', 0) for r in seed_results])
            std_hr10 = np.std([r.get('hit@10', 0) for r in seed_results])
            
            grid_search_results.append({
                'ratio': ratio,
                'gamma': gamma,
                'alpha': alpha,
                'ndcg@10_mean': avg_ndcg10,
                'ndcg@10_std': std_ndcg10,
                'hr@10_mean': avg_hr10,
                'hr@10_std': std_hr10
            })
            
            print(f"  NDCG@10: {avg_ndcg10:.4f} +/- {std_ndcg10:.4f}")
            print(f"  HR@10:   {avg_hr10:.4f} +/- {std_hr10:.4f}")

# Display grid search results
grid_df = pd.DataFrame(grid_search_results)

print("\n" + "="*70)
print("GRID SEARCH RESULTS:")
print("="*70)
print(grid_df.to_string(index=False))

# Find best configuration for each sampling ratio
print("\n" + "="*70)
print("BEST CONFIGURATIONS BY SAMPLING RATIO:")
print("="*70)

for ratio in SAMPLING_RATIOS_GRID:
    ratio_df = grid_df[grid_df['ratio'] == ratio]
    best_idx = ratio_df['ndcg@10_mean'].idxmax()
    best_config = grid_df.loc[best_idx]
    
    print(f"\n1:{ratio} sampling:")
    print(f"  Best gamma: {best_config['gamma']}")
    print(f"  Best alpha: {best_config['alpha']}")
    print(f"  NDCG@10: {best_config['ndcg@10_mean']:.4f} +/- {best_config['ndcg@10_std']:.4f}")
    print(f"  HR@10:   {best_config['hr@10_mean']:.4f} +/- {best_config['hr@10_std']:.4f}")
"""

print("Grid search code available but commented out.")
print("Uncomment the code above to run full hyperparameter search.")