# Movie Recommender System - Collaborative Filtering with LightFM

## AI Task: Personalized Top-K Recommendation

Build a **collaborative filtering** recommender using **matrix factorization** to predict personalized movie recommendations. The system learns latent user preferences and movie characteristics from implicit feedback signals (watch duration, completion rate) to rank unwatched movies and generate top-k recommendations.

**Task Type**: Implicit collaborative filtering with top-k ranking optimization

## Problem Context

Users don't provide explicit ratings - we only observe behavioral signals:
- Watch duration (minutes watched per session)
- Completion percentage (how much of the movie was watched)
- Multiple viewing sessions (rewatches indicate strong interest)

**Challenge**: Convert sparse behavioral data into reliable preference signals for recommendation.

## Approach Overview

### 1. Data Understanding
- **Input**: User watch history (105K events), movie metadata (1K movies), user profiles (10K users)
- **Challenge**: Sparse interaction matrix (~1% density), no explicit ratings
- **Preprocessing**: Aggregate sessions, filter cold-start users/items, normalize ranges

### 2. Implicit Feedback Signal Engineering
Construct a **strength score** from behavioral data:
```
strength = log(1 + total_minutes_watched) * (0.3 + 0.7 * completion_rate)
```
**Rationale**:
- **Log transform**: Diminishing returns for very long watch times (prevents outliers)
- **Completion rate**: 70% weight (finishing indicates genuine interest vs. abandoning)
- **Watch duration**: 30% weight (accounts for rewatches and movie length)

### 3. Algorithm: LightFM with BPR Loss

**Method**: Hybrid matrix factorization with gradient descent optimization

**BPR Loss (Bayesian Personalized Ranking)**:
- Pairwise ranking optimization: learns to rank watched items higher than unwatched
- Samples positive (watched) and negative (unwatched) pairs per user
- Robust to sparse implicit data with stable convergence

**Model Architecture**:
- User embeddings: [num_users × num_components] learned latent factors
- Item embeddings: [num_items × num_components] learned latent factors  
- Prediction: dot product of user/item embeddings + biases
- Training: Stochastic gradient descent with negative sampling

### 4. Evaluation Strategy
- **Split**: 80/20 random holdout (train on 80%, test on 20%)
- **Metrics**: 
  - **Precision@10**: Fraction of top-10 that are relevant (accuracy-based)
  - **Recall@10**: Fraction of user's test items recovered in top-10 (coverage)
  - **nDCG@10**: Ranking quality with position weighting (ranking-based)
  - **AUC**: Overall ranking quality (1.0 = perfect, 0.5 = random)

### 5. Expected Performance
Targets for implicit feedback with ~1% matrix density:
- **Test Precision@10**: 0.2-2% (baseline: random ~0.1%)
- **Test Recall@10**: 1-5%
- **Test nDCG@10**: 0.01-0.05
- **Test AUC**: 0.50-0.60 (baseline: random = 0.5)

In [None]:
!pip install lightfm

## Import Libraries

Core dependencies for the collaborative filtering recommender system:
- **pandas/numpy**: Data manipulation and numerical operations
- **scipy.sparse**: Memory-efficient sparse matrix storage (CSR format for ~1% density data)
- **LightFM**: Hybrid recommender with BPR loss for implicit feedback
- **matplotlib**: Visualization utilities

In [None]:
# Data processing
import pandas as pd
import numpy as np
import pickle
import warnings
from pathlib import Path

# Sparse matrix operations
from scipy.sparse import csr_matrix

# LightFM recommender system
from lightfm import LightFM
from lightfm.cross_validation import random_train_test_split
from lightfm.evaluation import precision_at_k, recall_at_k, auc_score

# Visualization
import matplotlib.pyplot as plt

# Suppress pandas RuntimeWarning for NaN display issues
warnings.filterwarnings('ignore', category=RuntimeWarning)

print("All libraries imported successfully!")

## Configuration

### Data Preprocessing Parameters
- **MIN_USER_INTERACTIONS**: Filter users with fewer than 3 movies watched (cold-start prevention)
- **MIN_ITEM_INTERACTIONS**: Filter movies with fewer than 5 watchers (rare item removal)
- **COMPLETION_WEIGHT**: Emphasis on completion vs. watch time (0.7 = 70% weight on finishing movies)

### Model Hyperparameters
- **NO_COMPONENTS**: Dimensionality of user/movie embeddings (latent factors)
- **LEARNING_RATE**: SGD step size for gradient descent optimization
- **EPOCHS**: Number of training iterations through the data
- **LOSS**: Loss function ('bpr' for Bayesian Personalized Ranking)

### Evaluation Parameters
- **K**: Top-k for Precision@k, Recall@k, and nDCG@k metrics
- **TEST_PERCENTAGE**: Fraction of interactions held out for testing (0.2 = 20%)

In [None]:
# ============================================================================
# CONFIGURATION
# ============================================================================

# Data paths (local raw_data folder)
DATA_DIR = Path('./raw_data')

# Data filtering thresholds
MIN_USER_INTERACTIONS = 3  # Remove users with fewer than 3 movies watched
MIN_ITEM_INTERACTIONS = 5  # Remove movies with fewer than 5 watchers
COMPLETION_WEIGHT = 0.7    # Weight for completion rate in strength score (0-1)

# LightFM model hyperparameters
NO_COMPONENTS = 50     # Latent factor dimensions (appropriate for sparse data)
LEARNING_RATE = 0.01   # Conservative learning rate for stable convergence
EPOCHS = 10            # Training iterations
LOSS = 'bpr'           # BPR loss function (Bayesian Personalized Ranking)

# Evaluation parameters
K = 10                 # Top-k for Precision@k, Recall@k, nDCG@k
TEST_PERCENTAGE = 0.2  # Fraction of data held out for testing

print("Configuration set!")
print(f"  Model: {NO_COMPONENTS} factors, {EPOCHS} epochs, {LOSS.upper()} loss, LR={LEARNING_RATE}")
print(f"  Filter: >={MIN_USER_INTERACTIONS} user interactions, >={MIN_ITEM_INTERACTIONS} item interactions")

## Load & Validate Data

Load raw CSV files and perform data quality checks:
1. Remove rows with NaN or infinite values
2. Validate numeric ranges (watch duration ≥ 0, progress 0-100%)
3. Convert progress_percentage (0-100 scale) to decimal (0-1 scale) for calculations

**Expected columns**:
- `users.csv`: user_id, subscription_type, demographics
- `movies.csv`: movie_id, title, genre_primary, genre_secondary, duration_minutes
- `watch_history.csv`: user_id, movie_id, watch_duration_minutes, progress_percentage, session_id

In [None]:
# ============================================================================
# DATA LOADING & VALIDATION
# ============================================================================

print('Loading data...')
# Load raw CSV files
users = pd.read_csv(DATA_DIR / 'users.csv')
movies = pd.read_csv(DATA_DIR / 'movies.csv')
watch_history = pd.read_csv(DATA_DIR / 'watch_history.csv')

print(f'Loaded {len(users):,} users, {len(movies):,} movies, {len(watch_history):,} watch events')

# Data validation and cleaning
print('\nValidating data quality...')
initial_count = len(watch_history)

# Replace infinite values with NaN for consistent handling
watch_history = watch_history.replace([np.inf, -np.inf], np.nan)

# Identify which columns exist in the dataset
required_cols = ['user_id', 'movie_id']  # Must have
optional_cols = ['watch_duration_minutes', 'progress_percentage', 'session_id']
existing_cols = [col for col in required_cols + optional_cols if col in watch_history.columns]

# Drop rows with NaN in critical columns
watch_history = watch_history.dropna(subset=existing_cols)

# Validate numeric ranges
if 'watch_duration_minutes' in watch_history.columns:
    # Watch duration must be non-negative
    watch_history = watch_history[watch_history['watch_duration_minutes'] >= 0]

if 'progress_percentage' in watch_history.columns:
    # Convert progress from percentage (0-100) to decimal (0-1)
    watch_history['progress_decimal'] = watch_history['progress_percentage'] / 100
    
    # Filter valid range [0, 100]%
    watch_history = watch_history[
        (watch_history['progress_percentage'] >= 0) & 
        (watch_history['progress_percentage'] <= 100)
    ]

# Report cleaning results
removed_count = initial_count - len(watch_history)
if removed_count > 0:
    print(f'Removed {removed_count:,} invalid rows ({removed_count/initial_count*100:.2f}%)')
else:
    print('All rows valid!')

print(f'Final dataset: {len(watch_history):,} valid watch events')

## Aggregate Watch Sessions

**Problem**: Users may watch the same movie multiple times across different sessions.

**Solution**: Aggregate all sessions per user-movie pair into a single **strength score**.

**Strength Formula**:
```
strength = log(1 + total_minutes) * (0.3 + 0.7 * completion_rate)
```

**Rationale**:
- **log(1 + total_minutes)**: Diminishing returns for very long watch times (prevents outliers from dominating)
- **completion_rate**: Mean of progress_percentage across all sessions (0-1 scale)
- **0.7 weight on completion**: Watching to the end signals strong interest vs. abandoning early
- **0.3 weight on duration**: Still accounts for rewatching behavior and partial views

**Example**: User watches "Movie A" twice - 30 min (50% complete) and 60 min (100% complete)
- total_minutes = 90, completion_mean = 0.75
- strength = log(91) * (0.3 + 0.7 * 0.75) = 4.51 * 0.825 = 3.72

In [None]:
# ============================================================================
# AGGREGATE WATCH SESSIONS
# ============================================================================

print('\nAggregating watch sessions per user-movie pair...')

# Group by (user_id, movie_id) and aggregate metrics
aggregated = watch_history.groupby(['user_id', 'movie_id']).agg({
    'watch_duration_minutes': 'sum',      # Total time spent watching
    'progress_decimal': 'mean',           # Average completion rate (0-1)
    'session_id': 'count'                 # Number of viewing sessions
}).reset_index()

# Rename columns for clarity
aggregated.columns = ['user_id', 'movie_id', 'total_minutes', 'completion_mean', 'num_sessions']

# Calculate implicit feedback strength score
# Formula: log(1 + minutes) * (base_weight + completion_weight * completion_rate)
# - log transform: diminishing returns for very long watch times
# - completion_mean: 0-1 scale (0% to 100% watched)
# - COMPLETION_WEIGHT: emphasis on finishing vs. just starting
aggregated['strength'] = (
    np.log1p(aggregated['total_minutes']) *           # log(1 + x) for numerical stability
    (0.3 + COMPLETION_WEIGHT * aggregated['completion_mean'])  # 30% base + 70% completion
)

print(f'Aggregated to {len(aggregated):,} unique user-movie interactions')
print(f'  Strength range: [{aggregated["strength"].min():.2f}, {aggregated["strength"].max():.2f}]')
print(f'  Avg sessions per interaction: {aggregated["num_sessions"].mean():.2f}')

## Filter Sparse Data

**Cold Start Problem**: Users with very few interactions and rarely-watched movies hurt model quality.

**Solution**: 
- Remove users with fewer than 3 movies watched (insufficient data to learn preferences)
- Remove movies with fewer than 5 watchers (too rare to recommend reliably)

This filtering improves matrix density and model generalization at the cost of coverage.

In [None]:
# ============================================================================
# FILTER SPARSE INTERACTIONS
# ============================================================================

print('\nFiltering sparse interactions...')

# Count how many interactions each user/movie has
user_counts = aggregated.groupby('user_id').size()   # Movies per user
item_counts = aggregated.groupby('movie_id').size()  # Users per movie

# Keep only users/movies that meet minimum thresholds
valid_users = user_counts[user_counts >= MIN_USER_INTERACTIONS].index
valid_items = item_counts[item_counts >= MIN_ITEM_INTERACTIONS].index

# Filter the interaction data
filtered = aggregated[
    aggregated['user_id'].isin(valid_users) &
    aggregated['movie_id'].isin(valid_items)
].copy()  # .copy() to avoid SettingWithCopyWarning

# Report filtering results
print(f'Filtered: {len(aggregated):,} -> {len(filtered):,} interactions')
print(f'  Users: {len(aggregated["user_id"].unique()):,} -> {len(filtered["user_id"].unique()):,}')
print(f'  Movies: {len(aggregated["movie_id"].unique()):,} -> {len(filtered["movie_id"].unique()):,}')

# Calculate sparsity metrics
total_possible = len(filtered["user_id"].unique()) * len(filtered["movie_id"].unique())
density_pct = (len(filtered) / total_possible) * 100
print(f'  Matrix density: {density_pct:.4f}% (sparsity: {100-density_pct:.4f}%)')

## Build Interaction Matrix

Convert user-movie interactions into a **sparse CSR (Compressed Sparse Row) matrix**.

**Why sparse?** With ~1% density (99% of cells are zero), storing a dense matrix would waste 99% of memory.

**Matrix structure**:
- Rows = users (index 0 to num_users-1)
- Columns = movies (index 0 to num_items-1)
- Values = strength scores (implicit feedback signal)

**Mappings**: Create bidirectional dictionaries to convert between:
- Original IDs (e.g., 'user_00123') ↔ Matrix indices (0, 1, 2, ...)

In [None]:
# ============================================================================
# BUILD SPARSE INTERACTION MATRIX
# ============================================================================

print('\nBuilding sparse interaction matrix...')

# Get unique user and movie IDs, sorted for reproducibility
unique_users = sorted(filtered['user_id'].unique())
unique_items = sorted(filtered['movie_id'].unique())

# Create bidirectional mappings: ID <-> index
user_to_idx = {user: idx for idx, user in enumerate(unique_users)}
item_to_idx = {item: idx for idx, item in enumerate(unique_items)}
idx_to_user = {idx: user for user, idx in user_to_idx.items()}  # Reverse lookup
idx_to_item = {idx: item for item, idx in item_to_idx.items()}  # Reverse lookup

# Map string IDs to integer indices for matrix construction
filtered['user_idx'] = filtered['user_id'].map(user_to_idx)
filtered['item_idx'] = filtered['movie_id'].map(item_to_idx)

# Matrix dimensions
num_users = len(unique_users)
num_items = len(unique_items)

# Build sparse CSR matrix
# CSR (Compressed Sparse Row) is efficient for:
# - Row slicing (get all items for a user)
# - Matrix-vector multiplication (used in recommendations)
interaction_matrix = csr_matrix(
    (filtered['strength'].values,                           # Data: strength scores
     (filtered['user_idx'].values, filtered['item_idx'].values)),  # (row, col) indices
    shape=(num_users, num_items)                            # Matrix dimensions
)

# Calculate density metrics
density = (interaction_matrix.nnz / (num_users * num_items)) * 100

print(f'Matrix: {num_users:,} users x {num_items:,} items')
print(f'  Non-zero entries: {interaction_matrix.nnz:,}')
print(f'  Density: {density:.4f}% (sparse)')
print(f'  Memory saved vs. dense: {(1 - density/100) * 100:.2f}%')

## Train/Test Split

Split interactions into training (80%) and testing (20%) sets for evaluation.

**Method**: Random holdout - each interaction has 20% probability of being assigned to test set.

**Why random vs. temporal?** 
- **Random**: Evaluates general prediction ability across all users/items
- **Temporal** (not used): Would test predicting future interactions from past behavior

The random split provides unbiased evaluation of the model's collaborative filtering capability.

In [None]:
# ============================================================================
# TRAIN/TEST SPLIT
# ============================================================================

print('\nSplitting train/test sets...')

# Randomly split interactions: 80% train, 20% test
# random_state=42 for reproducibility
train, test = random_train_test_split(
    interaction_matrix,
    test_percentage=TEST_PERCENTAGE,  # 0.2 = 20%
    random_state=42                   # Fixed seed for reproducible results
)

print(f'Train: {train.nnz:,} interactions ({(train.nnz/interaction_matrix.nnz)*100:.1f}%)')
print(f'Test:  {test.nnz:,} interactions ({(test.nnz/interaction_matrix.nnz)*100:.1f}%)')
print(f'  Note: Model trained on train set, evaluated on held-out test set')

## Train LightFM Model

**BPR Loss (Bayesian Personalized Ranking)**:
- Pairwise ranking optimization: learns to rank watched items higher than unwatched items
- Samples positive (watched) and negative (unwatched) pairs per user during training
- Optimizes ranking quality while preventing overfitting on sparse implicit data

**Training Process**:
- SGD (Stochastic Gradient Descent) with specified learning rate
- Each epoch = one complete pass through all training interactions
- Progress tracked every 5 epochs to monitor convergence and detect overfitting

In [None]:
# ============================================================================
# TRAIN LIGHTFM MODEL
# ============================================================================

print(f'\nTraining LightFM with {LOSS.upper()} loss...')
print(f'Hyperparameters: {NO_COMPONENTS} factors, {EPOCHS} epochs, lr={LEARNING_RATE}')

# Initialize LightFM model
model = LightFM(
    loss=LOSS,                    # Loss function ('bpr' or 'warp')
    no_components=NO_COMPONENTS,  # Embedding dimensionality (latent factors)
    learning_rate=LEARNING_RATE,  # SGD step size
    random_state=42               # Reproducible initialization
)

# Train with progress monitoring
print('\nTraining progress:')
for epoch in range(EPOCHS):
    # Train for 1 epoch (fit_partial allows incremental training)
    model.fit_partial(train, epochs=1)
    
    # Evaluate every 5 epochs to monitor convergence
    if (epoch + 1) % 5 == 0 or epoch == 0:
        train_p = precision_at_k(model, train, k=K).mean()  # Train performance
        test_p = precision_at_k(model, test, k=K).mean()    # Test performance
        
        gap = train_p - test_p
        print(f'  Epoch {epoch+1:2d}: Train P@{K}={train_p:.4f}, Test P@{K}={test_p:.4f} (gap: {gap:.4f})')

print('\nTraining complete!')

## Evaluate Model

**Metrics Explained**:

1. **Precision@10**: Of the 10 recommended items, what fraction are relevant (appear in test set)?
   - Higher = fewer irrelevant recommendations
   - Measures accuracy of top-k predictions

2. **Recall@10**: Of all relevant items (in test set), what fraction appear in top-10 recommendations?
   - Higher = better coverage of user's interests
   - Measures completeness of top-k predictions

3. **nDCG@10 (Normalized Discounted Cumulative Gain)**: Ranking quality with position weighting
   - Rewards items ranked higher (position 1 > position 10)
   - Range: 0.0 (worst) to 1.0 (perfect ranking)
   - Industry standard metric for top-k recommendation evaluation

4. **AUC (Area Under ROC Curve)**: Overall ranking quality across all items
   - 0.5 = random guessing (no signal)
   - 1.0 = perfect ranking (all relevant items ranked higher than irrelevant)
   - Measures if model consistently ranks relevant items above irrelevant ones

In [None]:
# ============================================================================
# EVALUATE MODEL
# ============================================================================

print('\n' + '='*70)
print('EVALUATION RESULTS')
print('='*70)

# Calculate all metrics on train and test sets
train_precision = precision_at_k(model, train, k=K).mean()
test_precision = precision_at_k(model, test, k=K).mean()

train_recall = recall_at_k(model, train, k=K).mean()
test_recall = recall_at_k(model, test, k=K).mean()

train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test).mean()

# Calculate nDCG@k (Normalized Discounted Cumulative Gain)
def ndcg_at_k(model, interactions, k=10):
    """
    Calculate Normalized Discounted Cumulative Gain at k.
    Rewards items ranked higher in the top-k list.
    
    Args:
        model: Trained LightFM model
        interactions: Sparse interaction matrix (train or test)
        k: Number of recommendations to consider
    
    Returns:
        Mean nDCG@k across all users
    """
    # Convert to CSR format for efficient row indexing
    interactions = interactions.tocsr()
    n_users, n_items = interactions.shape
    ndcg_scores = []
    
    for user_id in range(n_users):
        # Get actual relevant items for this user
        actual_items = interactions[user_id].indices
        
        if len(actual_items) == 0:
            continue  # Skip users with no interactions
        
        # Get predictions for all items
        item_indices = np.arange(n_items)
        user_indices = np.full(n_items, user_id)
        scores = model.predict(user_indices, item_indices)
        
        # Get top-k predicted items
        top_k_items = np.argsort(-scores)[:k]
        
        # Calculate DCG@k (Discounted Cumulative Gain)
        dcg = 0.0
        for i, item in enumerate(top_k_items):
            if item in actual_items:
                # Relevance = 1 if item is relevant, 0 otherwise
                # Position discount: 1/log2(position + 1)
                dcg += 1.0 / np.log2(i + 2)  # i+2 because position is 1-indexed
        
        # Calculate IDCG@k (Ideal DCG - best possible ranking)
        ideal_k = min(len(actual_items), k)
        idcg = sum(1.0 / np.log2(i + 2) for i in range(ideal_k))
        
        # nDCG = DCG / IDCG (normalize to 0-1 range)
        if idcg > 0:
            ndcg_scores.append(dcg / idcg)
    
    return np.mean(ndcg_scores) if ndcg_scores else 0.0

train_ndcg = ndcg_at_k(model, train, k=K)
test_ndcg = ndcg_at_k(model, test, k=K)

# Display results
print(f'\nPrecision@{K}: (fraction of top-{K} that are relevant)')
print(f'  Train: {train_precision:.4f} ({train_precision*100:.2f}%)')
print(f'  Test:  {test_precision:.4f} ({test_precision*100:.2f}%)')

print(f'\nRecall@{K}: (fraction of relevant items in top-{K})')
print(f'  Train: {train_recall:.4f} ({train_recall*100:.2f}%)')
print(f'  Test:  {test_recall:.4f} ({test_recall*100:.2f}%)')

print(f'\nnDCG@{K}: (ranking quality with position weighting, 1.0=perfect)')
print(f'  Train: {train_ndcg:.4f}')
print(f'  Test:  {test_ndcg:.4f}')

print(f'\nAUC Score: (ranking quality, 0.5=random, 1.0=perfect)')
print(f'  Train: {train_auc:.4f}')
print(f'  Test:  {test_auc:.4f}')

# Calculate train/test gaps
precision_gap = train_precision - test_precision
recall_gap = train_recall - test_recall
ndcg_gap = train_ndcg - test_ndcg
auc_gap = train_auc - test_auc

print(f'\nTrain/Test Gaps:')
print(f'  Precision@{K} gap: {precision_gap:.4f}')
print(f'  Recall@{K} gap: {recall_gap:.4f}')
print(f'  nDCG@{K} gap: {ndcg_gap:.4f}')
print(f'  AUC gap: {auc_gap:.4f}')

print('\n' + '='*70)

## Generate Sample Recommendations

Generate top-10 movie recommendations for 3 randomly selected users.

**Recommendation Process**:
1. Predict scores for all movies using the trained model
2. Exclude movies the user has already watched (collaborative filtering suggests *new* content)
3. Rank remaining movies by predicted score (descending)
4. Return top-k items with metadata (title, genres, scores)

**Note**: These are sample outputs for demonstration purposes. Actual recommendation quality is measured by the test metrics above.

In [None]:
# ============================================================================
# GENERATE SAMPLE RECOMMENDATIONS
# ============================================================================

print('\nGenerating sample recommendations...')

def get_recommendations(user_idx, model, interaction_matrix, idx_to_item, movies_df, k=10):
    '''
    Generate top-k movie recommendations for a user.
    
    Args:
        user_idx: Integer index of user (not original user_id)
        model: Trained LightFM model
        interaction_matrix: Sparse matrix of user-item interactions
        idx_to_item: Dictionary mapping matrix indices to movie IDs
        movies_df: DataFrame with movie metadata
        k: Number of recommendations to return
    
    Returns:
        List of dicts with movie_id, title, genres, score
    '''
    n_items = interaction_matrix.shape[1]
    
    # Predict scores for all items
    # LightFM.predict() requires parallel arrays of user indices and item indices
    item_indices = np.arange(n_items)              # [0, 1, 2, ..., n_items-1]
    user_indices = np.full(n_items, user_idx)      # [user_idx, user_idx, ..., user_idx]
    scores = model.predict(user_indices, item_indices)
    
    # Exclude items user has already watched
    known_items = interaction_matrix[user_idx].indices  # Get non-zero columns for this user
    scores[known_items] = -np.inf                       # Mask with -infinity to exclude
    
    # Get top-k items by score
    top_indices = np.argsort(-scores)[:k]  # Sort descending, take first k
    top_scores = scores[top_indices]
    
    # Map back to movie IDs and fetch metadata
    recommendations = []
    for idx, score in zip(top_indices, top_scores):
        movie_id = idx_to_item[idx]
        movie_info = movies_df[movies_df['movie_id'] == movie_id].iloc[0]
        
        # Combine primary and secondary genres
        genres = movie_info['genre_primary']
        if pd.notna(movie_info.get('genre_secondary')):
            genres += f", {movie_info['genre_secondary']}"
        
        recommendations.append({
            'movie_id': movie_id,
            'title': movie_info['title'],
            'genres': genres,
            'score': score
        })
    
    return recommendations

# Select 3 random users for demonstration
sample_user_indices = np.random.choice(num_users, size=3, replace=False)

for i, user_idx in enumerate(sample_user_indices, 1):
    user_id = idx_to_user[user_idx]  # Convert index to original ID
    
    print('\n' + '='*70)
    print(f'EXAMPLE {i}: User {user_id}')
    print('='*70)
    
    # Show user's watch history (top 5 by strength)
    user_history = filtered[filtered['user_id'] == user_id].nlargest(5, 'strength')
    
    print('\nWatch History (Top 5):')
    for _, row in user_history.iterrows():
        movie_info = movies[movies['movie_id'] == row['movie_id']].iloc[0]
        genres = movie_info['genre_primary']
        if pd.notna(movie_info.get('genre_secondary')):
            genres += f", {movie_info['genre_secondary']}"
        print(f'  - {movie_info["title"]} ({genres}) - strength: {row["strength"]:.2f}')
    
    # Generate recommendations
    recs = get_recommendations(user_idx, model, interaction_matrix, idx_to_item, movies, k=K)
    
    print(f'\nTop-{K} Recommendations:')
    for j, rec in enumerate(recs, 1):
        print(f'  {j:2d}. {rec["title"]} ({rec["genres"]}) - score: {rec["score"]:.3f}')