# E-commerce Recommendations - Interview Guide

## Overview

This notebook demonstrates key recommendation system concepts for interview preparation.

**Learning Objectives:**
1. Understand collaborative filtering
2. Understand content-based filtering
3. Build a hybrid recommender
4. Evaluate recommendations
5. Discuss trade-offs

**Interview Focus:** Concepts > Production Code

In [None]:
import sys
sys.path.append('../src')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from collaborative_filter import CollaborativeFilter, create_interaction_matrix
from content_recommender import ContentBasedRecommender
from hybrid_recommender import HybridRecommender

## 1. Load and Explore Data

**Interview Point:** Always start by understanding the data!

In [None]:
# Load data
products = pd.read_csv('../data/products.csv')
interactions = pd.read_csv('../data/interactions.csv')

print(f"Products: {len(products)}")
print(f"Interactions: {len(interactions)}")
print(f"Users: {interactions['user_id'].nunique()}")
print(f"Sparsity: {1 - len(interactions) / (interactions['user_id'].nunique() * len(products)):.2%}")

# Display samples
display(products.head())
display(interactions.head())

##  2. Collaborative Filtering

**Key Concept:** Find patterns in user-item interactions

**Interview Questions:**
- How does matrix factorization work?
- What's the cold-start problem?
- How to handle implicit feedback?

In [None]:
# Convert to confidence scores
interaction_matrix = create_interaction_matrix(interactions)

# Train collaborative filter
cf = CollaborativeFilter(n_factors=20)
cf.fit(interaction_matrix)

# Get recommendations
test_user = interactions['user_id'].iloc[0]
recs = cf.recommend(test_user, n_recommendations=5)
print(f"\nRecommendations for {test_user}:")
for product_id, score in recs:
    product_name = products[products['product_id'] == product_id]['name'].values[0]
    print(f"  {product_id}: {product_name} (score: {score:.3f})")

## 3. Content-Based Filtering

**Key Concept:** Recommend items similar to what user liked

**Interview Questions:**
- When is content-based better than collaborative?
- What features to use?
- Why cosine similarity?

In [None]:
# Train content-based recommender
cbr = ContentBasedRecommender(max_features=50)
cbr.fit(products)

# Find similar items
test_product = products['product_id'].iloc[0]
similar = cbr.get_similar_items(test_product, n_similar=5)

print(f"\nItems similar to {test_product}:")
for product_id, similarity in similar:
    product_name = products[products['product_id'] == product_id]['name'].values[0]
    print(f"  {product_id}: {product_name} (similarity: {similarity:.3f})")

## 4. Hybrid Recommender

**Key Concept:** Combine collaborative + content + popularity

**Interview Point:** Explain WHY hybrid solves weaknesses of individual approaches

In [None]:
# Create hybrid recommender
hybrid = HybridRecommender(cf, cbr)
hybrid.set_popular_items(interactions)

# Get hybrid recommendations
user_history = interactions[interactions['user_id'] == test_user]['product_id'].tolist()

hybrid_recs = hybrid.recommend(
    user_id=test_user,
    user_history=user_history,
    n_recommendations=5,
    context='homepage'
)

print(f"\nHybrid recommendations for {test_user}:")
for rec in hybrid_recs:
    print(f"  {rec['product_id']}: {rec['score']:.3f} - {rec['reason']}")

## 5. Evaluation

**Interview Discussion:**
- Offline vs. online metrics
- Precision@K, Recall@K, NDCG@K
- A/B testing

In [None]:
def precision_at_k(recommended, relevant, k):
    """Calculate Precision@K"""
    recommended_k = recommended[:k]
    return len(set(recommended_k) & set(relevant)) / k

# Example evaluation
recommended = [r['product_id'] for r in hybrid_recs]
relevant = user_history[-5:]  # Last 5 items (holdout)

p_at_5 = precision_at_k(recommended, relevant, 5)
print(f"Precision@5: {p_at_5:.2%}")

## Key Takeaways for Interviews

1. **Collaborative Filtering**:
   - Uses user-item interactions
   - Cold-start problem for new users/items
   - Matrix factorization is efficient

2. **Content-Based**:
   - Uses item features
   - Works for new items (advantage!)
   - Can create filter bubbles

3. **Hybrid**:
   - Best of both worlds
   - Context-aware weighting
   - Handles all cold-start cases

4. **Trade-offs**:
   - Accuracy vs. Diversity
   - Latency vs. Freshness  
   - Complexity vs. Maintainability

5. **Evaluation**:
   - Offline metrics (precision@K, NDCG)
   - Online metrics (CTR, conversion)
   - A/B testing in production

**Remember:** No single "right" answer - explain trade-offs!