# Assignment 2: Production Inference for Recipe Recommendation System

## Project Overview
This notebook demonstrates the inference pipeline for my PantryPal recipe recommendation system. After training our Learning-to-Rank model, we now show how to deploy it for real-time recommendations in production.

## Business Application
PantryPal users expect personalized recipe recommendations based on their interaction history. This notebook shows how we:
- Load a trained model from disk
- Fetch a user's interaction history from our app analytics
- Generate personalized top-N recipe recommendations
- Validate the recommendations for quality and relevance

## Technical Implementation
This notebook covers the complete inference workflow:
- Environment setup (Colab-compatible)
- Model and metadata loading from training artifacts
- User interaction retrieval with `UserInteractionFetcher`
- Real-time recipe scoring and ranking with `RecipeScorer`
- System validation and performance checks


In [1]:
# Environment Setup
# This cell configures the environment for both local development and Google Colab
import sys, subprocess, os, pathlib

IN_COLAB = "google.colab" in sys.modules
repo_root = pathlib.Path.cwd()

# Install required packages for Colab environments
if IN_COLAB:
    try:
        subprocess.run([sys.executable, "-m", "pip", "install", "-q",
                        "lightgbm", "pandas", "numpy", "scikit-learn"],
                       check=False)
    except Exception as e:
        print(f"pip install warning: {e}")

    # Clone the PantryPal ML repository if not already present
    if not (repo_root / "recipe_recommender").exists():
        subprocess.run(["git", "-c", "advice.detachedHead=false", "clone", "-q", "https://github.com/marcel-qayoom-taylor/PantryPalML.git"], check=True)
        os.chdir("PantryPalML")
        repo_root = pathlib.Path.cwd()

print(f"Environment ready. Project root: {repo_root}")


Environment ready. Project root: /Users/marcelqayoomtaylor/Documents/GitHub/PantryPalML/notebooks


In [2]:
# Import Required Components for Inference Pipeline

# Configuration management - same config used during training
from recipe_recommender.config import get_ml_config

# User interaction fetcher - retrieves user behavior from analytics events
from recipe_recommender.etl.fetch_user_interactions import UserInteractionFetcher

# Recipe scorer - loads trained model and generates recommendations
from recipe_recommender.inference.recipe_scorer import RecipeScorer

# Load configuration (ensures consistency with training pipeline)
config = get_ml_config()
print("Inference Pipeline Configuration:")
print("Model directory:", config.model_dir)
print("Data output directory:", config.output_dir)


Inference Pipeline Configuration:
Model directory: /Users/marcelqayoomtaylor/Documents/GitHub/PantryPalML/recipe_recommender/output/hybrid_models
Data output directory: /Users/marcelqayoomtaylor/Documents/GitHub/PantryPalML/recipe_recommender/output


## Real-Time Recommendation Generation

### Objective
Demonstrate how the trained model generates personalized recommendations in a production environment. This involves:
1. **User Context Retrieval**: Fetching historical interactions from app analytics
2. **Model Loading**: Loading the pre-trained LightGBM model and metadata
3. **Feature Engineering**: Building user profile features on-the-fly
4. **Recipe Scoring**: Generating relevance scores for all recipes
5. **Ranking & Selection**: Returning top-N recommendations

### Production Workflow
The `RecipeScorer` class orchestrates this entire pipeline, loading the saved model artifacts from training and applying them to score recipes for any user in real-time.


In [3]:
# Step 1: Initialize the user interaction fetcher
# This component reads user behavior data from our app analytics
fetcher = UserInteractionFetcher(config)

# Step 2: Select a test user with interaction history
# In production, this would be the current user's ID from the API request
sample_user_id = 'afcacbe1-eaba-415f-b03e-14ed682af65e'

# Alternative: Auto-discover an active user from the event data
# This code demonstrates how to find users with substantial interaction history

# import pandas as pd
# try:
#     events_path = config.output_dir / "combined_events.csv"
#     if events_path.exists():
#         df_head = pd.read_csv(events_path, nrows=10000)  # Sample for efficiency
#         # Find users with most recipe interactions
#         candidates = (
#             df_head[df_head["event"].isin(fetcher.recipe_events)]["distinct_id"]
#             .value_counts().head(10)
#         )
#         if len(candidates) > 0:
#             sample_user_id = candidates.index[0]
#             print(f"Auto-selected user with {candidates.iloc[0]} interactions")
# except Exception as e:
#     print("Could not auto-select user:", e)


print("Generating recommendations for user:", sample_user_id)

# Step 3: Retrieve user's interaction history
# This includes all recipe-related events: views, favorites, cooking attempts, etc.
interactions = fetcher.fetch_user_interactions(sample_user_id)

# Step 4: Initialize the recipe scorer and load the trained model
# This loads the LightGBM booster, feature metadata, and recipe catalog
scorer = RecipeScorer(config)

# Step 5: Generate personalized recommendations
# The scorer builds user features, scores all recipes, and returns top-N ranked results
result = scorer.get_user_recipe_recommendations(
    user_id=sample_user_id,
    interaction_history=interactions, 
    n_recommendations=100
)

# Step 6: Display the recommendations
import pandas as pd
recommendations = result.get("recommendations", [])
recs_df = pd.DataFrame(recommendations)

if recs_df.empty:
    print("No recommendations generated. Check user interactions and model artifacts.")
else:
    print(f"Generated {len(recommendations)} personalized recommendations:")
    display(recs_df[["recipe_id", "recipe_name", "score"]].head(100))


2025-09-30 12:36:02,978 - recipe_recommender.etl.fetch_user_interactions - INFO - Initialized UserInteractionFetcher
2025-09-30 12:36:02,980 - recipe_recommender.etl.fetch_user_interactions - INFO -    Events file: /Users/marcelqayoomtaylor/Documents/GitHub/PantryPalML/recipe_recommender/output/combined_events.csv
2025-09-30 12:36:02,980 - recipe_recommender.etl.fetch_user_interactions - INFO -    Tracking 8 event types
2025-09-30 12:36:02,981 - recipe_recommender.etl.fetch_user_interactions - INFO - Fetching interactions for user: afcacbe1-eaba-415f-b03e-14ed682af65e
2025-09-30 12:36:02,981 - recipe_recommender.etl.fetch_user_interactions - INFO - Reading events file in chunks
2025-09-30 12:36:03,081 - recipe_recommender.etl.fetch_user_interactions - INFO -    Processed 50,000 rows, found 0 matches...
2025-09-30 12:36:03,126 - recipe_recommender.etl.fetch_user_interactions - INFO - Found 74 recipe interactions for user afcacbe1-eaba-415f-b03e-14ed682af65e
2025-09-30 12:36:03,127 - rec

Generating recommendations for user: afcacbe1-eaba-415f-b03e-14ed682af65e


2025-09-30 12:36:09,752 - recipe_recommender.inference.recipe_scorer - INFO - Generated scores for 1967 recipes
2025-09-30 12:36:09,752 - recipe_recommender.inference.recipe_scorer - INFO -    Score range: -2.2355 - 2.3711
2025-09-30 12:36:09,753 - recipe_recommender.inference.recipe_scorer - INFO -    No threshold; selecting by top-N
2025-09-30 12:36:09,753 - recipe_recommender.inference.recipe_scorer - INFO -    Returning top 100 recommendations


Generated 100 personalized recommendations:


Unnamed: 0,recipe_id,recipe_name,score
0,1915,Chicken Katsu,2.371124
1,7,Apple Cinnamon French Toast,2.371124
2,275,Chocolate Self Saucing Pudding,2.371124
3,1099,Tiramisu,2.371124
4,1333,Cauliflower Hash Browns,2.028807
...,...,...,...
95,1276,Snickerdoodle Bread,-2.235481
96,1277,Paleo Pumpkin Bread,-2.235481
97,1278,Pumpkin Granola,-2.235481
98,1279,Healthy Blueberry Muffins,-2.235481


In [4]:
# System Validation and Quality Checks
print("=== Production Inference Pipeline Validation ===\n")

validation_errors = []
validation_warnings = []

# Check 1: Model Artifacts Integrity
print("1. Validating Model Artifacts...")
model_file = config.model_dir / "hybrid_lightgbm_model.txt"
metadata_file = config.model_dir / "hybrid_lightgbm_metadata.json"

if not model_file.exists():
    validation_errors.append(f"Critical: Missing trained model file at {model_file}")
else:
    print("   ✓ Trained model file found")

if not metadata_file.exists():
    validation_errors.append(f"Critical: Missing model metadata file at {metadata_file}")
else:
    print("   ✓ Model metadata file found")

# Check 2: Recipe Database Integrity  
print("\n2. Validating Recipe Database...")
recipe_features_file = config.output_dir / "enhanced_recipe_features_from_db.csv"
if not recipe_features_file.exists():
    validation_errors.append(f"Critical: Missing recipe features file at {recipe_features_file}")
else:
    print("   ✓ Recipe features database found")

# Check 3: Recommendation Quality
print("\n3. Validating Recommendation Output...")
recommendations = result.get("recommendations", [])

if len(recommendations) == 0:
    validation_errors.append("Critical: No recommendations generated")
elif len(recommendations) < 5:
    validation_warnings.append(f"Warning: Only {len(recommendations)} recommendations generated (expected 10)")
else:
    print(f"   ✓ Generated {len(recommendations)} recommendations")

# Check 4: Score Distribution Analysis
if recommendations:
    scores = [rec.get('score', 0) for rec in recommendations]
    score_range = max(scores) - min(scores)
    
    if score_range < 0.001:
        validation_warnings.append("Warning: Very low score variance - model may not be discriminating")
    else:
        print(f"   ✓ Score range: {min(scores):.6f} to {max(scores):.6f} (variance: {score_range:.6f})")

# Final Validation Report
print("\n" + "="*50)
if validation_errors:
    print("❌ VALIDATION FAILED")
    print("Critical Issues:")
    for error in validation_errors:
        print(f"   - {error}")
elif validation_warnings:
    print("⚠️  VALIDATION PASSED WITH WARNINGS")
    print("Warnings:")
    for warning in validation_warnings:
        print(f"   - {warning}")
else:
    print("✅ VALIDATION PASSED")
    print("System Status: Ready for Production")
    
    # Display top recommendation as success example
    if recommendations:
        top_rec = recommendations[0]
        print(f"\nExample Output:")
        print(f"Top Recommendation: '{top_rec.get('recipe_name', 'Unknown')}' (score: {top_rec.get('score', 0):.6f})")


=== Production Inference Pipeline Validation ===

1. Validating Model Artifacts...
   ✓ Trained model file found
   ✓ Model metadata file found

2. Validating Recipe Database...
   ✓ Recipe features database found

3. Validating Recommendation Output...
   ✓ Generated 100 recommendations
   ✓ Score range: -2.235481 to 2.371124 (variance: 4.606605)

✅ VALIDATION PASSED
System Status: Ready for Production

Example Output:
Top Recommendation: 'Chicken Katsu' (score: 2.371124)
