# 129: Advanced MLOps - Feature Stores & Real-Time Monitoring

## üéØ Learning Objectives

By the end of this notebook, you will:
- **Understand** feature store architecture and the offline/online serving dichotomy
- **Implement** production-grade feature stores with versioning and lineage tracking
- **Build** real-time feature serving pipelines with <10ms latency requirements
- **Apply** data quality monitoring with schema validation and distribution drift detection
- **Master** model performance monitoring to detect accuracy degradation and concept drift
- **Deploy** comprehensive observability systems for production ML pipelines

## üìö What is a Feature Store?

A **feature store** is a centralized repository for storing, managing, and serving ML features for both training and inference. It solves the **training-serving skew** problem by ensuring features computed during training are identical to features served during inference.

**The Training-Serving Skew Problem:**
- **Training time:** Features computed in batch (Spark/pandas), aggregated over historical data
- **Inference time:** Features must be computed in real-time (<10ms latency) with live data
- **Skew:** Different feature computation logic leads to accuracy degradation in production

**Feature Store Solution:**
- **Single source of truth:** Same feature definitions for training and serving
- **Offline store:** Batch features for training (Parquet, Delta Lake, S3)
- **Online store:** Low-latency features for inference (Redis, DynamoDB, Cassandra)
- **Feature versioning:** Track feature changes, enable reproducibility
- **Point-in-time correctness:** No data leakage from future into past

**Why Feature Stores Matter:**
- ‚úÖ **Eliminate training-serving skew** - Same code for batch and real-time features
- ‚úÖ **Feature reuse** - Share features across teams (customer_lifetime_value used by 5 models)
- ‚úÖ **Faster experimentation** - Pre-computed features ready for model training
- ‚úÖ **Reproducibility** - Feature versioning enables exact training reproduction
- ‚úÖ **Governance** - Track feature lineage, ownership, SLA compliance

## üè≠ Post-Silicon Validation Use Cases

**Feature Store for Wafer Test Data:**
- **Input:** Raw STDF files (10,000 parameters per device), test_time, die coordinates
- **Features:** Aggregated statistics (mean/std/quantiles per wafer), spatial correlations (neighbor yield)
- **Offline:** Compute 30-day rolling statistics for training yield models
- **Online:** Serve real-time features for binning decisions (<5ms latency)
- **Value:** Consistent features between training (batch) and production (real-time binning)

**Real-Time Monitoring for Yield Prediction:**
- **Input:** Live yield predictions (1,000 devices/minute)
- **Monitoring:** Track prediction distribution, accuracy vs actual yield, latency p99
- **Drift detection:** Alert when input features shift (Vdd distribution changes)
- **Concept drift:** Alert when accuracy drops (model degrading, process change)
- **Value:** Catch model degradation within minutes (not weeks)

**Data Quality for Parametric Tests:**
- **Input:** STDF parametric test results (voltage, current, frequency, power)
- **Schema validation:** Ensure test_name, test_value, test_limits present
- **Distribution checks:** Flag when Vdd values outside expected range (1.0-1.4V)
- **Null detection:** Alert on missing critical parameters (would break model)
- **Value:** Prevent bad data from reaching models (garbage in ‚Üí garbage out)

**Feature Lineage for Compliance:**
- **Requirement:** FDA/automotive require traceability of all features used in models
- **Solution:** Track which raw STDF fields ‚Üí derived features ‚Üí model predictions
- **Example:** final_yield ‚Üê neighbor_yield_avg ‚Üê (die_x, die_y, pass_fail) from STDF
- **Value:** Audit trail for regulatory compliance, root cause analysis

## üîÑ Feature Store Architecture

```mermaid
graph TB
    subgraph "Data Sources"
        A1[STDF Files] --> B[Feature Engineering]
        A2[Test Logs] --> B
        A3[Manufacturing DB] --> B
    end
    
    subgraph "Feature Store"
        B --> C1[Offline Store<br/>Parquet/Delta Lake]
        B --> C2[Online Store<br/>Redis/DynamoDB]
        C1 --> D1[Training Pipeline]
        C2 --> D2[Real-Time Inference]
    end
    
    subgraph "ML Lifecycle"
        D1 --> E[Model Training]
        E --> F[Model Registry]
        F --> D2
        D2 --> G[Predictions]
    end
    
    subgraph "Monitoring"
        G --> H1[Performance Monitor]
        G --> H2[Drift Detector]
        B --> H3[Data Quality Check]
        H1 --> I[Alerts]
        H2 --> I
        H3 --> I
    end
    
    style C1 fill:#e1f5ff
    style C2 fill:#ffe1e1
    style H1 fill:#e1ffe1
    style H2 fill:#ffe1f5
    style H3 fill:#fff5e1
```

## üìä Learning Path Context

**Prerequisites:**
- **124: Feature Store Implementation** - Basic feature store concepts
- **127: Model Governance & Compliance** - Lineage tracking, audit trails
- **128: Shadow Mode Deployment** - Safe deployment strategies

**Next Steps:**
- **130: ML Observability & Debugging** - Distributed tracing, model debugging
- **131: Container Orchestration** - Kubernetes for ML, horizontal scaling

---

Let's build production-grade feature stores and monitoring systems! üöÄ

In [None]:
# Setup and imports
import numpy as np
import pandas as pd
import time
from datetime import datetime, timedelta
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import json
import hashlib
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")

print("‚úÖ Setup complete - Ready for feature stores and monitoring!")

## 2. Production-Grade Feature Store Implementation

### üìù What's Happening in This Code?

**Purpose:** Implement enterprise feature store with offline/online serving, versioning, point-in-time correctness, and lineage tracking.

**Key Points:**
- **Offline store:** Parquet-based storage for training (batch feature computation)
- **Online store:** In-memory cache for inference (<5ms latency)
- **Point-in-time joins:** Prevent data leakage (no future features in past training)
- **Feature versioning:** Track feature definition changes over time
- **Lineage tracking:** Record which raw data ‚Üí features ‚Üí models

**Why This Matters:** Training-serving skew causes 10-30% accuracy drop in production. Feature stores ensure identical feature computation for training and inference.

In [None]:
class FeatureStore:
    """
    Production-grade feature store with offline/online serving.
    
    Supports:
    - Offline store: Batch features for training (Parquet/Delta Lake simulation)
    - Online store: Real-time features for inference (in-memory cache)
    - Point-in-time correctness: No data leakage from future
    - Feature versioning: Track definition changes
    - Lineage tracking: Raw data ‚Üí features ‚Üí models
    """
    
    def __init__(self, store_name):
        self.store_name = store_name
        self.offline_store = {}  # {feature_group: DataFrame}
        self.online_store = {}   # {entity_id: {feature_name: value}}
        self.feature_definitions = {}  # {feature_name: definition_metadata}
        self.feature_lineage = defaultdict(list)  # {feature_name: [source_tables]}
        self.feature_versions = defaultdict(list)  # {feature_name: [versions]}
        
    def register_feature_group(self, name, entity_key, features, description, version="v1.0"):
        """
        Register feature group (collection of related features).
        
        Args:
            name: Feature group name (e.g., "wafer_aggregates")
            entity_key: Join key (e.g., "wafer_id", "device_id")
            features: List of feature names
            description: Human-readable description
            version: Feature version
        """
        self.feature_definitions[name] = {
            'entity_key': entity_key,
            'features': features,
            'description': description,
            'version': version,
            'registered_at': datetime.now()
        }
        
        self.feature_versions[name].append({
            'version': version,
            'registered_at': datetime.now(),
            'features': features
        })
        
        print(f"‚úÖ Registered feature group: {name} ({version})")
        print(f"   Entity key: {entity_key}")
        print(f"   Features: {', '.join(features)}")
        
        return name
    
    def write_to_offline_store(self, feature_group_name, df, timestamp_col=None):
        """
        Write features to offline store (training data).
        
        Offline store simulates Parquet/Delta Lake storage.
        Includes timestamp for point-in-time correctness.
        """
        if timestamp_col and timestamp_col not in df.columns:
            # Add timestamp if not present
            df[timestamp_col] = datetime.now()
        
        # Store in offline store (simulating partitioned storage)
        if feature_group_name not in self.offline_store:
            self.offline_store[feature_group_name] = []
        
        self.offline_store[feature_group_name].append(df.copy())
        
        print(f"üì¶ Wrote {len(df)} rows to offline store: {feature_group_name}")
        
        return len(df)
    
    def write_to_online_store(self, feature_group_name, df, entity_key):
        """
        Write features to online store (real-time inference).
        
        Online store is in-memory cache (simulates Redis/DynamoDB).
        Only latest values stored per entity.
        """
        definition = self.feature_definitions.get(feature_group_name)
        if not definition:
            raise ValueError(f"Feature group {feature_group_name} not registered")
        
        features = definition['features']
        
        # Write to online store (entity_id ‚Üí feature dict)
        for _, row in df.iterrows():
            entity_id = row[entity_key]
            
            if entity_id not in self.online_store:
                self.online_store[entity_id] = {}
            
            for feature in features:
                if feature in row:
                    self.online_store[entity_id][feature] = row[feature]
        
        print(f"‚ö° Wrote {len(df)} entities to online store: {feature_group_name}")
        print(f"   Latency target: <5ms per entity lookup")
        
        return len(df)
    
    def get_offline_features(self, feature_group_name, entity_ids=None, start_date=None, end_date=None):
        """
        Retrieve features from offline store (for training).
        
        Supports:
        - Entity filtering (specific wafer IDs, device IDs)
        - Time range filtering (point-in-time correctness)
        - Batch retrieval (1000s of entities)
        """
        if feature_group_name not in self.offline_store:
            return pd.DataFrame()
        
        # Combine all batches
        dfs = self.offline_store[feature_group_name]
        df = pd.concat(dfs, ignore_index=True)
        
        # Filter by entity IDs
        if entity_ids is not None:
            definition = self.feature_definitions[feature_group_name]
            entity_key = definition['entity_key']
            df = df[df[entity_key].isin(entity_ids)]
        
        # Filter by date range (point-in-time correctness)
        if 'timestamp' in df.columns:
            if start_date:
                df = df[df['timestamp'] >= start_date]
            if end_date:
                df = df[df['timestamp'] <= end_date]
        
        return df
    
    def get_online_features(self, entity_ids, feature_names):
        """
        Retrieve features from online store (for inference).
        
        Low-latency lookup (<5ms) for real-time predictions.
        Returns only requested features for specified entities.
        """
        start_time = time.time()
        
        results = []
        for entity_id in entity_ids:
            if entity_id in self.online_store:
                feature_dict = {'entity_id': entity_id}
                for feature_name in feature_names:
                    feature_dict[feature_name] = self.online_store[entity_id].get(feature_name)
                results.append(feature_dict)
        
        latency_ms = (time.time() - start_time) * 1000
        
        result_df = pd.DataFrame(results)
        
        print(f"‚ö° Retrieved {len(results)} entities from online store")
        print(f"   Latency: {latency_ms:.2f}ms ({latency_ms/len(entity_ids):.2f}ms per entity)")
        
        return result_df
    
    def track_lineage(self, feature_name, source_tables, transformation_logic):
        """
        Track feature lineage for governance and debugging.
        
        Records:
        - Source tables/features used
        - Transformation logic applied
        - Created timestamp
        """
        self.feature_lineage[feature_name].append({
            'source_tables': source_tables,
            'transformation': transformation_logic,
            'created_at': datetime.now()
        })
        
        return feature_name
    
    def get_feature_lineage(self, feature_name):
        """Get full lineage for a feature (for audit, debugging)."""
        return self.feature_lineage.get(feature_name, [])
    
    def get_feature_metadata(self, feature_group_name):
        """Get feature group metadata (definition, version, registered time)."""
        return self.feature_definitions.get(feature_group_name)

# Example: Feature store for wafer test data
print("üè™ Feature Store: Wafer Test Features\\n")
print("="*80)

# Initialize feature store
fs = FeatureStore("wafer_test_feature_store")

# Register feature group: wafer-level aggregates
wafer_features = [
    'yield_pct',
    'avg_vdd',
    'std_vdd',
    'avg_test_time_ms',
    'device_count',
    'neighbor_yield_avg'
]

fs.register_feature_group(
    name='wafer_aggregates',
    entity_key='wafer_id',
    features=wafer_features,
    description='Wafer-level aggregate features from STDF test data',
    version='v1.0'
)

print()

# Generate synthetic wafer test data
n_wafers = 100
n_devices_per_wafer = 500

wafer_data = []

for wafer_id in range(1, n_wafers + 1):
    # Simulate wafer-level aggregates
    yield_pct = np.random.uniform(85, 99)
    avg_vdd = np.random.normal(1.2, 0.02)
    std_vdd = np.random.uniform(0.01, 0.05)
    avg_test_time = np.random.normal(100, 10)
    device_count = n_devices_per_wafer
    
    # Simulate spatial correlation (neighbor yield)
    neighbor_yield = yield_pct + np.random.normal(0, 2)
    
    wafer_data.append({
        'wafer_id': f'W{wafer_id:04d}',
        'yield_pct': yield_pct,
        'avg_vdd': avg_vdd,
        'std_vdd': std_vdd,
        'avg_test_time_ms': avg_test_time,
        'device_count': device_count,
        'neighbor_yield_avg': neighbor_yield,
        'timestamp': datetime.now() - timedelta(days=np.random.randint(0, 30))
    })

wafer_df = pd.DataFrame(wafer_data)

print(f"üìä Generated {len(wafer_df)} wafer records")
print(f"\\nSample data:")
print(wafer_df.head(3))
print()

# Write to offline store (for training)
print("="*80)
print("OFFLINE STORE (Training Data)")
print("="*80)

fs.write_to_offline_store(
    feature_group_name='wafer_aggregates',
    df=wafer_df,
    timestamp_col='timestamp'
)

print()

# Write to online store (for real-time inference)
print("="*80)
print("ONLINE STORE (Real-Time Inference)")
print("="*80)

# Latest 20 wafers to online store
latest_wafers = wafer_df.nlargest(20, 'timestamp')

fs.write_to_online_store(
    feature_group_name='wafer_aggregates',
    df=latest_wafers,
    entity_key='wafer_id'
)

print()

# Track feature lineage
fs.track_lineage(
    feature_name='neighbor_yield_avg',
    source_tables=['stdf_wafer_test.parametric_results'],
    transformation_logic='AVG(yield_pct) WHERE die_x IN (x-1, x, x+1) AND die_y IN (y-1, y, y+1)'
)

print("="*80)
print("FEATURE RETRIEVAL - OFFLINE (Training)")
print("="*80)

# Retrieve offline features for training (past 15 days)
training_features = fs.get_offline_features(
    feature_group_name='wafer_aggregates',
    start_date=datetime.now() - timedelta(days=15),
    end_date=datetime.now()
)

print(f"Retrieved {len(training_features)} wafer records for training")
print(f"Date range: {training_features['timestamp'].min()} to {training_features['timestamp'].max()}")
print()

# Train simple model on offline features
X_train = training_features[['avg_vdd', 'std_vdd', 'avg_test_time_ms', 'neighbor_yield_avg']]
y_train = training_features['yield_pct']

model = RandomForestRegressor(n_estimators=50, random_state=42)
model.fit(X_train, y_train)

print(f"‚úÖ Model trained on {len(X_train)} offline samples")
print()

# Retrieve online features for inference
print("="*80)
print("FEATURE RETRIEVAL - ONLINE (Real-Time Inference)")
print("="*80)

# Simulate real-time inference request (5 wafers)
inference_wafer_ids = latest_wafers['wafer_id'].head(5).tolist()

online_features = fs.get_online_features(
    entity_ids=inference_wafer_ids,
    feature_names=['avg_vdd', 'std_vdd', 'avg_test_time_ms', 'neighbor_yield_avg']
)

print(f"\\nOnline features:")
print(online_features)
print()

# Make predictions with online features
X_inference = online_features[['avg_vdd', 'std_vdd', 'avg_test_time_ms', 'neighbor_yield_avg']]
predictions = model.predict(X_inference)

print(f"‚úÖ Predictions for {len(predictions)} wafers:")
for i, (wafer_id, pred) in enumerate(zip(inference_wafer_ids, predictions)):
    print(f"   {wafer_id}: Predicted yield = {pred:.2f}%")

print()

# Get feature lineage
print("="*80)
print("FEATURE LINEAGE (Governance & Audit)")
print("="*80)

lineage = fs.get_feature_lineage('neighbor_yield_avg')
print(f"\\nFeature: neighbor_yield_avg")
print(f"Source tables: {lineage[0]['source_tables']}")
print(f"Transformation: {lineage[0]['transformation']}")
print(f"Created at: {lineage[0]['created_at']}")
print()

# Get feature metadata
metadata = fs.get_feature_metadata('wafer_aggregates')
print(f"Feature Group Metadata:")
print(f"  Name: wafer_aggregates")
print(f"  Version: {metadata['version']}")
print(f"  Entity key: {metadata['entity_key']}")
print(f"  Features: {', '.join(metadata['features'])}")
print(f"  Registered: {metadata['registered_at']}")

print()
print("="*80)
print("‚úÖ FEATURE STORE DEMONSTRATION COMPLETE")
print(f"   Offline store: {len(wafer_df)} records (training)")
print(f"   Online store: {len(fs.online_store)} entities (inference)")
print(f"   Latency: <5ms per entity (production-ready)")
print("="*80)

## 3. Real-Time Feature Serving Pipeline

### üìù What's Happening in This Code?

**Purpose:** Build low-latency feature serving pipeline for production inference with <10ms p99 latency.

**Key Points:**
- **Feature caching:** Pre-compute expensive aggregations, cache in Redis/memory
- **Batch retrieval:** Fetch multiple entities in single call (reduce network overhead)
- **Feature transformation:** Real-time computations (ratios, differences) from cached base features
- **Monitoring:** Track latency p50/p95/p99, cache hit rate, error rate

**Why This Matters:** Real-time predictions require <10ms feature retrieval. Slow features block production inference, violate SLA.

In [None]:
class RealTimeFeatureServer:
    """
    Low-latency feature serving for production inference.
    
    Optimizations:
    - In-memory caching (Redis simulation)
    - Batch retrieval (fetch multiple entities at once)
    - Feature transformation pipeline (derived features from base features)
    - Latency monitoring (p50, p95, p99)
    - Cache hit rate tracking
    """
    
    def __init__(self, feature_store):
        self.feature_store = feature_store
        self.cache = {}  # Simulates Redis in-memory cache
        self.latency_log = []
        self.cache_hits = 0
        self.cache_misses = 0
        
    def get_features_batch(self, entity_ids, feature_names, use_cache=True):
        """
        Batch feature retrieval with caching.
        
        Fetches multiple entities in single call (reduces network overhead).
        Uses cache for frequently accessed features.
        """
        start_time = time.time()
        
        results = []
        
        for entity_id in entity_ids:
            cache_key = f"{entity_id}::{','.join(sorted(feature_names))}"
            
            if use_cache and cache_key in self.cache:
                # Cache hit
                self.cache_hits += 1
                features = self.cache[cache_key]
            else:
                # Cache miss - retrieve from feature store
                self.cache_misses += 1
                
                feature_dict = {'entity_id': entity_id}
                
                # Get from online store
                if entity_id in self.feature_store.online_store:
                    for feature_name in feature_names:
                        feature_dict[feature_name] = self.feature_store.online_store[entity_id].get(feature_name)
                
                features = feature_dict
                
                # Update cache
                if use_cache:
                    self.cache[cache_key] = features
            
            results.append(features)
        
        latency_ms = (time.time() - start_time) * 1000
        self.latency_log.append(latency_ms)
        
        result_df = pd.DataFrame(results)
        
        return result_df, latency_ms
    
    def compute_derived_features(self, base_features_df):
        """
        Compute derived features from base features.
        
        Examples:
        - Ratios: vdd_idd_ratio = vdd / idd
        - Differences: vdd_delta = vdd - vdd_nominal
        - Aggregations: already cached in base features
        """
        df = base_features_df.copy()
        
        # Example derived features for wafer test
        if 'avg_vdd' in df.columns and 'std_vdd' in df.columns:
            df['vdd_coefficient_of_variation'] = df['std_vdd'] / df['avg_vdd']
        
        if 'yield_pct' in df.columns and 'neighbor_yield_avg' in df.columns:
            df['yield_vs_neighbor_delta'] = df['yield_pct'] - df['neighbor_yield_avg']
        
        return df
    
    def get_features_with_transformations(self, entity_ids, base_features, derived_features=None):
        """
        Retrieve base features and compute derived features.
        
        Workflow:
        1. Fetch base features from cache/store (fast)
        2. Compute derived features (simple math, <1ms)
        3. Return combined feature set
        """
        start_time = time.time()
        
        # Get base features
        base_df, base_latency = self.get_features_batch(entity_ids, base_features)
        
        # Compute derived features
        if derived_features:
            base_df = self.compute_derived_features(base_df)
        
        total_latency_ms = (time.time() - start_time) * 1000
        
        return base_df, total_latency_ms
    
    def get_latency_stats(self):
        """Calculate latency percentiles."""
        if not self.latency_log:
            return None
        
        return {
            'p50_ms': np.percentile(self.latency_log, 50),
            'p95_ms': np.percentile(self.latency_log, 95),
            'p99_ms': np.percentile(self.latency_log, 99),
            'mean_ms': np.mean(self.latency_log),
            'max_ms': np.max(self.latency_log),
            'total_requests': len(self.latency_log)
        }
    
    def get_cache_stats(self):
        """Calculate cache hit rate."""
        total = self.cache_hits + self.cache_misses
        
        if total == 0:
            return {'hit_rate': 0, 'cache_size': 0}
        
        return {
            'hit_rate': self.cache_hits / total,
            'cache_hits': self.cache_hits,
            'cache_misses': self.cache_misses,
            'cache_size': len(self.cache)
        }
    
    def clear_cache(self):
        """Clear cache (for testing, debugging)."""
        self.cache = {}
        print("üóëÔ∏è  Cache cleared")

# Example: Real-time feature serving for binning model
print("‚ö° Real-Time Feature Serving: Device Binning Model\\n")
print("="*80)

# Initialize feature server
feature_server = RealTimeFeatureServer(fs)

# Simulate production inference workload
print("SIMULATION: Production inference workload (1000 requests)")
print("="*80)

# Generate 1000 inference requests (random wafer IDs)
inference_requests = []
available_wafer_ids = list(fs.online_store.keys())

for i in range(1000):
    # Randomly select 1-5 wafers per request
    batch_size = np.random.randint(1, 6)
    entity_ids = np.random.choice(available_wafer_ids, batch_size, replace=False).tolist()
    inference_requests.append(entity_ids)

print(f"Generated {len(inference_requests)} inference requests")
print(f"Batch sizes: 1-5 wafers per request")
print()

# Process requests with caching
base_features = ['avg_vdd', 'std_vdd', 'avg_test_time_ms', 'neighbor_yield_avg']

print("Processing requests...")
for i, entity_ids in enumerate(inference_requests):
    features_df, latency = feature_server.get_features_with_transformations(
        entity_ids=entity_ids,
        base_features=base_features,
        derived_features=True
    )
    
    if (i + 1) % 200 == 0:
        print(f"  Processed {i + 1}/1000 requests...")

print()
print("="*80)
print("LATENCY ANALYSIS")
print("="*80)

latency_stats = feature_server.get_latency_stats()

print(f"\\nLatency Statistics (1000 requests):")
print(f"  Mean latency:   {latency_stats['mean_ms']:.2f}ms")
print(f"  P50 latency:    {latency_stats['p50_ms']:.2f}ms")
print(f"  P95 latency:    {latency_stats['p95_ms']:.2f}ms")
print(f"  P99 latency:    {latency_stats['p99_ms']:.2f}ms")
print(f"  Max latency:    {latency_stats['max_ms']:.2f}ms")

# Check SLA compliance (p99 < 10ms)
sla_target_p99 = 10.0
sla_met = latency_stats['p99_ms'] < sla_target_p99

print(f"\\nSLA Compliance:")
print(f"  Target p99: <{sla_target_p99}ms")
print(f"  Actual p99: {latency_stats['p99_ms']:.2f}ms")
print(f"  Status: {'‚úÖ MET' if sla_met else '‚ùå VIOLATED'}")

print()
print("="*80)
print("CACHE PERFORMANCE")
print("="*80)

cache_stats = feature_server.get_cache_stats()

print(f"\\nCache Statistics:")
print(f"  Cache hit rate: {cache_stats['hit_rate']:.1%}")
print(f"  Cache hits:     {cache_stats['cache_hits']}")
print(f"  Cache misses:   {cache_stats['cache_misses']}")
print(f"  Cache size:     {cache_stats['cache_size']} entries")

print(f"\\nCache Effectiveness:")
if cache_stats['hit_rate'] > 0.8:
    print(f"  ‚úÖ Excellent (>{80}% hit rate) - Cache is highly effective")
elif cache_stats['hit_rate'] > 0.5:
    print(f"  ‚ö†Ô∏è  Good ({cache_stats['hit_rate']:.0%} hit rate) - Consider cache warming")
else:
    print(f"  ‚ùå Poor (<50% hit rate) - Increase cache size or TTL")

print()

# Demonstrate cache impact
print("="*80)
print("CACHE IMPACT DEMONSTRATION")
print("="*80)

# Clear cache and measure latency without caching
feature_server.clear_cache()
feature_server.cache_hits = 0
feature_server.cache_misses = 0
feature_server.latency_log = []

print("\\nRun 1: Without cache (cold start)")
test_entity_ids = available_wafer_ids[:10]

for _ in range(100):
    _, _ = feature_server.get_features_batch(test_entity_ids, base_features, use_cache=False)

no_cache_stats = feature_server.get_latency_stats()
print(f"  Mean latency (no cache): {no_cache_stats['mean_ms']:.2f}ms")

# Now with cache
feature_server.cache_hits = 0
feature_server.cache_misses = 0
feature_server.latency_log = []

print("\\nRun 2: With cache (warm cache)")

for _ in range(100):
    _, _ = feature_server.get_features_batch(test_entity_ids, base_features, use_cache=True)

cache_stats_run2 = feature_server.get_latency_stats()
cache_hit_stats = feature_server.get_cache_stats()

print(f"  Mean latency (with cache): {cache_stats_run2['mean_ms']:.2f}ms")
print(f"  Cache hit rate: {cache_hit_stats['hit_rate']:.1%}")
print(f"  Speedup: {no_cache_stats['mean_ms'] / cache_stats_run2['mean_ms']:.1f}x faster")

print()
print("="*80)
print("‚úÖ REAL-TIME FEATURE SERVING COMPLETE")
print(f"   P99 latency: {latency_stats['p99_ms']:.2f}ms (target: <10ms)")
print(f"   Cache hit rate: {cache_stats['hit_rate']:.1%}")
print(f"   Production-ready: {'‚úÖ YES' if sla_met and cache_stats['hit_rate'] > 0.5 else '‚ùå NO'}")
print("="*80)

## 4. Model Performance Monitoring & Drift Detection

### üìù What's Happening in This Code?

**Purpose:** Monitor production model performance in real-time, detect accuracy degradation and concept drift before business impact.

**Key Points:**
- **Accuracy tracking:** Monitor prediction accuracy over time (sliding window)
- **Concept drift detection:** Statistical tests for distribution changes (KS test, PSI)
- **Performance alerts:** Trigger when accuracy drops >5% or drift detected
- **Root cause analysis:** Identify which features shifted (feature-level drift)

**Why This Matters:** Models degrade in production (data drift, concept drift, seasonality). Early detection prevents business losses (bad predictions costing $$ before manual discovery).

In [None]:
class ModelPerformanceMonitor:
    """
    Real-time model performance monitoring and drift detection.
    
    Tracks:
    - Prediction accuracy over time (sliding window)
    - Concept drift (target distribution changes)
    - Data drift (feature distribution changes)
    - Performance degradation alerts
    """
    
    def __init__(self, model_name, window_size=1000):
        self.model_name = model_name
        self.window_size = window_size
        self.predictions = []
        self.actuals = []
        self.timestamps = []
        self.feature_distributions = defaultdict(list)
        self.baseline_distribution = {}
        
    def log_prediction(self, prediction, actual, features, timestamp=None):
        """
        Log prediction for monitoring.
        
        Args:
            prediction: Model prediction
            actual: Ground truth (if available)
            features: Input features dict
            timestamp: Prediction time
        """
        self.predictions.append(prediction)
        self.actuals.append(actual)
        self.timestamps.append(timestamp or datetime.now())
        
        # Log feature values for drift detection
        for feature_name, feature_value in features.items():
            if isinstance(feature_value, (int, float)):
                self.feature_distributions[feature_name].append(feature_value)
    
    def set_baseline_distribution(self, baseline_features):
        """
        Set baseline feature distribution (training data).
        
        Used for drift detection - compare production features vs training.
        """
        for feature_name, values in baseline_features.items():
            self.baseline_distribution[feature_name] = np.array(values)
        
        print(f"‚úÖ Baseline distribution set for {len(baseline_features)} features")
    
    def calculate_accuracy(self, window='all'):
        """
        Calculate accuracy over specified window.
        
        Args:
            window: 'all', 'recent' (last N samples), or integer (exact window size)
        """
        if not self.predictions or not self.actuals:
            return None
        
        preds = np.array(self.predictions)
        acts = np.array(self.actuals)
        
        if window == 'recent':
            preds = preds[-self.window_size:]
            acts = acts[-self.window_size:]
        elif isinstance(window, int):
            preds = preds[-window:]
            acts = acts[-window:]
        
        # Calculate accuracy (for classification) or MAE (for regression)
        if preds.dtype == acts.dtype and preds.dtype in [np.int32, np.int64, object]:
            # Classification
            accuracy = np.mean(preds == acts)
            return {'type': 'classification', 'accuracy': accuracy}
        else:
            # Regression
            mae = np.mean(np.abs(preds - acts))
            rmse = np.sqrt(np.mean((preds - acts) ** 2))
            return {'type': 'regression', 'mae': mae, 'rmse': rmse}
    
    def detect_concept_drift(self, reference_window=1000, current_window=500):
        """
        Detect concept drift using Kolmogorov-Smirnov test.
        
        Compares prediction distribution between reference and current windows.
        
        Returns:
            drift_detected: Boolean
            p_value: Statistical significance
            ks_statistic: K-S test statistic
        """
        if len(self.predictions) < reference_window + current_window:
            return None
        
        # Reference window (older predictions)
        ref_preds = np.array(self.predictions[-reference_window - current_window:-current_window])
        
        # Current window (recent predictions)
        cur_preds = np.array(self.predictions[-current_window:])
        
        # Kolmogorov-Smirnov test
        ks_statistic, p_value = stats.ks_2samp(ref_preds, cur_preds)
        
        # Drift detected if p < 0.05 (distributions are different)
        drift_detected = p_value < 0.05
        
        return {
            'drift_detected': drift_detected,
            'p_value': p_value,
            'ks_statistic': ks_statistic,
            'reference_window': reference_window,
            'current_window': current_window
        }
    
    def detect_feature_drift(self, feature_name, method='ks_test'):
        """
        Detect drift for specific feature.
        
        Compares production feature distribution vs baseline (training).
        
        Methods:
        - ks_test: Kolmogorov-Smirnov test (distribution comparison)
        - psi: Population Stability Index (binned comparison)
        """
        if feature_name not in self.baseline_distribution:
            return None
        
        if feature_name not in self.feature_distributions:
            return None
        
        baseline = self.baseline_distribution[feature_name]
        current = np.array(self.feature_distributions[feature_name][-self.window_size:])
        
        if method == 'ks_test':
            ks_statistic, p_value = stats.ks_2samp(baseline, current)
            drift_detected = p_value < 0.05
            
            return {
                'feature': feature_name,
                'method': 'ks_test',
                'drift_detected': drift_detected,
                'p_value': p_value,
                'ks_statistic': ks_statistic
            }
        
        elif method == 'psi':
            # Population Stability Index
            psi = self._calculate_psi(baseline, current)
            
            # PSI thresholds:
            # < 0.1: No significant change
            # 0.1 - 0.25: Moderate change
            # > 0.25: Significant change
            drift_detected = psi > 0.25
            
            return {
                'feature': feature_name,
                'method': 'psi',
                'drift_detected': drift_detected,
                'psi': psi,
                'threshold': 0.25
            }
    
    def _calculate_psi(self, baseline, current, bins=10):
        """
        Calculate Population Stability Index.
        
        PSI = Œ£ (current_pct - baseline_pct) * ln(current_pct / baseline_pct)
        """
        # Create bins based on baseline distribution
        bin_edges = np.percentile(baseline, np.linspace(0, 100, bins + 1))
        
        # Histogram for baseline and current
        baseline_hist, _ = np.histogram(baseline, bins=bin_edges)
        current_hist, _ = np.histogram(current, bins=bin_edges)
        
        # Normalize to percentages
        baseline_pct = baseline_hist / len(baseline)
        current_pct = current_hist / len(current)
        
        # Avoid log(0)
        baseline_pct = np.where(baseline_pct == 0, 0.0001, baseline_pct)
        current_pct = np.where(current_pct == 0, 0.0001, current_pct)
        
        # Calculate PSI
        psi = np.sum((current_pct - baseline_pct) * np.log(current_pct / baseline_pct))
        
        return psi
    
    def check_performance_degradation(self, baseline_accuracy, threshold=0.05):
        """
        Check if performance degraded compared to baseline.
        
        Args:
            baseline_accuracy: Expected accuracy (from validation set)
            threshold: Alert if accuracy drops by this amount (e.g., 0.05 = 5%)
        """
        current_metrics = self.calculate_accuracy(window='recent')
        
        if current_metrics is None:
            return None
        
        if current_metrics['type'] == 'classification':
            current_acc = current_metrics['accuracy']
            degradation = baseline_accuracy - current_acc
            degraded = degradation > threshold
            
            return {
                'degraded': degraded,
                'baseline_accuracy': baseline_accuracy,
                'current_accuracy': current_acc,
                'degradation': degradation,
                'threshold': threshold
            }
        else:
            # For regression, use relative MAE increase
            # (more complex, baseline MAE needed)
            return {
                'degraded': False,
                'note': 'Regression degradation requires baseline MAE'
            }
    
    def generate_monitoring_report(self, baseline_accuracy=None):
        """Generate comprehensive monitoring report."""
        report = {
            'model_name': self.model_name,
            'total_predictions': len(self.predictions),
            'monitoring_period': {
                'start': min(self.timestamps) if self.timestamps else None,
                'end': max(self.timestamps) if self.timestamps else None
            }
        }
        
        # Accuracy metrics
        overall_metrics = self.calculate_accuracy(window='all')
        recent_metrics = self.calculate_accuracy(window='recent')
        
        report['accuracy'] = {
            'overall': overall_metrics,
            'recent': recent_metrics
        }
        
        # Concept drift
        concept_drift = self.detect_concept_drift()
        report['concept_drift'] = concept_drift
        
        # Feature drift (for all features)
        feature_drifts = []
        for feature_name in self.baseline_distribution.keys():
            drift = self.detect_feature_drift(feature_name, method='psi')
            if drift:
                feature_drifts.append(drift)
        
        report['feature_drift'] = feature_drifts
        
        # Performance degradation
        if baseline_accuracy:
            degradation = self.check_performance_degradation(baseline_accuracy)
            report['degradation'] = degradation
        
        return report

# Example: Model performance monitoring for yield prediction
print("üìä Model Performance Monitoring: Yield Prediction\\n")
print("="*80)

# Train baseline model
np.random.seed(42)

n_train = 1000
X_train_monitor = pd.DataFrame({
    'avg_vdd': np.random.normal(1.2, 0.02, n_train),
    'std_vdd': np.random.uniform(0.01, 0.05, n_train),
    'avg_test_time_ms': np.random.normal(100, 10, n_train),
    'neighbor_yield_avg': np.random.uniform(85, 99, n_train)
})
y_train_monitor = 0.9 * X_train_monitor['neighbor_yield_avg'] + np.random.normal(0, 2, n_train)

model_monitor = RandomForestRegressor(n_estimators=50, random_state=42)
model_monitor.fit(X_train_monitor, y_train_monitor)

# Calculate baseline accuracy
y_pred_baseline = model_monitor.predict(X_train_monitor)
baseline_mae = mean_absolute_error(y_train_monitor, y_pred_baseline)

print(f"‚úÖ Model trained")
print(f"   Baseline MAE: {baseline_mae:.2f}%")
print()

# Initialize performance monitor
monitor = ModelPerformanceMonitor(model_name='yield_prediction_v2.0', window_size=500)

# Set baseline distribution (training data)
baseline_features = {
    'avg_vdd': X_train_monitor['avg_vdd'].values,
    'std_vdd': X_train_monitor['std_vdd'].values,
    'avg_test_time_ms': X_train_monitor['avg_test_time_ms'].values,
    'neighbor_yield_avg': X_train_monitor['neighbor_yield_avg'].values
}

monitor.set_baseline_distribution(baseline_features)
print()

# Simulate production predictions (1500 samples)
print("="*80)
print("SIMULATING PRODUCTION PREDICTIONS")
print("="*80)

# Normal predictions (first 1000 samples - similar to training)
print("\\nPhase 1: Normal operation (1000 predictions)...")

for i in range(1000):
    features = {
        'avg_vdd': np.random.normal(1.2, 0.02),
        'std_vdd': np.random.uniform(0.01, 0.05),
        'avg_test_time_ms': np.random.normal(100, 10),
        'neighbor_yield_avg': np.random.uniform(85, 99)
    }
    
    X = pd.DataFrame([features])
    prediction = model_monitor.predict(X)[0]
    
    # Simulate ground truth
    actual = 0.9 * features['neighbor_yield_avg'] + np.random.normal(0, 2)
    
    monitor.log_prediction(
        prediction=prediction,
        actual=actual,
        features=features,
        timestamp=datetime.now() - timedelta(hours=1000-i)
    )

print("  ‚úÖ Logged 1000 predictions")

# Drifted predictions (next 500 samples - distribution shift)
print("\\nPhase 2: Data drift (500 predictions)...")
print("  ‚ö†Ô∏è  avg_vdd shifted from 1.2V to 1.25V (process change)")

for i in range(500):
    features = {
        'avg_vdd': np.random.normal(1.25, 0.02),  # SHIFTED!
        'std_vdd': np.random.uniform(0.01, 0.05),
        'avg_test_time_ms': np.random.normal(100, 10),
        'neighbor_yield_avg': np.random.uniform(85, 99)
    }
    
    X = pd.DataFrame([features])
    prediction = model_monitor.predict(X)[0]
    
    # Ground truth also shifts (concept drift)
    actual = 0.85 * features['neighbor_yield_avg'] + np.random.normal(0, 3)
    
    monitor.log_prediction(
        prediction=prediction,
        actual=actual,
        features=features,
        timestamp=datetime.now() - timedelta(hours=500-i)
    )

print("  ‚úÖ Logged 500 predictions (with drift)")
print()

# Generate monitoring report
print("="*80)
print("MONITORING REPORT")
print("="*80)

report = monitor.generate_monitoring_report(baseline_accuracy=0.95)

print(f"\\nModel: {report['model_name']}")
print(f"Total predictions: {report['total_predictions']}")
print(f"Monitoring period: {report['monitoring_period']['start']} to {report['monitoring_period']['end']}")

print(f"\\nACCURACY METRICS:")
print(f"  Recent window MAE: {report['accuracy']['recent']['mae']:.2f}%")
print(f"  Recent window RMSE: {report['accuracy']['recent']['rmse']:.2f}%")

print(f"\\nCONCEPT DRIFT DETECTION:")
if report['concept_drift']:
    cd = report['concept_drift']
    print(f"  Drift detected: {'‚ö†Ô∏è  YES' if cd['drift_detected'] else '‚úÖ NO'}")
    print(f"  KS statistic: {cd['ks_statistic']:.4f}")
    print(f"  P-value: {cd['p_value']:.4f}")
    print(f"  Reference window: {cd['reference_window']} predictions")
    print(f"  Current window: {cd['current_window']} predictions")

print(f"\\nFEATURE DRIFT DETECTION (PSI):")
drifted_features = [fd for fd in report['feature_drift'] if fd['drift_detected']]

for fd in report['feature_drift']:
    status = "‚ö†Ô∏è  DRIFT" if fd['drift_detected'] else "‚úÖ OK"
    print(f"  {fd['feature']}: {status} (PSI={fd['psi']:.3f})")

if drifted_features:
    print(f"\\n‚ö†Ô∏è  WARNING: {len(drifted_features)} feature(s) showing drift!")
    for fd in drifted_features:
        print(f"     - {fd['feature']}: PSI={fd['psi']:.3f} (threshold={fd['threshold']})")

print()
print("="*80)
print("RECOMMENDATIONS")
print("="*80)

if report['concept_drift'] and report['concept_drift']['drift_detected']:
    print("\\nüö® CONCEPT DRIFT DETECTED")
    print("   Action: Retrain model with recent data")
    print("   Timeline: Immediate (accuracy may be degrading)")

if drifted_features:
    print("\\n‚ö†Ô∏è  FEATURE DRIFT DETECTED")
    print(f"   Affected features: {', '.join([fd['feature'] for fd in drifted_features])}")
    print("   Action: Investigate root cause (process change, sensor drift, data pipeline issue)")
    print("   Timeline: Within 24 hours")

if not (report['concept_drift'] and report['concept_drift']['drift_detected']) and not drifted_features:
    print("\\n‚úÖ NO DRIFT DETECTED")
    print("   Model performance stable")
    print("   Continue monitoring")

print()
print("="*80)
print("‚úÖ MODEL PERFORMANCE MONITORING COMPLETE")
print(f"   Total predictions monitored: {report['total_predictions']}")
print(f"   Drift detection: {'‚ö†Ô∏è  ACTIVE' if drifted_features or (report['concept_drift'] and report['concept_drift']['drift_detected']) else '‚úÖ NORMAL'}")
print("="*80)

## 5. Data Quality Monitoring for ML Pipelines

### üìù What's Happening in This Code?

**Purpose:** Implement comprehensive data quality checks to prevent bad data from reaching models (garbage in ‚Üí garbage out).

**Key Points:**
- **Schema validation:** Ensure required fields present, correct data types
- **Range checks:** Flag values outside expected bounds (Vdd: 1.0-1.4V)
- **Null detection:** Alert on missing critical parameters
- **Distribution monitoring:** Compare current vs historical data distributions

**Why This Matters:** Bad data causes model failures (exceptions, incorrect predictions, degraded accuracy). Data quality checks catch issues upstream before they impact production.

In [None]:
class DataQualityMonitor:
    """
    Data quality monitoring for ML pipelines.
    
    Checks:
    - Schema validation (required fields, data types)
    - Range validation (min/max bounds)
    - Null detection (missing values)
    - Distribution validation (statistical comparison)
    - Anomaly detection (outliers)
    """
    
    def __init__(self, dataset_name):
        self.dataset_name = dataset_name
        self.schema = {}
        self.validation_rules = {}
        self.baseline_stats = {}
        self.quality_issues = []
        
    def define_schema(self, schema):
        """
        Define expected schema.
        
        Schema format:
        {
            'field_name': {
                'type': 'float'/'int'/'string',
                'required': True/False,
                'nullable': True/False
            }
        }
        """
        self.schema = schema
        print(f"‚úÖ Schema defined for {self.dataset_name}")
        print(f"   Fields: {len(schema)}")
        
        return schema
    
    def add_validation_rule(self, field, rule_type, **kwargs):
        """
        Add validation rule for field.
        
        Rule types:
        - range: min, max
        - categorical: allowed_values
        - regex: pattern
        - custom: function
        """
        if field not in self.validation_rules:
            self.validation_rules[field] = []
        
        rule = {'type': rule_type, **kwargs}
        self.validation_rules[field].append(rule)
        
        return rule
    
    def validate_schema(self, df):
        """
        Validate DataFrame against schema.
        
        Checks:
        - Required fields present
        - Correct data types
        - Nullable constraints
        """
        issues = []
        
        # Check required fields
        for field, spec in self.schema.items():
            if spec.get('required', False):
                if field not in df.columns:
                    issues.append({
                        'type': 'missing_field',
                        'field': field,
                        'severity': 'critical',
                        'message': f"Required field '{field}' missing"
                    })
        
        # Check data types
        for field in df.columns:
            if field in self.schema:
                expected_type = self.schema[field]['type']
                actual_dtype = str(df[field].dtype)
                
                # Type validation (simplified)
                type_mismatch = False
                if expected_type == 'float' and not ('float' in actual_dtype or 'int' in actual_dtype):
                    type_mismatch = True
                elif expected_type == 'int' and 'int' not in actual_dtype:
                    type_mismatch = True
                elif expected_type == 'string' and 'object' not in actual_dtype:
                    type_mismatch = True
                
                if type_mismatch:
                    issues.append({
                        'type': 'type_mismatch',
                        'field': field,
                        'severity': 'high',
                        'expected': expected_type,
                        'actual': actual_dtype,
                        'message': f"Field '{field}' type mismatch: expected {expected_type}, got {actual_dtype}"
                    })
        
        # Check nullability
        for field, spec in self.schema.items():
            if field in df.columns and not spec.get('nullable', True):
                null_count = df[field].isnull().sum()
                if null_count > 0:
                    issues.append({
                        'type': 'null_constraint_violation',
                        'field': field,
                        'severity': 'high',
                        'null_count': null_count,
                        'message': f"Field '{field}' has {null_count} nulls (not nullable)"
                    })
        
        return issues
    
    def validate_ranges(self, df):
        """Validate field values against range rules."""
        issues = []
        
        for field, rules in self.validation_rules.items():
            if field not in df.columns:
                continue
            
            for rule in rules:
                if rule['type'] == 'range':
                    min_val = rule.get('min')
                    max_val = rule.get('max')
                    
                    if min_val is not None:
                        violations = df[df[field] < min_val]
                        if len(violations) > 0:
                            issues.append({
                                'type': 'range_violation',
                                'field': field,
                                'severity': 'medium',
                                'rule': f'min={min_val}',
                                'violation_count': len(violations),
                                'message': f"Field '{field}' has {len(violations)} values < {min_val}"
                            })
                    
                    if max_val is not None:
                        violations = df[df[field] > max_val]
                        if len(violations) > 0:
                            issues.append({
                                'type': 'range_violation',
                                'field': field,
                                'severity': 'medium',
                                'rule': f'max={max_val}',
                                'violation_count': len(violations),
                                'message': f"Field '{field}' has {len(violations)} values > {max_val}"
                            })
                
                elif rule['type'] == 'categorical':
                    allowed_values = rule.get('allowed_values', [])
                    violations = df[~df[field].isin(allowed_values)]
                    
                    if len(violations) > 0:
                        issues.append({
                            'type': 'categorical_violation',
                            'field': field,
                            'severity': 'medium',
                            'violation_count': len(violations),
                            'message': f"Field '{field}' has {len(violations)} values not in allowed set"
                        })
        
        return issues
    
    def set_baseline_statistics(self, df):
        """
        Calculate baseline statistics for distribution monitoring.
        
        Stores mean, std, min, max, percentiles for numerical fields.
        """
        for col in df.select_dtypes(include=[np.number]).columns:
            self.baseline_stats[col] = {
                'mean': df[col].mean(),
                'std': df[col].std(),
                'min': df[col].min(),
                'max': df[col].max(),
                'p25': df[col].quantile(0.25),
                'p50': df[col].quantile(0.50),
                'p75': df[col].quantile(0.75),
                'p95': df[col].quantile(0.95)
            }
        
        print(f"‚úÖ Baseline statistics set for {len(self.baseline_stats)} numerical fields")
        
        return self.baseline_stats
    
    def validate_distribution(self, df, threshold_std_shift=2.0):
        """
        Validate current data distribution vs baseline.
        
        Flags if mean shifts by >2 standard deviations.
        """
        issues = []
        
        for col, baseline in self.baseline_stats.items():
            if col not in df.columns:
                continue
            
            current_mean = df[col].mean()
            baseline_mean = baseline['mean']
            baseline_std = baseline['std']
            
            # Check if mean shifted significantly
            shift_in_stds = abs(current_mean - baseline_mean) / baseline_std if baseline_std > 0 else 0
            
            if shift_in_stds > threshold_std_shift:
                issues.append({
                    'type': 'distribution_shift',
                    'field': col,
                    'severity': 'high',
                    'baseline_mean': baseline_mean,
                    'current_mean': current_mean,
                    'shift_stds': shift_in_stds,
                    'message': f"Field '{col}' mean shifted by {shift_in_stds:.1f} std devs"
                })
        
        return issues
    
    def detect_anomalies(self, df, method='iqr', threshold=3.0):
        """
        Detect anomalies/outliers in numerical fields.
        
        Methods:
        - iqr: Interquartile range (values outside Q1 - 1.5*IQR or Q3 + 1.5*IQR)
        - zscore: Z-score (values with |z| > threshold)
        """
        issues = []
        
        for col in df.select_dtypes(include=[np.number]).columns:
            if method == 'iqr':
                Q1 = df[col].quantile(0.25)
                Q3 = df[col].quantile(0.75)
                IQR = Q3 - Q1
                
                lower_bound = Q1 - 1.5 * IQR
                upper_bound = Q3 + 1.5 * IQR
                
                anomalies = df[(df[col] < lower_bound) | (df[col] > upper_bound)]
                
                if len(anomalies) > 0:
                    issues.append({
                        'type': 'anomaly',
                        'field': col,
                        'severity': 'low',
                        'method': 'iqr',
                        'anomaly_count': len(anomalies),
                        'anomaly_pct': len(anomalies) / len(df) * 100,
                        'message': f"Field '{col}' has {len(anomalies)} anomalies ({len(anomalies)/len(df)*100:.1f}%)"
                    })
            
            elif method == 'zscore':
                mean = df[col].mean()
                std = df[col].std()
                
                if std > 0:
                    z_scores = np.abs((df[col] - mean) / std)
                    anomalies = df[z_scores > threshold]
                    
                    if len(anomalies) > 0:
                        issues.append({
                            'type': 'anomaly',
                            'field': col,
                            'severity': 'low',
                            'method': 'zscore',
                            'anomaly_count': len(anomalies),
                            'anomaly_pct': len(anomalies) / len(df) * 100,
                            'message': f"Field '{col}' has {len(anomalies)} anomalies (|z| > {threshold})"
                        })
        
        return issues
    
    def run_quality_checks(self, df):
        """
        Run all data quality checks.
        
        Returns comprehensive quality report.
        """
        all_issues = []
        
        # Schema validation
        schema_issues = self.validate_schema(df)
        all_issues.extend(schema_issues)
        
        # Range validation
        range_issues = self.validate_ranges(df)
        all_issues.extend(range_issues)
        
        # Distribution validation
        if self.baseline_stats:
            dist_issues = self.validate_distribution(df)
            all_issues.extend(dist_issues)
        
        # Anomaly detection
        anomaly_issues = self.detect_anomalies(df)
        all_issues.extend(anomaly_issues)
        
        # Store issues
        self.quality_issues.extend(all_issues)
        
        # Categorize by severity
        critical = [i for i in all_issues if i['severity'] == 'critical']
        high = [i for i in all_issues if i['severity'] == 'high']
        medium = [i for i in all_issues if i['severity'] == 'medium']
        low = [i for i in all_issues if i['severity'] == 'low']
        
        report = {
            'dataset': self.dataset_name,
            'row_count': len(df),
            'column_count': len(df.columns),
            'total_issues': len(all_issues),
            'issues_by_severity': {
                'critical': len(critical),
                'high': len(high),
                'medium': len(medium),
                'low': len(low)
            },
            'issues': all_issues,
            'passed': len(critical) == 0 and len(high) == 0
        }
        
        return report

# Example: Data quality monitoring for STDF wafer test data
print("üîç Data Quality Monitoring: STDF Wafer Test Data\\n")
print("="*80)

# Initialize data quality monitor
dq = DataQualityMonitor("stdf_wafer_test")

# Define schema for wafer test data
schema = {
    'wafer_id': {'type': 'string', 'required': True, 'nullable': False},
    'device_id': {'type': 'string', 'required': True, 'nullable': False},
    'vdd': {'type': 'float', 'required': True, 'nullable': False},
    'idd': {'type': 'float', 'required': True, 'nullable': False},
    'frequency': {'type': 'float', 'required': True, 'nullable': False},
    'temperature': {'type': 'float', 'required': False, 'nullable': True},
    'test_time_ms': {'type': 'float', 'required': True, 'nullable': False},
    'pass_fail': {'type': 'int', 'required': True, 'nullable': False}
}

dq.define_schema(schema)
print()

# Add validation rules
dq.add_validation_rule('vdd', 'range', min=1.0, max=1.4)
dq.add_validation_rule('idd', 'range', min=0, max=200)
dq.add_validation_rule('frequency', 'range', min=1000, max=3000)
dq.add_validation_rule('temperature', 'range', min=-40, max=125)
dq.add_validation_rule('test_time_ms', 'range', min=10, max=500)
dq.add_validation_rule('pass_fail', 'categorical', allowed_values=[0, 1])

print("‚úÖ Validation rules added")
print("   vdd: 1.0-1.4V")
print("   idd: 0-200mA")
print("   frequency: 1000-3000 MHz")
print("   temperature: -40 to 125¬∞C")
print("   test_time_ms: 10-500ms")
print("   pass_fail: 0 or 1")
print()

# Generate baseline dataset (good quality)
n_baseline = 1000

baseline_data = pd.DataFrame({
    'wafer_id': [f'W{i//100:04d}' for i in range(n_baseline)],
    'device_id': [f'D{i:06d}' for i in range(n_baseline)],
    'vdd': np.random.normal(1.2, 0.02, n_baseline),
    'idd': np.random.normal(50, 5, n_baseline),
    'frequency': np.random.normal(2400, 100, n_baseline),
    'temperature': np.random.normal(25, 10, n_baseline),
    'test_time_ms': np.random.normal(100, 10, n_baseline),
    'pass_fail': np.random.choice([0, 1], n_baseline, p=[0.05, 0.95])
})

# Set baseline statistics
dq.set_baseline_statistics(baseline_data)
print()

# Test dataset with quality issues
print("="*80)
print("TESTING DATA QUALITY CHECKS")
print("="*80)

# Generate test dataset with intentional issues
n_test = 200

test_data = pd.DataFrame({
    'wafer_id': [f'W{i//100:04d}' for i in range(n_test)],
    'device_id': [f'D{i:06d}' for i in range(n_test)],
    'vdd': np.concatenate([
        np.random.normal(1.2, 0.02, 180),
        np.random.normal(1.5, 0.05, 20)  # 20 out-of-range values
    ]),
    'idd': np.random.normal(50, 5, n_test),
    'frequency': np.random.normal(2600, 100, n_test),  # Mean shifted from 2400
    'temperature': np.random.normal(25, 10, n_test),
    'test_time_ms': np.random.normal(100, 10, n_test),
    'pass_fail': np.random.choice([0, 1], n_test, p=[0.05, 0.95])
})

# Introduce some nulls
test_data.loc[0:5, 'temperature'] = None

print(f"\\nTest dataset: {len(test_data)} rows")
print(f"Intentional issues:")
print(f"  - 20 vdd values > 1.4V (range violation)")
print(f"  - 6 null temperature values")
print(f"  - frequency mean shifted from 2400 to 2600 (distribution shift)")
print()

# Run quality checks
report = dq.run_quality_checks(test_data)

print("="*80)
print("DATA QUALITY REPORT")
print("="*80)

print(f"\\nDataset: {report['dataset']}")
print(f"Rows: {report['row_count']}")
print(f"Columns: {report['column_count']}")

print(f"\\nISSUES BY SEVERITY:")
print(f"  Critical: {report['issues_by_severity']['critical']}")
print(f"  High: {report['issues_by_severity']['high']}")
print(f"  Medium: {report['issues_by_severity']['medium']}")
print(f"  Low: {report['issues_by_severity']['low']}")
print(f"  Total: {report['total_issues']}")

print(f"\\nQUALITY CHECK: {'‚úÖ PASSED' if report['passed'] else '‚ùå FAILED'}")

print(f"\\nDETAILED ISSUES:")
for issue in report['issues']:
    severity_icon = {'critical': 'üî¥', 'high': 'üü†', 'medium': 'üü°', 'low': 'üîµ'}
    icon = severity_icon.get(issue['severity'], '‚ö™')
    print(f"  {icon} [{issue['severity'].upper()}] {issue['message']}")

print()

# Action recommendations
print("="*80)
print("RECOMMENDATIONS")
print("="*80)

if report['issues_by_severity']['critical'] > 0:
    print("\\nüî¥ CRITICAL ISSUES - BLOCK DATA PIPELINE")
    print("   Action: Do not process this data batch")
    print("   Investigate root cause immediately")

if report['issues_by_severity']['high'] > 0:
    print("\\nüü† HIGH SEVERITY ISSUES - ALERT ON-CALL")
    print("   Action: Review issues, may indicate data pipeline problem")
    print("   Examples: Missing required fields, type mismatches, excessive nulls")

if report['issues_by_severity']['medium'] > 0:
    print("\\nüü° MEDIUM SEVERITY ISSUES - LOG FOR REVIEW")
    print("   Action: Process data but flag for investigation")
    print("   Examples: Range violations, unexpected categorical values")

if report['issues_by_severity']['low'] > 0:
    print("\\nüîµ LOW SEVERITY ISSUES - INFORMATIONAL")
    print("   Action: Monitor trends over time")
    print("   Examples: Anomalies/outliers (may be valid extreme cases)")

if report['passed']:
    print("\\n‚úÖ ALL CRITICAL & HIGH CHECKS PASSED")
    print("   Data quality acceptable for production")

print()
print("="*80)
print("‚úÖ DATA QUALITY MONITORING COMPLETE")
print(f"   Total issues found: {report['total_issues']}")
print(f"   Data quality: {'‚úÖ ACCEPTABLE' if report['passed'] else '‚ùå UNACCEPTABLE'}")
print("="*80)

## 6. üöÄ Real-World Project Templates

### Project 1: Enterprise Feature Store for Wafer Test Data

**Objective:** Build production feature store serving 10,000+ wafer features with <5ms p99 latency for real-time binning decisions.

**Business Value:** Consistent features between training and inference eliminate 10-30% accuracy drop from training-serving skew. Feature reuse across 5 models saves 200 engineering hours/year.

**Features to Implement:**
- Offline store: Parquet-partitioned storage on S3 (30-day rolling window, 1TB data)
- Online store: Redis cluster (10K QPS, <5ms p99 latency)
- Feature groups: wafer_aggregates, device_parametrics, spatial_correlations, temporal_trends
- Point-in-time joins: Prevent data leakage (no future features in past training)
- Feature versioning: Track definition changes (v1.0 ‚Üí v1.1 migration)
- Lineage tracking: STDF fields ‚Üí derived features ‚Üí model predictions

**Success Criteria:**
- Online store p99 latency <5ms for 10K concurrent requests
- Offline batch feature generation processes 1TB data in <30 minutes
- Zero training-serving skew (offline/online feature agreement >99.9%)
- Feature versioning supports rollback within 15 minutes
- Comprehensive lineage for audit compliance (FDA, automotive)

**STDF Data Application:**
- Raw data: 10K parametric tests per device (voltage, current, frequency, power)
- Feature groups: Wafer-level aggregates (mean/std Vdd, yield%, spatial patterns)
- Real-time serving: Binning model fetches features for 1,000 devices/sec
- Training: 30-day historical features for yield prediction model retraining

---

### Project 2: Real-Time Feature Serving for Fraud Detection

**Objective:** Build low-latency feature pipeline serving user transaction features with <10ms p99 latency for real-time fraud scoring.

**Business Value:** Real-time fraud detection requires <100ms total inference time (feature retrieval + model prediction). Slow features block transactions, violate SLA, lose revenue.

**Features to Implement:**
- Feature caching: Pre-compute expensive aggregations (user 30-day stats cached in Redis)
- Batch retrieval: Fetch 100 users in single call (reduce network overhead from 100x 1ms ‚Üí 1x 5ms)
- Feature transformation: Real-time ratios (transaction_amt / user_avg_30day) computed on-the-fly
- Cache warming: Pre-fetch features for high-value users during low-traffic hours
- Latency monitoring: Track p50/p95/p99, alert if >10ms

**Success Criteria:**
- P99 latency <10ms for feature retrieval (100 users batch)
- Cache hit rate >85% (warm cache reduces latency 5-10x)
- Support 10,000 QPS (peak traffic, Black Friday)
- Graceful degradation: Fallback to defaults if cache miss (don't block transactions)
- Monitoring dashboard: Real-time latency, cache hit rate, error rate

**Data Application:**
- Base features: user_id, transaction_amount, merchant_id, location, time
- Cached aggregates: user_30day_avg, user_30day_std, merchant_fraud_rate
- Derived features: amount_vs_avg_ratio, time_since_last_txn, location_distance_km
- Model: XGBoost fraud classifier (requires 20 features, <100ms end-to-end)

---

### Project 3: Model Performance Monitoring for Yield Prediction

**Objective:** Monitor production yield prediction model 24/7, detect accuracy degradation within 1 hour (not weeks).

**Business Value:** Yield prediction drives fab decisions ($M impact). Degraded model causes incorrect estimates, bad capacity planning. Early detection saves $500K+/incident.

**Features to Implement:**
- Accuracy tracking: Sliding window (1000 predictions), compare actual vs predicted yield
- Concept drift: KS test comparing prediction distributions (1-week ago vs today)
- Feature drift: PSI (Population Stability Index) for all input features
- Automated alerts: Slack/PagerDuty if accuracy drops >5% or drift p-value <0.05
- Root cause analysis: Identify which feature drifted (guides investigation)

**Success Criteria:**
- Detect accuracy degradation within 1 hour (1000 predictions at 1000/day rate)
- False positive rate <5% (alerts only for real issues, not noise)
- Root cause identification: Flag specific drifted features (e.g., "avg_vdd shifted 1.2V ‚Üí 1.25V")
- Automated retraining trigger: If drift sustained for 24 hours
- Dashboard: Real-time accuracy, drift status, alert history

**STDF Data Application:**
- Model: Yield prediction Random Forest (accuracy baseline: 95%)
- Monitor inputs: avg_vdd, std_vdd, test_time_ms, neighbor_yield_avg
- Drift scenarios: Process change (Vdd shift), equipment drift, seasonal patterns
- Action: Alert FAB engineer if drift detected, investigate root cause (sensor, process)

---

### Project 4: Data Quality Monitoring for STDF Ingestion Pipeline

**Objective:** Validate STDF data quality before reaching ML models, prevent garbage-in-garbage-out (99.9% clean data target).

**Business Value:** Bad STDF data causes model failures (exceptions), incorrect predictions, degraded accuracy. Data quality gates catch issues upstream, save debugging time (8 hours ‚Üí 15 minutes).

**Features to Implement:**
- Schema validation: Required fields (wafer_id, test_name, test_value, test_limits, pass_fail)
- Range checks: Vdd 1.0-1.4V, Idd 0-200mA, frequency 1000-3000MHz, temperature -40 to 125¬∞C
- Null detection: Flag missing critical parameters (would crash model)
- Distribution monitoring: Alert if mean shifts >2 std devs (indicates sensor drift)
- Anomaly detection: IQR method for outliers (may be valid extreme cases or bad data)

**Success Criteria:**
- Block data batch if critical issues (missing required fields, 100% nulls)
- Alert if high severity issues (>5% range violations, distribution shift)
- Log medium/low issues for review (anomalies, minor violations)
- Process 10,000 STDF files/day with <1 minute validation latency per file
- Comprehensive report: Issue counts, severity, affected fields, recommendations

**STDF Data Application:**
- Input: 10,000 devices per wafer, 100 parametric tests per device
- Validation: Ensure test_name, test_value, test_limits present and within bounds
- Distribution: Compare current wafer vs 30-day historical (mean/std Vdd, Idd)
- Action: Reject wafer if >10% devices have out-of-range parameters

---

### Project 5: Feature Store for Customer Lifetime Value (CLV) Model

**Objective:** Build feature store serving customer behavioral features for CLV prediction with daily retraining and real-time inference.

**Business Value:** CLV model drives marketing spend decisions ($10M annual budget). Feature consistency ensures training accuracy translates to production (no 20% degradation from skew).

**Features to Implement:**
- Offline store: Daily batch features (customer_30day_purchases, avg_order_value, category_preferences)
- Online store: Real-time features (days_since_last_purchase, current_cart_value)
- Point-in-time correctness: Training features from date T use only data available at T
- Feature versioning: Track changes (added recency features in v2.0)
- Scheduled updates: Daily offline feature refresh (2AM ET), online cache warming (6AM ET)

**Success Criteria:**
- Offline features updated daily (process 10M customer records in <1 hour)
- Online features served with <10ms p99 latency
- Point-in-time correctness: Zero data leakage (validated with historical backtests)
- Feature version tracking: Can reproduce training data from 6 months ago
- A/B test validation: New features increase CLV prediction accuracy by >3%

**Data Application:**
- Historical features: 30-day purchase count, avg order value, category counts
- Real-time features: cart_value, time_on_site, page_views
- Derived features: recency (days since last), frequency (purchases/30days), monetary (total spend)
- Model: Gradient boosting regressor (predicts CLV over next 12 months)

---

### Project 6: Drift Detection for Recommendation Engine

**Objective:** Monitor recommendation model performance and detect user behavior shifts (concept drift) within 24 hours.

**Business Value:** Recommendation CTR drives revenue ($500K/day). Concept drift (e.g., holiday season, trending products) degrades CTR 10-20%. Early detection enables rapid retraining, saves $50K/day.

**Features to Implement:**
- CTR monitoring: Track click-through rate hourly (baseline: 3.5%, alert if <3.2%)
- Distribution monitoring: Item popularity distribution (detect trending products)
- Cohort analysis: CTR by user segment (new users, power users, inactive)
- Seasonality detection: Compare current vs same week last year (expected patterns)
- Automated retraining: Trigger if CTR drops >5% for 24 hours

**Success Criteria:**
- Detect CTR degradation within 24 hours (hourly monitoring)
- Distinguish concept drift from temporary noise (24-hour sustained drop triggers alert)
- Cohort-level analysis: Identify which user segments affected
- Automated retraining: New model trained and deployed within 12 hours
- ROI tracking: CTR recovery after retraining (3.2% ‚Üí 3.6%)

**Data Application:**
- Metrics: CTR, conversion rate, revenue per impression
- Drift scenarios: Holiday season, viral products, competitor promotions
- Action: Retrain model with recent 7 days (capture current trends)
- Validation: A/B test new model (ensure CTR improvement >5%)

---

### Project 7: Real-Time Feature Validation for Autonomous Driving

**Objective:** Validate sensor features in real-time (<1ms latency) before feeding to perception models (safety-critical).

**Business Value:** Invalid features cause perception failures (miss pedestrians, wrong lane detection). Real-time validation prevents catastrophic failures, ensures safety.

**Features to Implement:**
- Schema validation: Required fields (lidar_points, camera_rgb, radar_detections)
- Range checks: Lidar distance 0-100m, camera resolution 1920x1080, radar velocity -50 to 50 m/s
- Sensor fusion validation: Timestamp alignment (<10ms skew between sensors)
- Null detection: Missing sensor data triggers safe mode (slow down, alert driver)
- Anomaly detection: Sudden sensor spikes (likely sensor failure, not real objects)

**Success Criteria:**
- Validation latency <1ms (perception pipeline requires <50ms end-to-end)
- Zero false negatives: Never allow invalid data to reach model (safety-critical)
- False positive rate <0.1%: Minimize safe mode triggers (user experience)
- Sensor fusion validation: Ensure multi-sensor timestamps aligned within 10ms
- Comprehensive logging: All validation failures logged for post-incident analysis

**Data Application:**
- Sensors: Lidar (100K points/frame), camera (1920x1080 RGB), radar (detections with velocity)
- Validation: Check timestamps aligned, ranges valid, no missing data
- Action: If validation fails, trigger safe mode (reduce speed, alert driver)
- Logging: Record failures for debugging (sensor malfunction, software bug)

---

### Project 8: Feature Lineage Tracking for Regulatory Compliance

**Objective:** Build complete feature lineage system for ML models in regulated industries (FDA medical devices, financial services).

**Business Value:** Regulatory audits require traceability: raw data ‚Üí features ‚Üí predictions. Manual documentation costs 40 hours/model. Automated lineage saves 95% effort.

**Features to Implement:**
- Source tracking: Map each feature to source tables/columns (e.g., neighbor_yield_avg ‚Üê stdf.wafer_test.die_yield)
- Transformation logging: Record transformation logic (SQL, Python code)
- Versioning: Track feature definition changes over time (v1.0 used simple avg, v2.0 added spatial weighting)
- Model linkage: Which features used by which models
- Audit report generation: One-click report for regulators (complete lineage graph)

**Success Criteria:**
- 100% feature coverage: Every feature has documented lineage
- Automated tracking: No manual documentation (lineage captured during feature engineering)
- Audit report generation: <5 minutes for complete model lineage
- Historical reconstruction: Can reproduce feature values from 2 years ago
- Compliance validation: Passes FDA/financial regulatory audits

**STDF Data Application:**
- Raw data: stdf.parametric_tests table (test_name, test_value, wafer_id, die_x, die_y)
- Feature: neighbor_yield_avg ‚Üê AVG(yield) WHERE die within 3mm radius
- Transformation: Spatial join + aggregation (documented in lineage)
- Model: Uses neighbor_yield_avg for binning decisions
- Audit trail: Raw STDF ‚Üí derived feature ‚Üí model prediction ‚Üí binning decision

## 7. üéØ Comprehensive Takeaways: Mastering Feature Stores & Monitoring

---

### 1. **The Training-Serving Skew Problem**

**What is Training-Serving Skew:**
- **Training time:** Features computed in batch (Spark, pandas) using historical data
- **Serving time:** Features must be computed in real-time (<10ms) with live data
- **Skew:** Different code paths ‚Üí different feature values ‚Üí accuracy degradation

**Real-World Impact:**
```
Training accuracy: 95%
Production accuracy: 75% (20% degradation!)
Root cause: Training used 30-day rolling average (batch),
           Production used 7-day average (different logic)
```

**Solution: Feature Store:**
- **Single source of truth:** Same feature definitions for training and serving
- **Offline store:** Batch features for training (Parquet, Delta Lake)
- **Online store:** Real-time features for inference (Redis, DynamoDB)
- **Consistent computation:** Same code generates offline and online features

**Example:**
```python
# Feature definition (used for both offline and online)
def compute_neighbor_yield_avg(wafer_id, die_x, die_y, data_source):
    neighbors = data_source.get_neighbors(die_x, die_y, radius=3mm)
    return np.mean([d.yield_pct for d in neighbors])

# Offline: data_source = Spark DataFrame (historical STDF)
# Online: data_source = Redis cache (current wafer)
# Result: Identical features, zero skew
```

---

### 2. **Feature Store Architecture Patterns**

**Offline Store (Training):**
- **Purpose:** Batch feature generation for model training
- **Storage:** Parquet (S3), Delta Lake, Hive, BigQuery
- **Latency:** Minutes to hours (not time-critical)
- **Volume:** Terabytes (years of historical data)
- **Access pattern:** Large scans (millions of rows for training)

**Online Store (Inference):**
- **Purpose:** Low-latency feature retrieval for predictions
- **Storage:** Redis, DynamoDB, Cassandra, Aerospike
- **Latency:** <5-10ms p99 (strict SLA)
- **Volume:** Gigabytes (recent data only, cache-friendly)
- **Access pattern:** Point lookups (single entity per request)

**Hybrid Pattern:**
```
Training pipeline:
1. Read offline store (30 days of historical features)
2. Train model on batch features
3. Validate model accuracy

Inference pipeline:
1. Read online store (latest features for entity_id)
2. Model.predict(online_features)
3. Serve prediction in <100ms
```

**Key Decision: When to use which store:**
- **Offline only:** Batch predictions (daily reports, email campaigns)
- **Online only:** Real-time predictions (fraud detection, recommendations)
- **Hybrid (most common):** Train on offline, serve from online

---

### 3. **Point-in-Time Correctness**

**The Data Leakage Problem:**
```python
# WRONG: Data leakage from future into past
training_data = []
for date in training_dates:
    features = compute_features(data_up_to=today)  # BUG: Using future data!
    label = get_label(date)
    training_data.append((features, label))

# Result: Model sees future information during training,
#         overestimates accuracy, fails in production
```

**Correct Point-in-Time Join:**
```python
# CORRECT: Only use data available at prediction time
training_data = []
for date in training_dates:
    features = compute_features(data_up_to=date)  # Only past data
    label = get_label(date)
    training_data.append((features, label))

# Result: Model trained on realistic features,
#         accuracy matches production
```

**Implementation:**
- **Timestamp columns:** Every feature row has `feature_timestamp`
- **Join logic:** `SELECT * FROM features WHERE feature_timestamp <= prediction_time`
- **Validation:** Historical backtests (predict 2023-01-01 using only 2022 data)

**Post-Silicon Example:**
```python
# Train yield model for date 2024-01-01
# Feature: neighbor_yield_avg (spatial correlation)

# WRONG:
neighbor_yield = avg(yield_pct for all devices)  # Includes future devices!

# CORRECT:
neighbor_yield = avg(yield_pct for devices WHERE test_timestamp < 2024-01-01)

# Validation:
# Backtest on 2023 data, verify predictions match actual (no lookahead)
```

---

### 4. **Feature Versioning and Evolution**

**Why Feature Versioning:**
- Features evolve over time (new sources, improved logic, bug fixes)
- Models depend on specific feature versions
- Need to reproduce training from 6 months ago (regulatory, debugging)

**Versioning Strategy:**
```python
# Feature group version history
wafer_aggregates:
  v1.0 (2024-01-01): Initial version (simple mean/std)
  v1.1 (2024-03-15): Added spatial correlation features
  v2.0 (2024-06-01): Breaking change (neighbor radius 3mm ‚Üí 5mm)
  v2.1 (2024-08-01): Bug fix (null handling)
```

**Model-Feature Compatibility:**
```
Model: yield_prediction_v3.0
Required features: wafer_aggregates >= v1.1, device_parametrics >= v2.0

# Feature store validates compatibility
if wafer_aggregates.version < v1.1:
    raise IncompatibleFeatureVersion("Need wafer_aggregates v1.1+ for spatial features")
```

**Migration Strategy:**
- **Backward compatible (v1.0 ‚Üí v1.1):** Add new features, keep old ones
- **Breaking change (v1.1 ‚Üí v2.0):** Retrain all models, coordinated deployment
- **Rollback:** Keep v1.1 online for 30 days (models can rollback if v2.0 fails)

---

### 5. **Real-Time Feature Serving Optimizations**

**Latency Budget:**
```
Total inference SLA: <100ms
- Feature retrieval: 10ms (online store)
- Model prediction: 50ms (forward pass)
- Post-processing: 10ms (formatting, logging)
- Network overhead: 30ms (API latency)
```

**Optimization Techniques:**

**1. Feature Caching:**
```python
# Without cache: Query database for every request (50ms)
features = db.query("SELECT * FROM features WHERE user_id = ?", user_id)

# With cache: In-memory lookup (1ms)
features = cache.get(f"features:user:{user_id}")
if not features:
    features = db.query(...)
    cache.set(f"features:user:{user_id}", features, ttl=3600)

# Speedup: 50ms ‚Üí 1ms (50x faster)
```

**2. Batch Retrieval:**
```python
# Without batching: N separate calls (N * 5ms)
for user_id in user_ids:
    features = redis.get(f"user:{user_id}")  # 100 calls √ó 5ms = 500ms

# With batching: Single pipelined call
features = redis.mget([f"user:{uid}" for uid in user_ids])  # 1 call √ó 10ms = 10ms

# Speedup: 500ms ‚Üí 10ms (50x faster)
```

**3. Derived Feature Computation:**
```python
# Base features (cached): user_30day_avg, user_30day_std
# Derived features (computed on-the-fly): 
#   - coefficient_of_variation = std / mean
#   - z_score = (current_value - mean) / std

# Tradeoff: Store 10 base features (small cache), compute 30 derived (fast math)
# vs Store 40 total features (large cache, slower lookups)
```

**4. Cache Warming:**
```
# Pre-fetch features for high-probability users (machine learning!)
# Example: Users likely to visit site in next hour

During low-traffic (2AM-6AM):
  for user_id in predicted_active_users:
      cache.set(f"user:{user_id}", compute_features(user_id))

During peak traffic (12PM-6PM):
  Cache hit rate: 90% (most users pre-cached)
  P99 latency: 5ms (cache hits) instead of 50ms (database queries)
```

---

### 6. **Drift Detection Methods**

**Types of Drift:**

**1. Data Drift (Input Distribution Change):**
- **Definition:** Feature distributions shift over time
- **Example:** avg_vdd changes from 1.2V to 1.25V (process change)
- **Impact:** Model still predicts, but accuracy degrades
- **Detection:** KS test, PSI, distribution comparison

**2. Concept Drift (Input-Output Relationship Change):**
- **Definition:** Relationship between features and target changes
- **Example:** User behavior changes (holiday season, pandemic)
- **Impact:** Same features ‚Üí different predictions needed
- **Detection:** Accuracy degradation, prediction distribution shift

**3. Label Drift (Output Distribution Change):**
- **Definition:** Target variable distribution changes
- **Example:** Fraud rate increases from 0.1% to 0.5%
- **Impact:** Class imbalance worsens, model predictions biased
- **Detection:** Target distribution comparison

**Statistical Tests:**

**Kolmogorov-Smirnov (KS) Test:**
```python
# Compare two distributions (reference vs current)
from scipy.stats import ks_2samp

reference_vdd = [1.18, 1.19, 1.20, 1.21, ...]  # Training data
current_vdd = [1.23, 1.24, 1.25, 1.26, ...]    # Production data

ks_statistic, p_value = ks_2samp(reference_vdd, current_vdd)

# Interpretation:
# p_value < 0.05: Distributions are significantly different (drift detected)
# p_value >= 0.05: No significant difference (no drift)

if p_value < 0.05:
    alert("Data drift detected in avg_vdd")
```

**Population Stability Index (PSI):**
```python
# Binned comparison (useful for categorical/discrete features)

def calculate_psi(baseline, current, bins=10):
    # Create bins
    bin_edges = np.percentile(baseline, np.linspace(0, 100, bins+1))
    
    # Histogram
    baseline_hist, _ = np.histogram(baseline, bins=bin_edges)
    current_hist, _ = np.histogram(current, bins=bin_edges)
    
    # Normalize
    baseline_pct = baseline_hist / len(baseline)
    current_pct = current_hist / len(current)
    
    # PSI formula
    psi = np.sum((current_pct - baseline_pct) * np.log(current_pct / baseline_pct))
    
    return psi

# Interpretation:
# PSI < 0.1: No significant change
# PSI 0.1-0.25: Moderate change (investigate)
# PSI > 0.25: Significant change (retrain model)
```

**When to Use Each:**
- **KS Test:** Continuous features (voltage, current, price)
- **PSI:** Any feature (handles categorical), easier to interpret
- **Chi-Square:** Categorical features (binning category, user segment)

---

### 7. **Data Quality Validation Rules**

**Schema Validation:**
```python
# Define expected schema
schema = {
    'wafer_id': {'type': 'string', 'required': True, 'nullable': False},
    'vdd': {'type': 'float', 'required': True, 'nullable': False},
    'test_time_ms': {'type': 'float', 'required': True, 'nullable': False}
}

# Validate data
def validate_schema(df, schema):
    issues = []
    
    for field, spec in schema.items():
        # Check if required field present
        if spec['required'] and field not in df.columns:
            issues.append(f"Missing required field: {field}")
        
        # Check nullability
        if not spec['nullable'] and df[field].isnull().any():
            issues.append(f"Field {field} has nulls (not allowed)")
    
    return issues
```

**Range Validation:**
```python
# Define ranges
ranges = {
    'vdd': {'min': 1.0, 'max': 1.4},
    'idd': {'min': 0, 'max': 200},
    'temperature': {'min': -40, 'max': 125}
}

# Validate ranges
def validate_ranges(df, ranges):
    issues = []
    
    for field, bounds in ranges.items():
        if field in df.columns:
            violations = df[(df[field] < bounds['min']) | (df[field] > bounds['max'])]
            
            if len(violations) > 0:
                issues.append(f"{field}: {len(violations)} values out of range [{bounds['min']}, {bounds['max']}]")
    
    return issues
```

**Distribution Validation:**
```python
# Compare current vs historical
def validate_distribution(current_df, baseline_stats, threshold_std=2.0):
    issues = []
    
    for col, baseline in baseline_stats.items():
        current_mean = current_df[col].mean()
        shift_stds = abs(current_mean - baseline['mean']) / baseline['std']
        
        if shift_stds > threshold_std:
            issues.append(f"{col}: Mean shifted {shift_stds:.1f} std devs")
    
    return issues
```

**Severity Levels:**
- **Critical:** Block data pipeline (missing required fields, 100% nulls)
- **High:** Alert on-call engineer (type mismatches, excessive nulls)
- **Medium:** Log for review (range violations, unexpected values)
- **Low:** Informational (anomalies, minor issues)

---

### 8. **Monitoring Metrics and Dashboards**

**Feature Store Metrics:**
```
Offline Store:
- Batch job duration (target: <30 min for 1TB)
- Feature freshness (age of latest features)
- Storage size (GB, growth rate)
- Failed batch jobs (count, error types)

Online Store:
- Latency (p50, p95, p99) - target: <5ms p99
- Throughput (QPS) - target: 10K QPS
- Cache hit rate - target: >85%
- Error rate - target: <0.1%
- Storage size (GB, eviction rate)
```

**Model Performance Metrics:**
```
Accuracy:
- Overall accuracy (classification) or MAE/RMSE (regression)
- Accuracy by cohort (new users, power users, etc.)
- Accuracy trend (7-day rolling average)

Drift:
- Feature drift count (how many features drifted)
- Concept drift p-value (KS test)
- Distribution shift magnitude (PSI, std devs)

Alerts:
- Total alerts fired (count per day)
- False positive rate (alerts without real issues)
- Time to resolution (alert ‚Üí fix deployed)
```

**Data Quality Metrics:**
```
Validation:
- Total records validated (count per day)
- Issues by severity (critical, high, medium, low)
- Pass rate (% batches with no critical issues)

Specific Checks:
- Schema validation pass rate
- Range violations (count, percentage)
- Null rate (% records with nulls)
- Distribution shift count (fields drifted)
- Anomaly rate (% outliers)
```

**Dashboard Layout:**
```
Real-Time Monitoring Dashboard:

Row 1: KPIs
- P99 latency (gauge, target: <10ms)
- Cache hit rate (gauge, target: >85%)
- Error rate (gauge, target: <0.1%)
- Throughput (line chart, 5-minute window)

Row 2: Drift Detection
- Feature drift count (bar chart, by feature)
- Concept drift p-value (line chart, 24-hour window)
- Accuracy trend (line chart, 7-day rolling)

Row 3: Data Quality
- Issues by severity (stacked bar chart)
- Pass rate (line chart, daily)
- Top violated fields (table)

Row 4: Alerts
- Recent alerts (table, last 24 hours)
- Alert history (timeline)
- On-call status (who's responding)
```

---

### 9. **Feature Store Best Practices**

**1. Feature Naming Conventions:**
```python
# Good: Descriptive, unambiguous
user_30day_purchase_count
device_avg_vdd_7day
wafer_neighbor_yield_avg_3mm

# Bad: Ambiguous, unclear time window
user_purchases  # 30 days? All time?
device_voltage  # Average? Max? Current?
wafer_yield  # Individual die? Wafer average?
```

**2. Feature Documentation:**
```python
# Feature metadata
{
    'name': 'wafer_neighbor_yield_avg_3mm',
    'description': 'Average yield of neighboring dies within 3mm radius',
    'type': 'float',
    'range': [0, 100],
    'source': 'stdf.wafer_test.parametric_results',
    'transformation': 'AVG(yield_pct) WHERE DISTANCE(die_x, die_y, x, y) < 3mm',
    'owner': 'fab_analytics_team',
    'created': '2024-01-15',
    'version': 'v1.1',
    'dependencies': ['die_x', 'die_y', 'yield_pct'],
    'sla': 'Updated daily 2AM ET'
}
```

**3. Feature Granularity:**
```python
# Coarse granularity: User-level
user_lifetime_purchases = 450

# Fine granularity: User-product-level
user_product_purchases = {
    'product_A': 100,
    'product_B': 250,
    'product_C': 100
}

# Tradeoff:
# Coarse: Smaller cache, faster lookups, less expressive
# Fine: Larger cache, slower lookups, more expressive

# Rule: Start coarse, add fine-grained when models need it
```

**4. Feature Store as Platform:**
```
Team A (Binning Model):
- Uses: wafer_aggregates, device_parametrics
- Contributes: binning_predictions (for downstream models)

Team B (Yield Model):
- Uses: wafer_aggregates, binning_predictions
- Contributes: yield_forecasts

Team C (Test Optimization):
- Uses: wafer_aggregates, device_parametrics
- Contributes: test_coverage_metrics

# Result: Feature reuse, faster experimentation, consistent features
```

---

### 10. **Monitoring Alert Strategy**

**Alert Levels:**

**P0 (Critical):**
- Production model serving errors >1% (blocking predictions)
- Feature store online store down (inference impossible)
- Data quality: Critical issues (missing required fields)
- **Response:** Page on-call immediately, all hands on deck

**P1 (High):**
- Accuracy drop >10% (sustained for 1 hour)
- Feature drift detected (multiple features)
- Latency p99 >50ms (violating SLA)
- **Response:** Alert team within 15 minutes, investigate immediately

**P2 (Medium):**
- Accuracy drop 5-10% (monitor for 4 hours)
- Single feature drift (investigate root cause)
- Cache hit rate <70% (degraded performance)
- **Response:** Alert team within 1 hour, investigate during business hours

**P3 (Low):**
- Data quality: Low severity issues (anomalies)
- Feature store batch job delayed (not blocking production)
- Minor range violations (<1% of records)
- **Response:** Log for review, no immediate action

**Alert Fatigue Prevention:**
```python
# Bad: Alert on every single issue
if accuracy < baseline_accuracy:
    alert("Accuracy drop!")  # Fires 100 times/day (noise)

# Good: Alert only on sustained issues
accuracy_window = last_1000_predictions()
if accuracy_window.mean() < baseline_accuracy - 0.05:
    if sustained_for_hours >= 4:
        alert("Sustained accuracy drop >5% for 4 hours")
```

**Alert Context:**
```python
# Bad: Vague alert
"Drift detected"

# Good: Actionable alert
Alert: Feature Drift Detected
  Feature: avg_vdd
  Baseline: 1.20V ¬± 0.02V
  Current: 1.25V ¬± 0.03V
  Shift: +2.5 std devs
  Possible causes:
    - Process change (new fab settings?)
    - Equipment drift (sensor calibration?)
    - Data pipeline issue (unit conversion?)
  Recommended action:
    - Investigate with FAB team
    - Check equipment calibration logs
    - Validate data pipeline
  Runbook: https://wiki.company.com/drift-investigation
```

---

### 11. **Feature Store Scaling Strategies**

**Offline Store Scaling:**

**1. Partitioning:**
```python
# Partition by date (enables incremental updates)
/features/wafer_aggregates/date=2024-01-01/part-00000.parquet
/features/wafer_aggregates/date=2024-01-02/part-00000.parquet

# Partition by entity (enables parallel processing)
/features/user_features/user_id_range=0-999999/part-00000.parquet
/features/user_features/user_id_range=1000000-1999999/part-00000.parquet

# Query optimization:
SELECT * FROM wafer_aggregates WHERE date >= '2024-01-01'
# Only scans relevant partitions (10GB instead of 1TB)
```

**2. Incremental Computation:**
```python
# Naive: Recompute all features daily (expensive)
user_30day_purchases = df.groupby('user_id').agg({'purchase': 'count'})

# Incremental: Only compute for new data
yesterday_purchases = new_data.groupby('user_id').agg({'purchase': 'count'})
user_30day_purchases = (
    old_features.join(yesterday_purchases, 'user_id')
    .select('user_id', (old_count - oldest_day_count + yesterday_count).alias('count'))
)

# Speedup: 10 hours ‚Üí 30 minutes (20x faster)
```

**Online Store Scaling:**

**1. Sharding:**
```python
# Single Redis instance: 10K QPS limit
# Sharded Redis cluster: 100K QPS (10 shards)

shard = hash(user_id) % num_shards
redis_shard = redis_cluster[shard]
features = redis_shard.get(f"user:{user_id}")
```

**2. Read Replicas:**
```python
# Primary: Writes only (feature updates)
# Replicas: Reads only (inference requests)

if operation == 'write':
    redis_primary.set(key, value)
else:
    redis_replica = random.choice(redis_replicas)  # Load balance
    redis_replica.get(key)

# Read capacity: 10K QPS √ó 5 replicas = 50K QPS
```

**3. TTL-based Eviction:**
```python
# Problem: Cache grows unbounded (memory exhaustion)

# Solution: TTL-based eviction
redis.setex(
    key=f"user:{user_id}",
    value=features,
    time=3600  # Expire after 1 hour
)

# Result: Only active users in cache (10M users ‚Üí 1M cached)
```

---

### 12. **Real-Time vs Batch Feature Pipelines**

**Batch Features (Offline Store):**
```python
# Scheduled nightly job (2AM ET)
spark.read.parquet("s3://raw-data/stdf/")
    .groupBy("wafer_id")
    .agg(
        avg("vdd").alias("avg_vdd"),
        stddev("vdd").alias("std_vdd"),
        count("*").alias("device_count")
    )
    .write.parquet("s3://features/wafer_aggregates/date=2024-01-01/")

# Characteristics:
# - Latency: 30 minutes (not time-critical)
# - Volume: 1TB (millions of wafers)
# - Schedule: Daily (updated once per day)
# - Use case: Training data generation
```

**Real-Time Features (Online Store):**
```python
# Streaming pipeline (Kafka ‚Üí Flink ‚Üí Redis)
wafer_stream
    .keyBy("wafer_id")
    .window(TumblingEventTimeWindows.of(Time.hours(1)))
    .aggregate(new WaferAggregator())
    .addSink(new RedisSink())

# Characteristics:
# - Latency: <1 second (event-time to feature-available)
# - Volume: 1K wafers/sec (streaming rate)
# - Schedule: Continuous (updated on every event)
# - Use case: Real-time inference
```

**Hybrid Pattern:**
```python
# Base features: Batch (expensive aggregations)
user_1year_purchases = batch_compute()  # Runs daily

# Delta features: Real-time (incremental updates)
user_today_purchases = stream_compute()  # Runs continuously

# Combined feature:
user_total_purchases = user_1year_purchases + user_today_purchases

# Tradeoff:
# - Batch: Accurate but stale (updated daily)
# - Real-time: Fresh but approximate (only recent data)
# - Hybrid: Best of both (accurate base + fresh delta)
```

---

### 13. **Feature Store Integration with ML Lifecycle**

**Training Pipeline:**
```python
# 1. Fetch offline features
features_df = feature_store.get_offline_features(
    feature_groups=['wafer_aggregates', 'device_parametrics'],
    start_date='2023-01-01',
    end_date='2023-12-31'
)

# 2. Train model
model = train_model(features_df)

# 3. Log feature metadata
mlflow.log_param("feature_groups", ['wafer_aggregates', 'device_parametrics'])
mlflow.log_param("feature_versions", {'wafer_aggregates': 'v2.0', 'device_parametrics': 'v1.5'})

# 4. Register model with feature requirements
model_registry.register(
    model=model,
    required_features=['avg_vdd', 'std_vdd', 'neighbor_yield_avg'],
    feature_versions={'wafer_aggregates': 'v2.0'}
)
```

**Inference Pipeline:**
```python
# 1. Fetch online features
features = feature_store.get_online_features(
    entity_ids=['W0001'],
    feature_names=['avg_vdd', 'std_vdd', 'neighbor_yield_avg']
)

# 2. Validate feature compatibility
model_metadata = model_registry.get_metadata(model_id)
if features.version < model_metadata.required_version:
    raise IncompatibleFeatureVersion()

# 3. Make prediction
prediction = model.predict(features)

# 4. Log prediction for monitoring
monitor.log_prediction(
    prediction=prediction,
    features=features,
    model_version=model_metadata.version
)
```

---

### 14. **Cost Optimization Strategies**

**Offline Store:**
```python
# Cost: $100/TB/month (S3 storage)

# Optimization 1: Compression
# Uncompressed: 1TB
# Snappy compression: 300GB (3x smaller, $33/month)
df.write.option("compression", "snappy").parquet("s3://features/")

# Optimization 2: Columnar storage
# Row-based (CSV): 1TB
# Columnar (Parquet): 200GB (5x smaller, $20/month)

# Optimization 3: Retention policy
# Keep 30 days: 30TB √ó $100 = $3,000/month
# Keep 7 days: 7TB √ó $100 = $700/month (4.3x cheaper)
```

**Online Store:**
```python
# Cost: $1,000/month (Redis cluster, 100GB memory)

# Optimization 1: TTL-based eviction
# All users: 100GB (10M users √ó 10KB)
# Active users only: 20GB (2M active √ó 10KB, 5x cheaper)
redis.setex(key, features, ttl=3600)

# Optimization 2: Feature compression
# Full feature set: 10KB per user
# Compressed: 2KB per user (5x smaller, 5x cheaper)
features = compress(features)

# Optimization 3: Derived features on-the-fly
# Store 10 base features: 1KB
# Compute 20 derived features: 0KB (free)
# Instead of storing 30 features: 3KB
```

---

### 15. **Testing and Validation**

**Feature Store Testing:**
```python
# Test 1: Offline-Online Consistency
def test_offline_online_consistency():
    entity_id = "W0001"
    
    # Get offline features (historical)
    offline_features = feature_store.get_offline_features(
        entity_ids=[entity_id],
        date='2024-01-01'
    )
    
    # Get online features (current)
    online_features = feature_store.get_online_features(
        entity_ids=[entity_id]
    )
    
    # Compare (should match for same timestamp)
    assert offline_features['avg_vdd'] == online_features['avg_vdd']

# Test 2: Point-in-Time Correctness
def test_point_in_time_correctness():
    # Features for 2024-01-01 should not include data from 2024-01-02
    features_jan1 = feature_store.get_offline_features(
        date='2024-01-01'
    )
    
    # All feature timestamps should be <= 2024-01-01
    assert features_jan1['feature_timestamp'].max() <= datetime(2024, 1, 1)

# Test 3: Feature Lineage Validation
def test_feature_lineage():
    lineage = feature_store.get_feature_lineage('neighbor_yield_avg')
    
    # Should trace back to source table
    assert 'stdf.wafer_test' in lineage['source_tables']
```

**Monitoring Testing:**
```python
# Test 1: Drift Detection Sensitivity
def test_drift_detection():
    # Generate drifted data
    baseline = np.random.normal(1.2, 0.02, 1000)
    drifted = np.random.normal(1.25, 0.02, 1000)  # Mean shifted
    
    # Should detect drift
    drift_detected = monitor.detect_feature_drift(baseline, drifted)
    assert drift_detected == True

# Test 2: Alert Thresholds
def test_alert_thresholds():
    # Accuracy drop within threshold (no alert)
    monitor.log_accuracy(baseline=0.95, current=0.93)  # 2% drop
    assert monitor.alerts == []
    
    # Accuracy drop exceeds threshold (alert)
    monitor.log_accuracy(baseline=0.95, current=0.89)  # 6% drop
    assert len(monitor.alerts) == 1
```

---

### 16. **Debugging and Troubleshooting**

**Common Issues:**

**1. Training-Serving Skew:**
```
Symptom: Training accuracy 95%, production accuracy 75%
Root cause: Different feature logic for offline vs online

Debugging:
1. Check feature definitions (offline vs online)
2. Compare feature values for same entity (should match)
3. Verify point-in-time correctness (no data leakage)

Fix:
- Unify feature computation logic
- Add integration test (offline-online consistency)
```

**2. Feature Store Latency Spikes:**
```
Symptom: P99 latency 50ms (normally 5ms)
Root cause: Cache eviction, database query fallback

Debugging:
1. Check cache hit rate (should be >85%)
2. Monitor cache size (evictions?)
3. Check database load (slow queries?)

Fix:
- Increase cache TTL (reduce evictions)
- Pre-warm cache (scheduled job)
- Optimize database queries (indexes)
```

**3. Drift False Positives:**
```
Symptom: Drift alerts every day (not real drift)
Root cause: Normal variance, not distribution shift

Debugging:
1. Check sample size (too small?)
2. Review threshold (too sensitive?)
3. Visualize distributions (real shift or noise?)

Fix:
- Increase sample size (1000 ‚Üí 5000)
- Adjust threshold (2 std ‚Üí 3 std)
- Use longer windows (1 day ‚Üí 7 days)
```

**4. Data Quality Issues:**
```
Symptom: Model exceptions, NaN predictions
Root cause: Missing data, out-of-range values

Debugging:
1. Check data quality report (validation issues?)
2. Identify problematic fields (which nulls/violations?)
3. Trace to source (pipeline bug? sensor failure?)

Fix:
- Add null handling (imputation, defaults)
- Add range validation (clip values)
- Fix upstream pipeline (data ingestion bug)
```

---

### 17. **Advanced Topics**

**1. Feature Stores at Scale:**
- **Uber's Michelangelo:** 10,000+ features, 100+ models, petabyte scale
- **Airbnb's Zipline:** Real-time + batch features, Kafka + Spark
- **LinkedIn's Feathr:** Declarative feature definitions, automated backfill

**2. Automated Feature Engineering:**
- **Featuretools:** Automated deep feature synthesis (DFS)
- **AutoFeat:** Automatic feature generation and selection
- **tsfresh:** Time-series feature extraction

**3. Feature Store as a Service:**
- **Tecton:** Cloud-native feature store (AWS, GCP, Azure)
- **Feast:** Open-source feature store (Linux Foundation)
- **SageMaker Feature Store:** AWS managed service

**4. Privacy-Preserving Feature Stores:**
- **Differential privacy:** Add noise to features (protect individual privacy)
- **Federated features:** Compute features across devices without centralizing data
- **Homomorphic encryption:** Compute on encrypted features

---

### 18. **Key Takeaways Summary**

‚úÖ **Feature stores eliminate training-serving skew** by ensuring identical feature computation for training (offline) and inference (online)

‚úÖ **Point-in-time correctness prevents data leakage** - only use data available at prediction time, not future information

‚úÖ **Real-time feature serving requires <5-10ms p99 latency** - use caching, batching, and derived features for optimization

‚úÖ **Drift detection catches model degradation early** - monitor feature distributions (KS test, PSI) and accuracy trends

‚úÖ **Data quality validation prevents garbage-in-garbage-out** - schema validation, range checks, distribution monitoring

‚úÖ **Feature versioning enables reproducibility** - track feature definition changes, ensure model-feature compatibility

‚úÖ **Monitoring requires multi-layer approach** - feature store health, model performance, data quality

‚úÖ **Alert strategy prevents fatigue** - P0-P3 severity levels, sustained thresholds, actionable context

‚úÖ **Post-silicon applications are critical** - STDF feature stores, wafer-level aggregations, real-time binning

‚úÖ **Production checklist:** Offline-online consistency, point-in-time correctness, lineage tracking, monitoring dashboards

---

### 19. **Production Readiness Checklist**

**Feature Store:**
- [ ] Offline store implemented (Parquet/Delta Lake on S3)
- [ ] Online store implemented (Redis/DynamoDB with <5ms p99)
- [ ] Point-in-time correctness validated (historical backtests pass)
- [ ] Feature versioning enabled (can reproduce 6 months ago)
- [ ] Lineage tracking (raw data ‚Üí features ‚Üí models)
- [ ] Offline-online consistency tests (integration tests pass)

**Monitoring:**
- [ ] Model performance monitoring (accuracy, drift, latency)
- [ ] Feature drift detection (KS test, PSI for all features)
- [ ] Data quality validation (schema, ranges, distributions)
- [ ] Dashboards (real-time metrics, alerts, trends)
- [ ] Alert integration (Slack, PagerDuty, email)
- [ ] Runbooks (troubleshooting guides for common issues)

**Operations:**
- [ ] Scheduled batch jobs (offline features updated daily)
- [ ] Real-time pipelines (streaming features, <1s latency)
- [ ] Cache warming (pre-fetch features for high-probability entities)
- [ ] Backup and recovery (feature store backups, disaster recovery)
- [ ] Cost optimization (compression, retention, TTL eviction)
- [ ] Documentation (feature catalog, API docs, runbooks)

---

### 20. **Next Steps in Learning**

**Notebook 130: ML Observability & Debugging**
- Distributed tracing for ML pipelines (trace feature ‚Üí model ‚Üí prediction)
- Model debugging with SHAP/LIME (explain predictions)
- Performance profiling (identify latency bottlenecks)

**Notebook 131: Container Orchestration for ML**
- Kubernetes for model serving (horizontal scaling, health checks)
- Docker multi-stage builds (optimize image size)
- Service mesh (Istio for traffic management, observability)

**Beyond MLOps:**
- **Real-Time ML:** Stream processing (Kafka, Flink), online learning
- **Edge Deployment:** TensorFlow Lite, ONNX, model quantization
- **Federated Learning:** Train across devices without centralizing data

---

**Congratulations! You've mastered production feature stores and real-time monitoring systems.** üéâ

You now understand:
- ‚úÖ Feature store architecture (offline vs online serving)
- ‚úÖ Training-serving skew prevention (point-in-time correctness)
- ‚úÖ Real-time feature serving (<5ms p99 latency)
- ‚úÖ Drift detection (KS test, PSI, concept drift)
- ‚úÖ Data quality monitoring (schema, ranges, distributions)
- ‚úÖ Production-grade monitoring (dashboards, alerts, runbooks)

**You're now equipped to build enterprise ML infrastructure that scales to billions of predictions.** üöÄ

## üéØ Key Takeaways

### When to Use Feature Stores
- **Feature reuse across teams**: Multiple models need same engineered features (device temperature trends, parametric moving averages)
- **Training-serving consistency**: Features computed identically in both batch training and real-time inference
- **Feature lineage & governance**: Track feature versions, dependencies, and transformations for compliance
- **Low-latency serving**: Sub-millisecond feature retrieval for online prediction (test floor decisions)
- **Temporal consistency**: Point-in-time lookups prevent data leakage in time-series predictions

### Limitations
- **Infrastructure overhead**: Requires deployment of feature store (Feast, Tecton), storage backends (Redis, DynamoDB), and compute for transformations
- **Learning curve**: Teams must adopt new workflows (feature registration, versioning, retrieval APIs)
- **Latency trade-offs**: Online stores fast but expensive; offline stores cheap but slower
- **Not for simple projects**: Overkill for single-model prototypes or teams <5 people

### Alternatives
- **Manual pipelines**: Direct feature engineering in training/serving code (simpler but error-prone)
- **Data warehouses**: Store features in BigQuery/Snowflake (lacks low-latency serving)
- **Model-specific preprocessing**: Embed feature logic in model (duplicates code, training-serving skew risk)

### Best Practices
- **Start simple**: Begin with offline store for batch training; add online store when latency matters
- **Feature validation**: Use Great Expectations or custom checks to validate feature distributions
- **Monitoring**: Track feature staleness, null rates, and distribution drift in production
- **Materialization schedules**: Balance freshness vs. compute cost (hourly for critical features, daily for stable ones)
- **Access control**: Implement RBAC for sensitive features (device performance data, proprietary test parameters)

## üìä Diagnostic Checks Summary

### Implementation Checklist
‚úÖ **Offline Feature Store**
- PostgreSQL/Parquet storage configured for historical features
- Feature registration with schema validation
- Point-in-time correct joins for training data generation

‚úÖ **Online Feature Store**
- Redis/DynamoDB deployed for low-latency serving (<10ms p99)
- Materialization pipeline syncs offline ‚Üí online stores
- Feature retrieval API integrated into inference service

‚úÖ **Feature Transformations**
- On-demand transformations: Real-time computations (temperature normalization, voltage deltas)
- Batch transformations: Pre-computed aggregations (24hr moving avg current, weekly yield trends)
- Streaming transformations: Kafka-based real-time feature updates

‚úÖ **Monitoring & Observability**
- Feature freshness alerts (data age >1hr for critical features)
- Null rate tracking (>5% nulls trigger investigation)
- Distribution drift detection (KS test p-value <0.05)
- Query latency monitoring (p95 <5ms for online features)

### Quality Metrics
- **Training-serving consistency**: Feature values match within 0.1% between offline training and online serving
- **Latency SLA**: 99th percentile online feature retrieval <10ms (target: 3ms)
- **Data freshness**: Critical features <1hr old, stable features <24hr old
- **Coverage**: >95% of feature requests served from cache (online store hit rate)

### Post-Silicon Validation Applications
**1. Parametric Feature Store for ATE Testing**
- Features: Vdd_rolling_mean_24hr, Idd_percentile_95, freq_deviation_z_score
- Use case: Predict test failures before running expensive long-duration tests
- Business value: 30-40% reduction in test time by skipping predicted-fail devices

**2. Spatial Features for Wafer Map Analysis**
- Features: neighbor_fail_rate_5mm, radial_position_normalized, quadrant_yield_delta
- Use case: Real-time die binning decisions during wafer sort
- Business value: Improved bin accuracy reduces overkill (good dies marked bad) by 15-25%

**3. Temporal Features for Yield Trending**
- Features: lot_yield_ema_7day, fab_defect_density_trend, equipment_uptime_ratio
- Use case: Predict lot-level yield before final test completion
- Business value: Early yield excursions detected 2-3 days sooner, faster root cause analysis

### Business ROI Estimation

**Scenario 1: Medium-Volume Semiconductor Fab (100K wafers/year)**
- Feature reuse across 15 models: $2.5M/year engineering time savings
- Training-serving consistency eliminates skew: $4M/year reduced overkill/underkill
- Low-latency serving enables real-time binning: $6M/year improved device mix
- **Total ROI: $12.5M/year** (cost: $500K infrastructure + $300K team training = $11.7M net)

**Scenario 2: High-Volume Automotive Semiconductor (500K wafers/year)**
- Enterprise feature platform (50+ models): $12M/year engineering productivity
- Point-in-time correct features prevent data leakage: $8M/year improved model accuracy
- Sub-5ms serving latency for inline decisions: $18M/year test time reduction
- Feature governance for ISO 26262 compliance: $5M/year audit cost savings
- **Total ROI: $43M/year** (cost: $2.5M infrastructure + $1.2M team/ops = $39.3M net)

**Scenario 3: Advanced Node R&D Fab (<10K wafers/year, high complexity)**
- Feature lineage for experiment reproducibility: $1.5M/year research velocity
- Cross-team feature sharing (design, test, yield teams): $3M/year collaboration efficiency
- Feature versioning for A/B testing: $2.5M/year faster model iteration
- **Total ROI: $7M/year** (cost: $400K infrastructure + $200K training = $6.4M net)

---

## üéì Mastery Achievement

**You now have production-grade expertise in:**
- ‚úÖ Designing offline + online feature stores with Redis/PostgreSQL for <10ms serving latency
- ‚úÖ Implementing point-in-time correct feature joins to prevent data leakage in time-series models
- ‚úÖ Building materialization pipelines that sync batch features to real-time stores
- ‚úÖ Monitoring feature freshness, null rates, and distribution drift in production
- ‚úÖ Applying feature stores to semiconductor yield prediction, parametric testing, and wafer map analysis

**Next Steps:**
- **Advanced Feature Engineering**: Time-series features, spatial aggregations, graph-based features
- **Feature Selection**: SHAP-based feature importance, correlation analysis, recursive elimination
- **Feature Store at Scale**: Multi-region replication, disaster recovery, horizontal scaling strategies