# AWS Insurance Demo - Latency Testing

This notebook demonstrates how to benchmark the Feast feature server for insurance use cases.

## Use Case Latency Requirements

| Use Case | Transaction Type | Target P99 |
|----------|-----------------|------------|
| Auto Underwriting | Real-Time (PCM) | < 50ms |
| Quick Quote | Real-Time (PCM) | < 20ms |
| Claims Assessment | Batch | < 200ms |
| Fraud Detection | Streaming (DSS) | < 30ms |

In [None]:
import asyncio
import time
import statistics
import pandas as pd
import matplotlib.pyplot as plt

# Feature server URL (update if running on different host/port)
FEATURE_SERVER_URL = "http://localhost:6566"

print(f"Feature Server URL: {FEATURE_SERVER_URL}")

## Quick Latency Test

Simple synchronous test to verify the feature server is responding.

In [None]:
import requests
import json

def quick_latency_test(feature_service: str, entity_key: str, entity_values: list, num_requests: int = 10):
    """Quick synchronous latency test."""
    url = f"{FEATURE_SERVER_URL}/get-online-features"
    payload = {
        "feature_service": feature_service,
        "entities": {entity_key: entity_values},
        "full_feature_names": False,
    }
    
    latencies = []
    errors = 0
    
    for i in range(num_requests):
        start = time.perf_counter()
        try:
            response = requests.post(url, json=payload, timeout=30)
            latency_ms = (time.perf_counter() - start) * 1000
            if response.status_code == 200:
                latencies.append(latency_ms)
            else:
                errors += 1
                if i == 0:
                    print(f"Error: {response.text[:200]}")
        except Exception as e:
            errors += 1
            if i == 0:
                print(f"Error: {e}")
    
    if latencies:
        sorted_l = sorted(latencies)
        print(f"Feature Service: {feature_service}")
        print(f"Batch Size: {len(entity_values)}")
        print(f"Requests: {num_requests} ({errors} errors)")
        print(f"Latency (ms): mean={statistics.mean(latencies):.2f}, "
              f"p50={sorted_l[len(sorted_l)//2]:.2f}, "
              f"p99={sorted_l[int(len(sorted_l)*0.99)]:.2f}")
        return latencies
    else:
        print(f"All {num_requests} requests failed!")
        return []

# Test with a simple feature service
print("Quick Latency Test")
print("=" * 60)
latencies = quick_latency_test(
    feature_service="benchmark_small",
    entity_key="customer_id",
    entity_values=["CUST00000001"],
    num_requests=20
)

## Comprehensive Benchmark

Run the full benchmark suite using the benchmark script.

In [None]:
# Run the benchmark script with the standard suite
!cd ../scripts && python benchmark_online_server.py \
    --server-url {FEATURE_SERVER_URL} \
    --suite standard \
    --batch-sizes 1,10,50 \
    --num-requests 50 \
    --concurrency 5

## Batch Size Impact Analysis

Test how latency scales with batch size for the underwriting use case.

In [None]:
import random

def generate_customer_ids(n):
    return [f"CUST{random.randint(1, 10000):08d}" for _ in range(n)]

# Test different batch sizes
batch_sizes = [1, 5, 10, 25, 50, 100]
results = []

print("Batch Size Impact Analysis")
print("Feature Service: underwriting_v1")
print("=" * 60)

for batch_size in batch_sizes:
    entity_values = generate_customer_ids(batch_size)
    latencies = quick_latency_test(
        feature_service="underwriting_v1",
        entity_key="customer_id",
        entity_values=entity_values,
        num_requests=20
    )
    if latencies:
        sorted_l = sorted(latencies)
        results.append({
            'batch_size': batch_size,
            'mean': statistics.mean(latencies),
            'p50': sorted_l[len(sorted_l)//2],
            'p99': sorted_l[int(len(sorted_l)*0.99)],
        })
    print()

# Display results
results_df = pd.DataFrame(results)
display(results_df)

In [None]:
# Visualize results
if results_df is not None and len(results_df) > 0:
    fig, ax = plt.subplots(figsize=(10, 6))
    
    ax.plot(results_df['batch_size'], results_df['mean'], 'b-o', label='Mean')
    ax.plot(results_df['batch_size'], results_df['p50'], 'g--o', label='P50')
    ax.plot(results_df['batch_size'], results_df['p99'], 'r-o', label='P99')
    
    # Add target line
    ax.axhline(y=50, color='orange', linestyle='--', label='Target P99 (50ms)')
    
    ax.set_xlabel('Batch Size')
    ax.set_ylabel('Latency (ms)')
    ax.set_title('Latency vs Batch Size - Underwriting Feature Service')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("No results to visualize. Ensure the feature server is running.")

## Summary and Recommendations

Based on the benchmark results, consider the following optimizations:

1. **For Real-Time PCM**: Keep batch size â‰¤ 10 for sub-50ms latency
2. **For Batch Claims**: Larger batch sizes (50-100) are acceptable
3. **For Streaming DSS**: Single entity lookups recommended for lowest latency

### Scaling Recommendations

- **DynamoDB**: Enable auto-scaling and consider DAX for caching
- **Feature Server**: Deploy behind load balancer with multiple instances
- **Redshift**: Use appropriate cluster size for offline materialization