# AWS Insurance Demo - Setup and Data Preparation

This notebook walks through the setup process for the AWS Insurance Demo feature store.

## Prerequisites
- AWS credentials configured
- Redshift cluster or serverless workgroup available
- S3 bucket created for registry and staging
- DynamoDB access for online store

## Steps
1. Generate sample data
2. Configure feature store
3. Apply Feast definitions
4. Materialize features to online store
5. Verify setup

In [None]:
# Install required packages if needed
# !pip install feast[aws] pandas numpy faker pyarrow

import os
import sys
from datetime import datetime, timedelta
import pandas as pd
import numpy as np

# Add scripts directory to path
sys.path.insert(0, '../scripts')

print("=" * 70)
print("Environment Setup")
print("=" * 70)
print(f"Python version: {sys.version}")
print(f"Pandas version: {pd.__version__}")
print(f"Current time: {datetime.now()}")
print("Setup complete!")

## 1. Generate Sample Data

Generate sample data for local testing. For production, you would load this data into Redshift.

In [None]:
# Configuration - Choose your data setup mode
NUM_CUSTOMERS = 10000  # Adjust as needed (1000 for quick test, 10000+ for realistic)
DATA_OUTPUT_DIR = '../data/sample'

# ============================================================================
# OPTION 1: LOCAL MODE (No AWS required - for testing)
# ============================================================================
# This generates parquet files locally for testing without Redshift

print("=" * 70)
print("Data Setup - LOCAL MODE")
print("=" * 70)
print(f"Generating {NUM_CUSTOMERS:,} customers...")
print()

# Run the consolidated setup script in local mode
!cd ../scripts && python setup_redshift_data.py \
    --local-only \
    --output-dir {DATA_OUTPUT_DIR} \
    --num-customers {NUM_CUSTOMERS} \
    --seed 42

In [None]:
# Load the generated data for preview
import os

data = {
    'profiles': pd.read_parquet(os.path.join(DATA_OUTPUT_DIR, 'customer_profiles.parquet')),
    'credit': pd.read_parquet(os.path.join(DATA_OUTPUT_DIR, 'customer_credit.parquet')),
    'risk': pd.read_parquet(os.path.join(DATA_OUTPUT_DIR, 'customer_risk.parquet')),
    'claims': pd.read_parquet(os.path.join(DATA_OUTPUT_DIR, 'claims_history.parquet')),
    'transactions': pd.read_parquet(os.path.join(DATA_OUTPUT_DIR, 'transactions.parquet')),
}

# Preview the generated data
print("=" * 70)
print("CUSTOMER PROFILES - Demographics & Account History")
print("=" * 70)
print(f"Shape: {data['profiles'].shape}")
display(data['profiles'].head())

print("\n" + "=" * 70)
print("CUSTOMER CREDIT - Credit Scores & Financial Indicators")
print("=" * 70)
print(f"Shape: {data['credit'].shape}")
display(data['credit'].head())

In [None]:
print("=" * 70)
print("CUSTOMER RISK - Risk Metrics & Behavioral Patterns")
print("=" * 70)
print(f"Shape: {data['risk'].shape}")
display(data['risk'].head())

print("\n" + "=" * 70)
print("CLAIMS HISTORY - Historical Claims Data")
print("=" * 70)
print(f"Shape: {data['claims'].shape}")
display(data['claims'].head())

In [None]:
print("=" * 70)
print("TRANSACTIONS - Transaction Data for Fraud Detection")
print("=" * 70)
print(f"Shape: {data['transactions'].shape}")
display(data['transactions'].head())

### OPTION 2: Redshift Mode (Production)

To load data directly to Redshift (recommended for production), run this command instead:

```bash
# Set environment variables
export REDSHIFT_HOST=my-cluster.xxxxx.us-west-2.redshift.amazonaws.com
export REDSHIFT_DATABASE=feast_db
export REDSHIFT_USER=feast_user
export REDSHIFT_PASSWORD=your_password

# Run the setup script (generates and loads directly to Redshift)
cd scripts
python setup_redshift_data.py --num-customers 10000

# For large datasets, use S3 COPY for faster loading:
python setup_redshift_data.py \
    --num-customers 100000 \
    --use-s3 \
    --s3-bucket my-feast-bucket \
    --iam-role arn:aws:iam::123456789012:role/RedshiftS3Role
```

## 2. Configure Feature Store

Before running `feast apply`, update the `feature_store.yaml` with your AWS settings.

In [None]:
# View current configuration
print("=" * 70)
print("Current feature_store.yaml configuration:")
print("=" * 70)
with open('../feature_repo/feature_store.yaml', 'r') as f:
    print(f.read())

## 3. Apply Feast Definitions

This registers all entities, feature views, and feature services with the feature store.

In [None]:
# Change to feature repo directory
os.chdir('../feature_repo')
print(f"Current directory: {os.getcwd()}")

# Apply Feast definitions
print("\n" + "=" * 70)
print("Applying Feast definitions...")
print("=" * 70)
!feast apply

In [None]:
# List registered components
print("=" * 70)
print("REGISTERED ENTITIES:")
print("=" * 70)
!feast entities list

print("\n" + "=" * 70)
print("REGISTERED FEATURE VIEWS:")
print("=" * 70)
!feast feature-views list

In [None]:
print("=" * 70)
print("REGISTERED ON-DEMAND FEATURE VIEWS:")
print("=" * 70)
!feast on-demand-feature-views list

print("\n" + "=" * 70)
print("REGISTERED FEATURE SERVICES:")
print("=" * 70)
!feast feature-services list

## 4. Materialize Features to Online Store (DynamoDB)

This loads features from the offline store (Redshift) into the online store (DynamoDB) for fast retrieval.

In [None]:
# Set date range for materialization
# Materialize from 30 days ago to now
start_date = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%dT%H:%M:%S')
end_date = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')

print("=" * 70)
print("Materializing Features")
print("=" * 70)
print(f"Date range: {start_date} to {end_date}")
print()

# Materialize features to online store (DynamoDB)
# Use incremental for subsequent runs
!feast materialize-incremental {end_date}

## 5. Verify Setup

Test online feature retrieval to ensure everything is working.

In [None]:
from feast import FeatureStore

# Initialize feature store
store = FeatureStore(repo_path='.')

print("=" * 70)
print("FEAST FEATURE STORE INITIALIZED")
print("=" * 70)
print(f"Project: {store.project}")

# List all feature views with details
print("\n" + "=" * 70)
print("FEATURE VIEWS:")
print("=" * 70)
for fv in store.list_feature_views():
    entities = [e.name if hasattr(e, 'name') else str(e) for e in fv.entities]
    print(f"  {fv.name}")
    print(f"    - Entities: {entities}")
    print(f"    - Features: {len(fv.features)}")
    print(f"    - TTL: {fv.ttl}")
    print()

In [None]:
# List on-demand feature views
print("=" * 70)
print("ON-DEMAND FEATURE VIEWS (Real-Time Transformations):")
print("=" * 70)
for odfv in store.list_on_demand_feature_views():
    print(f"  {odfv.name}")
    print(f"    - Features: {[f.name for f in odfv.features]}")
    print(f"    - Mode: {odfv.mode}")
    print()

In [None]:
# List feature services
print("=" * 70)
print("FEATURE SERVICES:")
print("=" * 70)
for fs in store.list_feature_services():
    print(f"  {fs.name}")
    if fs.tags:
        print(f"    - Tags: {fs.tags}")
    if fs.description:
        desc = fs.description[:60] + "..." if len(fs.description) > 60 else fs.description
        print(f"    - Description: {desc}")
    print()

---
## Part 6: Demo - Real-Time Underwriting (PCM)

Demonstrate the real-time auto underwriting use case.

### Scenario
A customer is requesting an auto insurance quote. The system needs to:
1. Retrieve customer profile, credit, and risk features
2. Calculate real-time risk scores
3. Generate premium estimate
4. Make underwriting decision

In [None]:
# Test customer IDs from generated data
test_customer_ids = ['CUST00000001', 'CUST00000002', 'CUST00000003', 'CUST00000005', 'CUST00000010']

print("=" * 70)
print("DEMO: Real-Time Underwriting Feature Retrieval")
print("=" * 70)
print(f"Testing with customer IDs: {test_customer_ids}")
print()

# Method 1: Using Feature Service (Recommended)
print("Method 1: Using Feature Service 'underwriting_v1'")
print("-" * 70)

try:
    underwriting_features = store.get_online_features(
        features=store.get_feature_service("underwriting_v1"),
        entity_rows=[{'customer_id': cid} for cid in test_customer_ids]
    ).to_dict()
    
    print("✓ Successfully retrieved underwriting features!")
    underwriting_df = pd.DataFrame(underwriting_features)
    display(underwriting_df)
    
except Exception as e:
    print(f"✗ Error: {e}")
    print("\nNote: This is expected if features haven't been materialized to the online store.")

In [None]:
# Method 2: Using individual feature references
print("Method 2: Using Individual Feature References")
print("-" * 70)

try:
    individual_features = store.get_online_features(
        features=[
            # Customer profile features
            'customer_profile_features:age',
            'customer_profile_features:gender',
            'customer_profile_features:state',
            'customer_profile_features:region_risk_zone',
            'customer_profile_features:customer_tenure_months',
            'customer_profile_features:loyalty_tier',
            # Credit features
            'customer_credit_features:credit_score',
            'customer_credit_features:credit_score_tier',
            'customer_credit_features:insurance_score',
            'customer_credit_features:bankruptcy_flag',
            # Risk features
            'customer_risk_features:overall_risk_score',
            'customer_risk_features:claims_risk_score',
            'customer_risk_features:num_claims_3y',
            'customer_risk_features:risk_segment',
            'customer_risk_features:underwriting_tier',
        ],
        entity_rows=[{'customer_id': cid} for cid in test_customer_ids]
    ).to_dict()
    
    print("✓ Successfully retrieved individual features!")
    individual_df = pd.DataFrame(individual_features)
    display(individual_df)
    
except Exception as e:
    print(f"✗ Error: {e}")

In [None]:
# Quick Quote Feature Service (minimal latency)
print("=" * 70)
print("DEMO: Quick Quote (Minimal Latency)")
print("=" * 70)

try:
    quick_quote_features = store.get_online_features(
        features=store.get_feature_service("underwriting_quick_quote"),
        entity_rows=[{'customer_id': cid} for cid in test_customer_ids]
    ).to_dict()
    
    print("✓ Quick Quote features retrieved!")
    quick_df = pd.DataFrame(quick_quote_features)
    display(quick_df)
    
except Exception as e:
    print(f"✗ Error: {e}")

---
## Part 7: Demo - Batch Claims Assessment

Demonstrate the batch claims optimization use case.

### Scenario
Process a batch of claims for:
1. Fraud risk assessment
2. Reserve calculation
3. Priority assignment
4. SIU referral recommendation

In [None]:
# Get sample claim IDs from generated data
sample_claims = data['claims'].head(5)
test_claim_ids = sample_claims['claim_id'].tolist()
test_customer_ids_claims = sample_claims['customer_id'].tolist()

print("=" * 70)
print("DEMO: Batch Claims Assessment")
print("=" * 70)
print(f"Testing with claim IDs: {test_claim_ids}")
print(f"Associated customers: {test_customer_ids_claims}")
print()

# Display sample claims info
print("Sample Claims Data:")
display(sample_claims[['claim_id', 'customer_id', 'claim_type', 'claim_amount_requested', 'claim_status', 'fraud_score']])

In [None]:
# Retrieve claims-related features for customers
print("Retrieving Claims Aggregation Features for Customers:")
print("-" * 70)

try:
    claims_features = store.get_online_features(
        features=[
            'claims_aggregation_features:total_claims_lifetime',
            'claims_aggregation_features:claims_count_1y',
            'claims_aggregation_features:claims_count_3y',
            'claims_aggregation_features:total_claims_amount_1y',
            'claims_aggregation_features:avg_claim_amount',
            'claims_aggregation_features:fraud_claims_count',
            'claims_aggregation_features:suspicious_claim_ratio',
            'claims_aggregation_features:avg_settlement_days',
        ],
        entity_rows=[{'customer_id': cid} for cid in test_customer_ids_claims]
    ).to_dict()
    
    print("✓ Claims aggregation features retrieved!")
    claims_df = pd.DataFrame(claims_features)
    display(claims_df)
    
except Exception as e:
    print(f"✗ Error: {e}")

---
## Part 8: Demo - Customer 360 View

Retrieve a comprehensive view of a customer using the `customer_360_v1` feature service.

In [None]:
print("=" * 70)
print("DEMO: Customer 360 View")
print("=" * 70)

# Select a single customer for detailed view
demo_customer_id = 'CUST00000001'
print(f"Customer ID: {demo_customer_id}")
print()

try:
    customer_360 = store.get_online_features(
        features=store.get_feature_service("customer_360_v1"),
        entity_rows=[{'customer_id': demo_customer_id, 'policy_id': 'POL00000001'}]
    ).to_dict()
    
    print("✓ Customer 360 features retrieved!")
    
    # Display features in a formatted way
    print("\n" + "-" * 70)
    print("CUSTOMER 360 PROFILE")
    print("-" * 70)
    
    for key, values in customer_360.items():
        value = values[0] if values else 'N/A'
        print(f"  {key}: {value}")
    
except Exception as e:
    print(f"✗ Error: {e}")

---
## Part 9: Start Feature Server (REST API)

Start the Feast feature server to enable REST API access for online feature retrieval.

### Starting the Server

Run this command in a terminal:
```bash
cd feature_repo
feast serve -h 0.0.0.0 -p 6566
```

### REST API Usage Examples

```bash
# Get features using feature service
curl -X POST "http://localhost:6566/get-online-features" \
  -H "Content-Type: application/json" \
  -d '{
    "feature_service": "underwriting_v1",
    "entities": {"customer_id": ["CUST00000001"]}
  }'

# Get individual features
curl -X POST "http://localhost:6566/get-online-features" \
  -H "Content-Type: application/json" \
  -d '{
    "features": [
      "customer_profile_features:age",
      "customer_credit_features:credit_score"
    ],
    "entities": {"customer_id": ["CUST00000001", "CUST00000002"]}
  }'
```

In [None]:
# Test REST API (if server is running)
import requests
import json

FEATURE_SERVER_URL = "http://localhost:6566"

print("=" * 70)
print("DEMO: REST API Feature Retrieval")
print("=" * 70)
print(f"Feature Server URL: {FEATURE_SERVER_URL}")
print()

try:
    # Test with underwriting_v1 feature service
    payload = {
        "feature_service": "underwriting_v1",
        "entities": {
            "customer_id": ["CUST00000001", "CUST00000002"]
        }
    }
    
    response = requests.post(
        f"{FEATURE_SERVER_URL}/get-online-features",
        json=payload,
        timeout=10
    )
    
    if response.status_code == 200:
        print("✓ Feature server is running!")
        print("\nResponse (truncated):")
        result = response.json()
        result_str = json.dumps(result, indent=2)
        print(result_str[:1000] + "..." if len(result_str) > 1000 else result_str)
    else:
        print(f"✗ Server returned status {response.status_code}")
        print(response.text[:500])
        
except requests.exceptions.ConnectionError:
    print("✗ Feature server is not running.")
    print("\nTo start the server, run:")
    print("  cd feature_repo && feast serve -h 0.0.0.0 -p 6566")
except Exception as e:
    print(f"✗ Error: {e}")

## Next Steps

1. **Start the Feature Server**: 
   ```bash
   feast serve -h 0.0.0.0 -p 6566
   ```

2. **Run Latency Benchmarks**:
   ```bash
   python ../scripts/benchmark_online_server.py --server-url http://localhost:6566 --suite quick
   ```

3. **Proceed to Feature Engineering**: Open `02_feature_engineering.ipynb` for model training examples.

In [None]:
print("=" * 70)
print("AWS Insurance Demo - Setup Complete!")
print("=" * 70)
print(f"\n✓ Timestamp: {datetime.now()}")
print(f"✓ Data generated: {NUM_CUSTOMERS} customers")
print(f"✓ Feature repo path: {os.getcwd()}")
print("\nNext steps:")
print("  1. Start feature server: feast serve -h 0.0.0.0 -p 6566")
print("  2. Run benchmarks: python ../scripts/benchmark_online_server.py --suite quick")
print("  3. Open 04_latency_testing.ipynb for detailed latency analysis")