# AWS Insurance Demo - Setup and Data Preparation

This notebook walks through the setup process for the AWS Insurance Demo feature store.

## Prerequisites
- AWS credentials configured
- Redshift cluster or serverless workgroup available
- S3 bucket created for registry and staging
- DynamoDB access for online store

## Steps
1. Generate sample data
2. Configure feature store
3. Apply Feast definitions
4. Materialize features to online store
5. Verify setup

In [None]:
# Install required packages if needed
# !pip install feast[aws] pandas numpy faker

import os
import sys
from datetime import datetime, timedelta
import pandas as pd

# Add scripts directory to path
sys.path.insert(0, '../scripts')

print("Setup complete! Python version:", sys.version)

## 1. Generate Sample Data

Generate sample data for local testing. For production, you would load this data into Redshift.

In [None]:
from generate_sample_data import InsuranceDataGenerator

# Create data generator with fixed seed for reproducibility
generator = InsuranceDataGenerator(seed=42)

# Generate data (adjust num_customers for your needs)
# Use smaller number for quick testing, larger for realistic benchmarks
NUM_CUSTOMERS = 10000

data = generator.generate_all_data(
    num_customers=NUM_CUSTOMERS,
    output_dir='../data/sample',
    end_date=datetime.now()
)

print(f"\nGenerated data for {NUM_CUSTOMERS} customers!")

In [None]:
# Preview the generated data
print("=" * 60)
print("Customer Profiles Sample:")
print("=" * 60)
display(data['profiles'].head())

print("\n" + "=" * 60)
print("Customer Credit Sample:")
print("=" * 60)
display(data['credit'].head())

print("\n" + "=" * 60)
print("Customer Risk Sample:")
print("=" * 60)
display(data['risk'].head())

## 2. Configure Feature Store

Before running `feast apply`, update the `feature_store.yaml` with your AWS settings.

In [None]:
# View current configuration
print("Current feature_store.yaml configuration:")
print("=" * 60)
with open('../feature_repo/feature_store.yaml', 'r') as f:
    print(f.read())

## 3. Apply Feast Definitions

This registers all entities, feature views, and feature services with the feature store.

In [None]:
# Change to feature repo directory
os.chdir('../feature_repo')
print(f"Current directory: {os.getcwd()}")

# Apply Feast definitions
!feast apply

In [None]:
# List registered feature views
print("Registered Feature Views:")
!feast feature-views list

print("\nRegistered Feature Services:")
!feast feature-services list

## 4. Materialize Features to Online Store (DynamoDB)

This loads features from the offline store (Redshift) into the online store (DynamoDB) for fast retrieval.

In [None]:
# Set date range for materialization
end_date = datetime.now().strftime('%Y-%m-%dT%H:%M:%S')
print(f"Materializing features up to: {end_date}")

# Materialize features to DynamoDB
!feast materialize-incremental {end_date}

## 5. Verify Setup

Test online feature retrieval to ensure everything is working.

In [None]:
from feast import FeatureStore

# Initialize feature store
store = FeatureStore(repo_path='.')

# List all feature views
print("Registered Feature Views:")
print("=" * 60)
for fv in store.list_feature_views():
    print(f"  - {fv.name}: {len(fv.features)} features")

print("\nRegistered On-Demand Feature Views:")
print("=" * 60)
for odfv in store.list_on_demand_feature_views():
    print(f"  - {odfv.name}: {len(odfv.features)} features")

print("\nRegistered Feature Services:")
print("=" * 60)
for fs in store.list_feature_services():
    print(f"  - {fs.name}")

In [None]:
# Test online feature retrieval
try:
    # Use sample customer IDs from generated data
    test_customer_ids = ['CUST00000001', 'CUST00000002', 'CUST00000003']
    
    features = store.get_online_features(
        features=[
            'customer_profile_features:age',
            'customer_profile_features:state',
            'customer_profile_features:region_risk_zone',
            'customer_credit_features:credit_score',
            'customer_credit_features:insurance_score',
            'customer_risk_features:overall_risk_score',
        ],
        entity_rows=[{'customer_id': cid} for cid in test_customer_ids]
    ).to_dict()
    
    print("✓ Online features retrieved successfully!")
    print("=" * 60)
    display(pd.DataFrame(features))
    
except Exception as e:
    print(f"✗ Error retrieving features: {e}")
    print("\nThis is expected if:")
    print("  - Data hasn't been loaded into Redshift")
    print("  - Materialization hasn't completed")
    print("  - AWS credentials are not configured")

## Next Steps

1. **Start the Feature Server**: 
   ```bash
   feast serve -h 0.0.0.0 -p 6566
   ```

2. **Run Latency Benchmarks**:
   ```bash
   python ../scripts/benchmark_online_server.py --server-url http://localhost:6566 --suite quick
   ```

3. **Proceed to Feature Engineering**: Open `02_feature_engineering.ipynb` for model training examples.