# Day 5 Lab 3: Model Deployment & Real-time Inference
## SecureBank Customer Churn Prediction - Deployment

**Objective:** Deploy trained model to a real-time SageMaker endpoint

**What You'll Learn:**
- Deploy models to SageMaker endpoints
- Configure endpoint instances and auto-scaling
- Invoke endpoints for real-time predictions
- Monitor endpoint performance
- Clean up resources

## Step 1: Initialize and Load Model Path

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer
import json
import time

# Initialize
sess = sagemaker.Session()
role = get_execution_role()
region = boto3.Session().region_name

# Load model data path from Lab 2
try:
    with open('model_data_path.txt', 'r') as f:
        model_data = f.read().strip()
    print(f"‚úÖ Model loaded from: {model_data}")
except FileNotFoundError:
    print("‚ö†Ô∏è  Model path file not found. Please run Lab 2 first.")
    model_data = None

## Step 2: Deploy Model to Endpoint

In [None]:
from sagemaker import image_uris
from sagemaker.model import Model

# Get XGBoost container
container = image_uris.retrieve('xgboost', region, '1.5-1')

# Create model
xgb_model = Model(
    model_data=model_data,
    image_uri=container,
    role=role,
    sagemaker_session=sess
)

# Deploy to endpoint
endpoint_name = f'securebank-churn-{int(time.time())}'

print(f"üöÄ Deploying model to endpoint: {endpoint_name}")
print("   Instance type: ml.t2.medium")
print("   Instance count: 1")
print("   This will take approximately 5-8 minutes...\n")

# Deploy model (updated for newer SageMaker SDK)
predictor = xgb_model.deploy(
    initial_instance_count=1,
    instance_type='ml.t2.medium',
    endpoint_name=endpoint_name
)

# Set serializer and deserializer after deployment
predictor.serializer = CSVSerializer()
predictor.deserializer = JSONDeserializer()

print(f"\n‚úÖ Endpoint deployed successfully!")
print(f"   Endpoint name: {endpoint_name}")

## Step 3: Test Endpoint with Sample Predictions

In [None]:
# Sample customer data for prediction
# Features: Account Length, VMail Message, Day Mins, Eve Mins, Night Mins, Intl Mins, etc.

# High-risk customer (likely to churn)
high_risk_customer = "128,25,265.1,197.4,244.7,10.0,3,4,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1"

# Low-risk customer (unlikely to churn)
low_risk_customer = "107,26,161.6,195.5,254.4,13.7,3,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0"

print("Testing endpoint with sample customers...\n")

# Predict high-risk customer
result1 = predictor.predict(high_risk_customer)
churn_prob1 = float(result1)
print(f"High-Risk Customer:")
print(f"  Churn Probability: {churn_prob1:.2%}")
print(f"  Risk Level: {'HIGH' if churn_prob1 > 0.7 else 'MEDIUM' if churn_prob1 > 0.4 else 'LOW'}")
print(f"  Recommendation: {'Immediate retention campaign' if churn_prob1 > 0.7 else 'Monitor closely'}\n")

# Predict low-risk customer
result2 = predictor.predict(low_risk_customer)
churn_prob2 = float(result2)
print(f"Low-Risk Customer:")
print(f"  Churn Probability: {churn_prob2:.2%}")
print(f"  Risk Level: {'HIGH' if churn_prob2 > 0.7 else 'MEDIUM' if churn_prob2 > 0.4 else 'LOW'}")
print(f"  Recommendation: {'Continue standard engagement' if churn_prob2 < 0.4 else 'Monitor'}")

## Step 4: Invoke Endpoint via boto3 (Production Pattern)

In [None]:
# Using boto3 runtime client (how applications would call the endpoint)
runtime = boto3.client('sagemaker-runtime')

response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType='text/csv',
    Body=high_risk_customer
)

result = json.loads(response['Body'].read().decode())
print(f"\nüìä Production API Response:")
print(f"   Endpoint: {endpoint_name}")
print(f"   Prediction: {result}")
print(f"   Response Time: {response['ResponseMetadata']['HTTPHeaders'].get('x-amzn-invoked-production-variant', 'N/A')}")

## Step 5: Monitor Endpoint Performance

In [None]:
# Get endpoint description
sm_client = boto3.client('sagemaker')
endpoint_desc = sm_client.describe_endpoint(EndpointName=endpoint_name)

print("üìà Endpoint Status:")
print(f"   Status: {endpoint_desc['EndpointStatus']}")
print(f"   Instance Type: {endpoint_desc['ProductionVariants'][0]['InstanceType']}")
print(f"   Instance Count: {endpoint_desc['ProductionVariants'][0]['CurrentInstanceCount']}")
print(f"   Creation Time: {endpoint_desc['CreationTime']}")

print("\nüí° Monitoring Tips:")
print("   - View metrics in CloudWatch: ModelInvocations, ModelLatency")
print("   - Set up alarms for high latency or errors")
print("   - Enable data capture for model monitoring")
print("   - Configure auto-scaling for production workloads")

## Step 6: Banking Use Case - Batch Predictions

In [None]:
# Simulate batch prediction for multiple customers
import pandas as pd

customers = [
    {"id": "CUST001", "data": high_risk_customer},
    {"id": "CUST002", "data": low_risk_customer},
]

results = []
for customer in customers:
    pred = predictor.predict(customer["data"])
    results.append({
        "Customer ID": customer["id"],
        "Churn Probability": f"{float(pred):.2%}",
        "Risk Level": "HIGH" if float(pred) > 0.7 else "MEDIUM" if float(pred) > 0.4 else "LOW"
    })

results_df = pd.DataFrame(results)
print("\nüè¶ SecureBank Churn Predictions:")
print(results_df.to_string(index=False))

## Step 7: Cleanup (IMPORTANT - Avoid Charges!)

In [None]:
# Delete endpoint to stop charges
print("‚ö†Ô∏è  Deleting endpoint to avoid ongoing charges...")
predictor.delete_endpoint()
print(f"‚úÖ Endpoint {endpoint_name} deleted successfully")

print("\nüí∞ Cost Optimization:")
print("   - Endpoints incur charges while running")
print("   - Always delete endpoints when not in use")
print("   - Use batch transform for non-real-time predictions")
print("   - Consider serverless inference for variable traffic")

## Summary

**What We Accomplished:**
- ‚úÖ Deployed XGBoost model to real-time SageMaker endpoint
- ‚úÖ Configured endpoint with ml.t2.medium instance
- ‚úÖ Tested predictions with sample banking customers
- ‚úÖ Demonstrated production API invocation pattern
- ‚úÖ Monitored endpoint performance and status
- ‚úÖ Cleaned up resources to avoid charges

**Production Considerations:**
- **Auto-scaling:** Configure based on invocation rate
- **Multi-AZ:** Deploy across availability zones for HA
- **Model Monitor:** Set up drift detection
- **A/B Testing:** Deploy multiple model variants
- **Security:** Use VPC endpoints and encryption

**Next Steps:**
- Implement Model Monitor for drift detection
- Set up automated retraining pipeline
- Integrate with banking applications via API Gateway
- Configure CloudWatch alarms and dashboards