# Day 6 Lab 1: Instance Selection & Cost Optimization

## üéØ Learning Objectives
- Understand T-series vs M-series instances
- Compare costs and performance
- Monitor CloudWatch metrics
- Make informed instance selection decisions

## üè¶ Banking Use Case
Deploy a **credit risk scoring model** on different instance types to find optimal cost/performance balance.

## ‚è±Ô∏è Duration: 30 minutes
## üí∞ Cost: ~$0.07

## Setup

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn import SKLearnModel
import pandas as pd
import numpy as np
import time
import json

# Initialize
session = sagemaker.Session()
role = get_execution_role()
region = session.boto_region_name
bucket = session.default_bucket()

print(f"Region: {region}")
print(f"Role: {role}")
print(f"Bucket: {bucket}")

## Part 1: Deploy Model on ml.t3.medium (Endpoint Only)

**Key Learning:** T-series instances can ONLY be used for endpoints, NOT training.

In [None]:
# Use pre-trained model from Day 5
model_data = f"s3://{bucket}/sagemaker/credit-risk-model/model.tar.gz"

# Create model
sklearn_model = SKLearnModel(
    model_data=model_data,
    role=role,
    entry_point='inference.py',
    framework_version='1.0-1',
    py_version='py3'
)

# Deploy to T3 endpoint
t3_endpoint_name = f"credit-risk-t3-{int(time.time())}"
print(f"Deploying to {t3_endpoint_name}...")

t3_predictor = sklearn_model.deploy(
    instance_type='ml.t3.medium',
    initial_instance_count=1,
    endpoint_name=t3_endpoint_name
)

print(f"‚úÖ T3 endpoint deployed: {t3_endpoint_name}")
print(f"üí∞ Cost: $0.05/hour")

## Part 2: Deploy Model on ml.m5.large

**Key Learning:** M-series instances work for BOTH training and endpoints.

In [None]:
# Deploy to M5 endpoint
m5_endpoint_name = f"credit-risk-m5-{int(time.time())}"
print(f"Deploying to {m5_endpoint_name}...")

m5_predictor = sklearn_model.deploy(
    instance_type='ml.m5.large',
    initial_instance_count=1,
    endpoint_name=m5_endpoint_name
)

print(f"‚úÖ M5 endpoint deployed: {m5_endpoint_name}")
print(f"üí∞ Cost: $0.115/hour (2.3x more expensive)")

## Part 3: Performance Comparison

In [None]:
# Test data
test_data = {
    'credit_score': 720,
    'income': 75000,
    'debt_ratio': 0.35,
    'employment_years': 5
}

# Test T3 endpoint
print("Testing T3 endpoint...")
t3_latencies = []
for i in range(100):
    start = time.time()
    result = t3_predictor.predict(test_data)
    latency = (time.time() - start) * 1000
    t3_latencies.append(latency)

# Test M5 endpoint
print("Testing M5 endpoint...")
m5_latencies = []
for i in range(100):
    start = time.time()
    result = m5_predictor.predict(test_data)
    latency = (time.time() - start) * 1000
    m5_latencies.append(latency)

# Compare results
print("\nüìä Performance Comparison:")
print(f"T3 - P50: {np.percentile(t3_latencies, 50):.2f}ms, P95: {np.percentile(t3_latencies, 95):.2f}ms")
print(f"M5 - P50: {np.percentile(m5_latencies, 50):.2f}ms, P95: {np.percentile(m5_latencies, 95):.2f}ms")

# Cost per 1M requests
t3_cost_per_1m = (0.05 / 3600) * (np.mean(t3_latencies) / 1000) * 1000000
m5_cost_per_1m = (0.115 / 3600) * (np.mean(m5_latencies) / 1000) * 1000000

print(f"\nüí∞ Cost per 1M requests:")
print(f"T3: ${t3_cost_per_1m:.2f}")
print(f"M5: ${m5_cost_per_1m:.2f}")

## Part 4: CloudWatch Metrics Analysis

In [None]:
cloudwatch = boto3.client('cloudwatch', region_name=region)

def get_endpoint_metrics(endpoint_name, metric_name):
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/SageMaker',
        MetricName=metric_name,
        Dimensions=[{'Name': 'EndpointName', 'Value': endpoint_name}],
        StartTime=time.time() - 600,
        EndTime=time.time(),
        Period=300,
        Statistics=['Average']
    )
    if response['Datapoints']:
        return response['Datapoints'][0]['Average']
    return 0

# Get metrics
print("üìà CloudWatch Metrics:")
print(f"\nT3 Endpoint ({t3_endpoint_name}):")
print(f"  CPU Utilization: {get_endpoint_metrics(t3_endpoint_name, 'CPUUtilization'):.2f}%")
print(f"  Memory Utilization: {get_endpoint_metrics(t3_endpoint_name, 'MemoryUtilization'):.2f}%")

print(f"\nM5 Endpoint ({m5_endpoint_name}):")
print(f"  CPU Utilization: {get_endpoint_metrics(m5_endpoint_name, 'CPUUtilization'):.2f}%")
print(f"  Memory Utilization: {get_endpoint_metrics(m5_endpoint_name, 'MemoryUtilization'):.2f}%")

print("\nüí° Right-sizing Recommendation:")
print("Target: 50-70% CPU utilization for optimal cost/performance")

## Part 5: Cost Analysis & Decision

In [None]:
# Scenario: 10,000 requests/day
daily_requests = 10000
avg_latency_t3 = np.mean(t3_latencies) / 1000  # seconds
avg_latency_m5 = np.mean(m5_latencies) / 1000  # seconds

# Calculate daily costs
t3_daily_cost = 0.05 * 24  # Always running
m5_daily_cost = 0.115 * 24

# Monthly costs
t3_monthly = t3_daily_cost * 30
m5_monthly = m5_daily_cost * 30

print("üí∞ Cost Analysis (10,000 requests/day):")
print(f"\nT3 Medium:")
print(f"  Daily: ${t3_daily_cost:.2f}")
print(f"  Monthly: ${t3_monthly:.2f}")
print(f"  Latency: {avg_latency_t3*1000:.2f}ms")

print(f"\nM5 Large:")
print(f"  Daily: ${m5_daily_cost:.2f}")
print(f"  Monthly: ${m5_monthly:.2f}")
print(f"  Latency: {avg_latency_m5*1000:.2f}ms")

print(f"\nüìä Decision Framework:")
print(f"  Savings with T3: ${m5_monthly - t3_monthly:.2f}/month ({((m5_monthly - t3_monthly)/m5_monthly*100):.1f}%)")
print(f"  ‚úÖ Use T3 if: Dev/test, low traffic, cost-sensitive")
print(f"  ‚úÖ Use M5 if: Production, high traffic, latency-sensitive")

## Cleanup

In [None]:
# Delete endpoints
print("Cleaning up endpoints...")
t3_predictor.delete_endpoint()
m5_predictor.delete_endpoint()
print("‚úÖ Cleanup complete")

## üéì Key Takeaways

1. **T-series instances:**
   - ‚úÖ ONLY for endpoints (NOT training)
   - ‚úÖ Lowest cost ($0.05/hour)
   - ‚úÖ Good for dev/test and low-traffic
   - ‚ö†Ô∏è Burstable performance

2. **M-series instances:**
   - ‚úÖ For BOTH training and endpoints
   - ‚úÖ Consistent performance
   - ‚úÖ Good for production
   - ‚ö†Ô∏è 2-3x more expensive

3. **Right-sizing:**
   - Monitor CloudWatch metrics
   - Target 50-70% CPU utilization
   - Balance cost vs performance
   - Use auto-scaling for variable traffic

4. **Cost optimization:**
   - Start with smallest instance
   - Scale up only if needed
   - Use Spot instances for training (70% savings)
   - Delete unused endpoints immediately