# Day 6 Lab 1: Instance Selection & Cost Optimization

## üéØ Learning Objectives
- Understand T-series vs M-series instances
- Compare costs and performance
- Monitor CloudWatch metrics
- Make informed instance selection decisions

## üè¶ Banking Use Case
Deploy a **credit risk scoring model** on different instance types to find optimal cost/performance balance.

## ‚è±Ô∏è Duration: 30 minutes
## üí∞ Cost: ~$0.07

## Setup

In [None]:
import boto3
import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn import SKLearnModel
import pandas as pd
import numpy as np
import time
import json

# Initialize
session = sagemaker.Session()
role = get_execution_role()
region = session.boto_region_name
bucket = session.default_bucket()

print(f"Region: {region}")
print(f"Role: {role}")
print(f"Bucket: {bucket}")

## Part 1: Understanding Instance Types

**Key Learning:** T-series instances can ONLY be used for endpoints, NOT training.

In [None]:
# Instance type comparison
instance_comparison = {
    'Instance Type': ['ml.t3.medium', 'ml.t3.large', 'ml.m5.large', 'ml.m5.xlarge', 'ml.c5.xlarge'],
    'vCPU': [2, 2, 2, 4, 4],
    'Memory (GB)': [4, 8, 8, 16, 8],
    'Cost/Hour': ['$0.05', '$0.10', '$0.115', '$0.23', '$0.204'],
    'Training': ['‚ùå', '‚ùå', '‚úÖ', '‚úÖ', '‚úÖ'],
    'Endpoint': ['‚úÖ', '‚úÖ', '‚úÖ', '‚úÖ', '‚úÖ'],
    'Best For': ['Dev/Test', 'Low Traffic', 'Production', 'High Traffic', 'CPU Intensive']
}

import pandas as pd
df = pd.DataFrame(instance_comparison)
print("\nüìä SageMaker Instance Type Comparison:\n")
print(df.to_string(index=False))

print("\nüí° Key Insights:")
print("  1. T-series: ONLY for endpoints (NOT training)")
print("  2. M-series: For BOTH training and endpoints")
print("  3. C-series: CPU-optimized for inference")
print("  4. P-series: GPU instances for deep learning (not shown)")

## Part 2: Cost Analysis Scenarios

**Key Learning:** Calculate costs for different traffic patterns.

In [None]:
# Cost calculation for different scenarios
def calculate_monthly_cost(instance_type, cost_per_hour, hours_per_day=24):
    daily_cost = cost_per_hour * hours_per_day
    monthly_cost = daily_cost * 30
    return monthly_cost

scenarios = {
    'Scenario': ['Dev/Test (8hrs/day)', 'Low Traffic (24/7)', 'Production (24/7)', 'High Traffic (24/7)'],
    'Instance': ['ml.t3.medium', 'ml.t3.large', 'ml.m5.large', 'ml.m5.xlarge'],
    'Hours/Day': [8, 24, 24, 24],
    'Cost/Hour': [0.05, 0.10, 0.115, 0.23],
    'Monthly Cost': [
        f"${calculate_monthly_cost('t3.medium', 0.05, 8):.2f}",
        f"${calculate_monthly_cost('t3.large', 0.10, 24):.2f}",
        f"${calculate_monthly_cost('m5.large', 0.115, 24):.2f}",
        f"${calculate_monthly_cost('m5.xlarge', 0.23, 24):.2f}"
    ]
}

df_scenarios = pd.DataFrame(scenarios)
print("\nüí∞ Monthly Cost Scenarios:\n")
print(df_scenarios.to_string(index=False))

print("\nüìä Cost Optimization Tips:")
print("  1. Use T3 for dev/test: Save 50-60% vs M5")
print("  2. Auto-scaling: Scale down during off-hours")
print("  3. Spot instances: 70% savings for training (not endpoints)")
print("  4. Right-sizing: Monitor CloudWatch, adjust as needed")

## Part 3: Performance vs Cost Trade-offs

In [None]:
# Simulated performance comparison (based on AWS benchmarks)
performance_data = {
    'Instance': ['ml.t3.medium', 'ml.t3.large', 'ml.m5.large', 'ml.m5.xlarge', 'ml.c5.xlarge'],
    'Avg Latency (ms)': [45, 40, 35, 30, 25],
    'P95 Latency (ms)': [80, 70, 60, 50, 40],
    'Max TPS': [50, 100, 150, 300, 400],
    'Cost/Hour': [0.05, 0.10, 0.115, 0.23, 0.204],
    'Cost per 1M requests': ['$2.50', '$2.78', '$2.11', '$2.11', '$1.41']
}

df_perf = pd.DataFrame(performance_data)
print("\n‚ö° Performance vs Cost Analysis:\n")
print(df_perf.to_string(index=False))

print("\nüéØ Decision Framework:")
print("\n  Use T3 when:")
print("    - Dev/test environments")
print("    - < 100 requests/second")
print("    - Latency < 100ms acceptable")
print("    - Cost is primary concern")
print("\n  Use M5 when:")
print("    - Production workloads")
print("    - 100-300 requests/second")
print("    - Latency < 50ms required")
print("    - Consistent performance needed")
print("\n  Use C5 when:")
print("    - CPU-intensive models")
print("    - > 300 requests/second")
print("    - Latency < 30ms required")
print("    - Best cost per request")

## Part 4: CloudWatch Metrics for Right-sizing

In [None]:
# Key CloudWatch metrics to monitor
metrics_guide = {
    'Metric': [
        'CPUUtilization',
        'MemoryUtilization',
        'ModelLatency',
        'Invocations',
        'Invocation4XXErrors',
        'Invocation5XXErrors'
    ],
    'Target Range': [
        '50-70%',
        '< 85%',
        '< 100ms',
        'Monitor trend',
        '< 1%',
        '< 0.1%'
    ],
    'Action if Outside Range': [
        'Scale up/down instance',
        'Increase instance size',
        'Optimize model or scale up',
        'Add auto-scaling',
        'Check input validation',
        'Check model health'
    ]
}

df_metrics = pd.DataFrame(metrics_guide)
print("\nüìà CloudWatch Metrics Guide:\n")
print(df_metrics.to_string(index=False))

print("\nüí° Right-sizing Process:")
print("  1. Deploy on smallest instance (T3)")
print("  2. Monitor for 24-48 hours")
print("  3. Check CPU/Memory utilization")
print("  4. If CPU > 70%: Scale up to M5")
print("  5. If CPU < 30%: Scale down or use auto-scaling")
print("  6. Monitor latency and error rates")
print("  7. Adjust based on business requirements")

print("\nüîî Set CloudWatch Alarms for:")
print("  - CPU > 80% for 5 minutes")
print("  - Memory > 85% for 5 minutes")
print("  - Latency > 100ms for 5 minutes")
print("  - Error rate > 1% for 5 minutes")

## Part 5: Real-world Banking Scenario

In [None]:
# SecureBank credit risk scoring scenario
print("üè¶ SecureBank Credit Risk Scoring System\n")
print("="*60)

# Scenario parameters
daily_loan_applications = 500
peak_hours = 8  # 9am-5pm
peak_multiplier = 3  # 3x traffic during peak

# Calculate requirements
avg_requests_per_hour = daily_loan_applications / 24
peak_requests_per_hour = avg_requests_per_hour * peak_multiplier
peak_requests_per_second = peak_requests_per_hour / 3600

print(f"\nüìä Traffic Pattern:")
print(f"  Daily applications: {daily_loan_applications}")
print(f"  Average: {avg_requests_per_hour:.1f} requests/hour")
print(f"  Peak (9am-5pm): {peak_requests_per_hour:.1f} requests/hour")
print(f"  Peak: {peak_requests_per_second:.2f} requests/second")

# Instance recommendations
print(f"\nüí° Instance Recommendation:")
if peak_requests_per_second < 1:
    recommended = "ml.t3.medium"
    cost = 0.05
    reason = "Low traffic, T3 sufficient"
elif peak_requests_per_second < 3:
    recommended = "ml.m5.large"
    cost = 0.115
    reason = "Moderate traffic, M5 for consistency"
else:
    recommended = "ml.m5.xlarge"
    cost = 0.23
    reason = "High traffic, need more capacity"

monthly_cost = cost * 24 * 30

print(f"  Recommended: {recommended}")
print(f"  Reason: {reason}")
print(f"  Monthly cost: ${monthly_cost:.2f}")

# Auto-scaling option
print(f"\nüîÑ Auto-scaling Alternative:")
off_peak_cost = 0.05 * 16 * 30  # T3 for 16 hours
peak_cost = 0.115 * 8 * 30  # M5 for 8 hours
autoscaling_cost = off_peak_cost + peak_cost

print(f"  Off-peak (16hrs): ml.t3.medium @ ${off_peak_cost:.2f}/month")
print(f"  Peak (8hrs): ml.m5.large @ ${peak_cost:.2f}/month")
print(f"  Total: ${autoscaling_cost:.2f}/month")
print(f"  Savings: ${monthly_cost - autoscaling_cost:.2f}/month ({((monthly_cost - autoscaling_cost)/monthly_cost*100):.1f}%)")

## Summary: Instance Selection Checklist

In [None]:
print("\n‚úÖ Instance Selection Checklist:\n")
print("1. Determine workload type:")
print("   ‚ñ° Training ‚Üí Use M5/C5/P3 (NOT T3)")
print("   ‚ñ° Endpoint ‚Üí Can use T3/M5/C5")
print("\n2. Estimate traffic:")
print("   ‚ñ° < 50 req/sec ‚Üí T3")
print("   ‚ñ° 50-200 req/sec ‚Üí M5")
print("   ‚ñ° > 200 req/sec ‚Üí C5 or multiple instances")
print("\n3. Check latency requirements:")
print("   ‚ñ° < 30ms ‚Üí C5 or GPU")
print("   ‚ñ° < 50ms ‚Üí M5")
print("   ‚ñ° < 100ms ‚Üí T3 acceptable")
print("\n4. Consider cost:")
print("   ‚ñ° Dev/test ‚Üí Start with T3")
print("   ‚ñ° Production ‚Üí M5 for consistency")
print("   ‚ñ° High traffic ‚Üí C5 for best cost/performance")
print("\n5. Plan for scaling:")
print("   ‚ñ° Variable traffic ‚Üí Enable auto-scaling")
print("   ‚ñ° Predictable peaks ‚Üí Schedule scaling")
print("   ‚ñ° Steady traffic ‚Üí Fixed instance count")
print("\n6. Monitor and optimize:")
print("   ‚ñ° Set CloudWatch alarms")
print("   ‚ñ° Review metrics weekly")
print("   ‚ñ° Right-size based on data")
print("   ‚ñ° Test before scaling down")

print("\nüí° Remember: Start small, monitor, and scale up as needed!")

## üéì Key Takeaways

1. **T-series instances:**
   - ‚úÖ ONLY for endpoints (NOT training)
   - ‚úÖ Lowest cost ($0.05/hour)
   - ‚úÖ Good for dev/test and low-traffic
   - ‚ö†Ô∏è Burstable performance

2. **M-series instances:**
   - ‚úÖ For BOTH training and endpoints
   - ‚úÖ Consistent performance
   - ‚úÖ Good for production
   - ‚ö†Ô∏è 2-3x more expensive

3. **Right-sizing:**
   - Monitor CloudWatch metrics
   - Target 50-70% CPU utilization
   - Balance cost vs performance
   - Use auto-scaling for variable traffic

4. **Cost optimization:**
   - Start with smallest instance
   - Scale up only if needed
   - Use Spot instances for training (70% savings)
   - Delete unused endpoints immediately