# üìò Day 3: Cloud Deployment & Production ML

**üéØ Goal:** Deploy ML models to the cloud and build production applications

**‚è±Ô∏è Time:** 120-150 minutes

**üåü Why This Matters for AI (2024-2025):**
- Cloud deployment is THE standard - 95% of ML models run in the cloud
- AWS, GCP, and Azure are the platforms companies actually use
- Serverless ML (Lambda, Cloud Functions) enables pay-per-use scaling
- Hugging Face Spaces lets you deploy for FREE with millions of users
- Streamlit is THE fastest way to build ML demos that impress employers
- Model serving at scale separates hobbyists from production engineers
- Every AI job posting mentions cloud deployment experience

**What You'll Build Today:**
1. **Cloud deployment concepts** for AWS, GCP, and Azure
2. **Serverless ML** with AWS Lambda and Cloud Functions
3. **Hugging Face Spaces** deployment (real, live deployment!)
4. **Streamlit ML app** with interactive UI
5. **Production-ready model serving** at scale

---

## ‚òÅÔ∏è Cloud ML Deployment Landscape (2024-2025)

**The cloud is where ML lives in production**

### üéØ Major Cloud Providers:

#### 1Ô∏è‚É£ **AWS (Amazon Web Services)**
**Market leader: 32% of cloud market**

**Key Services:**
- **SageMaker**: Full ML platform (training + deployment)
- **Lambda**: Serverless compute (pay per request)
- **EC2**: Virtual machines (full control)
- **ECS/EKS**: Container orchestration
- **S3**: Model and data storage

**Best for:** Enterprise, mature ML teams  
**Pros:** Most features, largest ecosystem  
**Cons:** Complex, expensive, steep learning curve  

#### 2Ô∏è‚É£ **GCP (Google Cloud Platform)**
**Google's cloud: 11% market share, best for ML**

**Key Services:**
- **Vertex AI**: Unified ML platform (TensorFlow native)
- **Cloud Functions**: Serverless (like Lambda)
- **Cloud Run**: Serverless containers
- **GKE**: Kubernetes (easiest K8s)
- **BigQuery ML**: SQL-based ML

**Best for:** ML/AI workloads, startups  
**Pros:** Best ML tools, cheaper, easier  
**Cons:** Smaller than AWS  

#### 3Ô∏è‚É£ **Azure (Microsoft)**
**Microsoft's cloud: 23% market share**

**Key Services:**
- **Azure ML**: ML platform (like SageMaker)
- **Azure Functions**: Serverless
- **Azure Container Instances**: Easy containers
- **AKS**: Kubernetes
- **Cognitive Services**: Pre-built AI APIs

**Best for:** Enterprises (especially .NET/Microsoft shops)  
**Pros:** Great integration with Microsoft tools  
**Cons:** Less ML-focused than GCP  

### üìä Comparison:

| Feature | AWS | GCP | Azure |
|---------|-----|-----|-------|
| **Market Share** | 32% | 11% | 23% |
| **ML Platform** | SageMaker | Vertex AI | Azure ML |
| **Serverless** | Lambda | Cloud Functions | Functions |
| **Best For** | Enterprise | ML/AI | Microsoft shops |
| **Free Tier** | ‚úÖ 12 months | ‚úÖ Always free | ‚úÖ 12 months |
| **Ease of Use** | ‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê |
| **ML Features** | ‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê |

### üåü Alternative Platforms:

**Specialized ML Platforms:**
- **Hugging Face Spaces**: FREE ML app hosting (we'll use this!)
- **Replicate**: Easy model deployment
- **Modal**: Serverless for ML
- **Gradio + Hugging Face**: Interactive ML demos

**General Platforms:**
- **Heroku**: Easy deployment (acquired by Salesforce)
- **Railway**: Modern Heroku alternative
- **Fly.io**: Global edge deployment
- **Render**: Simple cloud platform

### üèóÔ∏è Deployment Architectures:

#### **Simple:**
```
User ‚Üí FastAPI (Railway/Render) ‚Üí Model
```

#### **Serverless:**
```
User ‚Üí API Gateway ‚Üí Lambda ‚Üí Model (S3)
```

#### **Production:**
```
User ‚Üí Load Balancer ‚Üí [Container 1, Container 2, ...] ‚Üí Model
      ‚Üì
   Auto-scaling, monitoring, logging
```

Let's deploy to the cloud!

---

## üõ†Ô∏è Setup & Installation

In [None]:
# Install required libraries
import sys

# Core ML libraries
!{sys.executable} -m pip install scikit-learn numpy pandas joblib --quiet

# Web frameworks
!{sys.executable} -m pip install streamlit gradio --quiet

# Hugging Face
!{sys.executable} -m pip install transformers torch huggingface-hub --quiet

# Cloud SDKs (optional - install only if you have accounts)
# !{sys.executable} -m pip install boto3 google-cloud-aiplatform azure-ai-ml --quiet

# Deployment helpers
!{sys.executable} -m pip install requests plotly --quiet

print("‚úÖ Libraries installed successfully!")
print("\nüì¶ Installed:")
print("   - Streamlit (interactive apps)")
print("   - Gradio (ML interfaces)")
print("   - Hugging Face Hub")
print("\nüöÄ Ready for cloud deployment!")

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import json
import warnings
warnings.filterwarnings('ignore')

# ML libraries
from sklearn.datasets import load_iris, make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Set random seed
np.random.seed(42)

print("üì¶ Libraries imported successfully!")
print("‚òÅÔ∏è  Ready to deploy to the cloud!\n")

## ‚ö° Step 1: Serverless ML Deployment

**Serverless = Pay only for what you use, auto-scale infinitely**

### üéØ What is Serverless?

**Traditional:**
```
You manage: Servers, scaling, patching, monitoring
Cost: $50-500/month (even with 0 users)
```

**Serverless:**
```
Cloud manages: Everything!
You write: Just the code
Cost: $0 with 0 users, scales automatically
```

### üèóÔ∏è Serverless Architecture:

```
User Request
    ‚Üì
API Gateway (routes requests)
    ‚Üì
Lambda Function (your code)
    ‚Üì
Load model from S3/Cloud Storage
    ‚Üì
Make prediction
    ‚Üì
Return JSON response
```

### ‚ö° Serverless Platforms:

| Platform | Provider | Free Tier | Best For |
|----------|----------|-----------|----------|
| **AWS Lambda** | Amazon | 1M requests/mo | Enterprise, full control |
| **Cloud Functions** | Google | 2M requests/mo | Easier, better for ML |
| **Azure Functions** | Microsoft | 1M requests/mo | Microsoft ecosystem |
| **Modal** | Modal Labs | Generous | ML-specific |
| **Replicate** | Replicate | Pay-as-go | Pre-built models |

### üìä Serverless Pros & Cons:

**Pros:**
- ‚úÖ Zero cost at zero usage
- ‚úÖ Infinite auto-scaling
- ‚úÖ No server management
- ‚úÖ Pay per request
- ‚úÖ Built-in redundancy

**Cons:**
- ‚ùå Cold starts (first request slow)
- ‚ùå Execution time limits (15 min max)
- ‚ùå Memory limits (10GB max)
- ‚ùå Vendor lock-in

### üéØ When to Use Serverless:

‚úÖ **Good for:**
- Sporadic traffic
- Small to medium models
- API endpoints
- Event-driven predictions
- Cost optimization

‚ùå **Not ideal for:**
- Constant high traffic
- Large models (>1GB)
- Long-running tasks
- Real-time streaming

Let's create a serverless function!

In [None]:
# AWS Lambda function for ML prediction
# This is the code you would deploy to Lambda

lambda_function_code = '''
import json
import boto3
import joblib
import numpy as np

# Initialize S3 client
s3 = boto3.client('s3')

# Global variable for model (loaded once, reused)
model = None

def load_model():
    """Load model from S3 (lazy loading)"""
    global model
    if model is None:
        # Download model from S3
        s3.download_file(
            Bucket='my-ml-models',
            Key='models/classifier.joblib',
            Filename='/tmp/model.joblib'
        )
        model = joblib.load('/tmp/model.joblib')
    return model

def lambda_handler(event, context):
    """
    AWS Lambda handler function
    
    Input:
        event: {
            "body": "{\"features\": [5.1, 3.5, 1.4, 0.2]}"
        }
    
    Output:
        {
            "statusCode": 200,
            "body": "{\"prediction\": 0, \"probability\": 0.95}"
        }
    """
    try:
        # Parse input
        body = json.loads(event['body'])
        features = np.array(body['features']).reshape(1, -1)
        
        # Load model
        model = load_model()
        
        # Make prediction
        prediction = int(model.predict(features)[0])
        probability = float(model.predict_proba(features).max())
        
        # Return response
        return {
            'statusCode': 200,
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*'
            },
            'body': json.dumps({
                'prediction': prediction,
                'probability': probability
            })
        }
    
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps({
                'error': str(e)
            })
        }
'''

# Save Lambda function
with open('lambda_function.py', 'w') as f:
    f.write(lambda_function_code)

print("‚ö° AWS Lambda Function Created!\n")
print("="*70)
print("\n‚úÖ Saved to: lambda_function.py")
print("\nüöÄ Deployment Steps:\n")
print("1Ô∏è‚É£ Package dependencies:")
print("   pip install -t package/ scikit-learn joblib numpy")
print("   cd package && zip -r ../deployment.zip .")
print("   cd .. && zip -g deployment.zip lambda_function.py")

print("\n2Ô∏è‚É£ Upload model to S3:")
print("   aws s3 cp model.joblib s3://my-ml-models/models/")

print("\n3Ô∏è‚É£ Create Lambda function:")
print("   aws lambda create-function \\")
print("     --function-name ml-predictor \\")
print("     --runtime python3.9 \\")
print("     --handler lambda_function.lambda_handler \\")
print("     --zip-file fileb://deployment.zip")

print("\n4Ô∏è‚É£ Create API Gateway:")
print("   - Create REST API")
print("   - Add POST method")
print("   - Link to Lambda function")
print("   - Deploy API")

print("\n" + "="*70)
print("\nüí° Result: https://your-api-id.execute-api.region.amazonaws.com/predict")
print("\nüí∞ Cost: ~$0.20 per 1M requests (practically free!)")

In [None]:
# Google Cloud Function for ML prediction
# Similar to Lambda but often easier for ML

gcp_function_code = '''
import json
import joblib
import numpy as np
from google.cloud import storage

# Global model variable
model = None

def load_model():
    """Load model from Cloud Storage"""
    global model
    if model is None:
        client = storage.Client()
        bucket = client.bucket('my-ml-models')
        blob = bucket.blob('models/classifier.joblib')
        blob.download_to_filename('/tmp/model.joblib')
        model = joblib.load('/tmp/model.joblib')
    return model

def predict(request):
    """
    Google Cloud Function handler
    
    HTTP trigger endpoint
    """
    # Set CORS headers
    headers = {
        'Access-Control-Allow-Origin': '*',
        'Access-Control-Allow-Methods': 'POST',
        'Access-Control-Allow-Headers': 'Content-Type',
    }
    
    # Handle preflight
    if request.method == 'OPTIONS':
        return ('', 204, headers)
    
    try:
        # Parse request
        request_json = request.get_json()
        features = np.array(request_json['features']).reshape(1, -1)
        
        # Load model and predict
        model = load_model()
        prediction = int(model.predict(features)[0])
        probability = float(model.predict_proba(features).max())
        
        # Return response
        return (json.dumps({
            'prediction': prediction,
            'probability': probability
        }), 200, headers)
    
    except Exception as e:
        return (json.dumps({'error': str(e)}), 500, headers)
'''

# Save Cloud Function
with open('main.py', 'w') as f:
    f.write(gcp_function_code)

# Create requirements.txt
requirements = '''
scikit-learn==1.3.2
joblib==1.3.2
numpy==1.24.3
google-cloud-storage==2.10.0
'''

with open('requirements.txt', 'w') as f:
    f.write(requirements.strip())

print("‚òÅÔ∏è  Google Cloud Function Created!\n")
print("="*70)
print("\n‚úÖ Files created:")
print("   üìÑ main.py (function code)")
print("   üìÑ requirements.txt (dependencies)")

print("\nüöÄ Deployment Steps:\n")
print("1Ô∏è‚É£ Upload model to Cloud Storage:")
print("   gsutil cp model.joblib gs://my-ml-models/models/")

print("\n2Ô∏è‚É£ Deploy function:")
print("   gcloud functions deploy ml-predictor \\")
print("     --runtime python39 \\")
print("     --trigger-http \\")
print("     --allow-unauthenticated \\")
print("     --entry-point predict")

print("\n" + "="*70)
print("\nüí° Result: https://region-project-id.cloudfunctions.net/ml-predictor")
print("\nüí∞ Cost: Free tier includes 2M requests/month!")
print("\nüåü GCP is often easier than AWS for ML workloads")

## ü§ó Step 2: Hugging Face Spaces Deployment

**Deploy ML apps for FREE with Hugging Face Spaces!**

### üéØ What are Hugging Face Spaces?

**FREE hosting for ML applications:**
- ‚úÖ 100% FREE (no credit card needed)
- ‚úÖ Unlimited apps
- ‚úÖ Custom domains
- ‚úÖ Gradio or Streamlit apps
- ‚úÖ GPU support (paid tier)
- ‚úÖ Millions of potential users

### üèóÔ∏è How Spaces Work:

```
Your Code (app.py)
    ‚Üì
Push to Hugging Face
    ‚Üì
Auto-build & Deploy
    ‚Üì
Live at: huggingface.co/spaces/username/app-name
    ‚Üì
Share with the world! üåç
```

### üìä Spaces Options:

| Framework | Best For | Pros |
|-----------|----------|------|
| **Gradio** | ML demos | Fastest setup, auto UI |
| **Streamlit** | Data apps | More control, beautiful |
| **Docker** | Custom | Full flexibility |
| **Static** | Simple HTML | No backend needed |

### üåü Why Use Spaces?

**For Portfolio:**
- ‚úÖ Impress employers with live demos
- ‚úÖ Share projects easily (just a URL)
- ‚úÖ No DevOps knowledge needed
- ‚úÖ Professional-looking apps

**For Learning:**
- ‚úÖ Deploy in minutes
- ‚úÖ Free hosting forever
- ‚úÖ Great documentation
- ‚úÖ Active community

**For Projects:**
- ‚úÖ Rapid prototyping
- ‚úÖ User testing
- ‚úÖ Demo to stakeholders
- ‚úÖ MVP deployment

Let's deploy to Spaces!

In [None]:
# Create a Gradio app for Hugging Face Spaces
# This creates an interactive ML interface

gradio_app = '''
import gradio as gr
import numpy as np
from transformers import pipeline

# Load sentiment analysis model
print("Loading model...")
classifier = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)
print("Model loaded!")

def analyze_sentiment(text):
    """
    Analyze sentiment of input text
    
    Args:
        text: Input text to analyze
    
    Returns:
        Dictionary with label and score
    """
    if not text:
        return {"Error": "Please enter some text"}
    
    # Get prediction
    result = classifier(text)[0]
    
    # Format output
    label = result['label']
    score = result['score']
    
    # Create confidence breakdown
    if label == "POSITIVE":
        return {
            "Positive": score,
            "Negative": 1 - score
        }
    else:
        return {
            "Positive": 1 - score,
            "Negative": score
        }

# Create Gradio interface
demo = gr.Interface(
    fn=analyze_sentiment,
    inputs=gr.Textbox(
        lines=5,
        placeholder="Enter text to analyze sentiment...",
        label="Input Text"
    ),
    outputs=gr.Label(
        num_top_classes=2,
        label="Sentiment Analysis Results"
    ),
    title="üé≠ Sentiment Analysis with BERT",
    description="Analyze the sentiment of any text using a fine-tuned BERT model. Try it with movie reviews, product feedback, or tweets!",
    examples=[
        ["I absolutely love this product! It exceeded all my expectations."],
        ["This is the worst experience I've ever had. Very disappointed."],
        ["It's okay, nothing special but does the job."],
        ["Amazing quality! Highly recommend to everyone!"],
        ["Terrible customer service and poor quality product."]
    ],
    theme=gr.themes.Soft(),
    analytics_enabled=True
)

if __name__ == "__main__":
    demo.launch()
'''

# Save Gradio app
with open('app.py', 'w') as f:
    f.write(gradio_app)

# Create requirements for Spaces
spaces_requirements = '''
gradio==4.7.1
transformers==4.35.2
torch==2.1.1
'''

with open('requirements.txt', 'w') as f:
    f.write(spaces_requirements.strip())

# Create README for Spaces
readme = '''
---
title: Sentiment Analysis
emoji: üé≠
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.7.1
app_file: app.py
pinned: false
---

# Sentiment Analysis with BERT

This app uses a fine-tuned BERT model to analyze the sentiment of text.

## Features
- Real-time sentiment analysis
- Confidence scores for positive/negative
- Pre-loaded examples
- Clean, intuitive interface

## Model
- **Model**: DistilBERT fine-tuned on SST-2
- **Task**: Binary sentiment classification
- **Accuracy**: ~91% on test set

## Usage
1. Enter your text in the input box
2. Click "Submit" or press Enter
3. View sentiment scores

Try the example texts or input your own!
'''

with open('README.md', 'w') as f:
    f.write(readme.strip())

print("ü§ó Hugging Face Space Created!\n")
print("="*70)
print("\n‚úÖ Files created:")
print("   üìÑ app.py (Gradio application)")
print("   üìÑ requirements.txt (dependencies)")
print("   üìÑ README.md (space description)")

print("\nüöÄ Deployment Steps:\n")
print("1Ô∏è‚É£ Create account at huggingface.co")
print("\n2Ô∏è‚É£ Create new Space:")
print("   - Go to huggingface.co/new-space")
print("   - Choose 'Gradio' as SDK")
print("   - Choose 'Public' visibility")

print("\n3Ô∏è‚É£ Upload files:")
print("   - Upload app.py, requirements.txt, README.md")
print("   - Or use Git: git push to the Space repository")

print("\n4Ô∏è‚É£ Wait for build (2-3 minutes)")

print("\n5Ô∏è‚É£ Share your app!")
print("   URL: https://huggingface.co/spaces/USERNAME/APP-NAME")

print("\n" + "="*70)
print("\nüí° Your app is now LIVE and can handle millions of users!")
print("üí∞ Cost: $0 (completely FREE!)")
print("üåü Perfect for portfolio projects!")

## üé® Step 3: Streamlit ML Applications

**Build beautiful ML apps in minutes with Streamlit**

### üéØ What is Streamlit?

**The fastest way to build data/ML apps:**
- ‚úÖ Pure Python (no HTML/CSS/JS)
- ‚úÖ Reactive updates (auto-refresh)
- ‚úÖ Beautiful UI out of the box
- ‚úÖ 100+ components
- ‚úÖ FREE deployment (Streamlit Cloud)

### üèóÔ∏è Streamlit Architecture:

```python
import streamlit as st

# That's it! No routing, no templates
st.title("My ML App")
text = st.text_input("Enter text")
if st.button("Predict"):
    result = model.predict(text)
    st.write(f"Result: {result}")
```

### üìä Streamlit vs Gradio:

| Feature | Streamlit | Gradio |
|---------|-----------|--------|
| **Setup Speed** | ‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê |
| **Customization** | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê |
| **ML Focus** | ‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê |
| **Best For** | Data apps, dashboards | ML demos |
| **Learning Curve** | Easy | Easiest |

### üåü Streamlit Features:

**Inputs:**
- Text input, number input, sliders
- File uploads, camera input
- Select boxes, multi-select
- Date/time pickers

**Outputs:**
- Text, markdown, code
- Charts (matplotlib, plotly, altair)
- Tables, dataframes
- Images, audio, video
- Maps, 3D plots

**Layout:**
- Columns, tabs, expandable sections
- Sidebars, containers
- Progress bars, spinners
- Custom themes

Let's build a Streamlit app!

In [None]:
# Create a comprehensive Streamlit ML app

streamlit_app = '''
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import joblib
import os

# Page config
st.set_page_config(
    page_title="ML Model Predictor",
    page_icon="ü§ñ",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Custom CSS
st.markdown("""
    <style>
    .main-header {
        font-size: 3rem;
        font-weight: bold;
        color: #1f77b4;
        text-align: center;
        margin-bottom: 2rem;
    }
    .metric-card {
        background-color: #f0f2f6;
        padding: 1rem;
        border-radius: 0.5rem;
        margin: 0.5rem 0;
    }
    </style>
""", unsafe_allow_html=True)

# Title
st.markdown('<h1 class="main-header">ü§ñ ML Model Predictor</h1>', unsafe_allow_html=True)
st.markdown("### Interactive Machine Learning Application")

# Sidebar
with st.sidebar:
    st.header("‚öôÔ∏è Configuration")
    
    # Model selection
    st.subheader("Model Settings")
    n_estimators = st.slider(
        "Number of trees",
        min_value=10,
        max_value=200,
        value=100,
        step=10
    )
    
    max_depth = st.slider(
        "Max depth",
        min_value=1,
        max_value=20,
        value=10
    )
    
    # Train button
    train_button = st.button("üöÄ Train Model", type="primary")
    
    st.markdown("---")
    st.markdown("### üìä About")
    st.info(
        "This app demonstrates ML model training and prediction "
        "using the Iris dataset and Random Forest classifier."
    )

# Main content
tab1, tab2, tab3 = st.tabs(["üìà Training", "üéØ Prediction", "üìä Visualization"])

# Load data
@st.cache_data
def load_data():
    iris = load_iris()
    X = pd.DataFrame(iris.data, columns=iris.feature_names)
    y = iris.target
    return X, y, iris.target_names

X, y, target_names = load_data()

# Training tab
with tab1:
    st.header("Model Training")
    
    col1, col2 = st.columns(2)
    
    with col1:
        st.subheader("üìä Dataset Overview")
        st.write(f"**Total Samples:** {len(X)}")
        st.write(f"**Features:** {len(X.columns)}")
        st.write(f"**Classes:** {len(target_names)}")
        
        # Show data
        if st.checkbox("Show dataset"):
            st.dataframe(X.head(10))
    
    with col2:
        st.subheader("‚öôÔ∏è Model Configuration")
        st.write(f"**Algorithm:** Random Forest")
        st.write(f"**Trees:** {n_estimators}")
        st.write(f"**Max Depth:** {max_depth}")
    
    # Train model
    if train_button or 'model' not in st.session_state:
        with st.spinner("Training model..."):
            # Split data
            X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=0.2, random_state=42
            )
            
            # Train
            model = RandomForestClassifier(
                n_estimators=n_estimators,
                max_depth=max_depth,
                random_state=42
            )
            model.fit(X_train, y_train)
            
            # Evaluate
            y_pred = model.predict(X_test)
            accuracy = accuracy_score(y_test, y_pred)
            cm = confusion_matrix(y_test, y_pred)
            
            # Store in session
            st.session_state.model = model
            st.session_state.accuracy = accuracy
            st.session_state.cm = cm
            st.session_state.feature_importance = model.feature_importances_
        
        st.success("‚úÖ Model trained successfully!")
    
    # Show results
    if 'model' in st.session_state:
        st.markdown("---")
        st.subheader("üìä Training Results")
        
        # Metrics
        col1, col2, col3 = st.columns(3)
        col1.metric("Accuracy", f"{st.session_state.accuracy:.2%}")
        col2.metric("Train Samples", f"{int(len(X) * 0.8)}")
        col3.metric("Test Samples", f"{int(len(X) * 0.2)}")

# Prediction tab
with tab2:
    st.header("Make Predictions")
    
    if 'model' not in st.session_state:
        st.warning("‚ö†Ô∏è Please train the model first (Training tab)")
    else:
        st.subheader("Enter Feature Values")
        
        col1, col2 = st.columns(2)
        
        with col1:
            sepal_length = st.number_input(
                "Sepal Length (cm)",
                min_value=0.0,
                max_value=10.0,
                value=5.1,
                step=0.1
            )
            sepal_width = st.number_input(
                "Sepal Width (cm)",
                min_value=0.0,
                max_value=10.0,
                value=3.5,
                step=0.1
            )
        
        with col2:
            petal_length = st.number_input(
                "Petal Length (cm)",
                min_value=0.0,
                max_value=10.0,
                value=1.4,
                step=0.1
            )
            petal_width = st.number_input(
                "Petal Width (cm)",
                min_value=0.0,
                max_value=10.0,
                value=0.2,
                step=0.1
            )
        
        if st.button("üéØ Predict", type="primary"):
            # Make prediction
            features = np.array([[sepal_length, sepal_width, petal_length, petal_width]])
            prediction = st.session_state.model.predict(features)[0]
            probabilities = st.session_state.model.predict_proba(features)[0]
            
            # Display results
            st.markdown("---")
            st.subheader("üéØ Prediction Results")
            
            col1, col2 = st.columns(2)
            
            with col1:
                st.success(f"**Predicted Class:** {target_names[prediction]}")
                st.info(f"**Confidence:** {probabilities[prediction]:.2%}")
            
            with col2:
                # Probability chart
                prob_df = pd.DataFrame({
                    'Class': target_names,
                    'Probability': probabilities
                })
                fig = px.bar(
                    prob_df,
                    x='Class',
                    y='Probability',
                    title='Class Probabilities',
                    color='Probability',
                    color_continuous_scale='viridis'
                )
                st.plotly_chart(fig, use_container_width=True)

# Visualization tab
with tab3:
    st.header("Data Visualization")
    
    if 'model' not in st.session_state:
        st.warning("‚ö†Ô∏è Please train the model first (Training tab)")
    else:
        col1, col2 = st.columns(2)
        
        with col1:
            # Feature importance
            st.subheader("üìä Feature Importance")
            importance_df = pd.DataFrame({
                'Feature': X.columns,
                'Importance': st.session_state.feature_importance
            }).sort_values('Importance', ascending=False)
            
            fig = px.bar(
                importance_df,
                x='Importance',
                y='Feature',
                orientation='h',
                title='Feature Importance',
                color='Importance',
                color_continuous_scale='blues'
            )
            st.plotly_chart(fig, use_container_width=True)
        
        with col2:
            # Confusion matrix
            st.subheader("üéØ Confusion Matrix")
            fig = px.imshow(
                st.session_state.cm,
                labels=dict(x="Predicted", y="Actual", color="Count"),
                x=target_names,
                y=target_names,
                title='Confusion Matrix',
                color_continuous_scale='blues',
                text_auto=True
            )
            st.plotly_chart(fig, use_container_width=True)
        
        # Feature distributions
        st.subheader("üìà Feature Distributions")
        feature = st.selectbox("Select feature", X.columns)
        
        fig = px.histogram(
            X,
            x=feature,
            nbins=30,
            title=f'Distribution of {feature}',
            color_discrete_sequence=['#636EFA']
        )
        st.plotly_chart(fig, use_container_width=True)

# Footer
st.markdown("---")
st.markdown(
    "<div style='text-align: center; color: #666;'>\n"
    "Built with Streamlit üéà | Powered by scikit-learn ü§ñ\n"
    "</div>",
    unsafe_allow_html=True
)
'''

# Save Streamlit app
with open('streamlit_app.py', 'w') as f:
    f.write(streamlit_app)

print("üé® Streamlit App Created!\n")
print("="*70)
print("\n‚úÖ Saved to: streamlit_app.py")

print("\nüöÄ Run locally:")
print("   streamlit run streamlit_app.py")

print("\n‚òÅÔ∏è  Deploy to Streamlit Cloud (FREE):")
print("\n1Ô∏è‚É£ Push code to GitHub")
print("2Ô∏è‚É£ Go to share.streamlit.io")
print("3Ô∏è‚É£ Connect GitHub repo")
print("4Ô∏è‚É£ Select streamlit_app.py")
print("5Ô∏è‚É£ Click Deploy!")

print("\n" + "="*70)
print("\nüí° Your app will be live at: yourapp.streamlit.app")
print("üí∞ Cost: $0 (FREE forever!)")
print("üåü Perfect for impressive portfolio projects!")

## üè≠ Step 4: Production Model Serving

**Scale ML models to handle millions of requests**

### üéØ Model Serving Solutions:

#### 1Ô∏è‚É£ **TensorFlow Serving**
**Google's production ML serving system**

**Features:**
- ‚úÖ Optimized for TensorFlow/Keras
- ‚úÖ REST and gRPC APIs
- ‚úÖ Model versioning
- ‚úÖ A/B testing support
- ‚úÖ Batching for efficiency

**Best for:** TensorFlow models at scale

#### 2Ô∏è‚É£ **TorchServe**
**PyTorch's official serving tool**

**Features:**
- ‚úÖ PyTorch native
- ‚úÖ Multi-model serving
- ‚úÖ Auto-scaling
- ‚úÖ Metrics and logging

**Best for:** PyTorch models

#### 3Ô∏è‚É£ **KServe (formerly KFServing)**
**Kubernetes-native ML serving**

**Features:**
- ‚úÖ Framework-agnostic
- ‚úÖ Auto-scaling on K8s
- ‚úÖ Canary deployments
- ‚úÖ GPU support

**Best for:** Large-scale, multi-model deployments

#### 4Ô∏è‚É£ **BentoML**
**Modern ML serving framework**

**Features:**
- ‚úÖ Package models + dependencies
- ‚úÖ Deploy anywhere (Docker, K8s, cloud)
- ‚úÖ Built-in monitoring
- ‚úÖ Adaptive batching

**Best for:** Production deployments, any framework

#### 5Ô∏è‚É£ **FastAPI + Uvicorn**
**Custom solution (what we've been using!)**

**Features:**
- ‚úÖ Full control
- ‚úÖ Fast and modern
- ‚úÖ Easy to customize
- ‚úÖ Great documentation

**Best for:** Custom requirements, smaller scale

### üìä Comparison:

| Solution | Complexity | Performance | Flexibility | Best For |
|----------|------------|-------------|-------------|-----------|
| **TF Serving** | ‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê | TensorFlow models |
| **TorchServe** | ‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê | PyTorch models |
| **KServe** | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê | Enterprise K8s |
| **BentoML** | ‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | Production |
| **FastAPI** | ‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê | Custom needs |

### üèóÔ∏è Production Architecture:

```
Internet
    ‚Üì
Load Balancer (AWS ALB / GCP LB)
    ‚Üì
API Gateway (rate limiting, auth)
    ‚Üì
Model Servers [Auto-scaling]
‚îú‚îÄ Server 1 (GPU)
‚îú‚îÄ Server 2 (GPU)
‚îî‚îÄ Server 3 (GPU)
    ‚Üì
Model Cache (Redis)
    ‚Üì
Model Storage (S3 / GCS)
```

### üåü Production Best Practices:

**Performance:**
- ‚úÖ Use batching for throughput
- ‚úÖ Cache predictions when possible
- ‚úÖ Use GPUs for deep learning
- ‚úÖ Optimize model size (quantization, pruning)

**Reliability:**
- ‚úÖ Health checks and auto-restart
- ‚úÖ Graceful degradation
- ‚úÖ Circuit breakers
- ‚úÖ Retry logic

**Monitoring:**
- ‚úÖ Latency metrics (p50, p95, p99)
- ‚úÖ Throughput tracking
- ‚úÖ Error rate monitoring
- ‚úÖ Model performance tracking

**Security:**
- ‚úÖ API authentication (API keys, OAuth)
- ‚úÖ Rate limiting
- ‚úÖ Input validation
- ‚úÖ HTTPS only

Let's see production deployment code!

In [None]:
# Production-ready model serving with monitoring

production_code = '''
from fastapi import FastAPI, HTTPException, Depends, status
from fastapi.security import APIKeyHeader
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import joblib
import numpy as np
import time
import logging
from datetime import datetime
from prometheus_client import Counter, Histogram, generate_latest
from starlette.responses import Response

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Initialize FastAPI
app = FastAPI(
    title="Production ML API",
    description="Production-ready model serving with monitoring",
    version="1.0.0"
)

# Metrics
REQUEST_COUNT = Counter(
    'prediction_requests_total',
    'Total prediction requests',
    ['status']
)
REQUEST_LATENCY = Histogram(
    'prediction_latency_seconds',
    'Prediction latency in seconds'
)

# API Key authentication
API_KEY_HEADER = APIKeyHeader(name="X-API-Key")

def verify_api_key(api_key: str = Depends(API_KEY_HEADER)):
    """Verify API key"""
    # In production, check against database/secret manager
    valid_keys = ["your-secret-api-key-here"]
    if api_key not in valid_keys:
        logger.warning(f"Invalid API key attempt: {api_key[:10]}...")
        raise HTTPException(
            status_code=status.HTTP_401_UNAUTHORIZED,
            detail="Invalid API key"
        )
    return api_key

# Load model at startup
logger.info("Loading ML model...")
model = joblib.load('model.joblib')
logger.info("Model loaded successfully")

# Request/Response models
class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=4, max_items=4)
    
    @validator('features')
    def validate_features(cls, v):
        if any(x < 0 for x in v):
            raise ValueError('Features must be non-negative')
        return v
    
    class Config:
        schema_extra = {
            "example": {
                "features": [5.1, 3.5, 1.4, 0.2]
            }
        }

class PredictionResponse(BaseModel):
    prediction: int
    probability: float
    latency_ms: float
    timestamp: str

# Health check
@app.get("/health")
async def health_check():
    """Health check endpoint"""
    return {
        "status": "healthy",
        "model_loaded": model is not None,
        "timestamp": datetime.now().isoformat()
    }

# Readiness check
@app.get("/ready")
async def readiness_check():
    """Readiness check for load balancer"""
    if model is None:
        raise HTTPException(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            detail="Model not loaded"
        )
    return {"status": "ready"}

# Metrics endpoint
@app.get("/metrics")
async def metrics():
    """Prometheus metrics endpoint"""
    return Response(
        content=generate_latest(),
        media_type="text/plain"
    )

# Prediction endpoint
@app.post("/predict", response_model=PredictionResponse)
async def predict(
    request: PredictionRequest,
    api_key: str = Depends(verify_api_key)
):
    """Make prediction with authentication and monitoring"""
    start_time = time.time()
    
    try:
        # Convert to numpy array
        features = np.array(request.features).reshape(1, -1)
        
        # Make prediction
        prediction = int(model.predict(features)[0])
        probability = float(model.predict_proba(features).max())
        
        # Calculate latency
        latency_ms = (time.time() - start_time) * 1000
        
        # Log prediction
        logger.info(
            f"Prediction: {prediction}, "
            f"Probability: {probability:.4f}, "
            f"Latency: {latency_ms:.2f}ms"
        )
        
        # Update metrics
        REQUEST_COUNT.labels(status='success').inc()
        REQUEST_LATENCY.observe(time.time() - start_time)
        
        return PredictionResponse(
            prediction=prediction,
            probability=probability,
            latency_ms=latency_ms,
            timestamp=datetime.now().isoformat()
        )
    
    except Exception as e:
        logger.error(f"Prediction error: {str(e)}")
        REQUEST_COUNT.labels(status='error').inc()
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )

# Batch prediction
@app.post("/predict_batch")
async def predict_batch(
    requests: List[PredictionRequest],
    api_key: str = Depends(verify_api_key)
):
    """Batch predictions for efficiency"""
    start_time = time.time()
    
    try:
        # Collect all features
        features_batch = np.array([req.features for req in requests])
        
        # Batch prediction
        predictions = model.predict(features_batch)
        probabilities = model.predict_proba(features_batch)
        
        # Format results
        results = [
            {
                'prediction': int(pred),
                'probability': float(prob.max())
            }
            for pred, prob in zip(predictions, probabilities)
        ]
        
        latency_ms = (time.time() - start_time) * 1000
        
        logger.info(
            f"Batch prediction: {len(requests)} samples, "
            f"Latency: {latency_ms:.2f}ms"
        )
        
        REQUEST_COUNT.labels(status='success').inc()
        
        return {
            'results': results,
            'count': len(results),
            'latency_ms': latency_ms
        }
    
    except Exception as e:
        logger.error(f"Batch prediction error: {str(e)}")
        REQUEST_COUNT.labels(status='error').inc()
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(
        app,
        host="0.0.0.0",
        port=8000,
        log_level="info",
        access_log=True
    )
'''

with open('production_api.py', 'w') as f:
    f.write(production_code)

print("üè≠ Production API Created!\n")
print("="*70)
print("\n‚úÖ Saved to: production_api.py")

print("\nüåü Production Features:")
print("   ‚úÖ API key authentication")
print("   ‚úÖ Request validation (Pydantic)")
print("   ‚úÖ Comprehensive logging")
print("   ‚úÖ Prometheus metrics")
print("   ‚úÖ Health & readiness checks")
print("   ‚úÖ Batch prediction support")
print("   ‚úÖ Error handling")
print("   ‚úÖ Performance tracking")

print("\nüöÄ Deploy to production:")
print("   1. Docker: docker build -t ml-api .")
print("   2. K8s: kubectl apply -f deployment.yaml")
print("   3. Cloud: gcloud run deploy ml-api")

print("\n" + "="*70)

## üéØ Interactive Exercises

**Practice your cloud deployment skills!**

### Exercise 1: Deploy to Hugging Face Spaces

**Task:** Create and deploy a real ML app to Hugging Face Spaces

**Requirements:**
1. Choose a task (sentiment analysis, image classification, etc.)
2. Create a Gradio or Streamlit app
3. Add example inputs
4. Deploy to Spaces
5. Share the live URL!

**Bonus:** Add your own trained model instead of using pre-trained!

In [None]:
# YOUR SOLUTION HERE

# TODO: Create your Gradio/Streamlit app
# import gradio as gr

# TODO: Define your prediction function
# def predict(input):
#     ...

# TODO: Create interface
# demo = gr.Interface(...)

# TODO: Deploy to Spaces
# 1. Create account at huggingface.co
# 2. Create new Space
# 3. Upload app.py and requirements.txt

print("Complete the exercise above!")
print("\nYour app URL will be:")
print("https://huggingface.co/spaces/YOUR-USERNAME/YOUR-APP-NAME")

### Exercise 2: Build Production-Ready API

**Task:** Enhance the basic FastAPI with production features

**Add these features:**
1. Rate limiting (max 100 requests/minute)
2. Caching (cache predictions for 5 minutes)
3. Comprehensive error handling
4. API documentation with examples
5. Request/response logging

**Bonus:** Add Prometheus metrics and deploy to Railway or Render!

In [None]:
# YOUR SOLUTION HERE

# TODO: Add rate limiting
# from slowapi import Limiter

# TODO: Add caching
# from cachetools import TTLCache

# TODO: Enhance error handling
# @app.exception_handler(...)

# TODO: Add comprehensive logging
# import structlog

# TODO: Add metrics
# from prometheus_client import ...

print("Complete the exercise above!")
print("\nLibraries to explore:")
print("- slowapi (rate limiting)")
print("- cachetools (caching)")
print("- structlog (structured logging)")
print("- prometheus_client (metrics)")

## üéâ Key Takeaways

**Congratulations! You've mastered cloud ML deployment!**

### 1Ô∏è‚É£ **Cloud Platforms**
   - ‚úÖ AWS, GCP, Azure - know the major providers
   - ‚úÖ Each has ML-specific services
   - ‚úÖ Choose based on your needs and budget
   - **Use when:** Building production systems

### 2Ô∏è‚É£ **Serverless ML**
   - ‚úÖ Pay only for what you use
   - ‚úÖ Auto-scaling built-in
   - ‚úÖ No server management
   - **Use when:** Variable traffic, cost optimization

### 3Ô∏è‚É£ **Hugging Face Spaces**
   - ‚úÖ FREE ML app hosting
   - ‚úÖ Deploy in minutes
   - ‚úÖ Perfect for portfolio
   - **Use when:** Demos, MVPs, learning (always!)

### 4Ô∏è‚É£ **Streamlit/Gradio**
   - ‚úÖ Build ML apps without frontend skills
   - ‚úÖ Beautiful UIs out of the box
   - ‚úÖ Rapid prototyping
   - **Use when:** Need quick, impressive demos

### 5Ô∏è‚É£ **Production Serving**
   - ‚úÖ Monitoring, logging, metrics essential
   - ‚úÖ Authentication and security
   - ‚úÖ Auto-scaling and redundancy
   - **Use when:** Serving real users at scale

---

## üåü Deployment Decision Tree

**Choose your deployment strategy:**

```
Need deployment?
    |
    ‚îú‚îÄ Quick demo/portfolio? ‚Üí Hugging Face Spaces (FREE!)
    |
    ‚îú‚îÄ Internal tool? ‚Üí Streamlit Cloud (FREE!)
    |
    ‚îú‚îÄ Low traffic API? ‚Üí Railway/Render ($5-20/mo)
    |
    ‚îú‚îÄ Variable traffic? ‚Üí Serverless (AWS Lambda/Cloud Functions)
    |
    ‚îî‚îÄ Production scale? ‚Üí Cloud (AWS/GCP/Azure) with auto-scaling
```

---

## üìä Cost Comparison

**Monthly costs for different strategies:**

| Solution | Traffic | Cost | Best For |
|----------|---------|------|----------|
| **HF Spaces** | Any | $0 | Demos, portfolio |
| **Streamlit Cloud** | Low | $0 | Internal tools |
| **Railway** | Medium | $5-20 | Startups, MVPs |
| **Serverless** | Variable | $0-100 | Sporadic traffic |
| **Cloud VMs** | Constant | $50-500 | Production |
| **K8s Cluster** | High | $500+ | Enterprise |

---

## ‚úÖ Deployment Checklist

**Before deploying to production:**

**Code:**
- [ ] Input validation
- [ ] Error handling
- [ ] Logging configured
- [ ] API documentation
- [ ] Tests written

**Infrastructure:**
- [ ] Health checks
- [ ] Auto-scaling configured
- [ ] Load balancer set up
- [ ] SSL/HTTPS enabled
- [ ] Monitoring alerts

**Security:**
- [ ] API authentication
- [ ] Rate limiting
- [ ] Input sanitization
- [ ] Secrets in environment vars
- [ ] CORS configured

**Operations:**
- [ ] CI/CD pipeline
- [ ] Rollback procedure
- [ ] Monitoring dashboard
- [ ] Documentation complete
- [ ] On-call rotation

---

## üöÄ Next Steps

**Continue your deployment journey:**

1. **Deploy Real Projects:**
   - Create 3 Spaces apps for portfolio
   - Build Streamlit dashboard
   - Deploy to cloud (use free tiers)

2. **Learn Advanced Topics:**
   - Kubernetes for ML (KServe, Seldon)
   - Multi-model serving
   - Edge deployment (TensorFlow Lite)
   - Model optimization (ONNX, TensorRT)

3. **Build Portfolio:**
   - 3-5 deployed ML apps
   - GitHub repos with deployment docs
   - Blog posts about deployment
   - Contribute to open-source

---

**üí¨ Final Thoughts:**

*"You now have the complete skill set to deploy ML models from local prototypes to cloud-scale production systems. Hugging Face Spaces gives you FREE hosting to build an impressive portfolio. Streamlit lets you create beautiful apps without frontend skills. And production deployment knowledge makes you job-ready for ML engineering roles. The gap between learning ML and deploying ML is now closed - you can do both!"*

**üéâ Week 19 Complete! You've mastered MLOps & Deployment! üöÄ**

**What you've learned:**
- Day 1: Model deployment (Flask, FastAPI, Docker)
- Day 2: MLOps best practices (MLflow, monitoring, A/B testing)
- Day 3: Cloud deployment (AWS, GCP, Spaces, Streamlit)

**You can now:**
- Deploy models as REST APIs
- Build MLOps pipelines
- Deploy to cloud platforms
- Create production-ready ML systems

**üåü You're ready for ML engineering roles! üåü**

---

**üìö Additional Resources:**
- Hugging Face Spaces: https://huggingface.co/spaces
- Streamlit Docs: https://docs.streamlit.io
- AWS SageMaker: https://aws.amazon.com/sagemaker
- GCP Vertex AI: https://cloud.google.com/vertex-ai
- Full Stack Deep Learning: https://fullstackdeeplearning.com
- MLOps Community: https://mlops.community

**Keep deploying! üöÄ**