# Lecture 83 – Hands-On Lab: End-to-End Model Deployment

## Lab Objectives
In this lab, you will:
1. Train a CNN model on Fashion-MNIST
2. Serialize the model and artifacts
3. Deploy via FastAPI
4. Test the API locally
5. Containerize with Docker
6. (Optional) Deploy to cloud platform

## Expected Time
~15-20 minutes

## Grading Rubric (100 points)
- Model Training & Serialization (20 pts)
- API Implementation (30 pts)
- Testing & Documentation (20 pts)
- Docker Containerization (20 pts)
- Code Quality & Best Practices (10 pts)

---

## Step 1: Environment Setup

In [None]:
import sys
import tensorflow as tf
import numpy as np
import joblib
import json
from datetime import datetime
from pathlib import Path

print(f"Python: {sys.version}")
print(f"TensorFlow: {tf.__version__}")
print(f"GPU Available: {len(tf.config.list_physical_devices('GPU')) > 0}")

In [None]:
# Create directories
models_dir = Path('../models')
models_dir.mkdir(exist_ok=True)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
print(f"Deployment timestamp: {timestamp}")

## Step 2: Load and Prepare Data

**Task**: Load Fashion-MNIST and create train/test splits

In [None]:
# Load dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# Use subset for faster training
TRAIN_SIZE = 10000
TEST_SIZE = 2000

x_train = x_train[:TRAIN_SIZE]
y_train = y_train[:TRAIN_SIZE]
x_test = x_test[:TEST_SIZE]
y_test = y_test[:TEST_SIZE]

# Normalize
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Reshape
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

print(f"Training samples: {x_train.shape[0]}")
print(f"Test samples: {x_test.shape[0]}")
print(f"Input shape: {x_train.shape[1:]}")

## Step 3: Build and Train Model

**Task**: Create a CNN architecture and train it

In [None]:
# Define model architecture
def create_model():
    model = tf.keras.Sequential([
        # TODO: Add your CNN layers here
        # Hint: Conv2D, MaxPooling2D, Flatten, Dense, Dropout
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    return model

model = create_model()
model.summary()

In [None]:
# Compile model
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train
print("Training model...")
history = model.fit(
    x_train, y_train,
    epochs=3,  # Increase for better accuracy
    batch_size=128,
    validation_split=0.2,
    verbose=1
)

In [None]:
# Evaluate
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print(f"\n{'='*50}")
print(f"Test Accuracy: {test_acc:.4f}")
print(f"Test Loss: {test_loss:.4f}")
print(f"{'='*50}")

## Step 4: Serialize Model

**Task**: Save model in multiple formats

In [None]:
# Save as HDF5
h5_path = models_dir / f"fashion_cnn_lab_{timestamp}.h5"
model.save(h5_path)
print(f"✓ Saved HDF5: {h5_path.name}")

# Save as SavedModel
savedmodel_path = models_dir / f"fashion_cnn_lab_savedmodel_{timestamp}"
model.save(savedmodel_path, save_format='tf')
print(f"✓ Saved SavedModel: {savedmodel_path.name}")

In [None]:
# Save preprocessing config
class_names = [
    'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
    'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'
]

config = {
    'normalization': 'divide_by_255',
    'input_shape': [28, 28, 1],
    'num_classes': 10,
    'class_names': class_names
}

config_path = models_dir / f"preprocessing_config_{timestamp}.pkl"
joblib.dump(config, config_path)
print(f"✓ Saved config: {config_path.name}")

In [None]:
# Save metadata
metadata = {
    'model_name': 'fashion_cnn_lab',
    'version': timestamp,
    'framework': 'tensorflow',
    'framework_version': tf.__version__,
    'test_accuracy': float(test_acc),
    'training_samples': TRAIN_SIZE,
    'epochs': 3,
    'created_at': datetime.now().isoformat()
}

metadata_path = models_dir / f"model_metadata_{timestamp}.json"
with open(metadata_path, 'w') as f:
    json.dump(metadata, f, indent=2)

print(f"✓ Saved metadata: {metadata_path.name}")

## Step 5: Test Model Loading

**Task**: Verify saved model works correctly

In [None]:
# Load model
loaded_model = tf.keras.models.load_model(savedmodel_path)
loaded_config = joblib.load(config_path)

print("✓ Model and config loaded successfully")

# Test prediction
test_image = x_test[0:1]
prediction = loaded_model.predict(test_image, verbose=0)
predicted_class = prediction.argmax()

print(f"\nSample Prediction:")
print(f"  Predicted: {loaded_config['class_names'][predicted_class]}")
print(f"  Actual: {loaded_config['class_names'][y_test[0]]}")
print(f"  Confidence: {prediction[0][predicted_class]:.4f}")

## Step 6: Start FastAPI Server

**Task**: Launch the API server

In a terminal, run:
```bash
cd ../apps/fastapi_app
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
```

In [None]:
# Check if server is running
import requests
import time

def check_server(url="http://localhost:8000", retries=3):
    for i in range(retries):
        try:
            response = requests.get(f"{url}/ping", timeout=2)
            if response.status_code == 200:
                print(f"✓ Server is running at {url}")
                return True
        except:
            if i < retries - 1:
                print(f"Waiting for server... (attempt {i+1}/{retries})")
                time.sleep(2)
    
    print("✗ Server not running")
    print("\nStart server with:")
    print("  cd ../apps/fastapi_app")
    print("  uvicorn app:app --reload")
    return False

server_running = check_server()

## Step 7: Test API Endpoints

**Task**: Verify all endpoints work correctly

In [None]:
API_URL = "http://localhost:8000"

if server_running:
    # Test 1: Health check
    print("Test 1: Health Check")
    response = requests.get(f"{API_URL}/ping")
    print(f"  Status: {response.status_code}")
    print(f"  Response: {response.json()}\n")
    
    # Test 2: Metadata
    print("Test 2: Metadata")
    response = requests.get(f"{API_URL}/metadata")
    print(f"  Status: {response.status_code}")
    print(f"  Model: {response.json()['model_name']}\n")
    
    # Test 3: Prediction
    print("Test 3: Prediction")
    # Get raw test image
    (_, _), (x_test_raw, y_test_raw) = tf.keras.datasets.fashion_mnist.load_data()
    test_images = x_test_raw[:3].tolist()
    
    response = requests.post(
        f"{API_URL}/predict",
        json={"instances": test_images}
    )
    
    print(f"  Status: {response.status_code}")
    if response.status_code == 200:
        results = response.json()
        for i, pred in enumerate(results['predictions']):
            print(f"  Image {i+1}: {pred['class_name']} ({pred['confidence']:.4f})")
    
    print("\n✓ All API tests passed!")
else:
    print("Server not running. Skipping API tests.")

## Step 8: Performance Benchmarking

**Task**: Measure API latency and throughput

In [None]:
if server_running:
    # Benchmark API
    import time
    
    latencies = []
    num_requests = 50
    
    print(f"Running {num_requests} requests...")
    
    for i in range(num_requests):
        test_image = x_test_raw[i:i+1].tolist()
        
        start = time.time()
        response = requests.post(
            f"{API_URL}/predict",
            json={"instances": test_image}
        )
        latency = (time.time() - start) * 1000
        
        if response.status_code == 200:
            latencies.append(latency)
    
    # Statistics
    print(f"\n{'='*50}")
    print("Performance Metrics:")
    print(f"  Requests: {len(latencies)}")
    print(f"  Mean latency: {np.mean(latencies):.2f} ms")
    print(f"  Median latency: {np.median(latencies):.2f} ms")
    print(f"  P95 latency: {np.percentile(latencies, 95):.2f} ms")
    print(f"  P99 latency: {np.percentile(latencies, 99):.2f} ms")
    print(f"  Throughput: {1000 / np.mean(latencies):.2f} req/sec")
    print(f"{'='*50}")
else:
    print("Server not running. Skipping benchmark.")

## Step 9: Docker Containerization

**Task**: Build and test Docker container

In [None]:
# Build Docker image
print("Building Docker image...")
print("\nRun these commands in terminal:")
print()
print("cd ../apps/fastapi_app")
print("docker build -t fashion-cnn-api:v1.0 .")
print()
print("# Run container")
print("docker run -d \\")
print("  --name fashion-api \\")
print("  -p 8000:8000 \\")
print("  -v $(pwd)/../../models:/app/models \\")
print("  fashion-cnn-api:v1.0")
print()
print("# Check logs")
print("docker logs -f fashion-api")

## Step 10: Documentation

**Task**: Create deployment documentation

In [None]:
deployment_doc = f"""
# Fashion-MNIST CNN Deployment

## Model Information
- **Model**: Fashion-MNIST CNN Classifier
- **Version**: {timestamp}
- **Accuracy**: {test_acc:.4f}
- **Framework**: TensorFlow {tf.__version__}

## Files
- Model (H5): {h5_path.name}
- Model (SavedModel): {savedmodel_path.name}
- Config: {config_path.name}
- Metadata: {metadata_path.name}

## API Endpoints
- `GET /ping` - Health check
- `GET /metadata` - Model information
- `POST /predict` - Make predictions
- `GET /docs` - Interactive documentation

## Deployment Steps

### Local Deployment
```bash
# Start server
cd apps/fastapi_app
uvicorn app:app --host 0.0.0.0 --port 8000
```

### Docker Deployment
```bash
# Build
docker build -t fashion-cnn-api .

# Run
docker run -d -p 8000:8000 fashion-cnn-api
```

### Cloud Deployment
See README.md for AWS, GCP, and Azure instructions.

## Testing
```bash
# Health check
curl http://localhost:8000/ping

# Make prediction
curl -X POST http://localhost:8000/predict \\
  -H "Content-Type: application/json" \\
  -d '{{
    "instances": [[[0, 0, ...]]]
  }}'
```

## Performance
- Mean Latency: {np.mean(latencies):.2f}ms (measured locally)
- Throughput: {1000 / np.mean(latencies):.2f} req/sec

## Monitoring
- Metrics endpoint: `/metrics`
- Logs: Check Docker logs or application logs

## Maintenance
- Model retraining: Run `scripts/01_model_serialization.py`
- Update deployment: Rebuild Docker image with new model
"""

# Save documentation
doc_path = models_dir / f"DEPLOYMENT_{timestamp}.md"
with open(doc_path, 'w') as f:
    f.write(deployment_doc)

print(f"✓ Documentation saved: {doc_path.name}")
print("\nPreview:")
print(deployment_doc[:500] + "...")

## Lab Checklist

Complete the following tasks:

### Required (90 points)
- [ ] Train CNN model with >70% accuracy (20 pts)
- [ ] Save model in both H5 and SavedModel formats (10 pts)
- [ ] Create preprocessing config and metadata (10 pts)
- [ ] Start FastAPI server successfully (15 pts)
- [ ] All API endpoints working (15 pts)
- [ ] Performance benchmark completed (10 pts)
- [ ] Create deployment documentation (10 pts)

### Bonus (10 points)
- [ ] Docker container built and running (5 pts)
- [ ] Deploy to cloud platform (5 pts)

## Submission

Submit the following:
1. This completed notebook
2. Model files (H5, SavedModel, config, metadata)
3. Deployment documentation
4. Screenshot of API docs page (`/docs`)
5. (Optional) Docker image or cloud deployment URL

---

## Extension Ideas

1. **A/B Testing**: Deploy two model versions and compare
2. **Monitoring Dashboard**: Create Grafana dashboard for metrics
3. **CI/CD Pipeline**: Automate deployment with GitHub Actions
4. **Load Testing**: Use Locust or k6 for stress testing
5. **Model Optimization**: Quantize model to TFLite
6. **Multi-model Serving**: Serve multiple models in one API

---

**Congratulations!** You've completed an end-to-end ML deployment pipeline.

This lab covered:
- ✓ Model training and serialization
- ✓ REST API development
- ✓ Testing and benchmarking
- ✓ Containerization
- ✓ Documentation

You're now ready to deploy production ML systems!