# Lab 4.6: Model Deployment and Production - From Lab to Real World

## Duration: 45 minutes

## Learning Objectives
By the end of this lab, you will be able to:
- Understand the model deployment pipeline
- Save and load models in different formats
- Convert models for different deployment targets
- Create simple web APIs for model inference
- Optimize models for production use
- Understand monitoring and maintenance considerations

## Prerequisites
- **Labs 4.1-4.5 completed** (Full TensorFlow journey)
- Understanding of neural networks and model training
- Basic knowledge of web concepts (helpful but not required)

## Key Concepts
- **Model Serialization**: Saving trained models for later use
- **Model Serving**: Making models available for predictions
- **TensorFlow Lite**: Optimized models for mobile and edge devices
- **TensorFlow.js**: Running models in web browsers
- **Model Optimization**: Making models faster and smaller
- **API Design**: Creating interfaces for model predictions

## Setup and Introduction

Let's start by understanding the journey from training to production:

In [15]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import json
import os
import warnings
warnings.filterwarnings('ignore')

# Set random seeds
np.random.seed(42)
tf.random.set_seed(42)

print("Lab 4.6: Model Deployment and Production")
print("=" * 50)
print(f"TensorFlow version: {tf.__version__}")

print("\n🏗️  The ML Pipeline:")
print("  1. 📊 Data Collection & Preparation")
print("  2. 🧠 Model Training & Validation (Labs 4.1-4.5)")
print("  3. 🚀 Model Deployment & Serving ← WE ARE HERE!")
print("  4. 📈 Monitoring & Maintenance")

print("\n🎯 Today's Mission:")
print("  • Take our trained models to production")
print("  • Make them available for real users")
print("  • Optimize for speed and efficiency")
print("  • Handle real-world challenges")

print("\n🔧 Deployment Options We'll Explore:")
print("  1. 💾 Model Saving & Loading (the basics)")
print("  2. 🌐 Web API for predictions")
print("  3. 📱 Mobile deployment with TensorFlow Lite")
print("  4. 🌏 Browser deployment with TensorFlow.js")
print("  5. ⚡ Model optimization techniques")

# Check available devices
print(f"\nAvailable compute devices:")
physical_devices = tf.config.list_physical_devices()
for device in physical_devices:
    print(f"  • {device.device_type}: {device.name}")

Lab 4.6: Model Deployment and Production
TensorFlow version: 2.20.0

🏗️  The ML Pipeline:
  1. 📊 Data Collection & Preparation
  2. 🧠 Model Training & Validation (Labs 4.1-4.5)
  3. 🚀 Model Deployment & Serving ← WE ARE HERE!
  4. 📈 Monitoring & Maintenance

🎯 Today's Mission:
  • Take our trained models to production
  • Make them available for real users
  • Optimize for speed and efficiency
  • Handle real-world challenges

🔧 Deployment Options We'll Explore:
  1. 💾 Model Saving & Loading (the basics)
  2. 🌐 Web API for predictions
  3. 📱 Mobile deployment with TensorFlow Lite
  4. 🌏 Browser deployment with TensorFlow.js
  5. ⚡ Model optimization techniques

Available compute devices:
  • CPU: /physical_device:CPU:0


## Step 1: Model Creation and Training - The Foundation

First, let's create a model and train it for deployment:

In [16]:
# Create and train a simple model for deployment
print("Creating a model for deployment...")

# Load MNIST for quick training
(X_train_mnist, y_train_mnist), (X_test_mnist, y_test_mnist) = keras.datasets.mnist.load_data()
X_train_mnist = X_train_mnist.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test_mnist = X_test_mnist.reshape(-1, 28, 28, 1).astype('float32') / 255.0

# Build a CNN for digit classification
deployment_model = keras.Sequential([
    layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
], name='DigitClassifier')

deployment_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

print(f"Model parameters: {deployment_model.count_params():,}")

# Quick training
print("\nTraining model for deployment (3 epochs)...")
deployment_model.fit(
    X_train_mnist[:10000], y_train_mnist[:10000],
    validation_data=(X_test_mnist[:2000], y_test_mnist[:2000]),
    epochs=3,
    batch_size=128,
    verbose=1
)

# Test the model
test_loss, test_acc = deployment_model.evaluate(X_test_mnist[:1000], y_test_mnist[:1000], verbose=0)
print(f"\nModel ready for deployment! Accuracy: {test_acc:.4f}")

Creating a model for deployment...
Model parameters: 121,930

Training model for deployment (3 epochs)...
Epoch 1/3
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.7714 - loss: 0.7834 - val_accuracy: 0.8890 - val_loss: 0.3503
Epoch 2/3
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - accuracy: 0.9431 - loss: 0.1981 - val_accuracy: 0.9410 - val_loss: 0.1776
Epoch 3/3
[1m79/79[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - accuracy: 0.9612 - loss: 0.1315 - val_accuracy: 0.9580 - val_loss: 0.1343

Model ready for deployment! Accuracy: 0.9660


In [17]:
print("Model Saving Formats for Different Deployments:")
print("=" * 60)

# 1. SavedModel format (TensorFlow Serving, Cloud deployment)
print("\n1. SavedModel Format (Production Standard):")
savedmodel_path = 'digit_classifier_savedmodel'
deployment_model.export(savedmodel_path)  # Use export() for SavedModel format
print(f"✅ Saved to: {savedmodel_path}")
print("   • Best for: TensorFlow Serving, Cloud platforms")
print("   • Includes: Architecture, weights, training config")
print("   • Format: Directory with multiple files")

# Check what was saved
import os
if os.path.exists(savedmodel_path):
    savedmodel_contents = os.listdir(savedmodel_path)
    print(f"   • Contents: {savedmodel_contents}")

# 2. Native Keras format (New recommended format)
print("\n2. Native Keras Format (.keras):")
keras_path = 'digit_classifier.keras'
deployment_model.save(keras_path)  # Use .keras extension
print(f"✅ Saved to: {keras_path}")
print("   • Best for: Keras applications, Python deployment")
print("   • Format: Single file with everything included")
print("   • Recommended for new projects")

# 3. H5 format (Traditional Keras format)
print("\n3. H5 Format (Legacy Keras):")
h5_path = 'digit_classifier.h5'
deployment_model.save(h5_path)  # Use .h5 extension
print(f"✅ Saved to: {h5_path}")
print("   • Best for: Legacy systems, backward compatibility")
print("   • Format: Single HDF5 file")
print("   • Being phased out in favor of .keras format")

# 4. Model weights only (requires specific .weights.h5 extension)
print("\n4. Weights Only:")
weights_path = 'digit_classifier.weights.h5'  # Must end with .weights.h5
deployment_model.save_weights(weights_path)
print(f"✅ Saved to: {weights_path}")
print("   • Best for: When you have model architecture separately")
print("   • Smaller file size")
print("   • Requires model rebuilding")
print("   • Note: Filename must end with .weights.h5 in newer Keras")

# Show file sizes
print("\n📊 File Size Comparison:")
def get_size(path):
    if os.path.isdir(path):
        total_size = sum(os.path.getsize(os.path.join(path, f)) 
                        for f in os.listdir(path) if os.path.isfile(os.path.join(path, f)))
        return total_size / (1024*1024)  # MB
    elif os.path.exists(path):
        return os.path.getsize(path) / (1024*1024)  # MB
    else:
        return 0

print(f"  SavedModel:     {get_size(savedmodel_path):.2f} MB")
print(f"  Keras format:   {get_size(keras_path):.2f} MB")
print(f"  H5 format:      {get_size(h5_path):.2f} MB")
print(f"  Weights only:   {get_size(weights_path):.2f} MB")

# Test loading different formats
print("\n🔄 Testing Model Loading:")

# Test loading .keras format (recommended)
try:
    loaded_keras_model = keras.models.load_model(keras_path)
    loaded_keras_acc = loaded_keras_model.evaluate(X_test_mnist[:100], y_test_mnist[:100], verbose=0)[1]
    print(f"✅ Loaded .keras model accuracy: {loaded_keras_acc:.4f}")
except Exception as e:
    print(f"❌ Error loading .keras model: {e}")

# Test loading SavedModel format
try:
    loaded_savedmodel = keras.models.load_model(savedmodel_path)
    loaded_savedmodel_acc = loaded_savedmodel.evaluate(X_test_mnist[:100], y_test_mnist[:100], verbose=0)[1]
    print(f"✅ Loaded SavedModel accuracy: {loaded_savedmodel_acc:.4f}")
except Exception as e:
    print(f"❌ Error loading SavedModel: {e}")

# Test loading H5 format
try:
    loaded_h5_model = keras.models.load_model(h5_path)
    loaded_h5_acc = loaded_h5_model.evaluate(X_test_mnist[:100], y_test_mnist[:100], verbose=0)[1]
    print(f"✅ Loaded H5 model accuracy: {loaded_h5_acc:.4f}")
except Exception as e:
    print(f"❌ Error loading H5 model: {e}")

# Test loading weights (requires model architecture)
print("\n🔄 Testing Weights-Only Loading:")
try:
    # Create a new model with same architecture
    weights_test_model = keras.Sequential([
        layers.Conv2D(32, 3, activation='relu', input_shape=(28, 28, 1)),
        layers.MaxPooling2D(),
        layers.Conv2D(64, 3, activation='relu'),
        layers.MaxPooling2D(),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(10, activation='softmax')
    ], name='WeightsTestModel')
    
    # Load the weights
    weights_test_model.load_weights(weights_path)
    
    # Compile (required for evaluation)
    weights_test_model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    weights_acc = weights_test_model.evaluate(X_test_mnist[:100], y_test_mnist[:100], verbose=0)[1]
    print(f"✅ Loaded weights-only model accuracy: {weights_acc:.4f}")
except Exception as e:
    print(f"❌ Error loading weights: {e}")

print("\n💡 Format Recommendations:")
print("   • Use .keras format for new Keras projects")
print("   • Use SavedModel for TensorFlow Serving/production")  
print("   • Use .h5 only for legacy compatibility")
print("   • Use .weights.h5 for custom loading scenarios")
print("   • Note: Keras 3+ requires specific file extensions!")

Model Saving Formats for Different Deployments:

1. SavedModel Format (Production Standard):
INFO:tensorflow:Assets written to: digit_classifier_savedmodel/assets


INFO:tensorflow:Assets written to: digit_classifier_savedmodel/assets


Saved artifact at 'digit_classifier_savedmodel'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_88')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  5320226384: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5320223120: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5320226576: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5320225040: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506402064: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506400528: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506401872: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506401680: TensorSpec(shape=(), dtype=tf.resource, name=None)




✅ Saved to: digit_classifier_savedmodel
   • Best for: TensorFlow Serving, Cloud platforms
   • Includes: Architecture, weights, training config
   • Format: Directory with multiple files
   • Contents: ['fingerprint.pb', 'variables', 'saved_model.pb', 'assets']

2. Native Keras Format (.keras):
✅ Saved to: digit_classifier.keras
   • Best for: Keras applications, Python deployment
   • Format: Single file with everything included
   • Recommended for new projects

3. H5 Format (Legacy Keras):
✅ Saved to: digit_classifier.h5
   • Best for: Legacy systems, backward compatibility
   • Format: Single HDF5 file
   • Being phased out in favor of .keras format

4. Weights Only:
✅ Saved to: digit_classifier.weights.h5
   • Best for: When you have model architecture separately
   • Smaller file size
   • Requires model rebuilding
   • Note: Filename must end with .weights.h5 in newer Keras

📊 File Size Comparison:
  SavedModel:     0.06 MB
  Keras format:   1.43 MB
  H5 format:      1.43 MB
  



✅ Loaded .keras model accuracy: 1.0000
❌ Error loading SavedModel: File format not supported: filepath=digit_classifier_savedmodel. Keras 3 only supports V3 `.keras` files and legacy H5 format files (`.h5` extension). Note that the legacy SavedModel format is not supported by `load_model()` in Keras 3. In order to reload a TensorFlow SavedModel as an inference-only layer in Keras 3, use `keras.layers.TFSMLayer(digit_classifier_savedmodel, call_endpoint='serving_default')` (note that your `call_endpoint` might have a different name).
✅ Loaded H5 model accuracy: 1.0000

🔄 Testing Weights-Only Loading:
✅ Loaded weights-only model accuracy: 1.0000

💡 Format Recommendations:
   • Use .keras format for new Keras projects
   • Use SavedModel for TensorFlow Serving/production
   • Use .h5 only for legacy compatibility
   • Use .weights.h5 for custom loading scenarios
   • Note: Keras 3+ requires specific file extensions!


## Step 2: Model Saving Formats - Choose Your Deployment Target

Let's explore different ways to save models for different deployment scenarios:

## Step 3: Model Optimization for Production

Let's optimize our model for faster inference and smaller size:

In [18]:
print("Model Optimization Techniques:")
print("=" * 50)

# 1. Model Quantization - Reduce precision for smaller size and faster inference
print("\n1. Post-Training Quantization:")
try:
    # Convert to TensorFlow Lite with quantization
    converter = tf.lite.TFLiteConverter.from_keras_model(deployment_model)
    
    # Enable optimization (quantization)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    
    # Convert the model
    quantized_tflite_model = converter.convert()
    
    # Save the quantized model
    tflite_path = 'digit_classifier_quantized.tflite'
    with open(tflite_path, 'wb') as f:
        f.write(quantized_tflite_model)
    
    print(f"✅ Quantized TFLite model saved to: {tflite_path}")
    print(f"   Size: {len(quantized_tflite_model) / 1024:.1f} KB")
    
    # Compare with original model size
    original_size = get_size(keras_path) * 1024  # Convert MB to KB
    compression_ratio = original_size / (len(quantized_tflite_model) / 1024)
    print(f"   Original Keras model: {original_size:.1f} KB")
    print(f"   Compression ratio: {compression_ratio:.1f}x smaller")
    
except Exception as e:
    print(f"❌ Error with quantization: {e}")

# 2. Model Pruning - Remove less important weights
print("\n2. Model Pruning (Conceptual):")
print("   • Removes weights close to zero")
print("   • Can reduce model size by 80-90%")
print("   • Requires tensorflow_model_optimization package")
print("   • Best done during training (structured pruning)")

# 3. Model Benchmarking - Test inference speed
print("\n3. Inference Speed Benchmarking:")
print("Testing inference speed on different formats...")

# Benchmark original Keras model
import time

# Warm up
for _ in range(5):
    _ = deployment_model.predict(X_test_mnist[:1], verbose=0)

# Benchmark Keras model
start_time = time.time()
iterations = 100
for _ in range(iterations):
    predictions = deployment_model.predict(X_test_mnist[:1], verbose=0)
keras_inference_time = (time.time() - start_time) / iterations * 1000  # ms per prediction

print(f"Keras model inference: {keras_inference_time:.2f} ms per prediction")

# Test TFLite model if available
try:
    # Load and test TFLite model
    interpreter = tf.lite.Interpreter(model_path=tflite_path)
    interpreter.allocate_tensors()
    
    # Get input and output tensors
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    
    # Warm up TFLite
    test_input = X_test_mnist[:1].astype(np.float32)
    for _ in range(5):
        interpreter.set_tensor(input_details[0]['index'], test_input)
        interpreter.invoke()
    
    # Benchmark TFLite
    start_time = time.time()
    for _ in range(iterations):
        interpreter.set_tensor(input_details[0]['index'], test_input)
        interpreter.invoke()
        tflite_output = interpreter.get_tensor(output_details[0]['index'])
    
    tflite_inference_time = (time.time() - start_time) / iterations * 1000  # ms per prediction
    
    print(f"TFLite model inference: {tflite_inference_time:.2f} ms per prediction")
    
    # Speed improvement
    speed_improvement = keras_inference_time / tflite_inference_time
    print(f"TFLite speedup: {speed_improvement:.1f}x faster")
    
    # Test accuracy of quantized model
    correct_predictions = 0
    total_predictions = 100
    
    for i in range(total_predictions):
        # Get prediction from original model
        original_pred = np.argmax(deployment_model.predict(X_test_mnist[i:i+1], verbose=0)[0])
        
        # Get prediction from TFLite model
        interpreter.set_tensor(input_details[0]['index'], X_test_mnist[i:i+1])
        interpreter.invoke()
        tflite_pred = np.argmax(interpreter.get_tensor(output_details[0]['index'])[0])
        
        if original_pred == tflite_pred:
            correct_predictions += 1
    
    accuracy_retention = correct_predictions / total_predictions
    print(f"Accuracy retention: {accuracy_retention:.1%} (predictions match original)")
    
except Exception as e:
    print(f"❌ TFLite benchmarking failed: {e}")

print("\n📊 Optimization Summary:")
print("   • Quantization reduces model size significantly")
print("   • TFLite optimizes for mobile/edge deployment") 
print("   • Trade-off: Size/Speed vs. Accuracy")
print("   • Choose optimization based on deployment target")

print("\n💡 Production Optimization Tips:")
print("   1. Profile your model to find bottlenecks")
print("   2. Use appropriate precision (FP16, INT8)")
print("   3. Batch predictions when possible")
print("   4. Consider model distillation for complex models")
print("   5. Test optimized models thoroughly")

Model Optimization Techniques:

1. Post-Training Quantization:
INFO:tensorflow:Assets written to: /var/folders/z6/dlcbzdks22nfwdj6w9k468180000gn/T/tmpen0jku3a/assets


INFO:tensorflow:Assets written to: /var/folders/z6/dlcbzdks22nfwdj6w9k468180000gn/T/tmpen0jku3a/assets


Saved artifact at '/var/folders/z6/dlcbzdks22nfwdj6w9k468180000gn/T/tmpen0jku3a'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_88')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  5320226384: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5320223120: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5320226576: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5320225040: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506402064: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506400528: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506401872: TensorSpec(shape=(), dtype=tf.resource, name=None)
  5506401680: TensorSpec(shape=(), dtype=tf.resource, name=None)


W0000 00:00:1756394125.600814 5607261 tf_tfl_flatbuffer_helpers.cc:364] Ignored output_format.
W0000 00:00:1756394125.600822 5607261 tf_tfl_flatbuffer_helpers.cc:367] Ignored drop_control_dependency.
2025-08-28 10:15:25.601424: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /var/folders/z6/dlcbzdks22nfwdj6w9k468180000gn/T/tmpen0jku3a
2025-08-28 10:15:25.601652: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-08-28 10:15:25.601656: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /var/folders/z6/dlcbzdks22nfwdj6w9k468180000gn/T/tmpen0jku3a
I0000 00:00:1756394125.603249 5607261 mlir_graph_optimization_pass.cc:437] MLIR V1 optimization pass is not enabled
2025-08-28 10:15:25.603493: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-08-28 10:15:25.613558: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /var/folder

✅ Quantized TFLite model saved to: digit_classifier_quantized.tflite
   Size: 127.9 KB
   Original Keras model: 1465.4 KB
   Compression ratio: 11.5x smaller

2. Model Pruning (Conceptual):
   • Removes weights close to zero
   • Can reduce model size by 80-90%
   • Requires tensorflow_model_optimization package
   • Best done during training (structured pruning)

3. Inference Speed Benchmarking:
Testing inference speed on different formats...
Keras model inference: 15.94 ms per prediction
TFLite model inference: 0.03 ms per prediction
TFLite speedup: 610.5x faster


INFO: Created TensorFlow Lite XNNPACK delegate for CPU.


Accuracy retention: 100.0% (predictions match original)

📊 Optimization Summary:
   • Quantization reduces model size significantly
   • TFLite optimizes for mobile/edge deployment
   • Trade-off: Size/Speed vs. Accuracy
   • Choose optimization based on deployment target

💡 Production Optimization Tips:
   1. Profile your model to find bottlenecks
   2. Use appropriate precision (FP16, INT8)
   3. Batch predictions when possible
   4. Consider model distillation for complex models
   5. Test optimized models thoroughly


## Step 4: Simple Model Serving API

Let's create a simple prediction service that could be deployed:

In [19]:
print("Creating a Simple Model Serving Interface:")
print("=" * 60)

# Create a simple prediction service class
class DigitClassifierService:
    """Simple model serving class for digit classification"""
    
    def __init__(self, model_path):
        """Initialize the service with a trained model"""
        print(f"Loading model from: {model_path}")
        self.model = keras.models.load_model(model_path)
        self.class_names = [str(i) for i in range(10)]  # Digits 0-9
        print("✅ Model loaded successfully!")
    
    def predict_single(self, image):
        """Predict a single image"""
        # Ensure correct input shape
        if len(image.shape) == 2:  # If 2D, add batch and channel dims
            image = image.reshape(1, 28, 28, 1)
        elif len(image.shape) == 3:  # If 3D, add batch dim
            image = image.reshape(1, *image.shape)
        
        # Make prediction
        prediction = self.model.predict(image, verbose=0)
        predicted_class = np.argmax(prediction[0])
        confidence = float(prediction[0][predicted_class])
        
        return {
            'predicted_digit': int(predicted_class),
            'confidence': confidence,
            'all_probabilities': {
                str(i): float(prediction[0][i]) for i in range(10)
            }
        }
    
    def predict_batch(self, images):
        """Predict multiple images"""
        predictions = self.model.predict(images, verbose=0)
        results = []
        
        for i, prediction in enumerate(predictions):
            predicted_class = np.argmax(prediction)
            confidence = float(prediction[predicted_class])
            
            results.append({
                'image_index': i,
                'predicted_digit': int(predicted_class),
                'confidence': confidence
            })
        
        return results
    
    def get_model_info(self):
        """Get information about the loaded model"""
        return {
            'model_name': self.model.name,
            'input_shape': self.model.input_shape,
            'output_shape': self.model.output_shape,
            'total_parameters': self.model.count_params(),
            'classes': self.class_names
        }

# Initialize the service
print("\n🚀 Initializing Digit Classifier Service:")
service = DigitClassifierService(keras_path)

# Get model information
model_info = service.get_model_info()
print(f"\n📋 Model Information:")
for key, value in model_info.items():
    print(f"   {key}: {value}")

# Test the service with sample images
print("\n🧪 Testing the Service:")
print("=" * 40)

# Test single prediction
test_image = X_test_mnist[0]  # First test image
actual_label = y_test_mnist[0]

print(f"Testing image with actual label: {actual_label}")
result = service.predict_single(test_image)

print(f"\n📊 Single Prediction Result:")
print(f"   Predicted digit: {result['predicted_digit']}")
print(f"   Confidence: {result['confidence']:.2%}")
print(f"   Correct: {'✅' if result['predicted_digit'] == actual_label else '❌'}")

# Show top 3 predictions
sorted_probs = sorted(result['all_probabilities'].items(), 
                     key=lambda x: x[1], reverse=True)
print(f"\n🔝 Top 3 predictions:")
for i, (digit, prob) in enumerate(sorted_probs[:3]):
    print(f"   {i+1}. Digit {digit}: {prob:.2%}")

# Test batch prediction
print(f"\n📦 Batch Prediction Test:")
batch_size = 5
test_batch = X_test_mnist[:batch_size]
actual_labels = y_test_mnist[:batch_size]

batch_results = service.predict_batch(test_batch)

print(f"Testing {batch_size} images:")
correct = 0
for result in batch_results:
    actual = actual_labels[result['image_index']]
    predicted = result['predicted_digit']
    is_correct = predicted == actual
    correct += is_correct
    
    print(f"   Image {result['image_index']}: "
          f"Actual={actual}, Predicted={predicted}, "
          f"Confidence={result['confidence']:.2%} "
          f"{'✅' if is_correct else '❌'}")

batch_accuracy = correct / batch_size
print(f"\nBatch accuracy: {batch_accuracy:.1%}")

# Simulate API-style request/response
print(f"\n🌐 API-Style Request/Response Example:")
print("=" * 50)

def simulate_api_request(image_data):
    """Simulate an API request"""
    try:
        # In a real API, you'd receive image data (e.g., base64 encoded)
        # and need to preprocess it
        
        # Preprocess (normalize, reshape)
        processed_image = image_data.astype(np.float32) / 255.0
        
        # Make prediction
        result = service.predict_single(processed_image)
        
        # Format API response
        api_response = {
            'status': 'success',
            'prediction': result['predicted_digit'],
            'confidence': round(result['confidence'], 4),
            'processing_time_ms': 12.5,  # Simulated
            'model_version': '1.0.0'
        }
        
        return api_response
        
    except Exception as e:
        return {
            'status': 'error',
            'error_message': str(e),
            'error_code': 'PREDICTION_FAILED'
        }

# Test the API simulation
sample_image = (X_test_mnist[2] * 255).astype(np.uint8)  # Convert back to 0-255
api_response = simulate_api_request(sample_image)

print("Sample API Request:")
print("POST /predict")
print("Content-Type: application/json")
print('{"image": "<base64_encoded_image_data>"}')

print(f"\nAPI Response:")
print(json.dumps(api_response, indent=2))

print(f"\n💡 Production API Considerations:")
print("   • Input validation and sanitization")
print("   • Error handling and proper HTTP status codes")  
print("   • Rate limiting and authentication")
print("   • Logging and monitoring")
print("   • Model versioning and A/B testing")
print("   • Caching for frequently requested predictions")
print("   • Load balancing for high traffic")

print(f"\n🔧 Deployment Options:")
print("   • FastAPI/Flask for Python web services")
print("   • TensorFlow Serving for high-performance serving")
print("   • Docker containers for easy deployment")
print("   • Kubernetes for orchestration")
print("   • Cloud platforms (AWS SageMaker, Google AI Platform)")

Creating a Simple Model Serving Interface:

🚀 Initializing Digit Classifier Service:
Loading model from: digit_classifier.keras
✅ Model loaded successfully!

📋 Model Information:
   model_name: DigitClassifier
   input_shape: (None, 28, 28, 1)
   output_shape: (None, 10)
   total_parameters: 121930
   classes: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

🧪 Testing the Service:
Testing image with actual label: 7

📊 Single Prediction Result:
   Predicted digit: 7
   Confidence: 99.99%
   Correct: ✅

🔝 Top 3 predictions:
   1. Digit 7: 99.99%
   2. Digit 3: 0.00%
   3. Digit 2: 0.00%

📦 Batch Prediction Test:
Testing 5 images:
   Image 0: Actual=7, Predicted=7, Confidence=99.99% ✅
   Image 1: Actual=2, Predicted=2, Confidence=99.36% ✅
   Image 2: Actual=1, Predicted=1, Confidence=99.55% ✅
   Image 3: Actual=0, Predicted=0, Confidence=99.55% ✅
   Image 4: Actual=4, Predicted=4, Confidence=99.96% ✅

Batch accuracy: 100.0%

🌐 API-Style Request/Response Example:
Sample API Request:
POS

## Final Summary: From Training to Production

Congratulations! You've completed the full journey from model development to production deployment:

In [20]:
print("🎓 Deployment Journey Complete!")
print("=" * 60)

# Summary of what we've accomplished
deployment_summary = {
    "Model Formats Mastered": [
        "✅ Native Keras (.keras) - Recommended for new projects",
        "✅ SavedModel - Production standard for TF Serving",
        "✅ H5 Format - Legacy compatibility",
        "✅ Weights-only (.weights.h5) - Custom scenarios"
    ],
    "Optimization Techniques": [
        "✅ Post-training quantization (TensorFlow Lite)",
        "✅ Model size reduction and compression",
        "✅ Inference speed benchmarking",
        "✅ Accuracy vs. performance trade-offs"
    ],
    "Serving Solutions": [
        "✅ Model serving class design",
        "✅ Single and batch prediction APIs",
        "✅ Error handling and validation",
        "✅ API response formatting"
    ],
    "Production Readiness": [
        "✅ Model loading and initialization",
        "✅ Performance monitoring",
        "✅ Service architecture patterns",
        "✅ Deployment considerations"
    ]
}

for category, achievements in deployment_summary.items():
    print(f"\n📋 {category}:")
    for achievement in achievements:
        print(f"   {achievement}")

print(f"\n🏗️  Complete ML Pipeline Overview:")
print("=" * 50)
pipeline_stages = [
    ("1. Data Collection", "Gather and prepare training data"),
    ("2. Model Development", "Design, train, and validate models (Labs 4.1-4.5)"),
    ("3. Model Optimization", "Quantization, pruning, compression"),
    ("4. Model Deployment", "Save, serve, and scale models"),
    ("5. Monitoring & Maintenance", "Track performance, retrain as needed")
]

for stage, description in pipeline_stages:
    print(f"   {stage}: {description}")

print(f"\n🚀 Next Steps for Production:")
print("=" * 40)

next_steps = [
    "🔧 Set up CI/CD pipelines for model updates",
    "📊 Implement comprehensive monitoring and logging", 
    "🛡️  Add security measures (authentication, input validation)",
    "⚡ Optimize for your specific deployment environment",
    "🧪 Set up A/B testing for model versions",
    "📈 Plan for model retraining workflows",
    "🌐 Consider edge deployment for mobile/IoT devices",
    "🔄 Implement automated model validation pipelines"
]

for step in next_steps:
    print(f"   {step}")

print(f"\n📚 Key Takeaways:")
print("=" * 30)

takeaways = [
    "Model format choice depends on deployment target",
    "Optimization is crucial for production performance", 
    "Always test optimized models thoroughly",
    "Plan your serving architecture early",
    "Monitor everything in production",
    "Automate as much as possible",
    "Security and reliability are non-negotiable"
]

for i, takeaway in enumerate(takeaways, 1):
    print(f"   {i}. {takeaway}")

# Create a final visualization of the journey
print(f"\n📊 Your Deep Learning Journey:")
print("=" * 50)

labs_completed = [
    ("Lab 4.1", "TensorFlow Fundamentals", "✅"),
    ("Lab 4.2", "Deep Network Architecture", "✅"), 
    ("Lab 4.3", "Convolutional Networks", "✅"),
    ("Lab 4.4", "Recurrent Networks", "✅"),
    ("Lab 4.5", "Transfer Learning", "✅"),
    ("Lab 4.6", "Model Deployment", "✅")
]

for lab, topic, status in labs_completed:
    print(f"   {status} {lab}: {topic}")

print(f"\n🏆 Congratulations!")
print("You've mastered the complete deep learning pipeline:")
print("   • From basic TensorFlow operations to production deployment")
print("   • Multiple network architectures (Dense, CNN, RNN)")
print("   • Advanced techniques (Transfer Learning, Fine-tuning)")
print("   • Production-ready model serving and optimization")

print(f"\n🎯 You're now ready to:")
print("   • Build and deploy real-world ML applications")
print("   • Optimize models for production environments") 
print("   • Design scalable ML serving architectures")
print("   • Handle the full ML lifecycle professionally")

print(f"\n💪 Keep Learning:")
print("   • Explore MLOps tools and practices")
print("   • Study distributed training techniques")
print("   • Learn about model interpretability")
print("   • Dive deeper into specific domains (NLP, Computer Vision, etc.)")

print(f"\n🌟 Well done! Your deep learning journey continues...")

🎓 Deployment Journey Complete!

📋 Model Formats Mastered:
   ✅ Native Keras (.keras) - Recommended for new projects
   ✅ SavedModel - Production standard for TF Serving
   ✅ H5 Format - Legacy compatibility
   ✅ Weights-only (.weights.h5) - Custom scenarios

📋 Optimization Techniques:
   ✅ Post-training quantization (TensorFlow Lite)
   ✅ Model size reduction and compression
   ✅ Inference speed benchmarking
   ✅ Accuracy vs. performance trade-offs

📋 Serving Solutions:
   ✅ Model serving class design
   ✅ Single and batch prediction APIs
   ✅ Error handling and validation
   ✅ API response formatting

📋 Production Readiness:
   ✅ Model loading and initialization
   ✅ Performance monitoring
   ✅ Service architecture patterns
   ✅ Deployment considerations

🏗️  Complete ML Pipeline Overview:
   1. Data Collection: Gather and prepare training data
   2. Model Development: Design, train, and validate models (Labs 4.1-4.5)
   3. Model Optimization: Quantization, pruning, compression
   4. 