# Lecture 83 – Model Serialization for Deployment

## Learning Objectives
- Understand different model serialization formats (`.h5`, `SavedModel`, `.pkl`)
- Learn to save and load TensorFlow/Keras models
- Version models with timestamps for production tracking
- Serialize preprocessing artifacts (tokenizers, scalers)
- Test loaded models to ensure consistency

## Expected Runtime
~5 minutes (includes small CNN training on Fashion-MNIST subset)

## Prerequisites
- Python 3.9+
- TensorFlow 2.x
- NumPy, joblib

---

## Setup and Environment Check

In [None]:
# Install required packages (uncomment if needed)
# !pip install tensorflow==2.15.0 numpy joblib scikit-learn

In [None]:
import sys
import tensorflow as tf
import numpy as np
import joblib
import pickle
import os
from datetime import datetime
from pathlib import Path

print(f"Python version: {sys.version}")
print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")

## 1. Train a Simple CNN on Fashion-MNIST (Subset)

We'll train a lightweight CNN for image classification. In production, you'd use a larger dataset, but this demonstrates the serialization workflow.

In [None]:
# Load Fashion-MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# Use a small subset for fast training
SUBSET_SIZE = 5000
x_train_subset = x_train[:SUBSET_SIZE]
y_train_subset = y_train[:SUBSET_SIZE]
x_test_subset = x_test[:1000]
y_test_subset = y_test[:1000]

# Normalize and reshape
x_train_subset = x_train_subset.astype('float32') / 255.0
x_test_subset = x_test_subset.astype('float32') / 255.0
x_train_subset = x_train_subset.reshape(-1, 28, 28, 1)
x_test_subset = x_test_subset.reshape(-1, 28, 28, 1)

print(f"Training data shape: {x_train_subset.shape}")
print(f"Test data shape: {x_test_subset.shape}")

In [None]:
# Define a simple CNN model
def create_cnn_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    return model

model = create_cnn_model()
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

In [None]:
# Train the model (just 2 epochs for demonstration)
history = model.fit(
    x_train_subset, y_train_subset,
    epochs=2,
    batch_size=128,
    validation_split=0.2,
    verbose=1
)

In [None]:
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test_subset, y_test_subset, verbose=0)
print(f"\nTest accuracy: {test_acc:.4f}")

## 2. Model Serialization – Multiple Formats

### 2.1 Save as HDF5 (.h5) – Legacy Format
The `.h5` format is widely used but considered legacy. It's simple and compact.

In [None]:
# Create models directory
models_dir = Path('../models')
models_dir.mkdir(exist_ok=True)

# Save with timestamp for versioning
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
h5_path = models_dir / f"fashion_mnist_cnn_{timestamp}.h5"

model.save(h5_path)
print(f"Model saved as HDF5: {h5_path}")
print(f"File size: {h5_path.stat().st_size / 1024:.2f} KB")

### 2.2 Save as SavedModel Format – Recommended for Production
The `SavedModel` format is TensorFlow's native format, supporting TensorFlow Serving and deployment.

In [None]:
# Save as SavedModel (directory format)
savedmodel_path = models_dir / f"fashion_mnist_cnn_savedmodel_{timestamp}"
model.save(savedmodel_path, save_format='tf')
print(f"Model saved as SavedModel: {savedmodel_path}")

# Check directory structure
for item in savedmodel_path.rglob('*'):
    if item.is_file():
        print(f"  {item.relative_to(savedmodel_path)} - {item.stat().st_size / 1024:.2f} KB")

### 2.3 Save Preprocessing Artifacts
In production, you need to apply the same preprocessing pipeline. Save scalers, tokenizers, etc.

In [None]:
# Create a simple preprocessing configuration
preprocessing_config = {
    'normalization': 'divide_by_255',
    'input_shape': (28, 28, 1),
    'num_classes': 10,
    'class_names': [
        'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
        'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'
    ]
}

# Save using joblib (preferred for scikit-learn objects)
config_path = models_dir / f"preprocessing_config_{timestamp}.pkl"
joblib.dump(preprocessing_config, config_path)
print(f"Preprocessing config saved: {config_path}")

# Also save using pickle (alternative)
config_path_pickle = models_dir / f"preprocessing_config_{timestamp}_pickle.pkl"
with open(config_path_pickle, 'wb') as f:
    pickle.dump(preprocessing_config, f)
print(f"Preprocessing config saved (pickle): {config_path_pickle}")

## 3. Model Loading and Validation

### 3.1 Load HDF5 Model

In [None]:
# Load the .h5 model
loaded_h5_model = tf.keras.models.load_model(h5_path)
print("Model loaded from HDF5 format")

# Verify it works
h5_predictions = loaded_h5_model.predict(x_test_subset[:5], verbose=0)
print(f"\nPrediction shape: {h5_predictions.shape}")
print(f"Sample predictions (first image): {h5_predictions[0].argmax()}")

### 3.2 Load SavedModel Format

In [None]:
# Load SavedModel
loaded_savedmodel = tf.keras.models.load_model(savedmodel_path)
print("Model loaded from SavedModel format")

# Verify predictions match
savedmodel_predictions = loaded_savedmodel.predict(x_test_subset[:5], verbose=0)
print(f"\nPrediction shape: {savedmodel_predictions.shape}")
print(f"Sample predictions (first image): {savedmodel_predictions[0].argmax()}")

# Assert predictions are identical
np.testing.assert_allclose(h5_predictions, savedmodel_predictions, rtol=1e-5)
print("✓ Both models produce identical predictions!")

### 3.3 Load Preprocessing Config

In [None]:
# Load preprocessing config
loaded_config = joblib.load(config_path)
print("Preprocessing config loaded:")
print(f"  Normalization: {loaded_config['normalization']}")
print(f"  Input shape: {loaded_config['input_shape']}")
print(f"  Classes: {loaded_config['class_names']}")

## 4. Production-Ready Prediction Function

Combine model + preprocessing for a complete inference pipeline.

In [None]:
def preprocess_image(image_array, config):
    """Apply preprocessing based on saved config."""
    if config['normalization'] == 'divide_by_255':
        image_array = image_array.astype('float32') / 255.0
    
    # Ensure correct shape
    if len(image_array.shape) == 2:
        image_array = image_array.reshape(1, 28, 28, 1)
    elif len(image_array.shape) == 3:
        image_array = image_array.reshape(-1, 28, 28, 1)
    
    return image_array

def predict_with_labels(model, image_array, config):
    """Complete inference pipeline with class labels."""
    # Preprocess
    processed = preprocess_image(image_array, config)
    
    # Predict
    predictions = model.predict(processed, verbose=0)
    
    # Get class labels
    predicted_classes = predictions.argmax(axis=1)
    predicted_labels = [config['class_names'][i] for i in predicted_classes]
    
    return predicted_labels, predictions

# Test the pipeline
test_image = x_test[0]  # Raw image (not normalized)
labels, probs = predict_with_labels(loaded_savedmodel, test_image, loaded_config)

print(f"Predicted class: {labels[0]}")
print(f"Confidence: {probs[0].max():.4f}")
print(f"Actual class: {loaded_config['class_names'][y_test[0]]}")

## 5. Model Versioning Best Practices

In [None]:
# Create a model registry metadata file
model_metadata = {
    'model_name': 'fashion_mnist_cnn',
    'version': timestamp,
    'framework': 'tensorflow',
    'framework_version': tf.__version__,
    'model_type': 'cnn',
    'dataset': 'fashion_mnist',
    'training_samples': SUBSET_SIZE,
    'test_accuracy': float(test_acc),
    'input_shape': [28, 28, 1],
    'output_classes': 10,
    'saved_formats': ['h5', 'savedmodel'],
    'created_at': datetime.now().isoformat(),
    'files': {
        'h5_model': str(h5_path.name),
        'savedmodel': str(savedmodel_path.name),
        'preprocessing': str(config_path.name)
    }
}

import json
metadata_path = models_dir / f"model_metadata_{timestamp}.json"
with open(metadata_path, 'w') as f:
    json.dump(model_metadata, f, indent=2)

print(f"Model metadata saved: {metadata_path}")
print("\nMetadata content:")
print(json.dumps(model_metadata, indent=2))

## 6. Testing and Validation

In [None]:
def test_model_serialization(model_path, test_data, test_labels, is_savedmodel=False):
    """Test that loaded model produces correct outputs."""
    # Load model
    loaded_model = tf.keras.models.load_model(model_path)
    
    # Test shape
    predictions = loaded_model.predict(test_data[:10], verbose=0)
    assert predictions.shape == (10, 10), f"Expected shape (10, 10), got {predictions.shape}"
    
    # Test accuracy
    loss, acc = loaded_model.evaluate(test_data, test_labels, verbose=0)
    assert acc > 0.5, f"Model accuracy too low: {acc}"
    
    # Test prediction consistency
    pred1 = loaded_model.predict(test_data[:5], verbose=0)
    pred2 = loaded_model.predict(test_data[:5], verbose=0)
    np.testing.assert_allclose(pred1, pred2, rtol=1e-5)
    
    print(f"✓ All tests passed for {model_path.name}")
    return True

# Run tests
test_model_serialization(h5_path, x_test_subset, y_test_subset)
test_model_serialization(savedmodel_path, x_test_subset, y_test_subset, is_savedmodel=True)

## 7. Production Deployment Checklist

### Before deploying your model:

- [ ] **Model format**: Use SavedModel for TensorFlow Serving or ONNX for cross-platform
- [ ] **Versioning**: Include timestamp or semantic version in filenames
- [ ] **Metadata**: Save training config, metrics, and dependencies
- [ ] **Preprocessing**: Serialize all preprocessing steps (scalers, tokenizers)
- [ ] **Testing**: Validate loaded model produces identical predictions
- [ ] **Size optimization**: Consider model quantization or pruning for edge deployment
- [ ] **Security**: Scan model files for vulnerabilities (pickle can execute code)
- [ ] **Storage**: Use object storage (S3, GCS) with versioning enabled
- [ ] **Documentation**: Document input/output specs and example usage

### Shell commands for model management:

```bash
# List saved models
ls -lh ../models/

# Check SavedModel structure
saved_model_cli show --dir ../models/fashion_mnist_cnn_savedmodel_*/

# Upload to S3 (AWS)
aws s3 cp ../models/ s3://my-models-bucket/fashion-mnist/ --recursive

# Upload to GCS (Google Cloud)
gsutil -m cp -r ../models/ gs://my-models-bucket/fashion-mnist/
```

---

## Extension Ideas

1. **Model Quantization**: Convert to TensorFlow Lite for mobile deployment
2. **ONNX Export**: Convert model to ONNX format for cross-framework compatibility
3. **Model Registry**: Integrate with MLflow or Weights & Biases for tracking
4. **A/B Testing**: Save multiple model versions and compare in production
5. **Automated Testing**: Create pytest suite that validates model loading
6. **Model Signing**: Add cryptographic signatures to verify model integrity

---

**Next**: `02_serving_fastapi.ipynb` - Learn to serve this model via REST API