<a href="https://colab.research.google.com/github/prof-sd1/Data-Science/blob/main/AI_Module_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 7: AI Model Deployment

## **7.1: Saving & Loading AI Models**

### **7.1.1 Why Save and Load Models?**

  * **Concept:**
    Saving a trained model means serializing its learned parameters (weights and biases), its architecture (the layers and how they connect), and sometimes even its optimizer state, so it can be stored on disk. Loading a model means deserializing this information back into memory.

      * **Benefits:**
        1.  **Persistence:** Avoid retraining the model from scratch every time you want to use it. This saves significant time and computational resources.
        2.  **Deployment:** Deploying a trained model into production environments (e.g., web applications, mobile apps, edge devices) requires it to be saved and then loaded by the serving infrastructure.
        3.  **Reproducibility:** Share your exact trained model with others, ensuring they can reproduce your results without needing your original training code or data.
        4.  **Transfer Learning:** As seen in Module 6.4, pre-trained models are loaded and then adapted for new tasks.
        5.  **Checkpointing:** Save model states during long training runs. If training is interrupted (e.g., power outage, Colab disconnect), you can resume from the last saved checkpoint instead of starting over.
        6.  **Experimentation:** Easily compare different model architectures or training strategies by saving and loading their best performing versions.

  * **What's saved?**

      * **Model Architecture:** The definition of the layers and their connections.
      * **Model Weights:** The learned numerical parameters of each layer.
      * **Training Configuration:** The optimizer, loss function, and metrics used during `model.compile()`.
      * **Optimizer State:** The internal state of the optimizer (e.g., momentum in Adam), allowing training to resume exactly where it left off.

  * **Colab Explanation:**

In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
import numpy as np
import os

print("--- 7.1.1 Why Save and Load Models? ---")

print("Training AI models can take hours, days, or even weeks.")
print("Saving and loading models allows us to store this 'learned knowledge' permanently.")

# Simulate a simple model training
# 1. Load and prepare data (MNIST for simplicity)
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28)).astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28)).astype('float32') / 255

# 2. Define a simple model
def create_simple_model():
    model = models.Sequential([
        layers.Dense(128, activation='relu', input_shape=(28 * 28,)),
        layers.Dropout(0.2),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

model = create_simple_model()
print("\nSimple model created. Summary:")
model.summary()

# Simulate training for a few epochs
print("\nSimulating model training for 3 epochs...")
history = model.fit(train_images, train_labels, epochs=3, batch_size=64, validation_split=0.1, verbose=0)
print("Training finished.")

# Evaluate the trained model
loss, acc = model.evaluate(test_images, test_labels, verbose=0)
print(f"Trained model accuracy: {acc:.4f}")

print("\nWithout saving, if this Colab session ends, all this training progress is lost!")
print("Saving allows us to resume training, share, or deploy this trained model.")

--- 7.1.1 Why Save and Load Models? ---
Training AI models can take hours, days, or even weeks.
Saving and loading models allows us to store this 'learned knowledge' permanently.
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step

Simple model created. Summary:


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)



Simulating model training for 3 epochs...
Training finished.
Trained model accuracy: 0.9694

Without saving, if this Colab session ends, all this training progress is lost!
Saving allows us to resume training, share, or deploy this trained model.


* **Exercise:**

    1.  Imagine you are training a very large image classification model on a dataset that takes 24 hours to train fully. Your Colab session has a maximum runtime of 12 hours. How would saving and loading models help you complete the training?
    2.  You've developed a new, custom activation function for your neural network. When you save and then try to load the *entire* model, what potential issue might arise if you don't handle this custom object correctly?

-----

### **7.1.2 Saving Entire Models (Recommended)**

  * **Concept:**
    Saving the entire model is the most comprehensive way to persist your model. It saves:

      * The model's architecture.
      * The model's weights.
      * The model's training configuration (optimizer, loss, metrics).
      * The state of the optimizer (allowing you to resume training).

    TensorFlow/Keras supports a few formats for saving entire models:

    1.  **Native Keras Format (`.keras`):**
          * **Recommended for TensorFlow 2.x and Keras 3.** This is a single `.keras` file (actually a zip archive) that contains everything. It's designed to be portable and easy to use.
          * `model.save('my_model.keras')`
    2.  **TensorFlow SavedModel Format:**
          * This is TensorFlow's universal serialization format. It saves the model as a directory containing a `saved_model.pb` file (the computation graph) and `variables/` directory (weights).
          * Ideal for deployment with TensorFlow Serving, TensorFlow.js, TensorFlow Lite, or for use with `tf.saved_model.load()`.
          * `model.save('my_model_savedmodel')` (saves to a directory)
    3.  **HDF5 Format (`.h5`):**
          * A legacy format that was popular in older Keras versions. It saves the model as a single `.h5` file.
          * Still supported but generally superseded by `.keras` and SavedModel for new projects due to limitations with custom objects and more complex TensorFlow features.
          * `model.save('my_model.h5')`

  * **Colab Example:**

In [5]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
import numpy as np
import os

print("\n--- 7.1.2 Saving Entire Models ---")

# --- 1. Define the model ---
def create_simple_model():
    model = keras.Sequential([
        layers.Dense(512, activation='relu', input_shape=(28 * 28,)),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# --- 2. Load and preprocess MNIST data ---
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28)).astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28)).astype('float32') / 255

# --- 3. Train the model ---
model = create_simple_model()
model.fit(train_images, train_labels, epochs=3, batch_size=64, validation_split=0.1, verbose=0)
loss, acc = model.evaluate(test_images, test_labels, verbose=0)
print(f"Model accuracy before saving: {acc:.4f}")

# --- 4. Saving in Native Keras Format (.keras) ---
keras_model_path = 'my_model.keras'
model.save(keras_model_path)
print(f"\nModel saved to: {keras_model_path}")
print(f"File size: {os.path.getsize(keras_model_path) / (1024*1024):.2f} MB")

# --- 5. Saving in TensorFlow SavedModel Format ---
savedmodel_path = 'my_model_savedmodel'
model.export(savedmodel_path)  # For TF Serving / TFLite use
print(f"\nModel exported to directory: {savedmodel_path}")
print(f"Contents of {savedmodel_path}: {os.listdir(savedmodel_path)}")
print(f"Contents of {savedmodel_path}/variables: {os.listdir(os.path.join(savedmodel_path, 'variables'))}")

# --- 6. Saving in HDF5 Format (.h5) ---
h5_model_path = 'my_model.h5'
model.save(h5_model_path)
print(f"\nModel saved to: {h5_model_path}")
print(f"File size: {os.path.getsize(h5_model_path) / (1024*1024):.2f} MB")

print("\n✅ All three formats demonstrated for saving the entire model.")



--- 7.1.2 Saving Entire Models ---
Model accuracy before saving: 0.9768

Model saved to: my_model.keras
File size: 4.68 MB
Saved artifact at 'my_model_savedmodel'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 784), dtype=tf.float32, name='keras_tensor_7')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  136735065083152: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136735065081232: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136735065085072: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136735065086800: TensorSpec(shape=(), dtype=tf.resource, name=None)





Model exported to directory: my_model_savedmodel
Contents of my_model_savedmodel: ['variables', 'saved_model.pb', 'assets', 'fingerprint.pb']
Contents of my_model_savedmodel/variables: ['variables.index', 'variables.data-00000-of-00001']

Model saved to: my_model.h5
File size: 4.68 MB

✅ All three formats demonstrated for saving the entire model.


* **Exercise:**

    1.  After running the code, inspect the contents of the `my_model_savedmodel` directory. What are the main files/folders you see, and what do you think each contains?
    2.  If you had a very complex model with many custom layers and a custom training loop, which saving format (among `.keras`, SavedModel, `.h5`) would generally be the most robust for ensuring it loads correctly without needing the original Python code for custom objects? Why?

-----

### **7.1.3 Loading Entire Models**

  * **Concept:**
    Loading an entire model is straightforward using `tf.keras.models.load_model()`. This function automatically detects the format (whether `.keras`, SavedModel directory, or `.h5`) and reconstructs the model, including its architecture, weights, and compilation information.

      * **Key point:** When loading a model saved with its optimizer state, you can resume training from where you left off.

  * **Colab Example:**

In [12]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist
import numpy as np
import os

print("\n--- 7.1.3 Loading Entire Models ---")

# --- Load and preprocess MNIST test data ---
(_, _), (test_images, test_labels) = mnist.load_data()
test_images = test_images.reshape((10000, 28 * 28)).astype('float32') / 255

# --- 1. Load from Native Keras Format (.keras) ---
print("\n--- Loading from .keras format ---")
loaded_model_keras = keras.models.load_model('my_model.keras')
loaded_model_keras.summary()
loss, acc = loaded_model_keras.evaluate(test_images, test_labels, verbose=0)
print(f"✅ Accuracy of model loaded from .keras: {acc:.4f}")

# --- 2. Load from TensorFlow SavedModel using TFSMLayer (inference only) ---
print("\n--- Loading from SavedModel format (Keras 3.x, inference only) ---")
saved_model_layer = keras.layers.TFSMLayer('my_model_savedmodel', call_endpoint='serving_default')
loaded_model_savedmodel = keras.Sequential([
    keras.Input(shape=(784,)),
    saved_model_layer
])
loaded_model_savedmodel.summary()

# Predict and handle dict output if present
pred_probs = loaded_model_savedmodel.predict(test_images, verbose=0)

if isinstance(pred_probs, dict):
    print("Prediction output is a dict with keys:", pred_probs.keys())
    first_key = list(pred_probs.keys())[0]
    pred_array = pred_probs[first_key]
    if hasattr(pred_array, 'numpy'):
        pred_array = pred_array.numpy()
else:
    pred_array = pred_probs

# Convert to labels
if pred_array.ndim == 2:
    pred_labels = np.argmax(pred_array, axis=1)
else:
    pred_labels = pred_array.squeeze()

pred_labels = pred_labels.astype(int)

acc = np.mean(pred_labels == test_labels)
print(f"✅ Accuracy of model loaded from SavedModel (manual eval): {acc:.4f}")

# --- 3. Load from HDF5 Format (.h5) ---
print("\n--- Loading from .h5 format ---")
loaded_model_h5 = keras.models.load_model('my_model.h5')
loaded_model_h5.summary()
loss, acc = loaded_model_h5.evaluate(test_images, test_labels, verbose=0)
print(f"✅ Accuracy of model loaded from .h5: {acc:.4f}")

print("\n🎉 All models loaded and evaluated successfully!")



--- 7.1.3 Loading Entire Models ---

--- Loading from .keras format ---


✅ Accuracy of model loaded from .keras: 0.9768

--- Loading from SavedModel format (Keras 3.x, inference only) ---




Prediction output is a dict with keys: dict_keys(['output_0'])
✅ Accuracy of model loaded from SavedModel (manual eval): 0.9768

--- Loading from .h5 format ---


✅ Accuracy of model loaded from .h5: 0.9768

🎉 All models loaded and evaluated successfully!


* **Exercise:**

    1.  What would happen if you tried to load a model using `tf.keras.models.load_model()` but the file path pointed to a directory that was *not* a SavedModel, or a file that was not a `.keras` or `.h5` file?
    2.  If you save a model and then load it, and then try to train it for more epochs, why is it important that the optimizer's state was also saved and loaded?

-----

### **7.1.4 Saving and Loading Only Weights**

  * **Concept:**
    Sometimes, you only want to save or load the learned weights of a model, not its entire architecture or optimizer state. This is useful in scenarios like:

      * **Transfer Learning (Feature Extraction):** You define a new model architecture (e.g., a new classification head) and then load pre-trained weights into its base.

      * **Model Checkpointing:** During training, you might only save weights at regular intervals to save disk space, and then load the best weights into a freshly instantiated model.

      * **A/B Testing:** Quickly swap different sets of weights into the same model architecture for testing.

      * **`model.save_weights()`:** Saves only the weights of the model.

          * By default, it saves in TensorFlow Checkpoint format (multiple files, often with `.ckpt` suffix).
          * You can specify `save_format='h5'` to save in HDF5 format (`.h5` file).

      * **`model.load_weights()`:** Loads weights into a model.

          * The model's architecture must be identical to the one from which the weights were saved. If there are mismatches, you might need `skip_mismatch=True` or `by_name=True` (for HDF5).

  * **Colab Example:**

In [14]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
import numpy as np
import os

print("\n--- 7.1.4 Saving and Loading Only Weights ---")

# Load MNIST data and preprocess
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.astype("float32") / 255.0
test_images = test_images.astype("float32") / 255.0

# Add channel dimension for Conv2D
train_images = np.expand_dims(train_images, -1)  # shape (60000, 28, 28, 1)
test_images = np.expand_dims(test_images, -1)

# Simple CNN model creator function
def create_simple_model():
    model = models.Sequential([
        layers.Input(shape=(28, 28, 1)),
        layers.Conv2D(32, kernel_size=3, activation='relu'),
        layers.MaxPooling2D(pool_size=2),
        layers.Flatten(),
        layers.Dense(100, activation='relu'),
        layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Create and train a model
model_weights_only = create_simple_model()
model_weights_only.fit(train_images, train_labels, epochs=3, batch_size=64, validation_split=0.1, verbose=0)
loss_before_save, acc_before_save = model_weights_only.evaluate(test_images, test_labels, verbose=0)
print(f"Original model accuracy before saving weights: {acc_before_save:.4f}")

# --- 1. Save Weights (TensorFlow Checkpoint format - requires .ckpt suffix) ---
weights_tf_path = 'my_model_weights_tf_checkpoint.ckpt'  # Note the .ckpt extension
model_weights_only.save_weights(weights_tf_path)
print(f"\nWeights saved to TensorFlow Checkpoint format: {weights_tf_path}")
print(f"Contents of current directory: {os.listdir('.')}")

# --- 2. Save Weights (HDF5 format - requires .weights.h5 suffix) ---
weights_h5_path = 'my_model_weights.weights.h5'  # Must end with .weights.h5
model_weights_only.save_weights(weights_h5_path)
print(f"\nWeights saved to HDF5 format: {weights_h5_path}")
print(f"File size: {os.path.getsize(weights_h5_path) / (1024*1024):.2f} MB")

# --- Loading Weights ---

# New untrained model for loading weights
new_model_for_loading = create_simple_model()
loss_untrained, acc_untrained = new_model_for_loading.evaluate(test_images, test_labels, verbose=0)
print(f"\nUntrained model accuracy: {acc_untrained:.4f}")

# Load weights from TensorFlow Checkpoint
new_model_for_loading.load_weights(weights_tf_path)
loss_after_load_tf, acc_after_load_tf = new_model_for_loading.evaluate(test_images, test_labels, verbose=0)
print(f"Accuracy after loading TF Checkpoint weights: {acc_after_load_tf:.4f}")

# Load weights from HDF5 (new model instance)
new_model_for_loading_h5 = create_simple_model()
new_model_for_loading_h5.load_weights(weights_h5_path)
loss_after_load_h5, acc_after_load_h5 = new_model_for_loading_h5.evaluate(test_images, test_labels, verbose=0)
print(f"Accuracy after loading HDF5 weights: {acc_after_load_h5:.4f}")

print("\nAccuracy after loading weights matches the original trained model, confirming successful saving and loading.")



--- 7.1.4 Saving and Loading Only Weights ---
Original model accuracy before saving weights: 0.9833


ValueError: The filename must end in `.weights.h5`. Received: filepath=my_model_weights_tf_checkpoint.ckpt

* **Exercise:**

    1.  You have a pre-trained `MobileNetV2` model (without its top classification layer) and want to load its ImageNet weights into a new model you've defined. Which `model.load_weights()` parameter would be particularly useful if your new model's layer names don't exactly match the original `MobileNetV2` layer names, but their shapes are compatible?
    2.  When would saving *only* the weights be more advantageous than saving the entire model? Provide two scenarios.

-----

### **7.1.5 Best Practices and Deployment Considerations**

  * **Concept:**
    Effective model saving and loading are crucial for robust deployment.

      * **Which format to use?**

          * **`.keras` (Recommended):** For most Keras workflows in TensorFlow 2.x. It's self-contained and handles custom objects well.
          * **SavedModel:** If you plan to deploy with TensorFlow Serving, TensorFlow Lite, or TensorFlow.js, or need to integrate with other TensorFlow ecosystem tools. It's the most comprehensive and portable format within the TensorFlow ecosystem.
          * **`.h5`:** Primarily for backward compatibility or when you need a single file and don't have complex custom objects. Less recommended for new projects.

      * **Versioning:**

          * As models evolve, it's critical to manage different versions. Store models in versioned directories (e.g., `models/v1/`, `models/v2/`).
          * Include metadata (training date, dataset version, hyperparameters) with each saved model.
          * Use tools like MLflow or DVC for more sophisticated model versioning and tracking.

      * **Custom Objects:** If your model uses custom layers, loss functions, metrics, or activation functions, you need to inform `load_model()` about them.

          * **Recommended:** Use `tf.keras.utils.register_keras_serializable()` decorator on your custom classes/functions.
          * **Alternative:** Pass them via the `custom_objects` argument in `load_model()`.

      * **Deployment Environment:**

          * Ensure the environment where the model is loaded has the same TensorFlow/Keras version (or compatible versions) as the environment where it was saved.
          * Consider the target platform: TensorFlow Lite for mobile/edge, TensorFlow.js for web browsers, TensorFlow Serving for scalable production APIs.

  * **Colab Explanation (Conceptual & Custom Object Example):**

In [15]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import numpy as np
import os

print("\n--- 7.1.5 Best Practices and Deployment Considerations ---")

print("\n**Choosing the Right Format:**")
print("- `.keras` (Native Keras): Best for general Keras use, single file, portable.")
print("- `SavedModel`: Best for TensorFlow ecosystem deployment (TF Serving, TF Lite, TF.js).")
print("- `.h5` (HDF5): Legacy, single file, but can have issues with custom objects.")

print("\n**Handling Custom Objects (Layers, Losses, etc.):**")

# Define a custom layer
class CustomDenseLayer(layers.Layer):
    def __init__(self, units=32, **kwargs):
        super(CustomDenseLayer, self).__init__(**kwargs)
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
        )
        self.b = self.add_weight(
            shape=(self.units,), initializer="zeros", trainable=True
        )
        super(CustomDenseLayer, self).build(input_shape) # Must call at the end

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

    def get_config(self):
        config = super(CustomDenseLayer, self).get_config()
        config.update({"units": self.units})
        return config

# Recommended way to register custom objects for saving/loading
@tf.keras.utils.register_keras_serializable()
class CustomActivation(layers.Layer):
    def __init__(self, **kwargs):
        super(CustomActivation, self).__init__(**kwargs)
        self.threshold = tf.Variable(0.0, trainable=True, name='threshold') # Example trainable parameter

    def call(self, inputs):
        return tf.where(inputs > self.threshold, inputs, 0.0)

    def get_config(self):
        config = super(CustomActivation, self).get_config()
        # No need to manually add trainable variables if they are tf.Variable and part of the layer
        return config

print("\nCustom layers defined. `CustomActivation` is registered for serialization.")

# Create a model with custom layers
custom_model = models.Sequential([
    layers.Input(shape=(10,)),
    CustomDenseLayer(units=5),
    CustomActivation(), # Using the registered custom activation
    layers.Dense(1, activation='sigmoid')
])
custom_model.compile(optimizer='adam', loss='binary_crossentropy')
print("\nModel with custom layers created:")
custom_model.summary()

# Save the model
custom_model_path_keras = 'custom_model.keras'
custom_model.save(custom_model_path_keras)
print(f"\nCustom model saved to: {custom_model_path_keras}")

# Load the model (should work seamlessly because CustomActivation is registered)
print("\nAttempting to load custom model...")
loaded_custom_model = keras.models.load_model(custom_model_path_keras)
loaded_custom_model.summary()
print("Custom model loaded successfully using `.keras` format (due to registration).")

# If CustomActivation was NOT registered, you'd need:
# loaded_custom_model = keras.models.load_model(custom_model_path_keras,
#                                               custom_objects={'CustomActivation': CustomActivation})

print("\n**Versioning:**")
print("Organize your saved models in versioned directories (e.g., 'model_artifacts/v1/', 'model_artifacts/v2/').")
print("Include metadata (e.g., training date, dataset version, performance metrics) with each version.")

print("\n**Deployment Environment:**")
print("- Match TensorFlow/Keras versions between training and deployment.")
print("- Consider target platforms: TF Lite (mobile/edge), TF.js (web), TF Serving (API).")


--- 7.1.5 Best Practices and Deployment Considerations ---

**Choosing the Right Format:**
- `.keras` (Native Keras): Best for general Keras use, single file, portable.
- `SavedModel`: Best for TensorFlow ecosystem deployment (TF Serving, TF Lite, TF.js).
- `.h5` (HDF5): Legacy, single file, but can have issues with custom objects.

**Handling Custom Objects (Layers, Losses, etc.):**

Custom layers defined. `CustomActivation` is registered for serialization.

Model with custom layers created:



Custom model saved to: custom_model.keras

Attempting to load custom model...


TypeError: <class 'keras.src.models.sequential.Sequential'> could not be deserialized properly. Please ensure that components that are Python object instances (layers, models, etc.) returned by `get_config()` are explicitly deserialized in the model's `from_config()` method.

config={'module': 'keras', 'class_name': 'Sequential', 'config': {'name': 'sequential_11', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None, 'shared_object_id': 136735103768976}, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_shape': [None, 10], 'dtype': 'float32', 'sparse': False, 'name': 'input_layer_11'}, 'registered_name': None}, {'module': None, 'class_name': 'CustomDenseLayer', 'config': {'name': 'custom_dense_layer', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 5}, 'registered_name': 'CustomDenseLayer', 'build_config': {'input_shape': [None, 10]}}, {'module': None, 'class_name': 'CustomActivation', 'config': {'name': 'custom_activation', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None, 'shared_object_id': 136735103768976}}, 'registered_name': 'Custom>CustomActivation', 'build_config': {'input_shape': [None, 5]}}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_10', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None, 'shared_object_id': 136735103768976}, 'units': 1, 'activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': [None, 5]}}], 'build_input_shape': [None, 10]}, 'registered_name': None, 'build_config': {'input_shape': [None, 10]}, 'compile_config': {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'adam', 'learning_rate': 0.0010000000474974513, 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'loss_scale_factor': None, 'gradient_accumulation_steps': None, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': 'binary_crossentropy', 'loss_weights': None, 'metrics': None, 'weighted_metrics': None, 'run_eagerly': False, 'steps_per_execution': 1, 'jit_compile': False}}.

Exception encountered: Could not locate class 'CustomDenseLayer'. Make sure custom classes are decorated with `@keras.saving.register_keras_serializable()`. Full object config: {'module': None, 'class_name': 'CustomDenseLayer', 'config': {'name': 'custom_dense_layer', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 5}, 'registered_name': 'CustomDenseLayer', 'build_config': {'input_shape': [None, 10]}}

* **Exercise:**

    1.  Modify the `CustomDenseLayer` by adding the `@tf.keras.utils.register_keras_serializable()` decorator to it. Save the `custom_model` to a new `.keras` file and then try to load it. Does it still load correctly?
    2.  You have a model saved as `my_model_savedmodel`. You want to deploy it as a web service that can handle many concurrent prediction requests. Which TensorFlow deployment tool would be most suitable for this scenario, and why?


## **7.2: Streamlit & Flask for Web Apps**


### **7.2.1 Introduction to Web Deployment for AI Models**

  * **Concept:**
    Deploying an AI model means making it available for use by others, often in a production environment. For many applications, this involves creating a web interface or an API (Application Programming Interface) that allows users or other software systems to interact with the model.

      * **Why Web Deployment?**

          * **Accessibility:** Users can access the model's functionality through a web browser without needing to install complex software or understand the underlying code.
          * **Scalability:** Web applications can be designed to handle multiple users and requests concurrently.
          * **Integration:** APIs allow other applications (mobile apps, backend services) to programmatically send data to the model and receive predictions.
          * **Demonstration & Prototyping:** Quickly showcase your model's capabilities to stakeholders.

      * **Common Deployment Scenarios:**

          * **Interactive Web App:** A user-friendly interface where users input data (text, images, numbers) and see the model's predictions directly on a webpage (e.g., a sentiment analyzer, an image classifier demo).
          * **API Endpoint:** A programmatic interface where other software sends HTTP requests (e.g., JSON data) to your model and receives HTTP responses (e.g., JSON predictions). This is common for integrating AI into larger systems.

  * **Colab Explanation:**

In [16]:
print("--- 7.2.1 Introduction to Web Deployment for AI Models ---")

print("Imagine you've trained a fantastic AI model that can detect diseases from X-rays.")
print("How do you let doctors use it without them needing to run your Python code?")

print("\nWeb deployment is the answer!")
print("It allows us to wrap our AI model in a web interface or API, making it accessible.")

print("\nTwo common ways to deploy Python-based AI models as web applications are:")
print("1. Streamlit: For quick, interactive dashboards and demos (less web development knowledge needed).")
print("2. Flask: For building more custom web APIs or full-fledged web applications (more control, requires basic web dev knowledge).")

--- 7.2.1 Introduction to Web Deployment for AI Models ---
Imagine you've trained a fantastic AI model that can detect diseases from X-rays.
How do you let doctors use it without them needing to run your Python code?

Web deployment is the answer!
It allows us to wrap our AI model in a web interface or API, making it accessible.

Two common ways to deploy Python-based AI models as web applications are:
1. Streamlit: For quick, interactive dashboards and demos (less web development knowledge needed).
2. Flask: For building more custom web APIs or full-fledged web applications (more control, requires basic web dev knowledge).


* **Exercise:**

    1.  You've trained an object detection model that identifies different types of waste in an image for recycling purposes. Describe two different ways you could deploy this model for practical use, one as an interactive web app and one as an API.
    2.  What is the primary advantage of deploying an AI model as an API endpoint compared to a standalone desktop application?

-----

### **7.2.2 Streamlit for Interactive AI Web Apps**

  * **Concept:**
    **Streamlit** is an open-source Python library that makes it incredibly easy to create beautiful, custom web applications for machine learning and data science. It's designed for data scientists who want to build interactive dashboards and demos without needing extensive web development (HTML, CSS, JavaScript) knowledge.

      * **Key Features:**

          * **Python-Native:** Write your entire web app in pure Python.
          * **Rapid Prototyping:** Turn data scripts into interactive apps in minutes.
          * **Simple API:** Provides intuitive functions (`st.write()`, `st.slider()`, `st.button()`, `st.image()`) to display content and create interactive widgets.
          * **Automatic Reruns:** The app automatically reruns from top to bottom whenever a user interacts with a widget, making it reactive.
          * **Caching (`@st.cache_data`, `@st.cache_resource`):** Optimizes performance by caching expensive computations (like model loading) to avoid rerunning them unnecessarily.

      * **Pros:**

          * Extremely fast development cycle for demos and internal tools.
          * No front-end knowledge required.
          * Great for data visualization and interactive exploration.
          * Active community and growing ecosystem of components.

      * **Cons:**

          * Limited UI customization compared to traditional web frameworks.
          * Can be less suitable for complex, multi-page applications or high-traffic production systems.
          * The "rerun from top" execution model can sometimes be tricky to manage state.

  * **Colab Example:**
    To run Streamlit apps in Colab, we typically need to use `ngrok` or `localtunnel` to expose the local server to the internet. We'll demonstrate a simple Streamlit app that you can run locally or in Colab.

In [18]:
# Full Streamlit app code saved as a string
streamlit_app_code = """
import streamlit as st
import numpy as np
import time

@st.cache_resource
def load_my_ai_model():
    time.sleep(2)
    class DummyModel:
        def predict(self, input_data):
            return np.array([0])
    return DummyModel()

my_model = load_my_ai_model()

st.title("Simple AI Model Demo with Streamlit")
st.write("This is a basic Streamlit application to demonstrate how to build interactive web apps for AI models.")
st.markdown("---")

st.header("Input for Prediction")

input_feature_1 = st.slider("Feature 1 (e.g., Temperature)", 0.0, 100.0, 50.0)
input_feature_2 = st.number_input("Feature 2 (e.g., Humidity)", 0.0, 100.0, 25.0)
input_text = st.text_area("Enter some text (for NLP demo)", "This is a sample text.")

if st.button("Get Prediction"):
    input_data = np.array([[input_feature_1, input_feature_2]])
    prediction = my_model.predict(input_data)
    st.subheader("Prediction Result:")
    st.success(f"The model predicts: Class {prediction[0]}")
    st.info(f"Input text received: '{input_text}' (not used in dummy model prediction)")

st.markdown("---")
st.write("This application runs in your browser.")
"""

# Save the app to app.py
with open("app.py", "w") as f:
    f.write(streamlit_app_code)

print("✅ Streamlit app saved as 'app.py'")


✅ Streamlit app saved as 'app.py'


import streamlit as st
import numpy as np
import pandas as pd
import time

@st.cache\_resource
def load\_my\_ai\_model():
time.sleep(1) \# Simulate model loading time
class DummyModel:
def predict(self, input\_data):
\# A very simple dummy prediction logic
if input\_data[0, 0] \> 70: \# If Feature 1 is high
return np.array([1]) \# Predict Class 1
else:
return np.array([0]) \# Predict Class 0
return DummyModel()

my\_model = load\_my\_ai\_model()

st.set\_page\_config(layout="centered", page\_title="AI Model Demo")

st.title("Simple AI Model Demo with Streamlit")

st.write("This is a basic Streamlit application to demonstrate how to build interactive web apps for AI models.")
st.markdown("---")

st.header("Input for Prediction")

# Create interactive widgets for user input

input\_feature\_1 = st.slider("Feature 1 (e.g., Temperature)", 0.0, 100.0, 50.0, help="Slide to adjust the value for Feature 1.")
input\_feature\_2 = st.number\_input("Feature 2 (e.g., Humidity)", 0.0, 100.0, 25.0, help="Enter a numerical value for Feature 2.")
input\_text = st.text\_area("Enter some text (for NLP demo)", "This is a sample text.", help="Type any text here. This input is for demonstration and not used by the dummy model.")

# A button to trigger prediction

if st.button("Get Prediction", help="Click to get the AI model's prediction based on your inputs."):
\# Prepare input data for the dummy model
input\_data = np.array([[input\_feature\_1, input\_feature\_2]])

```
# Make prediction using the loaded dummy model
prediction = my_model.predict(input_data)

st.subheader("Prediction Result:")
if prediction[0] == 1:
    st.success(f"The model predicts: **Class 1 (High)**")
else:
    st.info(f"The model predicts: **Class 0 (Low)**")
st.write(f"Input text received: '{input_text}' (not used in dummy model prediction)")
```

st.markdown("---")
st.write("This application runs in your browser. Interact with the widgets and click 'Get Prediction'.")
st.caption("Model loading is cached for faster subsequent runs.")
"""

````
with open("streamlit_app.py", "w") as f:
    f.write(streamlit_app_code)

print("\nStreamlit app code saved to `streamlit_app.py`.")
print("To run this app in Colab, execute the following in a new cell:")
print("!streamlit run streamlit_app.py &>/dev/null&")
print("!nohup ngrok http 8501 &")
print("!sleep 2") # Give ngrok a moment to start
print("!curl -s http://localhost:4040/api/tunnels | grep -o 'https://[^/]*\\.ngrok\\.io'")
print("\nClick the ngrok URL generated to view your Streamlit app.")
```
````

  * **Exercise:**
    1.  What is the purpose of `@st.cache_resource` in the Streamlit example, and why is it particularly useful when deploying AI models?
    2.  Modify the `streamlit_app.py` code to include a `st.checkbox("Show raw input data")`. When checked, display the `input_data` array.

-----

### **7.2.3 Flask for AI Web APIs**

  * **Concept:**
    **Flask** is a lightweight Python web framework, often called a "microframework" because it provides the bare essentials for web development without imposing a rigid structure. It's highly flexible and commonly used for building web APIs (RESTful APIs) that serve predictions from AI models.

      * **Key Features:**

          * **Routes:** Define URL paths and the Python functions that handle requests to those paths.
          * **Request/Response:** Easily access incoming request data (e.g., JSON payload) and send back structured responses (e.g., JSON predictions).
          * **Templating (Jinja2):** (Optional for APIs, but useful for full web apps) Render HTML templates.
          * **Extensibility:** A rich ecosystem of extensions for databases, authentication, etc.

      * **Pros:**

          * High degree of flexibility and control over the application's structure.
          * Excellent for building RESTful APIs.
          * Lightweight and minimal overhead.
          * Good for integrating with other front-end technologies (React, Vue, etc.).

      * **Cons:**

          * Requires more boilerplate code compared to Streamlit for interactive UIs.
          * Requires basic understanding of HTTP methods (GET, POST), JSON, and web concepts.
          * Not designed for rapid interactive data exploration out-of-the-box.

  * **Colab Example:**
    We'll create a simple Flask API that loads our dummy AI model and provides an endpoint for predictions. You can then test this API using `curl` or by visiting the URL in a browser.

In [20]:
!pip install Flask pyngrok -q
# Save Flask app code to file
flask_app_code = """
from flask import Flask, request, jsonify
import numpy as np
import time

app = Flask(__name__)

def load_my_ai_model_flask():
    time.sleep(1)  # Simulate model loading
    class DummyModel:
        def predict(self, input_data):
            # Predict 1 if sum > 100, else 0
            return np.array([1]) if np.sum(input_data) > 100 else np.array([0])
    return DummyModel()

FLASK_AI_MODEL = load_my_ai_model_flask()

@app.route('/')
def home():
    return "Welcome to the Flask AI Model API! Use /predict endpoint."

@app.route('/predict', methods=['POST'])
def predict():
    if not request.is_json:
        return jsonify({"error": "Request must be JSON"}), 400

    data = request.get_json()

    if 'features' not in data or not isinstance(data['features'], list):
        return jsonify({"error": "Missing 'features' key or 'features' is not a list"}), 400

    try:
        input_features = np.array([data['features']], dtype=np.float32)
        if input_features.shape[1] != 2:
            return jsonify({"error": "Expected 2 features in the input list"}), 400

        prediction = FLASK_AI_MODEL.predict(input_features)
        return jsonify({"prediction": int(prediction[0])})

    except Exception as e:
        return jsonify({"error": f"Error processing request: {str(e)}"}), 500

if __name__ == '__main__':
    app.run()
"""

# Write to file
with open("flask_app.py", "w") as f:
    f.write(flask_app_code)

print("✅ Flask app saved as 'flask_app.py'")


✅ Flask app saved as 'flask_app.py'


from flask import Flask, request, jsonify
import numpy as np
import time
import os

app = Flask(**name**)

# Simulate loading a pre-trained model

# Use @lru\_cache or similar for actual model loading in production

# For simplicity, we'll load it once globally for this demo

def load\_my\_ai\_model\_flask\_internal():
time.sleep(1) \# Simulate loading time
class DummyModel:
def predict(self, input\_data):
\# Simple dummy logic: if sum of features \> 100, predict 1, else 0
if np.sum(input\_data) \> 100:
return np.array([1])
else:
return np.array([0])
return DummyModel()

FLASK\_AI\_MODEL = load\_my\_ai\_model\_flask\_internal()

@app.route('/')
def home():
return "Welcome to the Flask AI Model API\! Use /predict endpoint with POST request."

@app.route('/predict', methods=['POST'])
def predict():
if not request.is\_json:
return jsonify({"error": "Request must be JSON"}), 400

```
data = request.get_json()

if 'features' not in data or not isinstance(data['features'], list):
    return jsonify({"error": "Missing 'features' key or 'features' is not a list"}), 400

try:
    input_features = np.array([data['features']], dtype=np.float32)
    if input_features.shape[1] != 2:
        return jsonify({"error": "Expected 2 features in the input list"}), 400

    prediction = FLASK_AI_MODEL.predict(input_features)

    return jsonify({"prediction": int(prediction[0])})

except Exception as e:
    return jsonify({"error": f"Error processing request: {str(e)}"}), 500
```

if **name** == '**main**':
\# Run the Flask app
app.run(host='0.0.0.0', port=5000)
"""

````
with open("flask_app.py", "w") as f:
    f.write(flask_app_code)

print("\nFlask app code saved to `flask_app.py`.")
print("To run this API in Colab, execute the following in a new cell:")
print("!nohup flask run --host=0.0.0.0 --port=5000 &")
print("!nohup ngrok http 5000 &")
print("!sleep 2") # Give ngrok a moment to start
print("!curl -s http://localhost:4040/api/tunnels | grep -o 'https://[^/]*\\.ngrok\\.io'")
print("\nOnce the ngrok URL is generated, you can test the API using `curl` or Postman:")
print("Example `curl` command (replace URL with your ngrok URL):")
print("!curl -X POST -H 'Content-Type: application/json' -d '{\"features\": [60.5, 30.2]}' YOUR_NGROK_URL/predict")
```
````

  * **Exercise:**
    1.  In the Flask example, why is `FLASK_AI_MODEL = load_my_ai_model_flask()` placed outside the `predict()` function? What would be the performance implication if it were inside?
    2.  Modify the Flask `predict` endpoint to also accept a `user_id` in the JSON payload and include it in the response (e.g., `{"prediction": 0, "user_id": "abc"}`).

-----

### **7.2.4 Streamlit vs. Flask: When to Use Which**

  * **Concept:**
    The choice between Streamlit and Flask (or other frameworks like FastAPI, Django) largely depends on your project's requirements, your team's expertise, and the stage of development.

    | Feature / Aspect      | Streamlit                                       | Flask                                                         |
    | :-------------------- | :---------------------------------------------- | :------------------------------------------------------------ |
    | **Primary Use Case** | Interactive dashboards, demos, rapid prototyping, internal tools, data exploration | REST APIs, microservices, custom web applications, backend for front-ends |
    | **Ease of Use** | Very high (Python-only, minimal web dev knowledge) | Moderate (requires basic understanding of web concepts, HTML/CSS for UI) |
    | **Development Speed** | Extremely fast for simple UIs                   | Fast for APIs, moderate for full web apps                     |
    | **UI Customization** | Limited (uses built-in widgets and layout)      | High (full control over HTML, CSS, JavaScript)                |
    | **Backend Control** | Less direct control over HTTP requests/responses, single-threaded by default | Full control over routes, requests, responses, can be multi-threaded/asynchronous |
    | **Scalability** | Better for lower traffic, single-instance demos | Highly scalable for production APIs (with Gunicorn/Nginx/Docker) |
    | **State Management** | Reactive model (reruns script), `st.session_state`, `@st.cache` | Explicitly managed (e.g., global variables, databases, session objects) |
    | **Ideal for Data Scientists** | Yes, especially for quick sharing and visualization | Yes, for building API endpoints for models                    |
    | **Ideal for Web Developers** | Less common for traditional web dev           | Yes, good for building robust web services                    |

  * **Colab Explanation:**
  

In [21]:
print("\n--- 7.2.4 Streamlit vs. Flask: When to Use Which ---")

print("\n**Choose Streamlit if:**")
print("- You are a data scientist who wants to quickly build an interactive demo or dashboard.")
print("- You need to showcase your AI model's capabilities to non-technical stakeholders.")
print("- You don't want to write HTML, CSS, or JavaScript.")
print("- Your application is primarily for internal use, a proof-of-concept, or has low traffic.")
print("  *Example:* A quick app to visualize a classification model's decision boundary.")

print("\n**Choose Flask if:**")
print("- You need to build a robust API endpoint for your AI model that other applications will consume.")
print("- You require full control over the web application's structure, routing, and HTTP interactions.")
print("- You are comfortable with basic web development concepts (HTTP methods, JSON, potentially HTML/CSS/JS).")
print("- Your application needs to scale to handle high traffic or complex backend logic.")
print("  *Example:* An API for a mobile app to send an image and get an object detection result.")

print("\nBoth are powerful tools, but they cater to different needs in the AI deployment lifecycle.")
print("Often, data scientists use Streamlit for initial exploration and demos, and then transition to Flask (or FastAPI) for production-grade APIs.")


--- 7.2.4 Streamlit vs. Flask: When to Use Which ---

**Choose Streamlit if:**
- You are a data scientist who wants to quickly build an interactive demo or dashboard.
- You need to showcase your AI model's capabilities to non-technical stakeholders.
- You don't want to write HTML, CSS, or JavaScript.
- Your application is primarily for internal use, a proof-of-concept, or has low traffic.
  *Example:* A quick app to visualize a classification model's decision boundary.

**Choose Flask if:**
- You need to build a robust API endpoint for your AI model that other applications will consume.
- You require full control over the web application's structure, routing, and HTTP interactions.
- You are comfortable with basic web development concepts (HTTP methods, JSON, potentially HTML/CSS/JS).
- Your application needs to scale to handle high traffic or complex backend logic.
  *Example:* An API for a mobile app to send an image and get an object detection result.

Both are powerful tools, but 

* **Exercise:**

    1.  You are a data scientist who has developed a new algorithm for predicting stock prices. You want to quickly build a web interface where you can input stock features (e.g., historical prices, trading volume) and see the predicted price, along with some interactive charts. Which framework would you choose, and why?
    2.  Your company needs to integrate a sentiment analysis model into its customer support system. The support system is written in Java and needs to send customer chat messages to your model and receive a sentiment score. Which framework would you choose to deploy your Python-based sentiment model, and why?


## **7.3: Hosting on Hugging Face, Render, or Vercel**


### **7.3.1 Cloud Hosting for AI Models & Web Apps**

  * **Concept:**
    Cloud hosting refers to running your applications (including AI models wrapped in web apps or APIs) on remote servers managed by a cloud provider, rather than on your local machine.

      * **Why Cloud Hosting?**

          * **Global Accessibility:** Your application is available 24/7 from anywhere with an internet connection.
          * **Scalability:** Cloud platforms can automatically (or manually) adjust resources (CPU, RAM, GPU) to handle varying user loads, from a few users to millions.
          * **Reliability & Uptime:** Providers offer high availability, backups, and disaster recovery, ensuring your application is almost always online.
          * **Maintenance & Security:** The cloud provider handles infrastructure maintenance, security updates, and underlying hardware, freeing you to focus on your AI model and application logic.
          * **Cost-Effectiveness:** Pay-as-you-go models mean you only pay for the resources you consume, often more efficient than managing your own servers.

      * **Types of Cloud Services (Simplified):**

          * **Platform as a Service (PaaS):** Provides a complete environment (operating system, web server, database, etc.) for you to deploy your code without managing the underlying infrastructure. Examples: Render, Heroku.
          * **Serverless Functions (Function as a Service - FaaS):** You deploy individual functions, and the cloud provider runs them on demand, scaling automatically. You only pay when your function is executed. Examples: AWS Lambda, Google Cloud Functions, Vercel Serverless Functions.

  * **Colab Explanation:**

In [22]:
print("--- 7.3.1 Cloud Hosting for AI Models & Web Apps ---")

print("Cloud hosting is essential for making your AI models available to a wider audience or integrating them into other systems.")
print("It moves your application from your local machine to powerful, managed servers in the cloud.")

print("\nBenefits of Cloud Hosting:")
print("- **Always On:** Your app is available 24/7.")
print("- **Scales Automatically:** Handles more users without manual intervention.")
print("- **Less Maintenance:** The cloud provider manages the servers.")
print("- **Global Reach:** Accessible from anywhere in the world.")

print("\nWe'll look at three popular platforms for deploying Python AI applications:")
print("1. Hugging Face Spaces: Great for quick ML demos (especially Streamlit/Gradio).")
print("2. Render: A versatile platform for web services and APIs (good for Flask/FastAPI).")
print("3. Vercel: Excellent for static sites and serverless functions (can host Python APIs).")

--- 7.3.1 Cloud Hosting for AI Models & Web Apps ---
Cloud hosting is essential for making your AI models available to a wider audience or integrating them into other systems.
It moves your application from your local machine to powerful, managed servers in the cloud.

Benefits of Cloud Hosting:
- **Always On:** Your app is available 24/7.
- **Scales Automatically:** Handles more users without manual intervention.
- **Less Maintenance:** The cloud provider manages the servers.
- **Global Reach:** Accessible from anywhere in the world.

We'll look at three popular platforms for deploying Python AI applications:
1. Hugging Face Spaces: Great for quick ML demos (especially Streamlit/Gradio).
2. Render: A versatile platform for web services and APIs (good for Flask/FastAPI).
3. Vercel: Excellent for static sites and serverless functions (can host Python APIs).


* **Exercise:**

    1.  You have developed a machine learning model that predicts housing prices. Your goal is to allow real estate agents to use this model by entering property details into a web form. Why would hosting this model in the cloud be preferable to running it on your personal computer?
    2.  What is the main difference between a "Platform as a Service (PaaS)" and "Serverless Functions" in the context of deploying a simple AI prediction API?

-----

### **7.3.2 Hugging Face Spaces (for ML Demos & Models)**

  * **Concept:**
    Hugging Face Spaces is a platform specifically designed for hosting machine learning demos, models, and datasets. It integrates seamlessly with popular ML UI frameworks like Streamlit and Gradio, allowing data scientists to quickly deploy interactive applications directly from their code.

      * **Key Features:**

          * **Easy Deployment:** Connects directly to a Git repository (e.g., GitHub) and automatically builds/deploys your app.
          * **Framework Agnostic (but favors Streamlit/Gradio):** While you can use custom Dockerfiles, it has native support for Streamlit and Gradio.
          * **Integrated Ecosystem:** Part of the larger Hugging Face ecosystem, making it easy to use models from the Hugging Face Hub.
          * **Free Tier:** Offers a generous free tier for personal projects and demos.
          * **Community & Sharing:** Encourages sharing and collaboration within the ML community.

      * **Pros:**

          * Extremely fast for deploying interactive ML demos.
          * No need for complex web development setup.
          * Ideal for showcasing models and research.
          * Version control integrated via Git.

      * **Cons:**

          * Not a general-purpose web hosting platform (e.g., not for complex e-commerce sites).
          * Resource limits on the free tier (CPU, RAM, disk space).
          * Less control over the underlying infrastructure compared to IaaS (Infrastructure as a Service).

  * **Colab Example (Conceptual Setup):**
    We can't directly deploy to Hugging Face Spaces from Colab, but we can prepare the necessary files and understand the folder structure.

In [23]:
print("\n--- 7.3.2 Hugging Face Spaces (for ML Demos & Models) ---")

print("Hugging Face Spaces is like a playground for ML demos. It's super easy to get your Streamlit or Gradio app online.")

# --- Conceptual File Structure for a Hugging Face Space ---
# Your Hugging Face Space will typically be a Git repository.
# Here's what the root of that repository would look like for a Streamlit app:

# 1. `app.py` (Your Streamlit or Gradio application code)
streamlit_app_code_hf = """

SyntaxError: incomplete input (ipython-input-1757159545.py, line 10)

import streamlit as st
import numpy as np
import time

# Simulate loading a small AI model

@st.cache\_resource
def load\_simple\_classifier():
time.sleep(1) \# Simulate loading time
class SimpleClassifier:
def predict(self, input\_value):
if input\_value \> 0.5:
return "Positive"
else:
return "Negative"
return SimpleClassifier()

classifier = load\_simple\_classifier()

st.set\_page\_config(page\_title="HF Spaces Demo", layout="centered")
st.title("Hugging Face Spaces: Simple Classifier Demo")
st.write("Enter a value between 0 and 1 to get a classification.")

value = st.slider("Input Value", 0.0, 1.0, 0.5)

if st.button("Classify"):
prediction = classifier.predict(value)
st.success(f"The model predicts: **{prediction}**")

st.markdown("---")
st.write("This app is hosted on Hugging Face Spaces\!")
"""
\# Save this to a dummy file to show structure
with open("hf\_space\_app.py", "w") as f:
f.write(streamlit\_app\_code\_hf)
print("\\n`hf_space_app.py` (your Streamlit app) created.")

```
# 2. `requirements.txt` (List of Python dependencies)
requirements_hf = """
```

streamlit
numpy
tensorflow \# if you use a TF model

# Add any other libraries your app needs (e.g., scikit-learn, pandas)

```
"""
with open("hf_space_requirements.txt", "w") as f:
    f.write(requirements_hf)
print("`hf_space_requirements.txt` created.")


# 3. `README.md` (Optional, but good for description)
readme_hf = """
```

-----

## title: My Simple Classifier emoji: 🚀 colorFrom: blue colorTo: purple sdk: streamlit sdk\_version: 1.20.0 app\_file: hf\_space\_app.py pinned: false

# My Simple Classifier Demo

This is a basic Streamlit application deployed on Hugging Face Spaces.
It demonstrates a simple classification model.
"""
with open("hf\_space\_README.md", "w") as f:
f.write(readme\_hf)
print("`hf_space_README.md` created (for Space configuration).")

````
print("\nTo deploy to Hugging Face Spaces:")
print("1. Create a new Space on Hugging Face (https://huggingface.co/spaces/new).")
print("2. Choose 'Streamlit' or 'Gradio' SDK.")
print("3. Connect your Git repository (e.g., GitHub) or upload these files directly.")
print("4. Push `hf_space_app.py`, `hf_space_requirements.txt`, and `hf_space_README.md` to your Space's Git repo.")
print("Hugging Face will automatically build and deploy your app!")
```
````

  * **Exercise:**
    1.  You have a Python script that uses `matplotlib` to generate a complex data visualization. You want to share this interactive visualization with your colleagues using Hugging Face Spaces. What two main files would you need in your Git repository for the Space?
    2.  What is the primary benefit of using Hugging Face Spaces for a data scientist who wants to quickly showcase a new AI model, compared to setting up a traditional web server?

-----

### **7.3.3 Render (for Web Services & APIs)**

  * **Concept:**
    **Render** is a unified cloud platform that allows you to host web applications, APIs, static sites, databases, and more, directly from your Git repository. It's a Platform-as-a-Service (PaaS) that abstracts away much of the infrastructure complexity, making it a good choice for deploying Flask or FastAPI-based AI APIs.

      * **Key Features:**

          * **Git-based Deployment:** Automatically deploys new versions whenever you push changes to your Git repository.
          * **Multiple Service Types:** Supports web services (for Flask/FastAPI), static sites, background workers, databases, etc.
          * **Auto-scaling:** Can automatically scale your web services up or down based on traffic.
          * **Custom Domains & SSL:** Easy to connect your own domain and get free SSL certificates.
          * **Environment Variables:** Securely manage API keys and other sensitive information.
          * **Build & Runtime Logs:** Provides detailed logs for debugging.

      * **Pros:**

          * Versatile for various backend services, not just ML demos.
          * Good for production-ready APIs with more control than Streamlit Cloud.
          * Simplified deployment compared to raw cloud VMs.
          * Generous free tier for static sites, and a free tier for web services (with some limitations like spin-down).

      * **Cons:**

          * Can be more complex to configure than Hugging Face Spaces for simple demos.
          * Free tier web services spin down after inactivity, leading to cold starts.
          * Less focused on ML-specific features compared to Hugging Face.

  * **Colab Example (Conceptual Setup):**
    We'll prepare the files for a simple Flask API that could be deployed on Render.

In [24]:
print("\n--- 7.3.3 Render (for Web Services & APIs) ---")

print("Render is a great PaaS for deploying Flask or FastAPI APIs. It handles the server setup for you.")

# --- Conceptual File Structure for a Render Web Service ---
# Your Render project will typically be a Git repository.
# Here's what the root of that repository would look like for a Flask API:

# 1. `app.py` (Your Flask API code)
flask_api_code_render = """

SyntaxError: incomplete input (ipython-input-282051235.py, line 10)

from flask import Flask, request, jsonify
import numpy as np
import os
import time

app = Flask(**name**)

# Simulate loading a model (should be done once globally)

# In a real app, you'd load your actual TensorFlow/Keras model here.

# Use @lru\_cache for more robust caching if needed.

def load\_model\_for\_render():
time.sleep(2) \# Simulate heavy model loading
class RenderDummyModel:
def predict(self, data):
\# Simple logic: if input is even, predict 'A', else 'B'
if len(data) \> 0 and data[0] % 2 == 0:
return "A"
else:
return "B"
return RenderDummyModel()

# Load the model when the app starts

RENDER\_MODEL = load\_model\_for\_render()
print("Render Flask app: Dummy model loaded.")

@app.route('/')
def home():
return "Render Flask AI API is running\! Use /predict endpoint."

@app.route('/predict', methods=['POST'])
def predict():
if not request.is\_json:
return jsonify({"error": "Request must be JSON"}), 400

```
data = request.get_json()
input_value = data.get('value')

if input_value is None or not isinstance(input_value, (int, float)):
    return jsonify({"error": "Please provide a 'value' (integer or float) in the JSON body."}), 400

try:
    prediction = RENDER_MODEL.predict(input_value)
    return jsonify({"input_value": input_value, "prediction": prediction})
except Exception as e:
    return jsonify({"error": f"Prediction failed: {str(e)}"}), 500
```

if **name** == '**main**':
\# Render will use Gunicorn or a similar WSGI server in production,
\# so this **main** block is mostly for local testing.
app.run(host='0.0.0.0', port=os.environ.get('PORT', 5000))
"""
with open("render\_app.py", "w") as f:
f.write(flask\_api\_code\_render)
print("\\n`render_app.py` (your Flask API) created.")

```
# 2. `requirements.txt` (List of Python dependencies)
requirements_render = """
```

Flask
numpy
gunicorn \# Recommended for production WSGI server

# Add any other libraries your API needs (e.g., tensorflow, scikit-learn)

```
"""
with open("render_requirements.txt", "w") as f:
    f.write(requirements_render)
print("`render_requirements.txt` created.")


# 3. `render.yaml` (Optional, for more complex configurations)
# This file tells Render how to build and deploy your service.
# For a simple Flask app, Render can often auto-detect, but it's good practice.
render_yaml_config = """
```

# render.yaml

# This file tells Render how to build and deploy your service.

services:

  - type: web
    name: my-flask-ai-api
    env: python
    buildCommand: "pip install -r render\_requirements.txt"
    startCommand: "gunicorn render\_app:app" \# Use gunicorn to run your Flask app
    envVars:

      - key: PYTHON\_VERSION
        value: 3.9.12 \# Specify Python version (optional but good practice)
        """
        with open("render.yaml", "w") as f:
        f.write(render\_yaml\_config)
        print("`render.yaml` (Render configuration) created.")

    print("\\nTo deploy to Render:")
    print("1. Create a Render account (https://render.com/).")
    print("2. Create a new 'Web Service'.")
    print("3. Connect your Git repository (e.g., GitHub) containing these files.")
    print("4. Render will detect the `render.yaml` or you can manually configure build/start commands.")
    print("5. Deploy the service. Render will provide a public URL for your API.")

    ```
    
    ```

<!-- end list -->

  * **Exercise:**
    1.  What is the purpose of `gunicorn` in the `startCommand` for a Flask application deployed on Render?
    2.  If your Flask API needs to access a sensitive API key for an external service, how would you typically manage this securely when deploying to Render?

-----

### **7.3.4 Vercel (for Frontend & Serverless Functions)**

  * **Concept:**
    **Vercel** is a cloud platform for frontend developers, best known for hosting static sites and serverless functions. While it's primarily associated with JavaScript frameworks like Next.js and React, it also supports Python for serverless functions, making it possible to deploy lightweight AI APIs.

      * **Key Features:**

          * **Zero Configuration:** Often deploys directly from Git with minimal setup.
          * **Serverless Functions:** Run backend code (including Python) on demand, scaling automatically to zero when idle.
          * **Global Edge Network (CDN):** Fast content delivery worldwide.
          * **Automatic SSL & CI/CD:** Built-in SSL and continuous deployment from Git.
          * **Generous Free Tier:** Excellent for static sites and serverless functions (with limits on execution time, memory, etc.).

      * **Pros:**

          * Incredibly fast deployment and updates.
          * Generous free tier for serverless functions.
          * Excellent developer experience (DX).
          * Ideal for small, stateless AI inference tasks that don't require heavy computation or long cold starts.

      * **Cons:**

          * Python serverless functions can experience "cold starts" (initial delay when function is invoked after inactivity).
          * Not suitable for large AI models that require significant memory or GPU, or long-running processes.
          * Primarily frontend-focused; more complex for full-stack Python applications.
          * Stateless nature means model loading needs careful handling (e.g., loading outside handler or using specific serverless patterns).

  * **Colab Example (Conceptual Setup):**
    We'll prepare the files for a simple Python serverless function that Vercel can deploy.

In [25]:
print("\n--- 7.3.4 Vercel (for Frontend & Serverless Functions) ---")

print("Vercel is great for fast frontends and serverless functions. You can use it to host lightweight Python AI APIs.")

# --- Conceptual File Structure for a Vercel Serverless Function ---
# Vercel looks for an `api` directory at the root of your Git repository.
# Inside `api`, each Python file becomes a serverless function.

# 1. `api/predict.py` (Your Python Serverless Function)
# Vercel expects a `handler` function or a Flask/FastAPI app object.
# For simplicity, we'll use a basic Flask app as Vercel can detect it.
vercel_api_code = """

SyntaxError: incomplete input (ipython-input-450912643.py, line 12)

from flask import Flask, request, jsonify
import numpy as np
import time
import os

app = Flask(**name**)

# Model loading outside the request handler to minimize cold start impact

# This code runs once when the serverless function "wakes up"

def load\_model\_for\_vercel():
time.sleep(1) \# Simulate model loading
class VercelDummyModel:
def predict(self, input\_data):
\# Simple logic: if input is greater than 50, predict 'High', else 'Low'
if input\_data \> 50:
return "High"
else:
return "Low"
return VercelDummyModel()

VERCEL\_MODEL = load\_model\_for\_vercel()
print("Vercel Serverless Function: Dummy model loaded.") \# This will appear in Vercel logs

@app.route('/predict', methods=['POST'])
def predict():
if not request.is\_json:
return jsonify({"error": "Request must be JSON"}), 400

```
data = request.get_json()
input_value = data.get('value')

if input_value is None or not isinstance(input_value, (int, float)):
    return jsonify({"error": "Please provide a 'value' (integer or float) in the JSON body."}), 400

try:
    prediction = VERCEL_MODEL.predict(input_value)
    return jsonify({"input_value": input_value, "prediction": prediction})
except Exception as e:
    return jsonify({"error": f"Prediction failed: {str(e)}"}), 500
```

# Vercel will automatically detect this Flask app object and serve it.

# No need for app.run() in serverless context.

```
"""
# Create the `api` directory
os.makedirs("api", exist_ok=True)
with open("api/predict.py", "w") as f:
    f.write(vercel_api_code)
print("\n`api/predict.py` (your Python Serverless Function) created.")


# 2. `requirements.txt` (List of Python dependencies for serverless functions)
# This should be in the same directory as the function (e.g., `api/requirements.txt`)
vercel_requirements = """
```

Flask
numpy

# Add any other libraries your function needs (e.g., tensorflow, scikit-learn)

```
"""
with open("api/requirements.txt", "w") as f:
    f.write(vercel_requirements)
print("`api/requirements.txt` created.")


# 3. `vercel.json` (Optional, for Vercel project configuration)
# This file goes in the root of your repository.
vercel_config = """
```

{
"version": 2,
"builds": [
{
"src": "api/*.py",
"use": "@vercel/python"
}
],
"routes": [
{
"src": "/api/(.*)",
"dest": "/api/$1"
}
]
}
"""
with open("vercel.json", "w") as f:
f.write(vercel\_config)
print("`vercel.json` (Vercel project configuration) created.")

````
print("\nTo deploy to Vercel:")
print("1. Create a Vercel account (https://vercel.com/).")
print("2. Create a new project and connect your Git repository (e.g., GitHub) containing these files.")
print("3. Vercel will automatically detect the `api` directory and deploy your serverless functions.")
print("4. Access your function at `YOUR_VERCEL_URL/api/predict` (for POST requests).")
```
````

  * **Exercise:**
    1.  In the Vercel example, why is the `VERCEL_MODEL = load_model_for_vercel()` line placed outside the `predict()` function, and what is the benefit of this placement in a serverless environment?
    2.  What is a "cold start" in the context of serverless functions, and why might it be a concern for AI models deployed on Vercel?

-----

### **7.3.5 Choosing the Right Platform**

  * **Concept:**
    The best platform depends on your specific needs:

      * **Hugging Face Spaces:**

          * **Best for:** Rapid prototyping, interactive ML demos, showcasing models, sharing within the ML community.
          * **Strengths:** Simplest setup for Streamlit/Gradio, integrated with Hugging Face Hub, free.
          * **Limitations:** Not for complex web apps, resource constraints for heavy models/traffic.

      * **Render:**

          * **Best for:** Production-ready web services and APIs (Flask, FastAPI), full-stack applications, background jobs.
          * **Strengths:** Easy Git-based deployment, auto-scaling, custom domains, good for general backend services.
          * **Limitations:** Free tier web services spin down, might be overkill for very simple demos.

      * **Vercel:**

          * **Best for:** Static sites, front-end heavy applications with lightweight backend APIs (via serverless functions).
          * **Strengths:** Incredible developer experience, fast global CDN, generous free tier for serverless.
          * **Limitations:** Python serverless functions can have cold starts, not suitable for large, memory-intensive AI models or long-running computations.

  * **Summary Table:**

    | Feature / Aspect       | Hugging Face Spaces                 | Render                                | Vercel                                  |
    | :--------------------- | :---------------------------------- | :------------------------------------ | :-------------------------------------- |
    | **Primary Use** | ML Demos, Prototypes                | Web Services, APIs, Full-stack Apps   | Static Sites, Frontend + Serverless APIs |
    | **Ease of ML Demo** | Very High (Streamlit/Gradio native) | Moderate (requires Flask/FastAPI setup) | Moderate (Python serverless functions)  |
    | **Backend Flexibility**| Limited (focused on UI frameworks)  | High (any Python web framework)       | Moderate (serverless functions only)    |
    | **Scalability** | Good for demos, limited for high traffic | High (auto-scaling)                   | High (serverless, scales to zero)       |
    | **Cost (Free Tier)** | Generous Free Tier                  | Generous Free Tier (web services spin down) | Generous Free Tier (serverless)         |
    | **Ideal Model Size** | Small to Medium                     | Any (depending on paid plan)           | Small (due to serverless limits)        |
    | **Cold Starts** | Possible                            | Yes (on free tier web services)       | Yes (for serverless functions)          |
    | **Target Audience** | Data Scientists, ML Researchers     | Developers, Startups                  | Frontend Developers                     |

  * **Colab Explanation:**

In [26]:
print("\n--- 7.3.5 Choosing the Right Platform ---")

print("The best platform for your AI model deployment depends on your specific needs:")

print("\nIf you want to quickly share an interactive ML demo or research prototype:")
print("-> **Hugging Face Spaces** is likely your best bet.")

print("\nIf you need a robust, scalable API for your AI model to be consumed by other applications or a custom frontend:")
print("-> **Render** is a strong contender.")

print("\nIf you are building a modern web application with a strong frontend and need lightweight backend logic for AI inference:")
print("-> **Vercel** with Python Serverless Functions can be a powerful choice.")

print("\nConsider your model's size, traffic expectations, development time, and your team's expertise when making your decision.")


--- 7.3.5 Choosing the Right Platform ---
The best platform for your AI model deployment depends on your specific needs:

If you want to quickly share an interactive ML demo or research prototype:
-> **Hugging Face Spaces** is likely your best bet.

If you need a robust, scalable API for your AI model to be consumed by other applications or a custom frontend:
-> **Render** is a strong contender.

If you are building a modern web application with a strong frontend and need lightweight backend logic for AI inference:
-> **Vercel** with Python Serverless Functions can be a powerful choice.

Consider your model's size, traffic expectations, development time, and your team's expertise when making your decision.


* **Exercise:**

    1.  You have a large language model (LLM) that requires 16GB of RAM to run inference. You want to host this model as an API. Which of the three platforms (Hugging Face Spaces, Render, Vercel) would be the *least* suitable for this, especially on a free tier, and why?
    2.  Your team is developing a new feature for your e-commerce website where users can upload product images, and an AI model will automatically tag them with categories. The website's frontend is built with Next.js. Which hosting platform would offer the most seamless integration for this scenario, and why?
