# 🚀 Chapter 19: Training & Deploying TensorFlow Models at Scale

This notebook provides hands-on, practical examples for serving, deploying, and scaling TensorFlow models. Follow along and adapt the code for your projects!

## I. Serving a TensorFlow Model

### A. Using **TensorFlow Serving**

First, save your trained model in the SavedModel format:

In [None]:
# Save the model in SavedModel format
import tensorflow as tf

# Example: create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu', input_shape=(20,)),
    tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam', loss='mse')

# Save the model
model.save("my_model")

Next, run TensorFlow Serving using Docker:

```bash
# Launch TensorFlow Serving container
docker run -p 8501:8501 \
  --mount type=bind,source=$(pwd)/my_model,target=/models/my_model \
  -e MODEL_NAME=my_model -t tensorflow/serving
```

Once the server is running, send a prediction request via `curl`. Replace `[...]` with your input data:

In [None]:
curl -d '{"instances": [[1.0, 2.0, 3.0, ..., 20.0]]}' \
     -X POST http://localhost:8501/v1/models/my_model:predict

### B. Using **GCP AI Platform Prediction** (Optional)

You can deploy models to Google Cloud AI Platform for scalable serving. Here's a brief outline:

```bash
# Create model on GCP
gcloud ai-platform models create my_model

# Create a version with your model files in Google Cloud Storage
gcloud ai-platform versions create v1 \
  --model=my_model \
  --origin=gs://my_bucket/my_model/ \
  --runtime-version=2.8 \
  --python-version=3.7
```

Then, send prediction requests using REST API or `gcloud` commands.

## II. Deploying to Mobile & Embedded Devices

Use **TensorFlow Lite** to run models on mobile or embedded hardware.

In [None]:
# Convert SavedModel to TFLite
import tensorflow as tf

# Load your saved model
saved_model_dir = "my_model"
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
tflite_model = converter.convert()

# Save the TFLite model
with open("model.tflite", "wb") as f:
    f.write(tflite_model)
print("TFLite model saved as 'model.tflite'")

You can now load and run inference with the TFLite model in your mobile or embedded app using the TFLite Interpreter.

## III. Using GPUs to Speed Up Computations

### A. Ensuring GPU Availability

Check if TensorFlow detects GPUs:

In [None]:
import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f"GPU Available: {len(gpus)}")
else:
    print("No GPU detected")

GPU Available: True


### B. Managing GPU Memory Growth

Prevent TensorFlow from allocating all GPU memory upfront:

In [None]:
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    tf.config.experimental.set_memory_growth(gpus[0], True)
    print("Memory growth set for GPU")

### C. Placing Ops on GPU

Verify GPU utilization by placing operations explicitly:

In [None]:
import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
    with tf.device('/GPU:0'):
        a = tf.random.uniform((1000, 1000))
        b = tf.matmul(a, a)
        print(f"Running on device: {b.device}")

Running on device: /device:GPU:0


### D. Automatic Device Placement & Parallelism

TensorFlow automatically distributes operations across available GPUs when using strategies like `tf.distribute.MirroredStrategy`.

## IV. Training Models Across Multiple Devices

### A. Model vs Data Parallelism

- **Model Parallelism**: split a complex model across devices.
- **Data Parallelism**: replicate the model across devices and distribute data to train faster.

### B. Using `tf.distribute.Strategy` for Data Parallelism

Here's an example using `MirroredStrategy`:

In [None]:
strategy = tf.distribute.MirroredStrategy()

def build_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, activation='relu', input_shape=(20,)),
        tf.keras.layers.Dense(1)
    ])
    return model

with strategy.scope():
    model = build_model()
    model.compile(optimizer='adam', loss='mse')

# Prepare dummy data
import numpy as np
x_train = np.random.rand(1000, 20)
y_train = np.random.rand(1000, 1)

# Train the model
model.fit(x_train, y_train, epochs=5)

### C. Distributed Training on Cloud (GCP AI Platform)

For large-scale training, configure a multi-worker setup and submit jobs via GCP AI Platform. Use `tf.distribute.MultiWorkerMirroredStrategy()` within your training script for multi-node training.

## V. 🛠️ Exercises

1. Containerize and deploy a TensorFlow model using Docker + TF Serving.
2. Convert a CNN model to TFLite and run inference in Python.
3. Write a TensorFlow script that places ops on GPU and verifies GPU utilization.
4. Adapt an existing `model.fit()` script to use `tf.distribute.MirroredStrategy()`.
5. Submit and monitor a distributed training job on GCP AI Platform, visualizing logs in TensorBoard.

## VI. 🙏 Thank You!

This notebook equips you with the core tools to serve, scale, and maintain deep learning models—from edge devices to cloud-scale deployments.