# Tensorflow in Production

| Date | User | Change Type | Remarks |  
| ---- | ---- | ----------- | ------- |
| 14/03/2025   | Martin | Created   | Created notebook for Tensorflow in Production | 

# Content

* [Introduction](#introduction)
* [Visualising Graphs in TensorBoard](#visualising-graphs-in-tensorboard)
* [Tuning Hyperparameters with Tensorboard HParams](#tuning-hyperparameters-with-tensorboard-hparams)


# Introduction

Tensorboard is a tool to visualise summary metrics, graphs and images while model is training.

Learn how to write code for production focusing on unit tests, training distribution across multiple processing units, and efficient model saving and loading.

How to serve models via REST endpoints

__Content__

* Visualizing graphs in TensorBoard
* Managing hyperparameter tuning with TensorBoard's HParams
* Implementing unit tests using `tf.test`
* Using multiple executors
* Parallelizing TensorFlow using `tf.distribute.strategy`
* Saving and restoring a TensorFlow model
* Using TensorFlow Serving

---

# Visualising Graphs in TensorBoard

Learn how to use TensorBoards callbacks to monitor numerical values, histogram of sets of values and images in TensorBoard

Reusing the model in Ch8 - CNN

In [19]:
%load_ext tensorboard
%load_ext watermark
import tensorflow as tf
import numpy as np
import datetime

## Build the MINST model

In [2]:
# Load data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshape here separetes each individual pixel by adding 1 dimension
# This ensure that each pixel is 1 channel: 0-255 greyscale
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

# Padding the images by 2 pixels since the paper input images were 32x32
x_train = np.pad(x_train, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')
x_test = np.pad(x_test, ((0, 0), (2, 2), (2, 2), (0, 0)), 'constant')

# Normalise all the channel values
x_train = x_train / 255
x_test = x_test / 255

# Set model parameters
image_width = x_train[0].shape[0]
image_height = x_train[0].shape[1]
num_channels = 1 # indicates greyscale

# Hyperparameters
BATCH_SIZE = 100
EVAL_SIZE = 500
EPOCHS = 300
EVAL_EVERY = 5

# Set seed
seed = 98
np.random.seed(seed)
tf.random.set_seed(seed)

input_data = tf.keras.Input(
  dtype=tf.float32,
  shape=(image_width, image_height, num_channels),
  name="INPUT"
)

# First Conv-ReLU-MaxPool Layer
conv1 = tf.keras.layers.Conv2D(
  filters=6,
  kernel_size=5,
  padding='valid',
  activation='relu',
  name='C1'
)(input_data)

max_pool1 = tf.keras.layers.MaxPool2D(
  pool_size=2,
  strides=2,
  padding='same',
  name="S1"
)(conv1)

# Second Conv-ReLU-MaxPool Layer
conv2 = tf.keras.layers.Conv2D(
  filters=16,
  kernel_size=5,
  padding='valid',
  strides=1,
  activation='relu',
  name='C2'
)(max_pool1)

max_pool2 = tf.keras.layers.MaxPool2D(
  pool_size=2,
  strides=2,
  padding='same',
  name="S2"
)(conv2)

# Flatten Layer
flatten = tf.keras.layers.Flatten(name="Flatten")(max_pool2)

# DNN for classification
fully_connected1 = tf.keras.layers.Dense(
  units=120,
  activation='relu',
  name='FC1'
)(flatten)

fully_connected2 = tf.keras.layers.Dense(
  units=84,
  activation='relu',
  name='FC2'
)(fully_connected1)

output = tf.keras.layers.Dense(
  units=10,
  activation='softmax',
  name='OUTPUT'
)(fully_connected2)

model = tf.keras.Model(inputs=input_data, outputs=output)

I0000 00:00:1741938195.947671   68970 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9558 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4070, pci bus id: 0000:01:00.0, compute capability: 8.9


In [3]:
model.compile(
  optimizer='adam',
  loss='sparse_categorical_crossentropy',
  metrics=['accuracy']
)

In [4]:
model.summary()

## Add TensorBoard callbacks

Create a directory for the timestamped subdirectories for each run

Then instantiate the TensorBoard callback before passing it to the model training. All logs for the training phase will be stored in the `logs` directory

In [26]:
log_dir = "logs/experiement" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

In [None]:
tensorboard_callback = tf.keras.callbacks.TensorBoard(
  log_dir=log_dir,    # directory where experiments will be stored
  write_images=True,  # write model weights to visualise as image in Tensorboard
  histogram_freq=1    # frequency to compute weigh historgrams for layers in model
  # update_freq='batch' # changes how often Tensorboard tracks the metrics 
)

model.fit(
  x=x_train,
  y=y_train,
  epochs=5,
  validation_data=(x_test, y_test),
  callbacks=[tensorboard_callback],
  verbose=1
)

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.9970 - loss: 0.0091 - val_accuracy: 0.9886 - val_loss: 0.0629
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.9976 - loss: 0.0083 - val_accuracy: 0.9903 - val_loss: 0.0523
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9975 - loss: 0.0078 - val_accuracy: 0.9893 - val_loss: 0.0530
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.9974 - loss: 0.0073 - val_accuracy: 0.9881 - val_loss: 0.0571
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.9983 - loss: 0.0062 - val_accuracy: 0.9886 - val_loss: 0.0755


<keras.src.callbacks.history.History at 0x7fe8547895a0>

## Start TensorBoard

2 methods to starting tensorboard:

1. Running command: `tensorboard --logdir="logs" --port 6007` then navigating to the localhost link
2. `%tensorboard --logdir="logs"` in notebook

In [32]:
%tensorboard --logdir="logs"

Reusing TensorBoard on port 6006 (pid 80606), started 0:03:47 ago. (Use '!kill 80606' to kill it.)

__Sections__

* __Time Series__ - A summary of all the metrics that were tracked
* __Scalars__ - Show how the loss and metrics change with each epoch. Also contains details of hyperparameters (other scalar values)
* __Graphs__ - Visualise the model (like `model.summary()`). Ensure that the graph is built correctly
* __Histogram & Distribution__ - Shows the distribution of a Tensor over time. Visualise weights and biases to verify their training process

Possible to measure custom types of data (e.g images, embeddings)

## Visualising the Tensorflow model

Views allow quick examination of intended design and how Tensorflow understands the model structure

Op-level graph and conceptual graph that displays only the Keras model wihtout extra edges to other computation nodes

In [30]:
from tensorboard import notebook
notebook.list() # View open TensorBoard instances

Known TensorBoard instances:
  - port 6006: logdir logs (started 0:00:08 ago; pid 80606)


In [None]:
# Control TensorBoard display. If no port is provided, 
# the most recently launched TensorBoard is used
notebook.display(port=6006, height=1000)

In [None]:
# To kill a session
!kill <pid>

## Using file writer

Create a FileWriter for the timestamped log directory and write the top 10 images

`tf.summary` can be used to write summary data to be visualised in Tensorflow.

⚠️ __ALERT:__ Be carefule of writing image summaries too often to TensorBoard. Tends to eat up disk space very quickly

In [31]:
file_writer = tf.summary.create_file_writer(log_dir)
with file_writer.as_default():
  # Reshape the images and write the image summary
  images = np.reshape(x_train[0:10], (-1, 32, 32, 1))
  tf.summary.image("10 training data examples", images, max_outputs=10, step=0)

---

# Tuning Hyperparameters with Tensorboard HParams

HParams is a TensorBoard plugin to test and find the best combination of hyperparameters

Use sequential model of MNIST and configure HParams to compare several hyperparameter combinations

In [1]:
import tensorflow as tf
import numpy as np
import datetime

from tensorboard.plugins.hparams import api as hp

2025-03-14 16:44:01.514025: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-03-14 16:44:02.246948: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741941843.012884     756 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741941843.264250     756 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-14 16:44:05.142995: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instr

In [2]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize
x_train = x_train / 255
x_test = x_test/ 255

## Define hyperparameters to test

In [4]:
HP_ARCHITECTURE_NN = hp.HParam('archi_nn', hp.Discrete(['128,64', '256,128']))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.0, 0.1))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))

## Build the model

In [17]:
def train_model(hparams, experiment_run_log_dir):
  nb_units = list(map(int, hparams[HP_ARCHITECTURE_NN].split(',')))

  # Define the model
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten(name="FLATTEN"))
  model.add(tf.keras.layers.Dense(units=nb_units[0], activation='relu', name='D1'))
  model.add(tf.keras.layers.Dropout(hparams[HP_DROPOUT], name='DROPOUT'))
  model.add(tf.keras.layers.Dense(units=nb_units[1], activation='relu', name='D2'))
  model.add(tf.keras.layers.Dense(units=10, activation='softmax', name='OUTPUT'))

  model.compile(
    optimizer=hparams[HP_OPTIMIZER],
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
  )

  # Define callbacks both for tensorboard and hparams
  tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=experiment_run_log_dir)
  hparams_callback = hp.KerasCallback(experiment_run_log_dir, hparams)

  # Train the model
  model.fit(
    x=x_train,
    y=y_train,
    epochs=5,
    validation_data=(x_test, y_test),
    callbacks=[tensorboard_callback, hparams_callback]
  )

In [18]:
for archi_nn in HP_ARCHITECTURE_NN.domain.values:
  for optimizer in HP_OPTIMIZER.domain.values:
    for dropout in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
      hparams = {
        HP_ARCHITECTURE_NN: archi_nn,
        HP_OPTIMIZER: optimizer,
        HP_DROPOUT: dropout
      }
      
      experiment_run_log_dir = "hparam_logs/experiment" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

      train_model(
        hparams,
        experiment_run_log_dir
      )

I0000 00:00:1741943135.713630     756 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9558 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4070, pci bus id: 0000:01:00.0, compute capability: 8.9


Epoch 1/5


I0000 00:00:1741943138.737201    8056 service.cc:148] XLA service 0x7f10a40050e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1741943138.738457    8056 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce RTX 4070, Compute Capability 8.9
2025-03-14 17:05:38.921751: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
I0000 00:00:1741943139.187283    8056 cuda_dnn.cc:529] Loaded cuDNN version 90300
I0000 00:00:1741943140.340677    8056 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 3ms/step - accuracy: 0.8776 - loss: 0.4164 - val_accuracy: 0.9612 - val_loss: 0.1245
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9673 - loss: 0.1044 - val_accuracy: 0.9722 - val_loss: 0.0895
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 1ms/step - accuracy: 0.9790 - loss: 0.0665 - val_accuracy: 0.9753 - val_loss: 0.0792
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.9840 - loss: 0.0533 - val_accuracy: 0.9772 - val_loss: 0.0778
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.9883 - loss: 0.0376 - val_accuracy: 0.9762 - val_loss: 0.0790
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8613 - loss: 0.4671 - val_accuracy: 0.9617 - val_loss: 0.1199
Epoch 2/5
[1m1875/1875[0m [32




[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 2ms/step - accuracy: 0.7443 - loss: 1.0026 - val_accuracy: 0.9106 - val_loss: 0.3093
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9123 - loss: 0.3060 - val_accuracy: 0.9296 - val_loss: 0.2499
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9298 - loss: 0.2446 - val_accuracy: 0.9400 - val_loss: 0.2089
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.9410 - loss: 0.2089 - val_accuracy: 0.9484 - val_loss: 0.1799
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.9476 - loss: 0.1815 - val_accuracy: 0.9541 - val_loss: 0.1630
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.6543 - loss: 1.1748 - val_accuracy: 0.9044 - val_loss: 0.3305
Epoch 2/5
[1m1875/1875[0m [32





[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 3ms/step - accuracy: 0.8920 - loss: 0.3663 - val_accuracy: 0.9694 - val_loss: 0.1000
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9723 - loss: 0.0869 - val_accuracy: 0.9732 - val_loss: 0.0845
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9826 - loss: 0.0537 - val_accuracy: 0.9756 - val_loss: 0.0720
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9880 - loss: 0.0383 - val_accuracy: 0.9757 - val_loss: 0.0848
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.9904 - loss: 0.0293 - val_accuracy: 0.9777 - val_loss: 0.0817
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 3ms/step - accuracy: 0.8892 - loss: 0.3747 - val_accuracy: 0.9680 - val_loss: 0.1040
Epoch 2/5
[1m1875/1875[0m [32




[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 2ms/step - accuracy: 0.7443 - loss: 1.0023 - val_accuracy: 0.9148 - val_loss: 0.3052
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 2ms/step - accuracy: 0.9178 - loss: 0.2924 - val_accuracy: 0.9324 - val_loss: 0.2413
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.9323 - loss: 0.2385 - val_accuracy: 0.9417 - val_loss: 0.2081
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 2ms/step - accuracy: 0.9424 - loss: 0.2050 - val_accuracy: 0.9482 - val_loss: 0.1803
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 2ms/step - accuracy: 0.9503 - loss: 0.1755 - val_accuracy: 0.9530 - val_loss: 0.1643
Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 3ms/step - accuracy: 0.7260 - loss: 1.0155 - val_accuracy: 0.9179 - val_loss: 0.3010
Epoch 2/5
[1m1875/1875[0m [32

In [23]:
%tensorboard --logdir="hparam_logs"

Reusing TensorBoard on port 6007 (pid 18098), started 0:00:08 ago. (Use '!kill 18098' to kill it.)

## HParams section

Visualise the results of each run in the table view. Panel of the left shows the different filters for varied metrics

__Parallel Coordinates__ - More detailed section for each hyperparameter decision that was made across the runs. Selecting a run provides more details about the run itself

In [3]:
%watermark

Last updated: 2025-03-13T18:14:08.783191+08:00

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 8.33.0

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 5.15.167.4-microsoft-standard-WSL2
Machine     : x86_64
Processor   : x86_64
CPU cores   : 20
Architecture: 64bit

