# TensorBoard Profiler

Source: https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras

In this notebook, we'll see how we can use TensorBoard to profile a training (or inference) run and optimize it for performance.

Let' start by clearing the log directory, adding the TB extension, and loading the required modules.

In [None]:
# Load the TensorBoard notebook extension
%load_ext tensorboard

# Clear any logs from previous runs
!rm -rf ./tb_log/ 

import tensorflow as tf
from tensorflow import keras

##### Download the dataset

Download the MNIST Dataset. Note that, this time, we'll use TF datasets (not Keras') because it allows us to show some more interesting stuff in the TensorBoard profiler.

In [None]:
!pip install tensorflow_datasets

In [None]:
# Equivalent in keras
# mnist = keras.datasets.mnist
# (x_train, y_train),(x_test, y_test) = mnist.load_data()
# x_train, x_test = x_train / 255.0, x_test / 255.0

import tensorflow_datasets as tfds

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label


ds_train = ds_train.map(normalize_img)
ds_train = ds_train.batch(128)
ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(128)

##### Build the Model

Create a simple two-layer fully-connected DNN.

In [None]:
model = keras.models.Sequential([
  keras.layers.Flatten(input_shape=(28, 28, 1)),
  keras.layers.Dense(128,activation='relu'),
  keras.layers.Dense(10, activation='softmax')
])

model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=['accuracy']
)


##### Train the Model

Create a TensorBoard callback with the `profile_batch` option. In this case, let us profile batches from 500 to 520.

Then, train the model.

In [None]:
logs = "./tb_log"

tb_callback = tf.keras.callbacks.TensorBoard(log_dir = logs, histogram_freq = 1, profile_batch = '500,520')

# using test data for validation just for simplicity
model.fit(ds_train, epochs=5, validation_data=ds_test, callbacks = [tb_callback])


##### Examine Profiling Results

Open TensorBoard (in the notebook or from the command line) and examine the `PROFILE` tab from the dropdown menu.

In [None]:
%tensorboard --logdir="./tb_log"

##### Optimize for Performance

Optimize the input pipeline to speed-up the processing. In particular, cache and prefetch the data to avoid computation stalls (see dataset API lecture).

In [None]:
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

ds_train = ds_train.map(normalize_img)
ds_train = ds_train.batch(128)
ds_train = ds_train.cache()
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)


##### Train the Model (v2)

Train again the model.

In [None]:
model.fit(ds_train, epochs=5, validation_data=ds_test, callbacks = [tb_callback])

Check TensorBoard again and compare the two runs!