# Hands-on: Profiling TensorFlow with TensorBoard


![tb](https://github.com/tensorflow/tensorboard/blob/master/docs/images/quickstart_model_fit.png?raw=1)


1. Download the notebook `PRACE_hands_on_GPU_profiling.ipynb` to your local system

2. For this hands-on, we will use a jupyter notebook of Google where you can get a GPU for free named a "Google Colaboratory" environment so please navigate to the [colab link](https://colab.research.google.com/notebooks/intro.ipynb#recent=true)

2. Navigate to `Upload` in the left upper corner

3. Upload the `PRACE_hands_on_GPU_profiling.ipynb` by browsing to it 

First we need to run the following commands to use Tensorboard:

In [None]:
!pip install tensorflow-datasets
!pip install ipywidgets 
!jupyter nbextension enable --py widgetsnbextension
!pip install -U tensorboard-plugin-profile 

In [None]:
from datetime import datetime
import os
import tensorflow as tf
print("TensorFlow version: ", tf.__version__)
import tensorflow_datasets as tfds


Now we are going to download the Fashion-MNIST dataset from Google, to train a dummy neural network which we will profile and inspect using Tensorboard

In [None]:
# Download the data. The data is already divided into train and test.
# The labels are integers representing classes.
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = \
    fashion_mnist.load_data()

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label


# First let's construct the input pipeline throught the tf.data API (https://www.tensorflow.org/api_docs/python/tf/data)
ds_train = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
ds_train = ds_train.map(normalize_img)
ds_train = ds_train.batch(512)

ds_test = tf.data.Dataset.from_tensor_slices((test_images, test_labels))
ds_test = ds_test.map(normalize_img)
ds_test = ds_test.batch(512)

from tensorflow.keras.mixed_precision import experimental as mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28, 1)),
  tf.keras.layers.Dense(128,activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=tf.keras.optimizers.Adam(0.001),
    metrics=['accuracy']
)

- Now we are going to train the model for 5 epochs, where we make use of the Tensorboard callback to gather information on model training and performance which is stored in the `logs/` folder.

In [None]:
logs = "logs/" + datetime.now().strftime("%Y%m%d-%H%M%S")
tboard_callback = tf.keras.callbacks.TensorBoard(
    log_dir=logs, histogram_freq=1, write_graph=True, write_images=True,
    update_freq='epoch', profile_batch='2,3')


%time model.fit(ds_train,epochs=5,validation_data=ds_test,callbacks = [tboard_callback])


- Now start a Tensorboard instance, by inputting the commands, this might take a while. (If you start it for the **second** time may have to `!kill <pid>` a running process):


In [None]:
#!kill 2037 
%reload_ext tensorboard
%tensorboard --logdir=logs

- What are you seeing? Do you think we trained for enough epochs?


In [None]:
"""
TYPE YOUR ANSWER HERE
"""

### To see the profiling of the model, select "Profile" from the dropdown menu in the right upper corner where it says "inactive"
- What is taking the longest time to compute? What percentage of compute time did it take? What was the duration of the GPU kernel of that operation?

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

- Now take a look at the memory profile of the GPU, what do you think you can do to decrease free memory during training and make optimal use of the GPU?

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

- What is the max duration of a GPU-kernel computation? What was the average wall duration of the compute stream that occurred the most?

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

- What do you think is in this case the bottleneck of the model? Why?

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

- Use the `.cache()` and the `prefetch(tf.data.experimental.AUTOTUNE)` methods in your train and test data pipeline to optimize the speed of the training experiment
- Run the model again, with the optimization in place

- Why is the training now much faster? What has changed? How is this seen in the Tensorboard profiling?

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

### Next to the data pipeline, we can also optimize the speed and memory utilization through the floating point precision in which the model is training

- In what floating point precision is your model training right now, if you look at Tensorboard? 

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

Now add the following commands above your model definition:
```
from tensorflow.keras.mixed_precision import experimental as mixed_precision
policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)
```

> This will set the `dtype` policy for building the model

- Now re-compile, and train your model. Did it lose any validation accuracy?

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

- What are the ways in Tensorboard you can see the precision of the computations that took place? What do you notice? What is the max duration of a GPU-kernel computation now? Why didn't the model train faster?

In [None]:
"""
TYPE YOUR ANSWER HERE
"""

- BONUS: If you're interested, try to improve this model with a [convolutional neural 
network](https://medium.com/tensorflow/hello-deep-learning-fashion-mnist-with-keras-50fcff8cd74a) (CNN).