# Enable Auto-Mixed Precision for Transfer Learning with TensorFlow

This notebook performs the following steps:

- Enable auto-mixed precision with a single-line change.
- Transfer-Learning for Image Classification using [TensorFlow Hub's](https://www.tensorflow.org/hub) ResNet50v1.5 pretrained model.
- Export the fine-tuned model in the [SavedModel](https://www.tensorflow.org/guide/saved_model) format.
- Optimize the SavedModel for faster inference.
- Serve the SavedModel using [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving).

In [None]:
# Importing libraries
import os
import numpy as np
import time
import PIL.Image as Image
import tensorflow as tf
import tensorflow_hub as hub
from datetime import datetime
import requests
from copy import deepcopy
print("We are using Tensorflow version: ", tf.__version__)

### Identifying supported ISA

We identify the underlying supported ISA to determine whether to enable auto-mixed precision to leverage higher performance benefits for training and inference as accelerated by the 4th Gen Intel® Xeon® scalable processor (codenamed Sapphire Rapids).

In [None]:
import sys
sys.path.append('../../')

import version_check

arch = version_check.arch_checker().arch
print("Arch: ", arch)

### Transfer Learning for Image Classification with TensorFlow

In this section, we use [TensorFlow Hub's](https://www.tensorflow.org/hub) pretrained [ResNet50v1.5 pretrained model](https://tfhub.dev/google/imagenet/resnet_v1_50/feature_vector/5) originally trained on the ImageNet dataset and perform transfer learning to fine-tune the model for your own image classes.

Source: https://www.tensorflow.org/tutorials/images/transfer_learning_with_hub

In this example, we use the **TensorFlow Flower dataset**

Loading the data in a *tf.data.Dataset* format.<br />
We use a Batch Size of 512 images each of shape 224 x 224 x 3.

In [None]:
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
data_root = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
   untar=True)

batch_size = 512
img_height = 224
img_width = 224

train_ds = tf.keras.utils.image_dataset_from_directory(
  str(data_root),
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size
)

val_ds = tf.keras.utils.image_dataset_from_directory(
  str(data_root),
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size
)

class_names = np.array(train_ds.class_names)
print("The flower dataset has " + str(len(class_names)) + " classes: ", class_names)

Image Pre-processing (Normalization between 0 and 1) and using buffered prefetching to avoid I/O blocking issues.

Reference: https://www.tensorflow.org/guide/data_performance#prefetching

In [None]:
normalization_layer = tf.keras.layers.Rescaling(1./255)
train_ds = train_ds.map(lambda x, y: (normalization_layer(x), y)) # Where x—images, y—labels.
val_ds = val_ds.map(lambda x, y: (normalization_layer(x), y)) # Where x—images, y—labels.

AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

In [None]:
for image_batch, labels_batch in train_ds:
    print(image_batch.shape)
    print(labels_batch.shape)
    break

**Simple Transfer Learning:**<br />
    1. *Select a pre-trained model from TensorFlow Hub*.<br />
    2. *Retrain the top (last) layer to recognize the classes from your custom dataset*.<br /><br />

We use a **headless ResNet50v1.5 pretrained model** (without the classification layer). Any compatible image feature vector model from TF-Hub (https://tfhub.dev/s?module-type=image-feature-vector&q=tf2) can be used here.

In [None]:
resnet_feature_vector = "https://tfhub.dev/google/imagenet/resnet_v1_50/feature_vector/5"

feature_extractor_model = resnet_feature_vector

Create the feature extractor by wrapping the pre-trained model as a Keras layer with **hub.KerasLayer**. Use the ***trainable=False*** argument to freeze the variables, so that the training only modifies the new classifier layer:

In [None]:
feature_extractor_layer = hub.KerasLayer(
    feature_extractor_model,
    input_shape=(224, 224, 3),
    trainable=False)

feature_batch = feature_extractor_layer(image_batch)

Attach the last fully connected classification layer in a **tf.keras.Sequential** model.

In [None]:
num_classes = len(class_names)

fp32_model = tf.keras.Sequential([
  feature_extractor_layer,
  tf.keras.layers.Dense(num_classes)
])

if arch == 'SPR':
    # Create a deep copy of the model to train the bf16 model separately to compare accuracy
    bf16_model = tf.keras.models.clone_model(fp32_model)

fp32_model.summary()

In order to measure the training throughput, we define the following custom callback. For more information on callbacks, refer to https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback.

In [None]:
class TimeHistory(tf.keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.times = []
        self.throughput = []

    def on_batch_begin(self, batch, logs={}):
        self.epoch_time_start = time.time()

    def on_batch_end(self, batch, logs={}):
        total_time = time.time() - self.epoch_time_start
        self.times.append(total_time)
        self.throughput.append(batch_size/total_time)

#### Compile and train the model

In [None]:
fp32_model.compile(
  optimizer=tf.keras.optimizers.SGD(),
  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
  metrics=['acc'])

In [None]:
train_throughput_list = []

#### Train without auto-mixed precision (float32)

In [None]:
NUM_EPOCHS = 10
time_callback = TimeHistory()
history = fp32_model.fit(train_ds, validation_data=val_ds, epochs=NUM_EPOCHS, callbacks=[time_callback])
avg_throughput = sum(time_callback.throughput)/len(time_callback.throughput)
print("Avg Throughput: " + str(avg_throughput) + " imgs/sec")
train_throughput_list.append(avg_throughput)

### Enabling auto-mixed precision with `tf.config` API

In this section, we show how to enable the auto-mixed precision using the `tf.config` API. Enabling this API will automatically convert the pre-trained model to use the bfloat16 datatype for computation resulting in an increased training throughput on the latest Intel® Xeon® scalable processor.

You can also print the following to see whether the auto-mixed precision has been enabled.

_Note: We only enable the auto-mixed precision if the underlying system is the 4th Gen Intel® Xeon® scalable processor (codenamed Sapphire Rapids)_

In [None]:
if arch == 'SPR':
    tf.config.optimizer.set_experimental_options({'auto_mixed_precision_onednn_bfloat16':True})
    print(tf.config.optimizer.get_experimental_options())

#### Compile and train the model with auto-mixed precision (bfloat16)

In [None]:
if arch == 'SPR':
    # Compile
    bf16_model.compile(
      optimizer=tf.keras.optimizers.SGD(),
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=['acc'])
    
    # Train
    NUM_EPOCHS = 10
    time_callback = TimeHistory()
    history = bf16_model.fit(train_ds, validation_data=val_ds, epochs=NUM_EPOCHS, callbacks=[time_callback])
    avg_throughput = sum(time_callback.throughput)/len(time_callback.throughput)
    print("Avg Throughput: " + str(avg_throughput) + " imgs/sec")
    train_throughput_list.append(avg_throughput)
    
    model = bf16_model
else:
    model = fp32_model

Now, let's compare the throughput achieved with and without auto-mixed precision enabled.

In [None]:
if arch == 'SPR':
    import pandas as pd
    print(train_throughput_list)
    speedup = float(train_throughput_list[1])/float(train_throughput_list[0])
    print("Speedup : ", speedup)
    df = pd.DataFrame({'training_type':['orig', 'auto_mixed_precision'], 'Training Speedup':[1, speedup]})
    ax = df.plot.bar( x='training_type', y='Training Speedup', rot=0)

### Export the model in the SavedModel format

Now that you've trained the model, export it as a SavedModel for reusing it later.

In [None]:
export_path = "models/my_saved_model"
model.save(export_path)

export_path

Let's measure the performance of the model we just saved using the `tf_benchmark.py` script that runs inference on dummy data.

In [None]:
run scripts/tf_benchmark.py --model_path models/my_saved_model --num_warmup 5 --num_iter 50 --precision float32 --batch_size 32 --disable_optimize

### Optimize the SavedModel for faster inference

To get a good performance on your (re)trained model for inference, some inference optimizations are required.
In this section, we will guide you how to optimize a pre-trained model for better inference performance using the `freeze_optimize_v2.py` script that we put together using standard TensorFlow routines to optimize the model.
Those optimizations includes:

- Converting variables to constants
- Removing training-only operations like checkpoint saving
- Stripping out parts of the graph that are never reached
- Removing debug operations like CheckNumerics
- Folding batch normalization ops into the pre-calculated weights
- Fusing common operations into unified versions

The input to this script is the directory of original saved model, and output of this script is the directory of optimzed model. Users don't need to change below command in this tutorial, but need to put related directories after "--input_saved_model_dir" and "--output_saved_model_dir" for other pre-trained models.

In [None]:
run scripts/freeze_optimize_v2.py --input_saved_model_dir=models/my_saved_model --output_saved_model_dir=models/my_optimized_model

Now that we have saved the optimized model, let's measure its performance using our benchmarking script.

In [None]:
run scripts/tf_benchmark.py --model_path models/my_optimized_model --num_warmup 5 --num_iter 50 --precision float32 --batch_size 32

**Let's compare the speedup obtained with the optimized model.**

`plot.py` is a python script that creates a plot of the throughput values for inference with the original and the optimized model.

In [None]:
run scripts/plot.py

### TensorFlow Serving

In this section, we will initialize and run TensorFlow Serving natively to serve our retrained model.

In [None]:
!mkdir serving
!cp -r models/my_optimized_model serving/1

In [None]:
os.environ["MODEL_DIR"] = os.getcwd() + "/serving"

This is where we start running TensorFlow Serving and load our model. After it loads we can start making inference requests using REST. There are some important parameters:
- **rest_api_port**: The port that you'll use for REST requests.
- **model_name**: You'll use this in the URL of REST requests. It can be anything.
- **model_base_path**: This is the path to the directory where you've saved your model.

In [None]:
%%bash --bg
nohup tensorflow_model_server --rest_api_port=8501 --model_name=rn50 --model_base_path=${MODEL_DIR} > server.log 2>&1

In [None]:
!tail server.log

**Prepare the testing data for prediction**

In [None]:
for image_batch, labels_batch in val_ds:
    print(image_batch.shape)
    print(labels_batch.shape)
    break
test_data, test_labels = image_batch.numpy(), labels_batch.numpy()

First, let's take a look at a random example from our test data.

In [None]:
import matplotlib.pyplot as plt

def show(idx, title):
    plt.figure()
    plt.imshow(test_data[idx])
    plt.axis('off')
    plt.title('\n\n{}'.format(title), fontdict={'size': 16})

import random
rando = random.randint(0,test_data.shape[0]-1)
show(rando, 'An Example Image:')

#### Make a request to your model in TensorFlow Serving

Now let's create the JSON object for a batch of three inference requests, and see how well our model recognizes things:

In [None]:
import json
data = json.dumps({"signature_name": "serving_default", "instances": test_data[0:3].tolist()})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))

#### Make REST requests

We'll send a predict request as a POST to our server's REST endpoint, and pass it three examples.

In [None]:
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/rn50:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)['predictions']

for i in range(0,3):
    show(i, 'The model thought this was a {} (class {}), and it was actually a {} (class {})'.format(
        class_names[np.argmax(predictions[i])], np.argmax(predictions[i]), class_names[test_labels[i]], test_labels[i]))