# Model Conversion for Keras models

This tutorial explains how to convert a Keras Model stored in tensorflow's 'h5' format to a standardized ONNX format, which enables you to run your model on a GPU enabled AI Inference Server.  

In this tutorial you learn how to 
- load a model in 'h5' format
- convert and save the loaded model into 'onnx' format
- verify the input and output shape of the model


## Load the model

For this model conversion tutorial we are going to use the same model we created and trained in our [Image Classification]("../../use-cases/image-classification/Readme.md") example.  

In [None]:
from pathlib import Path
import tensorflow as tf
import sys

PYTHON_VERSION = sys.version_info

if PYTHON_VERSION.minor == 11:
    model_path = Path('./models/classification_mobilnet-py311.h5')
else:
    model_path = Path('./models/classification_mobilnet-py310.h5')

model = tf.keras.models.load_model(model_path)
model

## Convert and Save

In order to be able to convert your keras model to ONNX, you need to know the input shape and type of your model.  
These are stored in the properties of the original model as:

- `model.input.shape`, and
- `model.input.dtype`

Using these two, we can create an `input_signature` as `tensorflow.TensorSpec` class.  
In the constructor of TensorSpec you can define the `name` of the input tensor. In this tutorial, we are setting the `name` of the input tensor as `input`.  
Parameter `opset` defines the version of the `ONNX format`, opset version `13` refers to ONNX format `1.8.0` which is supported by AI Inference Server at the time of writing.

In [None]:
import tf2onnx
import onnx
import tensorflow as tf
    
if PYTHON_VERSION.minor == 11:
    input_signature = [tf.TensorSpec(inp.shape, inp.dtype, name=f'input_{i}') for i, inp in enumerate(model.input)]
else:
    input_signature = [tf.TensorSpec(model.input.shape, model.input.dtype, name='input')]

onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature, opset=13)
onnx.save(onnx_model, "./models/model.onnx")

## Input and Output shape

Let's inspect how the model and its inputs outputs are shaped.  
`graph.input` displays the input shape of the converted ONNX model.


In [None]:
onnx_model.graph.input

In this case, the shape of the input is `[? x 224 x 224 x 3]` which is similar to the original `[None, 224, 224, 3]` but its first dimension is called `unk_334` here.  
We can rename it defining the `dim_param` on the relevant dimension:

In [None]:
onnx_model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = 'batch_size'
onnx_model.graph.input

And we can do the same with the output.

In [None]:
print("Before:\n", onnx_model.graph.output)
onnx_model.graph.output[0].type.tensor_type.shape.dim[0].dim_param = 'batch_size'
print("After:\n", onnx_model.graph.output)

Now we can save the model again, and try it out.  
The model works with the original settings, but this way the two will be semantically linked and more readable.

In [None]:
onnx.save(onnx_model, "./models/model_renamed.onnx")

## Executing the model

Before packaging the model, it is recommended to try it out with the `onnxruntime` Python package.  
To do so we need to provide  
- an `onnxruntime.InferenceSession` with the preloaded model  
- the dictionary of the `input` tensors with the expected shape and type.  
  To test the model we are generating a numpy array with randomized float values. 
- the list of the `output` tensors,  
  now it is the last layer of our tensorflow model with name `dense`  
The shape can be changed by defining the variable `batch_size`.

In [None]:
import numpy
from onnxruntime import InferenceSession

batch_size = 10
images = numpy.random.random((batch_size, 224, 224, 3)).astype('float32')
session = InferenceSession("./models/model.onnx")

if PYTHON_VERSION.minor == 11:
    result = session.run(["dense"], {"input_0": images})
else:
    result = session.run(["dense"], {"input": images})
    
result

The `result` contains the output tensors in a list, in this case its the only one `dense` tensor.  
To get the most likely class where the images belong to, we need to extract the first element of the list: `result[0]` and to iterate through the probabilities of the classes by image searching for the highest probability.  
The shape of `result[0]` tensor is  `batch_size x 5` where `5` is the number of our labels.

In [None]:
labels = ['Label 101', 'Label 102', 'Label 103', 'Label 104', 'Label 105']
for probabilities in result[0]:
    class_index = numpy.argmax(probabilities)
    print(f"predicted class: {labels[class_index]}\n  probabilities {probabilities}")

## Usage of the ONNX model

The AI Inference Server with GPU support accepts ONNX models for execution.  
For this purpose the model must be packaged into a `GPURuntimeComponent` step using AI Software Development Kit.  
For details on how to create `GPURuntimeComponent` and build pipelines that run on a GPU enabled AI Inference Server you can study the [Object Detection]("../../use-cases/object-detection/Readme.md") example.


## Using custom layers in your model

AI Inference Server supports the execution of models containing custom layers.\
In case of using TensorRT optimization in the model configuraton, AI Inference Server attempts to run the model on the TensorRT backend. In case of an unrecognized operation (custom layers), the given operation will be executed on ONNX runtime backend.\
This could result in performance degradation.