## Lab03_01: Style Transfer with ONNXRuntime and Microsoft DirectML 

### Prerequisites: 
- onnx
- onnxruntime
- numpy
- cv
- pytorch (tortch)

### Objective:
- Learn to use ONNXRuntime + DirectML by using the Style Transfer model from Lab01. 


We'll be using a Mosaic model which comes from the ONNX Model Repository. It's based off the research paper *Perceptual Losses for Real-Time Style Transfer and Super-Resolution* by Justin Johnson, Alexandre Alahi, and Li Fei-Fei.

In [None]:
!pip install onnxruntime
!pip install onnxruntime-directml

import cv2
import numpy as np
from onnx import numpy_helper
import onnx
import onnxruntime as rt
from onnx.tools import update_model_dims
import os
import time
import matplotlib.pyplot as plt
import tabulate

Let's begin by loading the model from Lab01_03. 

Remember, we want to use Float16 models on GPU for optimal performance.

In [None]:
#Load and verify model
model_name = '../../models/style-transfer-fp16.onnx'    #Model generated from Lab01_03 
onnx_model = onnx.load(model_name)
onnx.checker.check_model(onnx_model)
print("Model loaded and verified!")

>Find DirectML (DML) through the available providers and select that for our session

In [None]:
#Find available providers and create the Inference session targeted for GPU
print("Available providers")
'''
    TODO: Print out all of the available execution providers on the system

'''

start_session = time.time() # Excerise - Create the inference n DirectML device
'''
    TODO: Set the execution provider as DirectML
'''
elapsed_time_session_creation = ((time.time() - start_session)*1000)

#Print the session creation time
print(f"Session Creation time (msec) : {elapsed_time_session_creation:0.3f}\n")

Let's take a look at the model to understand the inputs and outputs

In [None]:
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
input_type = session.get_inputs()[0].type

print("input name", input_name)
print("input shape", input_shape)
print("input type", input_type)

outname = [output.name for output in session.get_outputs()]
print("outputs", outname)

Load in a sample image and go through the pre-processing steps. 
>Read in the sample image 

In [None]:
# Read the image
img = cv2.imread("../../resources/lab03_image.jpg")
# Convert to RGB
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Resize the image to 224x224,3
img = cv2.resize(img,(224,224),interpolation = cv2.INTER_LANCZOS4)
image = np.array(img, dtype=np.float16)
image = image.transpose((2,0,1))[np.newaxis, ...]
plt.imshow(np.asarray(img))

Run inference on the sample image

In [None]:
# ortvalue can be provided as part of the input feed to a model
data_output = session.run(outname, {input_name: image})[0][0] 

# Single output output 
output = data_output[0]
result = np.clip(data_output, 0, 255)
result = result.transpose(1,2,0).astype("uint8")
# Show and save the output
plt.imshow(np.asarray(result))
cv2.imwrite("../../resources/Lab03_01_output.png", result)

Let's now evaluate our baseline performance by running inference 1000 times

> Complete the run call with the style transfer model

In [None]:
print("Evaluating for Performance\n")
start_perf_base = time.time()
for i in range(100):
    data_output = session.run(outname, {input_name: image})[0][0] # Run inference here by completing the run() call
# Print the average time
elapsed_time_baseline_perf = ((time.time() - start_perf_base)*1000)/100
print(f"Average inference time (msec) : {elapsed_time_baseline_perf:0.3f}\n")
del session

In [None]:
#Load and verify model
model_name = '../../models/style-transfer=fp16-fixed.onnx' # Model generated from Lab01_03 
onnx_model = onnx.load(model_name)
onnx.checker.check_model(onnx_model)
print("Model loaded and verified!")

Let's reduce the session creation time
> Disable memory pattern, disable graph optimizations, and fix the input shapes using the denotation API

Then observe the difference in session creation time

In [None]:
#Set the session options by creation SessionOptions() object below
'''
    TODO: Disable memory patterns, disable graph optimizations, 
        and override the "free" dimensions using the denotation() API to set the batch size to 1
'''

#Create the inference session
start = time.time() 
'''
    TODO: Create the inference session with DirectML
'''
elapsed_time = ((time.time() - start)*1000)
print(f"Session Creation time (msec) : {elapsed_time:0.3f}\n")

In [None]:
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
input_type = session.get_inputs()[0].type

print("input name", input_name)
print("input shape", input_shape)
print("input type", input_type)

outname = [output.name for output in session.get_outputs()]
print("outputs", outname)

Let's also evaluate the performance for n number of inference runs

In [None]:
print("Evaluating for Performance\n")
t_start = time.time()
for i in range(100):
    data_output = session.run(outname, {input_name: image})[0][0] # Run inference here by completing the run() call
# Print the average time
elapsed_time_optimized_perf = ((time.time() - t_start)*1000)/100
print(f"Average inference time (msec) : {elapsed_time_optimized_perf:0.3f}\n")
del session

In [None]:
timing = [["Baseline Session Creation",elapsed_time_session_creation],
          ["Optimize Session Creation",elapsed_time],
          ["Avg Inference Time Baseline",elapsed_time_baseline_perf],
          ["Avg Inference Time optimized",elapsed_time_optimized_perf]]
table = tabulate.tabulate(timing, tablefmt='html')
table