# Automatic Device Selection

The Auto Device (or AUTO in short) selects the most suitable device from the available compute devices by considering the network precision, power efficiency and processing capability. The network precision (if the network is quantized or not) is the first consideration to filter out the devices that cannot run the network efficiently.

Next, the dedicated accelerator devices are preferred, e.g., discrete GPU, integrated GPU, or VPU. CPU is used as the default “fallback device”. Please note that AUTO does this selection only once at the network load time. 

When choosing the accelerator device like GPUs, loading the network to these devices may take long time. To address this challenge for application that requires fast initial inference response the AUTO starts inferencing immediately on the CPU and then transparently shifts inferencing to the GPU once ready, dramatically reducing time to first inference.

![Auto Device Selection logic](data/auto_device_selection.png "Auto Device Selection")

The following demostrations use the [googlenet-v1](https://docs.openvino.ai/latest/omz_models_model_googlenet_v1.html) model from the [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/). The googlenet-v1 model is the first of the Inception family of models designed to perform image classification. Like the other Inception models, the googlenet-v1 model has been pre-trained on the ImageNet image database. For details about this family of models, check out the paper.

Please follow the [104-model-tools](../104-model-tools/README.md) to download googlenet-v1 and convert (--precisions FP16), copy the results (googlenet-v1.bin and googlenet-v1.xml) to notebooks\106-auto-device\model\ folder

## Imports

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from openvino.runtime import Core, CompiledModel, AsyncInferQueue, InferRequest
import sys
import time

## Load the model with AUTO device
### Default behavior of compile_model without device_name
By default compile_model will select AUTO as device_name if it is not specificed.

In [None]:
ie = Core()
# read the model into ngraph representation
model = ie.read_model(model="model/googlenet-v1.xml")
# load the model to target device
compiled_model = ie.compile_model(model=model, config={"LOG_LEVEL":"LOG_INFO"})

if isinstance(compiled_model, CompiledModel):
    print("Compile model without device_name successfully.")

### compile_model with AUTO as device name

In [None]:
compiled_model = ie.compile_model(model=model, device_name="AUTO")

if isinstance(compiled_model, CompiledModel):
    print("Compile model with AUTO successfully.")

## First inference latency benifit with AUTO
One of the key performance benefits of AUTO is on first inference latency (FIL = compile model time + fist inference execution time). Directly using CPU device would produce the shortest first inference latency as the OpenVINO graph representations can really quickly be JIT-compiled to CPU. The challenge is with the GPU. Since the OpenCL complication of graph to GPU-optimized kernels takes a few seconds to complete for this platform. If AUTO selects GPU as the device, this initialization time may be intolerable to some applications, which is the reason for AUTO to transparently use the CPU as the first inference device until GPU is ready. 
### Load an Image

In [None]:
input_layer_ir = next(iter(compiled_model.inputs))

# Text detection models expects image in BGR format
image = cv2.imread("data/intel_rnb.jpg")

# N, C, H, W = batch size, number of channels, height, width
N, C, H, W = input_layer_ir.shape

# Resize image to meet network expected input sizes
resized_image = cv2.resize(image, (W, H))

# Reshape to network input shape
input_image = np.expand_dims(resized_image.transpose(2, 0, 1), 0)

plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB));

### load to GPU Device and do first inference

In [None]:

# Start to compile model, time point 1
gpu_load_start_time = time.perf_counter()
compiled_model = ie.compile_model(model=model, device_name="GPU") # load to GPU

# get input and output nodes
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

# get the first inference result
results = compiled_model([input_image])[output_layer]

# Get 1st inference, time point 2
gpu_fil_end_time = time.perf_counter()
gpu_fil_span = gpu_fil_end_time - gpu_load_start_time
print(f"Loaded model to GPU and get first inference in {gpu_fil_end_time-gpu_load_start_time:.2f} seconds.")

### compile_model with AUTO Device and do first inference

In [None]:
# Start to compile model, time point 1
auto_load_start_time = time.perf_counter()
compiled_model = ie.compile_model(model=model) # device_name is AUTO by default

# get input and output nodes
input_layer = compiled_model.input(0)
output_layer = compiled_model.output(0)

# get the first inference result
results = compiled_model([input_image])[output_layer]


# Get 1st inference, time point 2
auto_fil_end_time = time.perf_counter()
auto_fil_span = auto_fil_end_time - auto_load_start_time
print(f"Loaded model to AUTO and get first inference in {auto_fil_end_time-auto_load_start_time:.2f} seconds.")

### First inference latency benefit 

In [None]:
# Output the latency difference
device_list = ["GPU", "AUTO"]
load_and_fil_list = [gpu_fil_span, auto_fil_span]
plt.barh(range(len(load_and_fil_list)), load_and_fil_list, tick_label=device_list)
plt.show()

## Performance hint
The next highlight is the differentiation of performance hint with AUTO. By specifying LATENCY hint or THROUGHTPUT hint, AUTO demonstrate significant performance results towards the desired metric. THROUGHTPUT hint delivers much higher frame per second (FPS) performance than LATENCY hint. In contrast, the LATENCY hint delivers much lower latency than THROUGHTPUT hint. Notice that the hints do not require low-level device-specific settings, and are also completely portable between the devices, which allows the AUTO just to expedite the hint value directly to the selected device.

### compile_model with THROUGHTPUT hint

Loop for the inference and output the FPS/Latency for each n sencods

In [None]:
# output period (seconds)
period_seconds = 10
end_after_periods = 6  # Total time @period_seconds x @end_after_periods

print("Compiling Model for AUTO Device with THROUGHPUT hint")
sys.stdout.flush()

compiled_model = ie.compile_model(model=model, config={"PERFORMANCE_HINT":"THROUGHPUT", "LOG_LEVEL":"LOG_INFO"})

def completion_callback(infer_request: InferRequest, job_id) -> None:
    global period_fps
    global period_latency
    global period_start_time
    global period_count
    global latency_list
    global overall_latency_list
    global feed_inference
    
    latency_list.append(infer_request.latency)
    overall_latency_list.append(infer_request.latency)
    period_exec_time = time.perf_counter()-period_start_time
    if period_exec_time >= period_seconds:
        period_start_time = time.perf_counter()
        period_fps = len(latency_list)/period_exec_time
        period_latency = sum(latency_list)/len(latency_list)
        print("fps:%.2f, latency:%.2f, period time: %.2fs"%(period_fps, period_latency, period_exec_time))
        sys.stdout.flush()
        latency_list = []
        period_count = period_count + 1
        if period_count >= end_after_periods: # Stop feed inference request
            feed_inference = False

infer_queue = AsyncInferQueue(compiled_model, 0) #set 0 will query optimal num by default
infer_queue.set_callback(completion_callback)

print("Start inference, %.0f groups fps/latency will be out with %.0fs interval"%(end_after_periods, period_seconds))
sys.stdout.flush()

# Initilization for inference with THROUGHPUT hint
period_fps = 0
period_latency = 0
period_start_time = time.perf_counter()
period_count = 0
latency_list = []
overall_latency_list = []
feed_inference = True

job_id = 0
while True == feed_inference:
    infer_queue.start_async({input_layer_ir.any_name: input_image}, job_id)
    period_exec_time = time.perf_counter()-period_start_time
    job_id+=1
    
infer_queue.wait_all()

# Take the fps and latency of latest period
THROUGHPUT_fps = period_fps
THROUGHPUT_latency = period_latency

print("Done")
sys.stdout.flush()
#print(overall_latency_list)

### compile_model with LATENCY hint

Loop for the inference and output the FPS/Latency for each n sencods

In [None]:
print("Compiling Model for AUTO Device with LATENCY hint")
sys.stdout.flush()

compiled_model = ie.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})

infer_queue = AsyncInferQueue(compiled_model, 0) #set 0 will query optimal num by default
infer_queue.set_callback(completion_callback)

print("Start inference, %.0f groups fps/latency will be out with %.0fs interval"%(end_after_periods, period_seconds))
sys.stdout.flush()

# Initilization for inference with LATENCY hint
period_fps = 0
period_latency = 0
period_start_time = time.perf_counter()
period_count = 0
latency_list = []
overall_latency_list = []
feed_inference = True

job_id = 0
while True == feed_inference:
    infer_queue.start_async({input_layer_ir.any_name: input_image}, job_id)
    period_exec_time = time.perf_counter()-period_start_time
    job_id+=1
    
infer_queue.wait_all()

# Take the fps and latency of latest period
LATENCY_fps = period_fps
LATENCY_latency = period_latency

print("Done")
sys.stdout.flush()
#print(overall_latency_list)

### FPS and Latency difference


In [None]:
# output the difference
labels = ["fps", "latency"]
THROUGHPUT = [THROUGHPUT_fps, THROUGHPUT_latency]
LATENCY = [LATENCY_fps, LATENCY_latency]

width = 0.4
fig, ax = plt.subplots(1,2)

rects1 = ax[0].bar([0], THROUGHPUT_fps, width, label='THROUGHPUT', color='#557f2d')
rects2 = ax[0].bar([width], LATENCY_fps, width, label='LATENCY')
ax[0].set_ylabel("frame per second")
ax[0].set_xticks([width/2]) 
ax[0].set_xticklabels(["fps"])

rects1 = ax[1].bar([0], THROUGHPUT_latency, width, label='THROUGHPUT', color='#557f2d')
rects2 = ax[1].bar([width], LATENCY_latency, width, label='LATENCY')
ax[1].set_ylabel("millisecond")
ax[1].set_xticks([width/2]) 
ax[1].set_xticklabels(["latency (ms)"])

fig.suptitle('Performance Hints')
ax[1].legend()
fig.tight_layout()

plt.show()