# PyTorch Resnet50 Benchmarking Sample.

This sample code will run a simple inference loop to estimate the inference time for running Resnet50 on PyTorch. 

To compare the results, we added the native and also OpenVINO inference for quick time comparison with IPEX.

Note:
Please check for BF16 supports.
https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html

DevCloud:
https://www.intel.com/content/www/us/en/developer/tools/devcloud/services.html

@author: Raymond Lo, PhD (AI Software Evangelist Global Lead)

@disclaimer: This is an initial internal test and all benchmark results below need to be validated. 

In [43]:
import torch
import torchvision.models as models
from intel_extension_for_pytorch.quantization import prepare, convert
import intel_extension_for_pytorch as ipex

In [44]:
def inference(model, data, iterations):
  with torch.no_grad():
    # warm up
    for _ in range(100):
      model(data)

    # measure
    import time
    start = time.time()
    for _ in range(iterations):
      output = model(data)
    end = time.time()
    print('Inference took {:.2f} ms on average'.format((end-start)/iterations*1000))
    print('Total time: {:.2f} s'.format(end-start)+" for "+str(iterations)+" runs")

In [45]:
model = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model.eval()

dtype = 'float32'
torchscript = 'False'

data = torch.rand(1, 3, 224, 224)

model = model.to(memory_format=torch.channels_last)
data = data.to(memory_format=torch.channels_last)

if dtype == 'float32':
    model = ipex.optimize(model, dtype=torch.float32)
elif dtype == 'bfloat16':
    model = ipex.optimize(model, dtype=torch.bfloat16)
else: # int8
    from intel_extension_for_pytorch.quantization import prepare, convert
    qconfig = ipex.quantization.default_static_qconfig
    model = prepare(model, qconfig, example_inputs=data, inplace=False)
    # calibration
    n_iter = 100
    for i in range(n_iter):
        model(data)

    model = convert(model)

with torch.cpu.amp.autocast(enabled=dtype=='bfloat16'):
    if torchscript: 
        with torch.no_grad():
            model = torch.jit.trace(model, data)
            model = torch.jit.freeze(model)

In [46]:
n_samples = 1000

## Run the inference with IPEX

In [47]:
inference(model, data, n_samples)

Inference took 4.37 ms on average
Total time: 4.37 s for 1000 runs


## Run the inference with the native PyTorch

In [48]:
model_native = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model_native.eval()

model_native = model_native.to(memory_format=torch.channels_last)
data = data.to(memory_format=torch.channels_last)

# calibration
n_iter = 100
for i in range(n_iter):
    model_native(data)

In [49]:
inference(model_native, data, n_samples)

Inference took 12.19 ms on average
Total time: 12.19 s for 1000 runs


## OpenVINO Inference

In [50]:
from openvino.runtime import Core, AsyncInferQueue
import os

In [51]:
def inference_openvino(ov_model, data, output_layer_onnx, iterations):
    # warm up
    for _ in range(100):
        output = ov_model([data])

    # measure
    import time
    start = time.time()
    for _ in range(iterations):
        output = ov_model([data])
        #output = ov_model([data])[output_layer_onnx]
        #print(output)
        
    end = time.time()
    print('Inference took {:.2f} ms on average'.format((end-start)/iterations*1000))
    print('Total time: {:.2f} s'.format(end-start)+" for "+str(iterations)+" runs")
    
#just callback and take data back (no post-processing)
def callback(infer_request, info) -> None:
    res = infer_request.get_output_tensor(0).data[0]
    #print(res)
    
def inference_openvino_async(ov_model, data, output_layer_onnx, iterations):
    import time
    #0 will pick the optimal # with auto plugin throughput mode
    infer_queue = AsyncInferQueue(ov_model, 0)
    infer_queue.set_callback(callback)
    
    #measure the total time it takes to process all requests.
    start = time.time()
    for _ in range(iterations):
        infer_queue.start_async({'input.1': data})
        #infer_queue.start_async({0: data}, output_layer_onnx)

    infer_queue.wait_all()
    end = time.time()
    print('Inference took {:.2f} ms on average'.format((end-start)/iterations*1000))
    print('Total time: {:.2f} s'.format(end-start)+" for "+str(iterations)+" runs")

In [52]:
onnx_path = "resnet50.onnx"
ir_path = "."
ir_file = "resnet50.xml"

torch.onnx.export(
            model_native,
            data,
            onnx_path,
        )

# Construct the command for Model Optimizer.
mo_command = f"""mo
                 --input_model "{onnx_path}"
                 --output_dir "{ir_path}"
                 --compress_to_fp16
                 """
mo_command = " ".join(mo_command.split())
print("Model Optimizer command to convert the ONNX model to OpenVINO:")
print(mo_command)

print("Exporting ONNX model to IR...")
mo_result = %sx $mo_command
print("\n".join(mo_result))

verbose: False, log level: Level.ERROR

Model Optimizer command to convert the ONNX model to OpenVINO:
mo --input_model "resnet50.onnx" --output_dir "." --compress_to_fp16
Exporting ONNX model to IR...
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /home/raymondlo84/pytorch_openvino/intel-extension-for-pytorch/examples/notebooks/resnet50.xml
[ SUCCESS ] BIN file: /home/raymondlo84/pytorch_openvino/intel-extension-for-pytorch/examples/notebooks/resnet50.bin


In [53]:
# Load the network to OpenVINO Runtime.
ie = Core()
devices = ie.available_devices
ie.set_property({'CACHE_DIR': 'cache'})

for device in devices:
    device_name = ie.get_property(device, "FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

num_cores = os.cpu_count() 

model_onnx = ie.read_model(model=onnx_path)

#CPU only tests
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")
compiled_model_onnx_throughput = ie.compile_model(model=model_onnx, device_name="CPU", config={"PERFORMANCE_HINT":"THROUGHPUT"})

#GPU tests
#compiled_model_onnx_throughput_gpu = ie.compile_model(model=model_onnx, device_name="GPU", config={"PERFORMANCE_HINT":"THROUGHPUT"})

output_layer_onnx = compiled_model_onnx.output(0)

CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz
GPU: Intel(R) Data Center GPU Flex Series 170 [0x56c0] (dGPU)


In [54]:
# Run inference on the input image. (sync mode) with latency priority
inference_openvino(compiled_model_onnx, data, output_layer_onnx, n_samples)

Inference took 8.66 ms on average
Total time: 8.66 s for 1000 runs


In [55]:
# Run inference on the input image. (async mode) 
#print(output_layer_onnx)
#input_layer_onnx = compiled_model_onnx_throughput.input(0)
#print(input_layer_onnx)
inference_openvino_async(compiled_model_onnx_throughput, data, output_layer_onnx, n_samples)

Inference took 1.38 ms on average
Total time: 1.38 s for 1000 runs


In [65]:
#Run inference on GPUs - uncomment to try
#inference_openvino_async(compiled_model_onnx_throughput_gpu, data, output_layer_onnx, n_samples)

Inference took 1.24 ms on average
Total time: 12.41 s for 10000 runs


In [57]:
# async is needed to get best run performance out from all cores

In [58]:
!benchmark_app -m resnet50.xml -t 30 -api sync

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 51.75 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : f32 / [...] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     495 (node: 495) : f32 / [...] / [1,1000]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : u8 / [N,C,H,W] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     49

In [59]:
# we can see the median average and min for all inferences

In [60]:
!benchmark_app -m resnet50.xml -t 30 -api async

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 60.84 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : f32 / [...] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     495 (node: 495) : f32 / [...] / [1,1000]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : u8 / [N,C,H,W] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     49

In [61]:
# we can see the median average and min for all inferences-- and async usually have a slightly higher latency

In [62]:
#GPU benchmark - uncomment to try
#!benchmark_app -m resnet50.xml -t 30 -api async -d GPU

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] GPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 32.36 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : f32 / [...] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     495 (node: 495) : f32 / [...] / [1,1000]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : u8 / [N,C,H,W] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     49