# PyTorch Resnet50 Benchmarking Sample.

This sample code will run a simple inference loop to estimate the inference time for running Resnet50 on PyTorch. 

To compare the results, we added the native and also OpenVINO inference for quick time comparison with IPEX.

Note:
Please check for BF16 supports.
https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html

DevCloud:
https://www.intel.com/content/www/us/en/developer/tools/devcloud/services.html

@author: Raymond Lo, PhD (AI Software Evangelist Global Lead)

@disclaimer: This is an initial internal test and all benchmark results below need to be validated. 

# How to Run (Linux)

### Setup Virtual Environment
```
python3 -m venv ipex_openvino_env
source ipex_openvino_env/bin/activate
```
### Install OpenVINO

```
pip install --upgrade pip
pip install wheel setuptools
pip install openvino-dev 
```

### Install IPEX

```
pip install intel_extension_for_pytorch
pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-stable-cpu

```

### Install Jupyter Lab

```
pip install jupyterlab

```

### Run Jupyter Lab
```
jupyter lab examples/notebooks 
```

Then, open benchmark.ipynb and run the notebook :)


In [28]:
import torch
import torchvision.models as models
from intel_extension_for_pytorch.quantization import prepare, convert
import intel_extension_for_pytorch as ipex

In [29]:
def inference(model, data, iterations):
  with torch.no_grad():
    # warm up
    for _ in range(100):
      model(data)

    # measure
    import time
    start = time.time()
    for _ in range(iterations):
      output = model(data)
    end = time.time()
    print('Inference took {:.2f} ms in average'.format((end-start)/iterations*1000))
    print('Total time: {:.2f} s'.format(end-start)+" for "+str(iterations)+" runs")

In [30]:
model = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model.eval()

dtype = 'float32'
torchscript = 'False'

data = torch.rand(1, 3, 224, 224)

model = model.to(memory_format=torch.channels_last)
data = data.to(memory_format=torch.channels_last)

if dtype == 'float32':
    model = ipex.optimize(model, dtype=torch.float32)
elif dtype == 'bfloat16':
    model = ipex.optimize(model, dtype=torch.bfloat16)
else: # int8
    from intel_extension_for_pytorch.quantization import prepare, convert

qconfig = ipex.quantization.default_static_qconfig
model = prepare(model, qconfig, example_inputs=data, inplace=False)

# calibration
n_iter = 100
for i in range(n_iter):
    model(data)

model = convert(model)

with torch.cpu.amp.autocast(enabled=dtype=='bfloat16'):
    if torchscript: 
        with torch.no_grad():
            model = torch.jit.trace(model, data)
            model = torch.jit.freeze(model)

In [31]:
n_samples = 3000

## Run the inference with IPEX

In [32]:
inference(model, data, n_samples)

Inference took 4.16 ms in average
Total time: 12.47 s for 3000 runs


## Run the inference with the native PyTorch

In [33]:
model_native = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model_native.eval()

model_native = model_native.to(memory_format=torch.channels_last)
data = data.to(memory_format=torch.channels_last)

# calibration
n_iter = 100
for i in range(n_iter):
    model_native(data)

In [34]:
inference(model_native, data, n_samples)

Inference took 12.49 ms in average
Total time: 37.48 s for 3000 runs


## OpenVINO Inference

In [35]:
from openvino.runtime import Core, AsyncInferQueue
import os

In [36]:
def inference_openvino(ov_model, data, output_layer_onnx, iterations):
    # warm up
    for _ in range(100):
        ov_model([data])[output_layer_onnx]

    # measure
    import time
    start = time.time()
    for _ in range(iterations):
        output = ov_model([data])[output_layer_onnx]
    end = time.time()
    print('Inference took {:.2f} ms in average'.format((end-start)/iterations*1000))
    print('Total time: {:.2f} s'.format(end-start)+" for "+str(iterations)+" runs")
    
#just callback and take data back (no post-processing)
def callback(infer_request, info) -> None:
    res = infer_request.get_output_tensor(0).data[0]
    
def inference_openvino_async(ov_model, data, output_layer_onnx, iterations):
    import time
    infer_queue = AsyncInferQueue(ov_model, 2)
    infer_queue.set_callback(callback)
    
    #measure the total time it takes to process all requests.
    start = time.time()
    for _ in range(iterations):
        infer_queue.start_async({0: data}, output_layer_onnx)
    infer_queue.wait_all()
    end = time.time()
    print('Inference took {:.2f} ms in average'.format((end-start)/iterations*1000))
    print('Total time: {:.2f} s'.format(end-start)+" for "+str(iterations)+" runs")

In [37]:
onnx_path = "resnet50.onnx"
ir_path = "."
ir_file = "resnet50.xml"

torch.onnx.export(
            model_native,
            data,
            onnx_path,
        )

# Construct the command for Model Optimizer.
mo_command = f"""mo
                 --input_model "{onnx_path}"
                 --output_dir "{ir_path}"
                 """
mo_command = " ".join(mo_command.split())
print("Model Optimizer command to convert the ONNX model to OpenVINO:")
print(mo_command)

print("Exporting ONNX model to IR...")
mo_result = %sx $mo_command
print("\n".join(mo_result))

Model Optimizer command to convert the ONNX model to OpenVINO:
mo --input_model "resnet50.onnx" --output_dir "."
Exporting ONNX model to IR...
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /home/raymondlo84/pytorch_ext/intel-extension-for-pytorch/examples/cpu/inference/resnet50.xml
[ SUCCESS ] BIN file: /home/raymondlo84/pytorch_ext/intel-extension-for-pytorch/examples/cpu/inference/resnet50.bin


In [38]:
# Load the network to OpenVINO Runtime.
ie = Core()
devices = ie.available_devices

for device in devices:
    device_name = ie.get_property(device, "FULL_DEVICE_NAME")
    print(f"{device}: {device_name}")

num_cores = os.cpu_count() 

model_onnx = ie.read_model(model=onnx_path)
compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="CPU")
#compiled_model_onnx = ie.compile_model(model=model_onnx, device_name="AUTO", config={"PERFORMANCE_HINT":"THROUGHPUT"})

output_layer_onnx = compiled_model_onnx.output(0)


CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz


In [39]:
# Run inference on the input image. (sync mode)
inference_openvino(compiled_model_onnx, data, output_layer_onnx, n_samples)

Inference took 7.44 ms in average
Total time: 22.32 s for 3000 runs


In [40]:
# Run inference on the input image. (async mode)
print(output_layer_onnx)
inference_openvino_async(compiled_model_onnx, data, output_layer_onnx, n_samples)

<ConstOutput: names[495] shape[1,1000] type: f32>
Inference took 2.96 ms in average
Total time: 8.87 s for 3000 runs


In [41]:
# async is needed to get best run performance out from all cores

In [42]:
!benchmark_app -m resnet50.xml -t 30 -api sync

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 72.22 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : f32 / [...] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     495 (node: 495) : f32 / [...] / [1,1000]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : u8 / [N,C,H,W] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     49

In [None]:
# we can see the median average and min for all inferences

In [43]:
!benchmark_app -m resnet50.xml -t 30 -api async

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 83.09 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : f32 / [...] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     495 (node: 495) : f32 / [...] / [1,1000]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     input.1 (node: input.1) : u8 / [N,C,H,W] / [1,3,224,224]
[ INFO ] Model outputs:
[ INFO ]     49

In [None]:
# we can see the median average and min for all inferences-- and async usually have a slightly higher latency