We recommend to use a test tool named "pplnn" to benchmark x86 architecture.
This chapter only introduces the method of using pplnn to benchmark x86 architecture. If you want to benchmark cuda architecture, please refer to: Cuda Benchmark Tool.
For compilation method of pplnn, please refer to: building-from-source.md.
X86 architecture uses openmp as the thread pool, so if you need to test multi-thread performance, please compile with -DHPCC_USE_OPENMP=ON
option as below:
./build.sh -DHPCC_USE_OPENMP=ON
pplnn will be generated to: ./pplnn-build/tools/pplnn
pplnn can either generate random data, or read data from outside as network input:
- For networks such as classification/semantic segmentation, the execution speed of the network has nothing to do with the input data value, so you can generate random data for benchmark.
- For networks such as detection/instance segmentation, the execution speed of the network may be related to the input data value, so it is recommended to read in real data externally.
When pplnn reads data externally, it is recommended to use --reshaped-inputs
option to specify external test data.
Under this option, pplnn requires the test data file to be in binary format (you can use numpy's tofile
function to store data in binary format). Each input tensor of the model needs to store a separate test data file. The test datafile naming method is:
<tensor_name>-<input_shape>-<data_type>.dat
- <tensor_name>: corresponded to the name of the input tensor in the onnx model, such as: "input"
- <input_shape>: shape of the model input tensor, with '_' as the separator, such as: 1_3_224_224
- <data_type>: data type of the test file, now supports fp64|fp32|fp16|int32|int64|bool
For example, if the name of the input tensor is "input" in the onnx model, the shape is (1,3,224,224), and the data type is float32, then the test data file name should be:
input-1_3_224_224-fp32.dat
pplnn's run options related to the x86 architecture benchmark are:
--onnx-model
: Specify the tested onnx model file--in-shapes
: Specify the input tensor shape--mm-policy
: Memory management strategy, "mem" means less memory usage, and "perf" means more radical memory optimization. Default is mem--enable-profiling
: Enable profiling. Default is false--min-profiling-time
: Specify the minimum time duration of benchmark in seconds. Default is 1s--warmuptimes
: Specify the warm up times. Default is 0--disable-avx512
: Disable avx512 instruction set. Default is false--core-binding
: Enable core binding. Default is false.
When the compilation specifies -DHPCC_USE_OPENMP=ON
, the environment variable OMP_NUM_THREADS
can be used to specify the number of threads:
export OMP_NUM_THREADS=8 # use 8 threads
Here is an example to use random test data for benchmark:
./pplnn --onnx-model <onnx_model> \ # specify onnx model
--mm-policy mem \ # use "mem" memory management policy
--enable-profiling \ # enable profiling
--min-profiling-time 10 \ # benchmark lasts at least 10s
--warmuptimes 5 \ # warm up 5 times
--core-binding \ # enable core binding
--disable-avx512 # disable avx512 instruction set
pplnn will automatically generate random test data by the input tensor shape of the model.
The external test data format requirements are described in section 2.1.
You can use the following command for benchmark:
./pplnn --onnx-model <onnx_model> \ # specify onnx model
--reshaped-inputs input-1_3_224_224-fp32.dat \ # specify input test data file
--mm-policy mem \ # use "mem" memory management policy
--enable-profiling \ # enable profiling
--min-profiling-time 10 \ # benchmark lasts at least 10s
--warmuptimes 5 # warm up 5 times
When there are multiple inputs, --reshaped-inputs
is separated by commas ','.