You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our group wants to profile BNNs on some ARM64 processors. We tried running ./lce_benchmark_model_aarch64 with the --enable_op_profiling set to true. The result looks like this (for QuickNet), with many floating-point operators grouped together as TfLiteXNNPackDelegate in the report. Is there a way to get the inference time for more specific operators like ADD, MUL, CONV_2D, MAX_POOL_2D, etc? Thank you!
The text was updated successfully, but these errors were encountered:
At the moment I do not think this is possible, because it is the way XNNPACK behaves. However, I just saw this blog post, which indicates that XNNPACK detailed profiling is available in the next release of TensorFlow. Since the Larq Compute Engine is still based on TF 2.9.0, this is not yet supported. We plan to update LCE to TF 2.10.0 within the coming weeks, so then this should be included as well.
The alternative is to disable XNNPACK (see the command-line options for lce_benchmark_model_aarch64 ), but of course that will give different profiling results. However, it might give you a rough indication of how much time is spent in each layer.
Our group wants to profile BNNs on some ARM64 processors. We tried running
./lce_benchmark_model_aarch64
with the--enable_op_profiling
set totrue
. The result looks like this (for QuickNet), with many floating-point operators grouped together asTfLiteXNNPackDelegate
in the report. Is there a way to get the inference time for more specific operators like ADD, MUL, CONV_2D, MAX_POOL_2D, etc? Thank you!The text was updated successfully, but these errors were encountered: