rfc: prototype integration with ArmCL #795

diaena · 2020-07-31T15:41:06Z

The goal for this RFC is to demonstrate an approach for providing optimized implementations of machine learning operators in oneDNN, on AArch64, based on the open-source Arm Compute Library.

Rendered document

Others involved: @nSircombe @cfRod @alenik01

vpirogov

Thank you for detailed RFC, @diaena. I added a few questions regarding documentation, validation and third party components.

vpirogov · 2020-08-03T16:45:49Z

rfcs/20200731-Using-Arm-Compute-Library-kernels-on-AArch64/README.md

+- Step 1: introduce CMake changes to allow inclusion of ArmCL as a dependency;
+- Step 2: changes to `src/cpu/platform.hpp` to support new AARCH64 macros;
+- Step 3: addition of ArmCL based implementation into `src/cpu/aarch64` and
+integration into `src/cpu/cpu_convolution_list.cpp`.


Please add changes to documentation (README.md and doc/*) to cover new dependencies and build options.

Yes, no problem. We will include a docs update in the implementation plan.

rfcs/20200731-Using-Arm-Compute-Library-kernels-on-AArch64/README.md

emfomenk · 2020-08-03T17:32:06Z

Hijacked @kawakami-k's comment from a neighbor thread:

Fujitsu plans to upstream JIT kernels for aarch64 to oneDNN v1.6.
In the next few days, I will be putting out an RFC regarding the implementation of AArch64.

I expected there will be some intersection between the RFCs (e.g. the directory structure, maybe some compiler knobs, etc). So, @kawakami-k, please review this RFC too.

igorsafo

Thank you for very detailed RFC. Taking into account @vpirogov remarks the RFC LGTM.

kawakami-k · 2020-08-06T06:33:21Z

I don't see any part of this RFC #795 that would cause inconsistency with Fujitsu's RFC, which will be posted soon.
Thank you.

emfomenk

Thank you for the very informative RFC and reference to the PoC implementation!

The proposal looks good to me. There is a minor issue with the implementation you described below, specifically keeping the modifiable object (acl_data_) in immutable primitive descriptor or primitive objects (what we've already discussed offline). But since this is an implementation detail, I think this doesn't affect the RFC per se.

Since version 1.6, oneDNN has provided limited support for AArch64 builds. This minor change is to detect an AArch64 CPU and permit the use of `USE_MKLDNN` in that case. Build flags for oneDNN are also modified accordingly. Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels. These are not optimised for AArch64, but oneDNN v1.7 onwards provides support for a limited set of primitives based Arm Compute Library. See: oneapi-src/oneDNN#795 and: oneapi-src/oneDNN#820 for more details. Support for ACL-based oneDNN primitives in PyTorch will require some further modification,

Summary: Since version 1.6, oneDNN has provided limited support for AArch64 builds. This minor change is to detect an AArch64 CPU and permit the use of `USE_MKLDNN` in that case. Build flags for oneDNN are also modified accordingly. Note: oneDNN on AArch64, by default, will use oneDNN's reference C++ kernels. These are not optimised for AArch64, but oneDNN v1.7 onwards provides support for a limited set of primitives based Arm Compute Library. See: oneapi-src/oneDNN#795 and: oneapi-src/oneDNN#820 for more details. Support for ACL-based oneDNN primitives in PyTorch will require some further modification, Fixes #{issue number} Pull Request resolved: #50400 Reviewed By: izdeby Differential Revision: D25886589 Pulled By: malfet fbshipit-source-id: 2c81277a28ad4528c2d2211381e7c6692d952bc1

This PR adds a PReLU primitive which makes use of Compute Library for the Arm® architecture (ACL), optimised for AArch64 targets. The datatype support is just for f32 and can only be used in forward mode. Implementation follows a similar approach to oneapi-src#1281, including measures to avoid the parallelisation of small workloads. It builds upon the approach originally introduced in the PR oneapi-src#820 and RFC oneapi-src#795. The primitive offers a minimum of ~ x10 speedup for large tensors (~1million f32s) and is comparable for very small tensors. Co-authored-by: Louis Kaplan <louis.kaplan@arm.com> - [X] Do all unit and benchdnn tests (`make test` and `make test_benchdnn_*`) pass locally for each commit? - [X] Have you formatted the code using clang-format? - [X] Have you submitted performance data that demonstrates performance improvements?

This PR adds a PReLU primitive which makes use of Compute Library for the Arm® architecture (ACL), optimised for AArch64 targets. The datatype support is just for f32 and can only be used in forward mode. Implementation follows a similar approach to #1281, including measures to avoid the parallelisation of small workloads. It builds upon the approach originally introduced in the PR #820 and RFC #795. The primitive offers a minimum of ~ x10 speedup for large tensors (~1million f32s) and is comparable for very small tensors. Co-authored-by: Louis Kaplan <louis.kaplan@arm.com> - [X] Do all unit and benchdnn tests (`make test` and `make test_benchdnn_*`) pass locally for each commit? - [X] Have you formatted the code using clang-format? - [X] Have you submitted performance data that demonstrates performance improvements?

emfomenk added the RFC A design document label Jul 31, 2020

rfc: prototype integration with ArmCL

1aded51

vpirogov reviewed Aug 3, 2020

View reviewed changes

igorsafo approved these changes Aug 4, 2020

View reviewed changes

emfomenk mentioned this pull request Aug 12, 2020

JIT code generation for AArch64 #804

Closed

emfomenk approved these changes Aug 17, 2020

View reviewed changes

emfomenk merged commit a1d0f81 into oneapi-src:rfcs Aug 17, 2020

diaena mentioned this pull request Aug 24, 2020

Aarch64 GEMM convolution by integration with Arm Compute Library #820

Merged

4 tasks

alenik01 mentioned this pull request Nov 12, 2020

Adding Arm Compute Library-based Winograd convolution #886

Merged

5 tasks

diaena deleted the arm-rfc branch November 13, 2020 16:19

diaena mentioned this pull request Nov 13, 2020

src: cpu: aarch64: Add s8s8s8 support for ACL-based GEMM Convolution #889

Merged

2 tasks

nSircombe mentioned this pull request Jan 11, 2021

Enables build with oneDNN (MKL-DNN) on AArch64 pytorch/pytorch#50400

Closed

joeramsay mentioned this pull request Feb 4, 2021

cpu: aarch64: Add support for ACL-based indirect convolution #973

Merged

3 tasks

alenik01 mentioned this pull request Feb 12, 2021

src: cpu: aarch64: add sum+relu post-ops support on AArch64 #979

Merged

3 tasks

alenik01 mentioned this pull request Jun 24, 2021

src: cpu: aarch64: add support for ACL-based inner product #1103

Merged

4 tasks

This was referenced Jul 29, 2021

src: cpu: aarch64: add support for 'sum+act' post-ops #1127

Merged

src: cpu: aarch64: add support for standalone activations #1131

Merged

This was referenced Sep 20, 2021

src: cpu: aarch64: matmul: add support for matmul on aarch64 #1158

Merged

[rls-v2.4] src: cpu: aarch64: matmul: add support for matmul on aarch64 #1161

Merged

jondea mentioned this pull request Oct 15, 2021

Softmax acl #1175

Merged

6 tasks

jondea mentioned this pull request Feb 17, 2022

cpu: aarch64: add ACL binary primitive #1281

Merged

3 tasks

lb991 mentioned this pull request Apr 26, 2022

cpu: aarch64: add ACL PReLU primitive #1357

Merged

3 tasks

nSircombe mentioned this pull request Mar 5, 2023

Error when build oneDNN with Arm Compute Library 19.08 #1575

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: prototype integration with ArmCL #795

rfc: prototype integration with ArmCL #795

diaena commented Jul 31, 2020 •

edited

Loading

vpirogov left a comment

vpirogov Aug 3, 2020

nSircombe Aug 5, 2020

emfomenk commented Aug 3, 2020

igorsafo left a comment

kawakami-k commented Aug 6, 2020

emfomenk left a comment •

edited

Loading

rfc: prototype integration with ArmCL #795

rfc: prototype integration with ArmCL #795

Conversation

diaena commented Jul 31, 2020 • edited Loading

vpirogov left a comment

Choose a reason for hiding this comment

vpirogov Aug 3, 2020

Choose a reason for hiding this comment

nSircombe Aug 5, 2020

Choose a reason for hiding this comment

emfomenk commented Aug 3, 2020

igorsafo left a comment

Choose a reason for hiding this comment

kawakami-k commented Aug 6, 2020

emfomenk left a comment • edited Loading

Choose a reason for hiding this comment

diaena commented Jul 31, 2020 •

edited

Loading

emfomenk left a comment •

edited

Loading