[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

allnes · 2024-04-18T18:31:47Z

CVS-138509
CVS-137575

dmitry-gorokhov · 2024-05-27T09:01:17Z

@EgorDuplensky Could you please start the review? Thanks!

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/matmul.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_utils.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

EgorDuplensky · 2024-06-19T15:06:09Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

+        aclMemoryInfoMap[ARG_WEI]->set_tensor_shape(temp_weights_shape);
+    }
+
+    tensorsInfoValidateStatus = arm_compute::NEFullyConnectedLayer::validate(


Does not oneDNN use weights packing feature for ACL integration?
https://arm-software.github.io/ComputeLibrary/v23.02.1/classarm__compute_1_1_n_e_fully_connected_layer.xhtml#a19aa329510cbef84acc16335c2099908
Just asking.
Because, if not, later we better to try to use it by ourselves.

Discussed. oneDNN does use has_opt_impl feature (basically weights packing).
So, the oneDNN logic needs to be replicated for ACLFCExecutor to ensure no performance drop.
We can merge the PR with no weights packing support, as soon as all the tests are passed, but completely disable the ACLFCExecutor for now.

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

EgorDuplensky · 2024-06-26T17:22:53Z

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

+            OperationType::FullyConnected,
+            ShapeTolerance::Agnostic,
+            // supports
+            [](const FCConfig& config) -> bool {


let's ensure the tests are passed and disable the executor for now.
There is no rush to enable it and replace the oneDNN one.
We need to make sure we don't have degradations first.

@EgorDuplensky I'll disable it when review will be ended

EgorDuplensky · 2024-06-26T17:24:22Z

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/instances/arm/matmul.cpp

+const std::vector<ShapeRelatedParams> IS = {
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, false}},
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, false}},
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, true}},
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, true}},


We need to actually complete the tests refactoring for FullyConnected node (to add common tests, etc). Let's do it in scope of a follow up PR.

@EgorDuplensky created issue CVS-145273

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.cpp

EgorDuplensky · 2024-07-02T12:26:09Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.cpp

+    return true;
+}
+
+void ACLCommonExecutor::execute(const MemoryArgs &memory) {


I propose to leave a todo regarding the fact, that actually it should be enough to import_memory just once in scope of "update()" method, but it is not working for some reason and should be investigated.

@EgorDuplensky added

EgorDuplensky

The main leftovers are:

Disable the executor for now and merge it disabled
Enable has_opt_impl logic (to enable weights packing), run broad performance validation.
Complete the tests refactoring for FC layer (matmul.cpp)

@maxnick ready for future steps

github-actions bot added the category: CPU OpenVINO CPU plugin label Apr 18, 2024

allnes added platform: arm OpenVINO on ARM / ARM64 no_stale Do not mark as stale labels Apr 23, 2024

allnes marked this pull request as ready for review May 13, 2024 13:33

allnes requested review from a team as code owners May 13, 2024 13:33

allnes requested review from alvoron and EgorDuplensky May 13, 2024 13:33

allnes assigned EgorDuplensky May 13, 2024

allnes force-pushed the an/fc_acl_executor branch from db355f7 to 0482f4a Compare May 13, 2024 13:36

allnes assigned alvoron May 13, 2024

allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from f946770 to 9004e85 Compare May 14, 2024 14:38

dmitry-gorokhov added this to the 2024.3 milestone May 22, 2024

allnes force-pushed the an/fc_acl_executor branch from 7d3ba52 to 4f4e832 Compare May 24, 2024 17:47

eshoguli reviewed Jun 5, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp Outdated Show resolved Hide resolved

allnes requested a review from eshoguli June 17, 2024 17:58

alvoron reviewed Jun 18, 2024

View reviewed changes

allnes requested a review from alvoron June 19, 2024 08:40

EgorDuplensky reviewed Jun 19, 2024

View reviewed changes

allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 18b2f19 to 3a13983 Compare June 21, 2024 19:41

eshoguli reviewed Jun 24, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp Outdated Show resolved Hide resolved

eshoguli reviewed Jun 24, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp Outdated Show resolved Hide resolved

allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 0ba1be0 to e0b96ec Compare June 25, 2024 14:13

EgorDuplensky reviewed Jun 26, 2024

View reviewed changes

allnes requested a review from maxnick June 26, 2024 20:07

allnes requested review from EgorDuplensky and eshoguli June 26, 2024 20:07

EgorDuplensky reviewed Jun 28, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.cpp Outdated Show resolved Hide resolved

allnes requested a review from EgorDuplensky July 1, 2024 14:48

EgorDuplensky approved these changes Jul 2, 2024

View reviewed changes

EgorDuplensky reviewed Jul 2, 2024

View reviewed changes

Add FullyConnected ACL executor

687468d

allnes force-pushed the an/fc_acl_executor branch from 2d07742 to 687468d Compare July 2, 2024 14:50

allnes assigned maxnick and unassigned alvoron and EgorDuplensky Jul 2, 2024

allnes and others added 8 commits July 2, 2024 21:01

Merge branch 'master' into an/fc_acl_executor

efeb462

Update acl_common_executor.cpp

2e8c191

Merge branch 'master' into an/fc_acl_executor

363d5a0

Merge branch 'master' into an/fc_acl_executor

8c44250

Merge branch 'master' into an/fc_acl_executor

74f3d05

Update fullyconnected_implementations.cpp

3dbc2f5

Merge branch 'master' into an/fc_acl_executor

0977618

update tests filter

6433826

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

allnes commented Apr 18, 2024 •

edited

Loading

dmitry-gorokhov commented May 27, 2024

EgorDuplensky Jun 19, 2024

EgorDuplensky Jun 28, 2024

EgorDuplensky Jun 26, 2024 •

edited

Loading

allnes Jun 26, 2024

EgorDuplensky Jun 26, 2024

allnes Jun 26, 2024

EgorDuplensky Jul 2, 2024

allnes Jul 2, 2024

EgorDuplensky left a comment •

edited

Loading

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

Are you sure you want to change the base?

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

Conversation

allnes commented Apr 18, 2024 • edited Loading

dmitry-gorokhov commented May 27, 2024

EgorDuplensky Jun 19, 2024

Choose a reason for hiding this comment

EgorDuplensky Jun 28, 2024

Choose a reason for hiding this comment

EgorDuplensky Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

allnes Jun 26, 2024

Choose a reason for hiding this comment

EgorDuplensky Jun 26, 2024

Choose a reason for hiding this comment

allnes Jun 26, 2024

Choose a reason for hiding this comment

EgorDuplensky Jul 2, 2024

Choose a reason for hiding this comment

allnes Jul 2, 2024

Choose a reason for hiding this comment

EgorDuplensky left a comment • edited Loading

Choose a reason for hiding this comment

allnes commented Apr 18, 2024 •

edited

Loading

EgorDuplensky Jun 26, 2024 •

edited

Loading

EgorDuplensky left a comment •

edited

Loading