Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

allnes
Copy link
Contributor

@allnes allnes commented Apr 18, 2024

@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Apr 18, 2024
@allnes allnes added platform: arm OpenVINO on ARM / ARM64 no_stale Do not mark as stale labels Apr 23, 2024
@allnes allnes marked this pull request as ready for review May 13, 2024 13:33
@allnes allnes requested review from a team as code owners May 13, 2024 13:33
@allnes allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from f946770 to 9004e85 Compare May 14, 2024 14:38
@dmitry-gorokhov dmitry-gorokhov added this to the 2024.3 milestone May 22, 2024
@dmitry-gorokhov
Copy link
Contributor

@EgorDuplensky Could you please start the review? Thanks!

@allnes allnes requested a review from eshoguli June 17, 2024 17:58
@allnes allnes requested a review from alvoron June 19, 2024 08:40
aclMemoryInfoMap[ARG_WEI]->set_tensor_shape(temp_weights_shape);
}

tensorsInfoValidateStatus = arm_compute::NEFullyConnectedLayer::validate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not oneDNN use weights packing feature for ACL integration?
https://arm-software.github.io/ComputeLibrary/v23.02.1/classarm__compute_1_1_n_e_fully_connected_layer.xhtml#a19aa329510cbef84acc16335c2099908
Just asking.
Because, if not, later we better to try to use it by ourselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed. oneDNN does use has_opt_impl feature (basically weights packing).
So, the oneDNN logic needs to be replicated for ACLFCExecutor to ensure no performance drop.
We can merge the PR with no weights packing support, as soon as all the tests are passed, but completely disable the ACLFCExecutor for now.

@allnes allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 18b2f19 to 3a13983 Compare June 21, 2024 19:41
@allnes allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 0ba1be0 to e0b96ec Compare June 25, 2024 14:13
OperationType::FullyConnected,
ShapeTolerance::Agnostic,
// supports
[](const FCConfig& config) -> bool {
Copy link
Contributor

@EgorDuplensky EgorDuplensky Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's ensure the tests are passed and disable the executor for now.
There is no rush to enable it and replace the oneDNN one.
We need to make sure we don't have degradations first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorDuplensky I'll disable it when review will be ended

Comment on lines +95 to +99
const std::vector<ShapeRelatedParams> IS = {
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, false}},
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, false}},
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, true}},
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, true}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to actually complete the tests refactoring for FullyConnected node (to add common tests, etc). Let's do it in scope of a follow up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorDuplensky created issue CVS-145273

@allnes allnes requested a review from maxnick June 26, 2024 20:07
@allnes allnes requested a review from EgorDuplensky July 1, 2024 14:48
return true;
}

void ACLCommonExecutor::execute(const MemoryArgs &memory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to leave a todo regarding the fact, that actually it should be enough to import_memory just once in scope of "update()" method, but it is not working for some reason and should be investigated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@EgorDuplensky EgorDuplensky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main leftovers are:

  1. Disable the executor for now and merge it disabled
  2. Enable has_opt_impl logic (to enable weights packing), run broad performance validation.
  3. Complete the tests refactoring for FC layer (matmul.cpp)

@maxnick ready for future steps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin no_stale Do not mark as stale platform: arm OpenVINO on ARM / ARM64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants