-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123
base: master
Are you sure you want to change the base?
Conversation
db355f7
to
0482f4a
Compare
f946770
to
9004e85
Compare
7d3ba52
to
4f4e832
Compare
@EgorDuplensky Could you please start the review? Thanks! |
src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/matmul.cpp
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Outdated
Show resolved
Hide resolved
aclMemoryInfoMap[ARG_WEI]->set_tensor_shape(temp_weights_shape); | ||
} | ||
|
||
tensorsInfoValidateStatus = arm_compute::NEFullyConnectedLayer::validate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not oneDNN use weights packing feature for ACL integration?
https://arm-software.github.io/ComputeLibrary/v23.02.1/classarm__compute_1_1_n_e_fully_connected_layer.xhtml#a19aa329510cbef84acc16335c2099908
Just asking.
Because, if not, later we better to try to use it by ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed. oneDNN does use has_opt_impl feature (basically weights packing).
So, the oneDNN logic needs to be replicated for ACLFCExecutor to ensure no performance drop.
We can merge the PR with no weights packing support, as soon as all the tests are passed, but completely disable the ACLFCExecutor for now.
18b2f19
to
3a13983
Compare
0ba1be0
to
e0b96ec
Compare
src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
OperationType::FullyConnected, | ||
ShapeTolerance::Agnostic, | ||
// supports | ||
[](const FCConfig& config) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's ensure the tests are passed and disable the executor for now.
There is no rush to enable it and replace the oneDNN one.
We need to make sure we don't have degradations first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorDuplensky I'll disable it when review will be ended
const std::vector<ShapeRelatedParams> IS = { | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, false}}, | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, false}}, | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, true}}, | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, true}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to actually complete the tests refactoring for FullyConnected node (to add common tests, etc). Let's do it in scope of a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorDuplensky created issue CVS-145273
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.cpp
Outdated
Show resolved
Hide resolved
return true; | ||
} | ||
|
||
void ACLCommonExecutor::execute(const MemoryArgs &memory) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to leave a todo regarding the fact, that actually it should be enough to import_memory just once in scope of "update()" method, but it is not working for some reason and should be investigated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorDuplensky added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main leftovers are:
- Disable the executor for now and merge it disabled
- Enable has_opt_impl logic (to enable weights packing), run broad performance validation.
- Complete the tests refactoring for FC layer (matmul.cpp)
@maxnick ready for future steps
2d07742
to
687468d
Compare
CVS-138509
CVS-137575