Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AArch64: Add fixed format support for ACL primtives #1590

Merged
merged 4 commits into from
Apr 11, 2023

Conversation

jondea
Copy link
Contributor

@jondea jondea commented Mar 13, 2023

Description

This PR makes the Compute Library for Arm® (ACL) primitives which use GEMM (inner product, matmul, conv) make use of fixed format kernels. By this we mean that the weights are reordered to be blocked/interleaved in oneDNN rather than ACL, so the kernel used needs to a have a known and fixed format.
This has multiple positives:

  • It fits better with the oneDNN philosophy of separating the primitive and reorder
  • Weights can be variable, because the reorder is performed at the oneDNN/framework level
  • And therefore, most importantly: primitives can be properly cached in TensorFlow without checking the content of the weights (which is very hacky)

This has a negligible effect on performance overall, because the kernels are similar and the reorders are just happening in a different place.

The broad idea is:

  • Ask ACL what WeightsFormat it wants the weights in, via the has_opt_impl function
  • Convert the resulting WeightsFormat into a memory descriptor
  • Get oneDNN to reorder by modifying the weights memory descriptor
  • Set the strides on the weights TensorInfo to reflect the reordered layout

Things to note:

  • This patch causes the examples cpu-cnn-inference-f32-cpp and cpu-primitives-inner-product-cpp to fail because inner product now implicitly collapses dimensions. This behaviour is supported by benchdnn, but not these two examples. Would it make sense to modify reorder_primitive_desc_create and consistent_with to allow collapsing dimensions?
  • We have also removed wino because it is not yet compatible with this approach, and it is also failing for some unrelated reasons. Due to the existing heuristics we do not see it used in any models currently, but we may return to this later with a working implementation and more effective heuristics.
  • This patch causes some NCHW fastmath convolutions to no longer be supported, we are working on a fix, but these are rarely used.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit? (see comment in body)
  • Have you formatted the code using clang-format?

@jondea
Copy link
Contributor Author

jondea commented Mar 15, 2023

Any feedback would be appreciated on this. Also, would it still be possible for a backport to v3.1?

src/cpu/aarch64/acl_inner_product.hpp Show resolved Hide resolved
if (!s_mdw.consistent_with(d_mdw)) return invalid_arguments;
// If src and dst are not already consistent, try to reshape them
memory_desc_t reshaped_src_md = *src_md;
auto reshaped_s_mdw = memory_desc_wrapper(&reshaped_src_md);
Copy link
Contributor

@igorsafo igorsafo Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change applies to the whole library and changes API behavior.
If you want to have such feature we will need an RFC to discuss available options.

As an alternative option you could uncollapse weights md internally in the implementation before returning to a user (or keep both collapsed and uncollapsed versions for internal use and API calls). This way reorders will work as expected and the change will be isolated to this specific inner product implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, I'll have a think and create an RFC if it still makes sense.

Would you be able to expand on what you mean by this

As an alternative option you could uncollapse weights md internally in the implementation before returning to a user

I think we need to pass the collapsed memory descriptor back to the user so that they can do the right reorder. The collapsed reorder is a different to the non-collapsed one. It's also beneficial, for example If we have a tensor with H=2, W=2 and IC=3 and we need to pad to 4, then

  • 2x2x3 -pad-> 2x2x4 -collapse-> 16
  • 2x2x3 -collapse-> 12 -pad-> 12

src/cpu/aarch64/acl_convolution_utils.cpp Show resolved Hide resolved
@igorsafo
Copy link
Contributor

Any feedback would be appreciated on this. Also, would it still be possible for a backport to v3.1?

There is not much time to get it into v3.1 (Production release March 23, 2023 (ww12.4'23)), we are at the end of a validation cycle.
For this to happen we need to make sure:

  • all tests pass on your side
  • the changes are isolated to Aarch64 implementation

@jondea
Copy link
Contributor Author

jondea commented Mar 20, 2023

Understood, thank you for the feedback @igorsafo!

@jondea
Copy link
Contributor Author

jondea commented Mar 29, 2023

I should say that all the tests pass now on AArch64

jondea and others added 3 commits March 30, 2023 10:22
Remove ACL winograd convolution because it is not compatible with
upcoming fixed format kernel changes, is broken and does not have a
clear use cases not have a clear use case
- Use the fixed format interface for ACL, where we first ask ACL what
  format it wants the weights in, and then reorder the weights in
  oneDNN. This has the benefit of us being able to hoist the reorders
  out at the framework level. It also has the added benefit of allowing
  the weights to be modified between executions, and a necessary step
  towards making the oneDNN-ACL interface stateless.
- Note that this patch causes some fastmath inner products and fastmath
  NCHW convolutions to fall back to non-fastmath kernels

Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
@mgouicem mgouicem merged commit f62a91b into oneapi-src:master Apr 11, 2023
@mgouicem
Copy link
Contributor

THanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants