-
Notifications
You must be signed in to change notification settings - Fork 989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AArch64: Add fixed format support for ACL primtives #1590
Conversation
Any feedback would be appreciated on this. Also, would it still be possible for a backport to v3.1? |
src/common/reorder.cpp
Outdated
if (!s_mdw.consistent_with(d_mdw)) return invalid_arguments; | ||
// If src and dst are not already consistent, try to reshape them | ||
memory_desc_t reshaped_src_md = *src_md; | ||
auto reshaped_s_mdw = memory_desc_wrapper(&reshaped_src_md); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change applies to the whole library and changes API behavior.
If you want to have such feature we will need an RFC to discuss available options.
As an alternative option you could uncollapse weights md internally in the implementation before returning to a user (or keep both collapsed and uncollapsed versions for internal use and API calls). This way reorders will work as expected and the change will be isolated to this specific inner product implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, I'll have a think and create an RFC if it still makes sense.
Would you be able to expand on what you mean by this
As an alternative option you could uncollapse weights md internally in the implementation before returning to a user
I think we need to pass the collapsed memory descriptor back to the user so that they can do the right reorder. The collapsed reorder is a different to the non-collapsed one. It's also beneficial, for example If we have a tensor with H=2, W=2 and IC=3 and we need to pad to 4, then
- 2x2x3 -pad-> 2x2x4 -collapse-> 16
- 2x2x3 -collapse-> 12 -pad-> 12
There is not much time to get it into v3.1 (Production release March 23, 2023 (ww12.4'23)), we are at the end of a validation cycle.
|
Understood, thank you for the feedback @igorsafo! |
5385481
to
166dd36
Compare
I should say that all the tests pass now on AArch64 |
Remove ACL winograd convolution because it is not compatible with upcoming fixed format kernel changes, is broken and does not have a clear use cases not have a clear use case
- Use the fixed format interface for ACL, where we first ask ACL what format it wants the weights in, and then reorder the weights in oneDNN. This has the benefit of us being able to hoist the reorders out at the framework level. It also has the added benefit of allowing the weights to be modified between executions, and a necessary step towards making the oneDNN-ACL interface stateless. - Note that this patch causes some fastmath inner products and fastmath NCHW convolutions to fall back to non-fastmath kernels Co-authored-by: Milos Puzovic <Milos.Puzovic@arm.com>
bcd9e71
to
02a9308
Compare
THanks for the contribution! |
Description
This PR makes the Compute Library for Arm® (ACL) primitives which use GEMM (inner product, matmul, conv) make use of fixed format kernels. By this we mean that the weights are reordered to be blocked/interleaved in oneDNN rather than ACL, so the kernel used needs to a have a known and fixed format.
This has multiple positives:
This has a negligible effect on performance overall, because the kernels are similar and the reorders are just happening in a different place.
The broad idea is:
WeightsFormat
it wants the weights in, via thehas_opt_impl
functionWeightsFormat
into a memory descriptorThings to note:
cpu-cnn-inference-f32-cpp
andcpu-primitives-inner-product-cpp
to fail because inner product now implicitly collapses dimensions. This behaviour is supported by benchdnn, but not these two examples. Would it make sense to modifyreorder_primitive_desc_create
andconsistent_with
to allow collapsing dimensions?Checklist
General
make test
andmake test_benchdnn_*
) pass locally for each commit? (see comment in body)