Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] New subtarget features to control ldp and stp formation, fo… #66098

Merged
merged 1 commit into from
Sep 14, 2023

Conversation

manosanaggh
Copy link
Contributor

@manosanaggh manosanaggh commented Sep 12, 2023

…cused on ampere1 and ampere1a.

On some AArch64 cores, including Ampere's ampere1 and ampere1a architectures, load and store pair instructions are faster compared to simple loads/stores only when the alignment of the pair is at least twice that of the individual element being loaded.

Based on that, this patch introduces four new subtarget features, two for controlling ldp and two for controlling stp, to cover the ampere1 and ampere1a alignment needs and to enable optional fine-grained control over ldp and stp generation in general. The latter can be utilized by another cpu, if there are possible benefits
with a different policy than the default provided by the compiler.

More specifically, for each of the ldp and stp respectively we have:

  • disable-ldp/disable-stp: Do not emit ldp/stp.
  • ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source pointer is aligned to at least double the alignment of the type.

Therefore, for -mcpu=ampere1 and -mcpu=ampere1a
ldp-aligned-only/stp-aligned-only become the defaults, because of the benefit from the alignment, whereas for the rest of the cpus the default behaviour of the compiler is maintained.

@manosanaggh
Copy link
Contributor Author

This is a new version, which was requested here: https://reviews.llvm.org/D159480

@manosanaggh
Copy link
Contributor Author

manosanaggh commented Sep 12, 2023

@ptomsich @davemgreen

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving this over. It looks sensible to me.

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Outdated Show resolved Hide resolved
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp Outdated Show resolved Hide resolved
llvm/test/CodeGen/AArch64/disable-ldp-ampere1.ll Outdated Show resolved Hide resolved
@manosanaggh
Copy link
Contributor Author

Thanks for moving this over. It looks sensible to me.

Thank you too for the tips! I am wondering if these x64 regressions in the checks are a thing or false positives, but I am going to have a real quick test tomorrow in a x64 machine.

@manosanaggh manosanaggh force-pushed the ldp-stp-aligned-only-ampere1-series branch from b08b40f to df192f7 Compare September 13, 2023 07:07
@manosanaggh
Copy link
Contributor Author

manosanaggh commented Sep 13, 2023

Thanks for moving this over. It looks sensible to me.

Thank you too for the tips! I am wondering if these x64 regressions in the checks are a thing or false positives, but I am going to have a real quick test tomorrow in a x64 machine.

After looking at that log generated by the ci, I realize I have to switch to debug build on x64 for that. Should be a test-issue.

@manosanaggh manosanaggh force-pushed the ldp-stp-aligned-only-ampere1-series branch from df192f7 to 487a01d Compare September 13, 2023 19:10
@manosanaggh
Copy link
Contributor Author

manosanaggh commented Sep 13, 2023

The regression on my test case, caught by the ci, also happens with the upstream llvm (main branch) and has nothing to do with my changes. I see it is trigerred only for -mcpu=ampere1/ampere1a with the right test-case. Machine instruction Scheduler pass causes it on an assertion enabled with debug build. I should probably file a report on it.

@manosanaggh manosanaggh force-pushed the ldp-stp-aligned-only-ampere1-series branch from 487a01d to f48c624 Compare September 13, 2023 20:12
@manosanaggh
Copy link
Contributor Author

Filed a report for the crash on: #66328

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed that the check lines are still a bit off. If you fix that then this LGTM. Thanks

llvm/test/CodeGen/AArch64/ldp-stp-control-features.ll Outdated Show resolved Hide resolved
…cused on ampere1 and ampere1a.

On some AArch64 cores, including Ampere's ampere1 and ampere1a
architectures, load and store pair instructions are faster compared
to simple loads/stores only when the alignment of the pair is at least
twice that of the individual element being loaded.

Based on that, this patch introduces four new subtarget features,
two for controlling ldp and two for controlling stp, to cover
the ampere1 and ampere1a alignment needs and to enable optional
fine-grained control over ldp and stp generation in general.
The latter can be utilized by another cpu, if there are possible
benefits
with a different policy than the default provided by the compiler.

More specifically, for each of the ldp and stp respectively we have:

- disable-ldp/disable-stp: Do not emit ldp/stp.
- ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source
pointer is aligned to at least double the alignment of the type.

Therefore, for -mcpu=ampere1 and -mcpu=ampere1a
ldp-aligned-only/stp-aligned-only become the defaults because,
of the benefit from the alignment, whereas for the rest
of the cpus the default behaviour of the compiler is maintained.
@manosanaggh manosanaggh force-pushed the ldp-stp-aligned-only-ampere1-series branch from 651ce7c to 2cf4f1d Compare September 14, 2023 12:24
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM

@manosanaggh
Copy link
Contributor Author

Thanks LGTM

Thank you too!

@davemgreen
Copy link
Collaborator

There is hopefully a fix for the issues with the ampere1 model in #66384

@manosanaggh I didn't seem to be able to add you as a reviewer, but it would be good if you could take a look. Thanks

@ptomsich ptomsich merged commit 008f26b into llvm:main Sep 14, 2023
1 of 2 checks passed
@manosanaggh
Copy link
Contributor Author

There is hopefully a fix for the issues with the ampere1 model in #66384

@manosanaggh I didn't seem to be able to add you as a reviewer, but it would be good if you could take a look. Thanks

Thanks, for addressing this quickly. I am going to have a look soon.

Yes, you can't add me as a reviewer because of me not being a reviewer (not having rights in the project).

kstoimenov pushed a commit to kstoimenov/llvm-project that referenced this pull request Sep 14, 2023
…vm#66098)

On some AArch64 cores, including Ampere's ampere1 and ampere1a
architectures, load and store pair instructions are faster compared to
simple loads/stores only when the alignment of the pair is at least
twice that of the individual element being loaded.

Based on that, this patch introduces four new subtarget features, two
for controlling ldp and two for controlling stp, to cover the ampere1
and ampere1a alignment needs and to enable optional fine-grained control
over ldp and stp generation in general. The latter can be utilized by
another cpu, if there are possible benefits
with a different policy than the default provided by the compiler.

More specifically, for each of the ldp and stp respectively we have:

- disable-ldp/disable-stp: Do not emit ldp/stp.
- ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source
pointer is aligned to at least double the alignment of the type.

Therefore, for -mcpu=ampere1 and -mcpu=ampere1a
ldp-aligned-only/stp-aligned-only become the defaults, because of the
benefit from the alignment, whereas for the rest of the cpus the default
behaviour of the compiler is maintained.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…vm#66098)

On some AArch64 cores, including Ampere's ampere1 and ampere1a
architectures, load and store pair instructions are faster compared to
simple loads/stores only when the alignment of the pair is at least
twice that of the individual element being loaded.

Based on that, this patch introduces four new subtarget features, two
for controlling ldp and two for controlling stp, to cover the ampere1
and ampere1a alignment needs and to enable optional fine-grained control
over ldp and stp generation in general. The latter can be utilized by
another cpu, if there are possible benefits
with a different policy than the default provided by the compiler.

More specifically, for each of the ldp and stp respectively we have:

- disable-ldp/disable-stp: Do not emit ldp/stp.
- ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source
pointer is aligned to at least double the alignment of the type.

Therefore, for -mcpu=ampere1 and -mcpu=ampere1a
ldp-aligned-only/stp-aligned-only become the defaults, because of the
benefit from the alignment, whereas for the rest of the cpus the default
behaviour of the compiler is maintained.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants