forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 353
Cherrypick aarch64 zcm zcz optimizations #11793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tomershafir
merged 12 commits into
stable/21.x
from
cherrypick-aarch64-zcm-zcz-optimizations
Nov 13, 2025
Merged
Cherrypick aarch64 zcm zcz optimizations #11793
tomershafir
merged 12 commits into
stable/21.x
from
cherrypick-aarch64-zcm-zcz-optimizations
Nov 13, 2025
+893
−644
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`fmov dX, dY` is not a preferred instruction. Previously introduced by: llvm#144152 (cherry-pick ffddf33)
`MI->getOperand(1).getImm()` has already been verified to be 0 entering the block. (cherry-pick 769d5c2)
[AArch64] Lower FPR register moves to zero cycle NEON Lower FPR64, FPR32, FPR16, FPR8 register moves into NEON moves if the target supports zero cycle move for NEON but not for the narrower classes. Adds a subtarget feature called FeatureZCRegMoveFPR128 that enables to query wether the target supports zero cycle reg move for FPR128 NEON registers, and embeds it into the appropriate processors. Includes lowering test cases, and specializes check prefixes. (cherry-pick 7f9d72a)
Fix incorrect super-register lookup when copying from $wzr on subtargets
that lack zero-cycle zeroing but support 64-bit zero-cycle moves.
When copying from $wzr, we used the wrong register class to lookup the
super-register, causing
$w0 = COPY $wzr
to get expanded as
$x0 = ORRXrr $xzr, undef $noreg, implicit $wzr,
rather than the correct
$x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.
(cherry-pick b51486f88a7f90719a1e744c828738e717ba5ffc)
This change improves LLVM's model accuracy by splitting AArch64 subtarget features of zero cycle zeroing per register class. This aligns with how uarch is designed (each register bank has unique capabilities). Similarly to how we improved ZCM modeling. It splits `HasZeroCycleZeroingGP` to `HasZeroCycleZeroingGPR32` and `HasZeroCycleZeroingGPR64`, removes opaque `FeatureZCZeroing`, and infers `FeatureNoZCZeroingFP` to be `FeatureNoZCZeroingFPR64` based on the single usage in `AArch64AsmPrinter.cpp`. It also splits `arm64-zero-cycle-zeroing.ll` into 2 tests one `-gpr` and one `-fpr`, similarly to ZCM, to make the tests more focused and managable in correspondance with the new modeling. The test cases are updated as well, exlpoiting the fact that this is a refactor patch: - remove redundant functions that just mix isolated ones (t1-4) - specialize check prefixes - replace `apple-a10` with `apple-m1` - add a `-mtriple=arm64-apple-macosx -mcpu=generic` test case for GPR - isolate `mtriple=arm64-apple-ios -mcpu=cyclone` FP workaround test case and move `-fullfp16` to another non-workaround test case (cherry-pick c3c24be)
Lower FPR64, FPR32, FPR16 from `fmov` zeroing into NEON zeroing if the target supports zero cycle zeroing of NEON registers but not for the narrower classes. It handles 2 cases: one in `AsmPrinter` where a FP zeroing from immediate has been captured by pattern matching on instruction selection, and second post RA in `AArch64InstrInfo::copyPhysReg` for uncaptured/later-generated WZR/XZR fmovs. Adds a subtarget feature called FeatureZCZeroingFPR128 that enables to query wether the target supports zero cycle zeroing for FPR128 NEON registers, and updates the appropriate processors. (cherry-pick f059d2b)
…PhysReg (llvm#162826) This patch uses the RI member variable directly in the member function AArch64InstrInfo::copyPhysReg, instead of redundant calls to the public API. (cherry-pick 6345222)
This patch pivots GPR32 and GPR64 zeroing into distinct branches to simplify the code an improve the lowering. Zeroing GPR moves are now handled differently than non-zeroing ones. Zero source registers WZR and XZR do not require register annotations of undef, implicit and kill. The non-zeroing source now cannot process WZR removing the ternary expression. This patch also moves GPR64 logic right after GPR32 for better organization. (cherry-pick 5ac616f)
Given a GPR32 zeroing instruction, if the target supports zero cycle zeroing for GPR64 but not for GPR32, widen the instruction to 64 bit `$xn = MOVZXi 0, 0` instead of writing to `$wn` to exploit zero cycle zeroing. It also aligns naming in the generic zeroing test. (cherry-pick f7585ad)
Author
|
Closes rdar://164394265 |
jcohen-apple
approved these changes
Nov 11, 2025
Author
|
@swift-ci please test |
recommend putting this in the main comment for the PR so it ends up in the commit message of the merge commit. Also, I'd leave off the "Closes " part so |
Author
|
Done |
jroelofs
approved these changes
Nov 11, 2025
|
@swift-ci please test |
|
@swift-ci please test llvm |
Author
|
@swift-ci please test llvm |
Author
|
@swift-ci please test |
1 similar comment
Author
|
@swift-ci please test |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
rdar://164394265