Skip to content

Conversation

@tomershafir
Copy link

@tomershafir tomershafir commented Nov 11, 2025

rdar://164394265

tomershafir and others added 11 commits November 11, 2025 10:11
`fmov dX, dY` is not a preferred instruction.

Previously introduced by:
llvm#144152

(cherry-pick ffddf33)
`MI->getOperand(1).getImm()` has already been verified to be 0 entering
the block.

(cherry-pick 769d5c2)
[AArch64] Lower FPR register moves to zero cycle NEON

Lower FPR64, FPR32, FPR16, FPR8 register moves into NEON moves if the
target supports zero cycle move for NEON but not for the narrower
classes.

Adds a subtarget feature called FeatureZCRegMoveFPR128 that enables to
query wether the target supports zero cycle reg move for FPR128 NEON
registers, and embeds it into the appropriate processors.

Includes lowering test cases, and specializes check prefixes.

(cherry-pick 7f9d72a)
Fix incorrect super-register lookup when copying from $wzr on subtargets
that lack zero-cycle zeroing but support 64-bit zero-cycle moves.

When copying from $wzr, we used the wrong register class to lookup the
super-register, causing

    $w0 = COPY $wzr

to get expanded as

    $x0 = ORRXrr $xzr, undef $noreg, implicit $wzr,

rather than the correct

    $x0 = ORRXrr $xzr, undef $xzr, implicit $wzr.

(cherry-pick b51486f88a7f90719a1e744c828738e717ba5ffc)
This change improves LLVM's model accuracy by splitting AArch64
subtarget features of zero cycle zeroing per register class. This aligns
with how uarch is designed (each register bank has unique capabilities).
Similarly to how we improved ZCM modeling.

It splits `HasZeroCycleZeroingGP` to `HasZeroCycleZeroingGPR32` and
`HasZeroCycleZeroingGPR64`, removes opaque `FeatureZCZeroing`, and
infers `FeatureNoZCZeroingFP` to be `FeatureNoZCZeroingFPR64` based on
the single usage in `AArch64AsmPrinter.cpp`.

It also splits `arm64-zero-cycle-zeroing.ll` into 2 tests one `-gpr` and
one `-fpr`, similarly to ZCM, to make the tests more focused and
managable in correspondance with the new modeling.

The test cases are updated as well, exlpoiting the fact that this is a
refactor patch:

- remove redundant functions that just mix isolated ones (t1-4)
- specialize check prefixes
- replace `apple-a10` with `apple-m1`
- add a `-mtriple=arm64-apple-macosx -mcpu=generic` test case for GPR
- isolate `mtriple=arm64-apple-ios -mcpu=cyclone` FP workaround test
case and move `-fullfp16` to another non-workaround test case

(cherry-pick c3c24be)
Lower FPR64, FPR32, FPR16 from `fmov` zeroing into NEON zeroing if the
target supports zero cycle zeroing of NEON registers but not for the
narrower classes.

It handles 2 cases: one in `AsmPrinter` where a FP zeroing from
immediate has been captured by pattern matching on instruction
selection, and second post RA in `AArch64InstrInfo::copyPhysReg` for
uncaptured/later-generated WZR/XZR fmovs.

Adds a subtarget feature called FeatureZCZeroingFPR128 that enables to
query wether the target supports zero cycle zeroing for FPR128 NEON
registers, and updates the appropriate processors.

(cherry-pick f059d2b)
…m#161138)

Simplifies the code and improves readability.

(cherry-pick 732a366)
…PhysReg (llvm#162826)

This patch uses the RI member variable directly in the member function
AArch64InstrInfo::copyPhysReg, instead of redundant calls to the public
API.

(cherry-pick 6345222)
This patch pivots GPR32 and GPR64 zeroing into distinct branches to
simplify the code an improve the lowering.

Zeroing GPR moves are now handled differently than non-zeroing ones.
Zero source registers WZR and XZR do not require register annotations of
undef, implicit and kill. The non-zeroing source now cannot process WZR
removing the ternary expression. This patch also moves GPR64 logic right
after GPR32 for better organization.

(cherry-pick 5ac616f)
Given a GPR32 zeroing instruction, if the target supports zero cycle
zeroing for GPR64 but not for GPR32, widen the instruction to 64 bit
`$xn = MOVZXi 0, 0` instead of writing to `$wn` to exploit zero cycle
zeroing.

It also aligns naming in the generic zeroing test.

(cherry-pick f7585ad)
@tomershafir
Copy link
Author

Closes rdar://164394265

@tomershafir
Copy link
Author

@swift-ci please test

@jroelofs
Copy link

Closes rdar://164394265

recommend putting this in the main comment for the PR so it ends up in the commit message of the merge commit. Also, I'd leave off the "Closes " part so rdar://... starts the line.

@tomershafir
Copy link
Author

Done

@azharudd
Copy link

@swift-ci please test

@azharudd
Copy link

@swift-ci please test llvm

@tomershafir
Copy link
Author

@swift-ci please test llvm

@tomershafir
Copy link
Author

@swift-ci please test

1 similar comment
@tomershafir
Copy link
Author

@swift-ci please test

@tomershafir tomershafir merged commit c4bf428 into stable/21.x Nov 13, 2025
5 checks passed
@tomershafir tomershafir deleted the cherrypick-aarch64-zcm-zcz-optimizations branch November 13, 2025 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants