-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Add FeatureFuseLiterals as SubTargetFeature for Grace and Olympus #160257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12564,6 +12564,17 @@ bool AArch64TargetLowering::isOffsetFoldingLegal( | |
|
||
bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT, | ||
bool OptForSize) const { | ||
// If the constant to be materialized is scalar, it maybe efficient to use | ||
// sequence of 'mov + fmov' rather than 'adrp + ldr' on specified CPU's. | ||
// However, when materializing vector of constants, there are two things to | ||
// note: | ||
// 1. Throughput of fmov instruction is very low. | ||
// 2. ldr instruction can load multiple constants in one go. Also, it's | ||
// throughput is higher as compared to fmov. | ||
if (!VT.isVector() && (Subtarget->getCPU() == "neoverse-v2" || | ||
Subtarget->getCPU() == "olympus")) | ||
Comment on lines
+12574
to
+12575
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't like to add checks like this on the cpu name. It is better to add a subtarget feature for it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok sure |
||
return true; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We would probably want to handle minsize/optsize like below. It would be the Subtarget->hasFuseLiterals that should probably change, being replaced with a new subtarget feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok sure |
||
|
||
bool IsLegal = false; | ||
// We can materialize #0.0 as fmov $Rd, XZR for 64-bit, 32-bit cases, and | ||
// 16-bit case when target has full fp16 support. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this say "fmovs limit throughput, loads are great", but then goes on to use the fmov version for these cpus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to be cautious when we are materializing vector of constants. So, I have used "maybe more efficient" to describe that we are pessimistic here.