Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12564,6 +12564,17 @@ bool AArch64TargetLowering::isOffsetFoldingLegal(

bool AArch64TargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
bool OptForSize) const {
// If the constant to be materialized is scalar, it maybe efficient to use
// sequence of 'mov + fmov' rather than 'adrp + ldr' on specified CPU's.
// However, when materializing vector of constants, there are two things to
// note:
// 1. Throughput of fmov instruction is very low.
// 2. ldr instruction can load multiple constants in one go. Also, it's
// throughput is higher as compared to fmov.
Comment on lines +12567 to +12573
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this say "fmovs limit throughput, loads are great", but then goes on to use the fmov version for these cpus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"fmovs limit throughput, loads are great".

We want to be cautious when we are materializing vector of constants. So, I have used "maybe more efficient" to describe that we are pessimistic here.

if (!VT.isVector() && (Subtarget->getCPU() == "neoverse-v2" ||
Subtarget->getCPU() == "olympus"))
Comment on lines +12574 to +12575
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't like to add checks like this on the cpu name. It is better to add a subtarget feature for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure

return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably want to handle minsize/optsize like below. It would be the Subtarget->hasFuseLiterals that should probably change, being replaced with a new subtarget feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure


bool IsLegal = false;
// We can materialize #0.0 as fmov $Rd, XZR for 64-bit, 32-bit cases, and
// 16-bit case when target has full fp16 support.
Expand Down
9 changes: 8 additions & 1 deletion llvm/test/CodeGen/AArch64/misched-fusion-addadrp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-n1 | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v1 | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-n2 | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v2 | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=neoverse-v2 | FileCheck %s --check-prefix NO-CONST-POOL
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=olympus | FileCheck %s --check-prefix NO-CONST-POOL
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=apple-a16 -mattr=-fuse-literals | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=apple-a17 -mattr=-fuse-literals | FileCheck %s
; RUN: llc %s -o - -mtriple=aarch64-unknown -mcpu=ampere1 -mattr=-fuse-literals | FileCheck %s
Expand All @@ -38,6 +39,12 @@ define double @litf() {
; CHECK-LABEL: litf:
; CHECK: adrp [[ADDR:x[0-9]+]], [[CSTLABEL:.LCP.*]]
; CHECK-NEXT: ldr {{d[0-9]+}}, {{[[]}}[[ADDR]], :lo12:[[CSTLABEL]]{{[]]}}
;
; NO-CONST-POOL: mov [[R:x[0-9]+]], #11544
; NO-CONST-POOL: movk [[R]], #21572, lsl #16
; NO-CONST-POOL: movk [[R]], #8699, lsl #32
; NO-CONST-POOL: movk [[R]], #16393, lsl #48
; NO-CONST-POOL: fmov {{d[0-9]+}}, [[R]]
entry:
ret double 0x400921FB54442D18
}