Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][TTI] Improve LegalVF when computing gather-loads cost #69617

Conversation

antoniofrighetto
Copy link
Contributor

After determining the cost of loads that could not be coalesced into VectorizedLoads, computing the cost of a gather-vectorized load is carried out. Take into account the case where the type of a group of loads, whose type is a vector of size dependent upon VF, may be legalized into a scalar value.

Current number of loads is held in LT.first.

Fixes: #68953.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 19, 2023

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Antonio Frighetto (antoniofrighetto)

Changes

After determining the cost of loads that could not be coalesced into VectorizedLoads, computing the cost of a gather-vectorized load is carried out. Take into account the case where the type of a group of loads, whose type is a vector of size dependent upon VF, may be legalized into a scalar value.

Current number of loads is held in LT.first.

Fixes: #68953.


Full diff: https://github.com/llvm/llvm-project/pull/69617.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+8-1)
  • (added) llvm/test/Transforms/SLPVectorizer/AArch64/gather-load-fp128.ll (+37)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index d8a0e68d7123759..40ed3e6d327d882 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2989,7 +2989,14 @@ InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
       ElementCount::getScalable(1))
     return InstructionCost::getInvalid();
 
-  ElementCount LegalVF = LT.second.getVectorElementCount();
+  ElementCount LegalVF;
+  if (LT.second.isVector()) {
+    LegalVF = LT.second.getVectorElementCount();
+  } else {
+    // If the legalized type is a simple type, treat it as a 1-element vector.
+    LegalVF = ElementCount::getFixed(1);
+  }
+
   InstructionCost MemOpCost =
       getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind,
                       {TTI::OK_AnyValue, TTI::OP_None}, I);
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-load-fp128.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-load-fp128.ll
new file mode 100644
index 000000000000000..84de45bde06a62f
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-load-fp128.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64-unknown-linux-gnu -mcpu=neoverse-512tvb -pass-remarks-output=%t.yaml < %s | FileCheck %s
+; RUN: cat %t.yaml | FileCheck -check-prefix=REMARK %s
+
+; REMARK-LABEL: Function:        gather_load_fp128
+; REMARK:       Args:
+; REMARK-NEXT:    - String:          'List vectorization was possible but not beneficial with cost '
+; REMARK-NEXT:    - Cost:            '0'
+; REMARK-NEXT:    - String:          ' >= '
+; REMARK-NEXT:    - Treshold:        '0'
+
+define void @gather_load_fp128(ptr %arg) #0 {
+; CHECK-LABEL: @gather_load_fp128(
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr i8, ptr [[ARG:%.*]], i64 16
+; CHECK-NEXT:    [[LOAD0:%.*]] = load fp128, ptr [[ARG]], align 1
+; CHECK-NEXT:    [[LOAD1:%.*]] = load fp128, ptr [[GEP]], align 1
+; CHECK-NEXT:    [[LOAD2:%.*]] = load fp128, ptr null, align 1
+; CHECK-NEXT:    [[LOAD3:%.*]] = load fp128, ptr null, align 1
+; CHECK-NEXT:    [[FCMP0:%.*]] = fcmp oeq fp128 [[LOAD0]], 0xL00000000000000000000000000000000
+; CHECK-NEXT:    [[FCMP1:%.*]] = fcmp oeq fp128 [[LOAD1]], 0xL00000000000000000000000000000000
+; CHECK-NEXT:    [[FCMP2:%.*]] = fcmp oeq fp128 [[LOAD2]], 0xL00000000000000000000000000000000
+; CHECK-NEXT:    [[FCMP3:%.*]] = fcmp oeq fp128 [[LOAD3]], 0xL00000000000000000000000000000000
+; CHECK-NEXT:    ret void
+;
+  %gep = getelementptr i8, ptr %arg, i64 16
+  %load0 = load fp128, ptr %arg, align 1
+  %load1 = load fp128, ptr %gep, align 1
+  %load2 = load fp128, ptr null, align 1
+  %load3 = load fp128, ptr null, align 1
+  %fcmp0 = fcmp oeq fp128 %load0, 0xL0
+  %fcmp1 = fcmp oeq fp128 %load1, 0xL0
+  %fcmp2 = fcmp oeq fp128 %load2, 0xL0
+  %fcmp3 = fcmp oeq fp128 %load3, 0xL0
+  ret void
+}
+
+attributes #0 = { vscale_range(2,2) }

Comment on lines 2992 to 2999
ElementCount LegalVF;
if (LT.second.isVector()) {
LegalVF = LT.second.getVectorElementCount();
} else {
// If the legalized type is a simple type, treat it as a 1-element vector.
LegalVF = ElementCount::getFixed(1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall isLegalMaskedGather() just return false for such types instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May that be the case though? Currently, few lines above, if useNeonVector is available, the cost is computed via BaseT::getGatherScatterOpCost before type-legalization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the vector type gets scalarized (like in your check), isLegalMaskedGather() shall return false for such types, even though useNeonVector() returns true. SLP shall not assume emitting masked gather here at all, the scalarization increases the cost significantly, we should end up with the simple gather here

@@ -0,0 +1,37 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64-unknown-linux-gnu -mcpu=neoverse-512tvb -pass-remarks-output=%t.yaml < %s | FileCheck %s
; RUN: cat %t.yaml | FileCheck -check-prefix=REMARK %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is testing the remarks necessary, along with the output?

; REMARK-NEXT: - String: ' >= '
; REMARK-NEXT: - Treshold: '0'

define void @gather_load_fp128(ptr %arg) #0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a variant with i128 too?

@@ -2989,7 +2989,14 @@ InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
ElementCount::getScalable(1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth treating !LT.second.isVector() the same as 1 x scalable vectors. Either that or something should be checking that the element type isElementTypeLegalForScalableVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we bail out when isLegalMaskedGather returns false, which indeed would happen with such types, – as per @alexey-bataev suggestion above – I don't think we would ever be reaching this point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't look like that function was called when I tried it, and it is best if these functions work correctly in isolation. It's possible to call it throught the cost model: https://godbolt.org/z/z4635vodq.

I think that because the gather will not become a scalable SVE gather in this case, the line above with useNeonVector should be returning true so we use the BaseT::getGatherScatterOpCost for the cost.

@antoniofrighetto
Copy link
Contributor Author

Addressed reviews, thanks!

@@ -2996,6 +2996,9 @@ static unsigned getSVEGatherScatterOverhead(unsigned Opcode) {
InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask,
Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) {
if (!isLegalMaskedGatherScatter(DataTy))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound right - to return an invalid cost. If the gather/scatter can be scalarized then it should have some cost, even if it's quite high. There are some scalable vectors that can't be scalarized, but in general for fixed-width it should be OK.

Is the isElementTypeLegalForScalableVector check below enough to fix the issues you were seeing? We should probably have some fixed-length cost checks too.

If not, I think doing this would make sense, so that it returned the base cost:

if (useNeonVector(DataTy) || !isLegalMaskedGatherScatter(DataTy))
  return BaseT::getGatherScatterOpCost(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought we wanted to favour an invalid cost in those cases, but makes sense. I confirm isElementTypeLegalForScalableVector is fixing the issues previously seen.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this LGTM. Thanks for the fix!

After determining the cost of loads that could not be coalesced into
`VectorizedLoads` in SLP, computing the cost of a gather-vectorized
load is carried out. Favour a potentially high valid cost when the
type of a group of loads, whose type is a vector of size dependent
upon `VF`, may be legalized into a scalar value.

Fixes: llvm#68953.
@antoniofrighetto antoniofrighetto force-pushed the feature/slp_aarch64_tti_improve_vf branch from c828e0d to 138e6c1 Compare October 27, 2023 18:25
@antoniofrighetto antoniofrighetto merged commit 138e6c1 into llvm:main Oct 27, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SLPVectorizer][AArch64][SVE] Opt crashed when optimizing
4 participants