Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InterleavedAccessPass] Get round the unsupported large scalarize vectors #88643

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

vfdff
Copy link
Contributor

@vfdff vfdff commented Apr 14, 2024

When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512;
While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid
llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited.
Unlikely, the related custom spilting is not supported now.

Fix #88247

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 14, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-transforms

Author: Allen (vfdff)

Changes

When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512;
While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid
llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited. unlikely, the related custom spilting is not supported now.

Fix #88247


Full diff: https://github.com/llvm/llvm-project/pull/88643.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+3-1)
  • (modified) llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll (+25)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index f552f91929201c..24ce3a8121500d 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -16433,7 +16433,9 @@ bool AArch64TargetLowering::lowerDeinterleaveIntrinsicToLoad(
   if (UseScalable && !VTy->isScalableTy())
     return false;
 
-  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
+  // Get around for the missing of split large size scalarize vectors.
+  // TODO: Support the custom split of large size scalarize vectors
+  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false);
 
   VectorType *LdTy =
       VectorType::get(VTy->getElementType(),
diff --git a/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll b/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
index feb22aa1a37635..d0f773a6174d24 100644
--- a/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
+++ b/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
@@ -491,6 +491,31 @@ define void @store_bfloat_factor2(ptr %ptr, <16 x bfloat> %v0, <16 x bfloat> %v1
   ret void
 }
 
+; Use 4 vliad llvm.vector.insert.nxv4f64.nxv2f64
+define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_nxptr_factor2(ptr nocapture readonly %ptr) #2 {
+; CHECK-LABEL: define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_nxptr_factor2(
+; CHECK-SAME: ptr nocapture readonly [[PTR:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 0
+; CHECK-NEXT:    [[LDN1:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP1]])
+; CHECK-NEXT:    [[TMP2:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 0
+; CHECK-NEXT:    [[TMP3:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP2]], i64 0)
+; CHECK-NEXT:    [[TMP4:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 1
+; CHECK-NEXT:    [[TMP5:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP4]], i64 0)
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 2
+; CHECK-NEXT:    [[LDN2:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP6]])
+; CHECK-NEXT:    [[TMP7:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN2]], 0
+; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP3]], <vscale x 2 x double> [[TMP7]], i64 2)
+; CHECK-NEXT:    [[TMP9:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN2]], 1
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP5]], <vscale x 2 x double> [[TMP9]], i64 2)
+; CHECK-NEXT:    [[TMP11:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } poison, <vscale x 4 x double> [[TMP8]], 0
+; CHECK-NEXT:    [[TMP12:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP11]], <vscale x 4 x double> [[TMP10]], 1
+; CHECK-NEXT:    ret { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP12]]
+;
+  %wide.vec = load <vscale x 8 x double>, ptr %ptr, align 8
+  %ldN = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %wide.vec)
+  ret { <vscale x 4 x double>, <vscale x 4 x double> } %ldN
+}
+
 attributes #0 = { vscale_range(2,2) "target-features"="+sve" }
 attributes #1 = { vscale_range(2,4) "target-features"="+sve" }
 attributes #2 = { vscale_range(4,4) "target-features"="+sve" }

@vfdff vfdff force-pushed the PR88247 branch 2 times, most recently from 86f56f5 to 5fd76f1 Compare April 15, 2024 09:34
unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
// Get around for the missing of split large size scalarize vectors.
// TODO: Support the custom split of large size scalarize vectors
unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it would break the SVE fixed length support. Looking at the code I think changing:
if (UseScalable)
to
if (UseScalable && isa<FixedVectorType>(VecTy)
should do the trick given getMinSVEVectorSizeInBits() is not relevant for scalable vector types and the default 128 value should be maintained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply your comment, thanks @paulwalker-arm

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been clearer. I meant the code within getNumInterleavedAccesses() needs to be changed because that is where the bug exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
// Get around for the missing of split large size scalarize vectors.
// TODO: Support the custom split of large size scalarize vectors
unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have been clearer. I meant the code within getNumInterleavedAccesses() needs to be changed because that is where the bug exists.

@@ -491,6 +491,30 @@ define void @store_bfloat_factor2(ptr %ptr, <16 x bfloat> %v0, <16 x bfloat> %v1
ret void
}

; Use 4 vliad llvm.vector.insert.nxv4f64.nxv2f64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively a negative test so perhaps something like "Ensure vscale_range property does not affect scalable vector types."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment, thanks

…tors

When build with option -msve-vector-bits=512, the return vaule of
Subtarget->getMinSVEVectorSizeInBits() is 512;
While the MinElts is still 4 for <vscale x 4 x double> in
getNumInterleavedAccesses, so it creates invalid
llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited.
Unlikely, the related custom spilting is not supported now.
@vfdff vfdff merged commit 8aa7e37 into llvm:main Apr 16, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] error in backend: Do not know how to split the result of this operator
3 participants