[InterleavedAccessPass] Get round the unsupported large scalarize vectors #88643

vfdff · 2024-04-14T05:34:09Z

When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512;
While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid
llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited.
Unlikely, the related custom spilting is not supported now.

Fix #88247

llvmbot · 2024-04-14T05:34:39Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-transforms

Author: Allen (vfdff)

Changes

When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512;
While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid
llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited. unlikely, the related custom spilting is not supported now.

Fix #88247

Full diff: https://github.com/llvm/llvm-project/pull/88643.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+3-1)
(modified) llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll (+25)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index f552f91929201c..24ce3a8121500d 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -16433,7 +16433,9 @@ bool AArch64TargetLowering::lowerDeinterleaveIntrinsicToLoad(
   if (UseScalable && !VTy->isScalableTy())
     return false;
 
-  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
+  // Get around for the missing of split large size scalarize vectors.
+  // TODO: Support the custom split of large size scalarize vectors
+  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false);
 
   VectorType *LdTy =
       VectorType::get(VTy->getElementType(),
diff --git a/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll b/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
index feb22aa1a37635..d0f773a6174d24 100644
--- a/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
+++ b/llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll
@@ -491,6 +491,31 @@ define void @store_bfloat_factor2(ptr %ptr, <16 x bfloat> %v0, <16 x bfloat> %v1
   ret void
 }
 
+; Use 4 vliad llvm.vector.insert.nxv4f64.nxv2f64
+define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_nxptr_factor2(ptr nocapture readonly %ptr) #2 {
+; CHECK-LABEL: define { <vscale x 4 x double>, <vscale x 4 x double> } @deinterleave_nxptr_factor2(
+; CHECK-SAME: ptr nocapture readonly [[PTR:%.*]]) #[[ATTR2]] {
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 0
+; CHECK-NEXT:    [[LDN1:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP1]])
+; CHECK-NEXT:    [[TMP2:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 0
+; CHECK-NEXT:    [[TMP3:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP2]], i64 0)
+; CHECK-NEXT:    [[TMP4:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN1]], 1
+; CHECK-NEXT:    [[TMP5:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> poison, <vscale x 2 x double> [[TMP4]], i64 0)
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr <vscale x 2 x double>, ptr [[PTR]], i64 2
+; CHECK-NEXT:    [[LDN2:%.*]] = call { <vscale x 2 x double>, <vscale x 2 x double> } @llvm.aarch64.sve.ld2.sret.nxv2f64(<vscale x 2 x i1> shufflevector (<vscale x 2 x i1> insertelement (<vscale x 2 x i1> poison, i1 true, i64 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer), ptr [[TMP6]])
+; CHECK-NEXT:    [[TMP7:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN2]], 0
+; CHECK-NEXT:    [[TMP8:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP3]], <vscale x 2 x double> [[TMP7]], i64 2)
+; CHECK-NEXT:    [[TMP9:%.*]] = extractvalue { <vscale x 2 x double>, <vscale x 2 x double> } [[LDN2]], 1
+; CHECK-NEXT:    [[TMP10:%.*]] = call <vscale x 4 x double> @llvm.vector.insert.nxv4f64.nxv2f64(<vscale x 4 x double> [[TMP5]], <vscale x 2 x double> [[TMP9]], i64 2)
+; CHECK-NEXT:    [[TMP11:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } poison, <vscale x 4 x double> [[TMP8]], 0
+; CHECK-NEXT:    [[TMP12:%.*]] = insertvalue { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP11]], <vscale x 4 x double> [[TMP10]], 1
+; CHECK-NEXT:    ret { <vscale x 4 x double>, <vscale x 4 x double> } [[TMP12]]
+;
+  %wide.vec = load <vscale x 8 x double>, ptr %ptr, align 8
+  %ldN = tail call { <vscale x 4 x double>, <vscale x 4 x double> } @llvm.experimental.vector.deinterleave2.nxv8f64(<vscale x 8 x double> %wide.vec)
+  ret { <vscale x 4 x double>, <vscale x 4 x double> } %ldN
+}
+
 attributes #0 = { vscale_range(2,2) "target-features"="+sve" }
 attributes #1 = { vscale_range(2,4) "target-features"="+sve" }
 attributes #2 = { vscale_range(4,4) "target-features"="+sve" }

paulwalker-arm · 2024-04-15T11:14:17Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

-  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
+  // Get around for the missing of split large size scalarize vectors.
+  // TODO: Support the custom split of large size scalarize vectors
+  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false);


This looks like it would break the SVE fixed length support. Looking at the code I think changing:
if (UseScalable)
to
if (UseScalable && isa<FixedVectorType>(VecTy)
should do the trick given getMinSVEVectorSizeInBits() is not relevant for scalable vector types and the default 128 value should be maintained.

Apply your comment, thanks @paulwalker-arm

Sorry, I should have been clearer. I meant the code within getNumInterleavedAccesses() needs to be changed because that is where the bug exists.

Done, thanks

paulwalker-arm · 2024-04-15T12:26:14Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

-  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, UseScalable);
+  // Get around for the missing of split large size scalarize vectors.
+  // TODO: Support the custom split of large size scalarize vectors
+  unsigned NumLoads = getNumInterleavedAccesses(VTy, DL, false);


Sorry, I should have been clearer. I meant the code within getNumInterleavedAccesses() needs to be changed because that is where the bug exists.

paulwalker-arm · 2024-04-15T12:28:16Z

llvm/test/Transforms/InterleavedAccess/AArch64/sve-interleaved-accesses.ll

@@ -491,6 +491,30 @@ define void @store_bfloat_factor2(ptr %ptr, <16 x bfloat> %v0, <16 x bfloat> %v1
  ret void
 }

+; Use 4 vliad llvm.vector.insert.nxv4f64.nxv2f64


This is effectively a negative test so perhaps something like "Ensure vscale_range property does not affect scalable vector types."?

Update the comment, thanks

…tors When build with option -msve-vector-bits=512, the return vaule of Subtarget->getMinSVEVectorSizeInBits() is 512; While the MinElts is still 4 for <vscale x 4 x double> in getNumInterleavedAccesses, so it creates invalid llvm.aarch64.sve.ld2.sret.nxv4f64, which need be splited. Unlikely, the related custom spilting is not supported now.

vfdff requested review from cmd120, paulwalker-arm and huntergr-arm April 14, 2024 05:34

llvmbot added backend:AArch64 llvm:transforms labels Apr 14, 2024

vfdff force-pushed the PR88247 branch 2 times, most recently from 86f56f5 to 5fd76f1 Compare April 15, 2024 09:34

paulwalker-arm reviewed Apr 15, 2024

View reviewed changes

vfdff force-pushed the PR88247 branch from 5fd76f1 to a36748f Compare April 15, 2024 12:07

paulwalker-arm reviewed Apr 15, 2024

View reviewed changes

vfdff force-pushed the PR88247 branch from a36748f to 17239de Compare April 15, 2024 13:06

paulwalker-arm approved these changes Apr 15, 2024

View reviewed changes

vfdff merged commit 8aa7e37 into llvm:main Apr 16, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InterleavedAccessPass] Get round the unsupported large scalarize vectors #88643

[InterleavedAccessPass] Get round the unsupported large scalarize vectors #88643

vfdff commented Apr 14, 2024 •

edited

llvmbot commented Apr 14, 2024 •

edited

paulwalker-arm Apr 15, 2024

vfdff Apr 15, 2024

paulwalker-arm Apr 15, 2024

vfdff Apr 15, 2024

paulwalker-arm Apr 15, 2024

paulwalker-arm Apr 15, 2024

vfdff Apr 15, 2024

[InterleavedAccessPass] Get round the unsupported large scalarize vectors #88643

[InterleavedAccessPass] Get round the unsupported large scalarize vectors #88643

Conversation

vfdff commented Apr 14, 2024 • edited

llvmbot commented Apr 14, 2024 • edited

paulwalker-arm Apr 15, 2024

Choose a reason for hiding this comment

vfdff Apr 15, 2024

Choose a reason for hiding this comment

paulwalker-arm Apr 15, 2024

Choose a reason for hiding this comment

vfdff Apr 15, 2024

Choose a reason for hiding this comment

paulwalker-arm Apr 15, 2024

Choose a reason for hiding this comment

paulwalker-arm Apr 15, 2024

Choose a reason for hiding this comment

vfdff Apr 15, 2024

Choose a reason for hiding this comment

vfdff commented Apr 14, 2024 •

edited

llvmbot commented Apr 14, 2024 •

edited