Skip to content

[VPlan] Allow truncation for lanes in VPScalarIVStepsRecipe#175268

Merged
googlewalt merged 1 commit intollvm:mainfrom
boomanaiden154:vplan-wrapping-1-9-26
Jan 10, 2026
Merged

[VPlan] Allow truncation for lanes in VPScalarIVStepsRecipe#175268
googlewalt merged 1 commit intollvm:mainfrom
boomanaiden154:vplan-wrapping-1-9-26

Conversation

@boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Jan 10, 2026

VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize blocks with a width greater than the maximum value the types of some of their (changing) operands are able to hold (e.g., an i1 input with a vector width of 4). Simply reenable implicit truncation in ConstantInt::get() to cover this case.

Remove the helper function given it is only called in one place to prevent accidentally using it elsewhere where we probably do not want implicit truncation turned on.

This fixes another case that we saw after
acb78bd did not fix that issue, which had the same stack trace. We still want to keep lane constants as unsigned.

Somewhat similar to 6d1e7d4.

This test case comes from a tensorflow/XLA compilation from a test case in https://github.com/google-research/spherical-cnn.

VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize
blocks with a width greater than the maximum value the types of some of
their (changing) operands are able to hold (e.g., an i1 input with a
vector width of 4). Simply reenable implicit truncation in
ConstantInt::get() to cover this case.

Remove the helper function given it is only called in one place to
prevent accidentally using it elsewhere where we probably do not want
implicit truncation turned on.

This fixes another case that we saw after
acb78bd did not fix that issue, which
had the same stack trace. We still want to keep lane constants as
unsigned.

Somewhat similar to 6d1e7d4.
@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2026

@llvm/pr-subscribers-llvm-transforms

Author: Aiden Grossman (boomanaiden154)

Changes

VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize blocks with a width greater than the maximum value the types of some of their (changing) operands are able to hold (e.g., an i1 input with a vector width of 4). Simply reenable implicit truncation in ConstantInt::get() to cover this case.

Remove the helper function given it is only called in one place to prevent accidentally using it elsewhere where we probably do not want implicit truncation turned on.

This fixes another case that we saw after
acb78bd did not fix that issue, which had the same stack trace. We still want to keep lane constants as unsigned.

Somewhat similar to 6d1e7d4.


Full diff: https://github.com/llvm/llvm-project/pull/175268.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+8-8)
  • (added) llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll (+68)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index bfa704589a6dd..2c0772320c3cf 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2344,12 +2344,6 @@ InstructionCost VPHeaderPHIRecipe::computeCost(ElementCount VF,
   return Ctx.TTI.getCFInstrCost(Instruction::PHI, Ctx.CostKind);
 }
 
-/// A helper function that returns an integer or floating-point constant with
-/// value C.
-static Constant *getUnsignedIntOrFpConstant(Type *Ty, uint64_t C) {
-  return Ty->isIntegerTy() ? ConstantInt::get(Ty, C) : ConstantFP::get(Ty, C);
-}
-
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void VPWidenIntOrFpInductionRecipe::printRecipe(
     raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
@@ -2451,8 +2445,14 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {
     StartIdx0 = Builder.CreateSIToFP(StartIdx0, BaseIVTy);
 
   for (unsigned Lane = StartLane; Lane < EndLane; ++Lane) {
-    Value *StartIdx = Builder.CreateBinOp(
-        AddOp, StartIdx0, getUnsignedIntOrFpConstant(BaseIVTy, Lane));
+    // It is okay if the induction variable type cannot hold the lane number,
+    // we expect truncation in this case.
+    Constant *LaneValue =
+        BaseIVTy->isIntegerTy()
+            ? ConstantInt::get(BaseIVTy, Lane, /*IsSigned=*/false,
+                               /*ImplicitTrunc=*/true)
+            : ConstantFP::get(BaseIVTy, Lane);
+    Value *StartIdx = Builder.CreateBinOp(AddOp, StartIdx0, LaneValue);
     // The step returned by `createStepForVF` is a runtime-evaluated value
     // when VF is scalable. Otherwise, it should be folded into a Constant.
     assert((State.VF.isScalable() || isa<Constant>(StartIdx)) &&
diff --git a/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll b/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll
new file mode 100644
index 0000000000000..cdfe9c30d10af
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll
@@ -0,0 +1,68 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -passes=loop-vectorize -force-vector-width=4 -S %s 2>&1 | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-grtev4-linux-gnu"
+
+define void @copy_bitcast_fusion(ptr noalias %foo, ptr noalias %bar) {
+; CHECK-LABEL: define void @copy_bitcast_fusion(
+; CHECK-SAME: ptr noalias [[FOO:%.*]], ptr noalias [[BAR:%.*]]) {
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = select i1 false, i64 1, i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = select i1 true, i64 1, i64 0
+; CHECK-NEXT:    [[TMP3:%.*]] = select i1 false, i64 1, i64 0
+; CHECK-NEXT:    [[TMP4:%.*]] = select i1 true, i64 1, i64 0
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP1]]
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP2]]
+; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP3]]
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP4]]
+; CHECK-NEXT:    [[TMP9:%.*]] = load float, ptr [[TMP5]], align 4
+; CHECK-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+; CHECK-NEXT:    [[TMP11:%.*]] = load float, ptr [[TMP7]], align 4
+; CHECK-NEXT:    [[TMP12:%.*]] = load float, ptr [[TMP8]], align 4
+; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x float> poison, float [[TMP9]], i32 0
+; CHECK-NEXT:    [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP10]], i32 1
+; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP11]], i32 2
+; CHECK-NEXT:    [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP12]], i32 3
+; CHECK-NEXT:    [[TMP17:%.*]] = shufflevector <4 x float> [[TMP16]], <4 x float> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP18:%.*]] = shufflevector <8 x float> [[TMP17]], <8 x float> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP19:%.*]] = shufflevector <16 x float> [[TMP18]], <16 x float> <float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, <24 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
+; CHECK-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <24 x float> [[TMP19]], <24 x float> poison, <24 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 3, i32 7, i32 11, i32 15, i32 19, i32 23>
+; CHECK-NEXT:    store <24 x float> [[INTERLEAVED_VEC]], ptr [[BAR]], align 4
+; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
+  br label %body
+
+body:
+  %iv = phi i64 [ 0, %0 ], [ %ptr3, %body ]
+  %iv.trunc = trunc i64 %iv to i1
+  %iv.trunc2 = select i1 %iv.trunc, i64 1, i64 0
+  %unpack.ptr = getelementptr { float, float }, ptr %foo, i64 %iv.trunc2
+  %unpack = load float, ptr %unpack.ptr, align 4
+  %idx3 = mul i64 %iv, 24
+  %bar.ptr = getelementptr i8, ptr %bar, i64 %idx3
+  store float %unpack, ptr %bar.ptr, align 4
+  %repack4 = getelementptr i8, ptr %bar.ptr, i64 4
+  store float 0.000000e+00, ptr %repack4, align 4
+  %ptr1 = getelementptr i8, ptr %bar.ptr, i64 8
+  store float 0.000000e+00, ptr %ptr1, align 4
+  %repack4.1 = getelementptr i8, ptr %bar.ptr, i64 12
+  store float 0.000000e+00, ptr %repack4.1, align 4
+  %ptr2 = getelementptr i8, ptr %bar.ptr, i64 16
+  store float 0.000000e+00, ptr %ptr2, align 4
+  %repack4.2 = getelementptr i8, ptr %bar.ptr, i64 20
+  store float 0.000000e+00, ptr %repack4.2, align 4
+  %ptr3 = add i64 %iv, 1
+  %exitcond.not = icmp eq i64 %ptr3, 4
+  br i1 %exitcond.not, label %exit, label %body
+
+exit:
+  ret void
+}

@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2026

@llvm/pr-subscribers-vectorizers

Author: Aiden Grossman (boomanaiden154)

Changes

VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize blocks with a width greater than the maximum value the types of some of their (changing) operands are able to hold (e.g., an i1 input with a vector width of 4). Simply reenable implicit truncation in ConstantInt::get() to cover this case.

Remove the helper function given it is only called in one place to prevent accidentally using it elsewhere where we probably do not want implicit truncation turned on.

This fixes another case that we saw after
acb78bd did not fix that issue, which had the same stack trace. We still want to keep lane constants as unsigned.

Somewhat similar to 6d1e7d4.


Full diff: https://github.com/llvm/llvm-project/pull/175268.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+8-8)
  • (added) llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll (+68)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index bfa704589a6dd..2c0772320c3cf 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -2344,12 +2344,6 @@ InstructionCost VPHeaderPHIRecipe::computeCost(ElementCount VF,
   return Ctx.TTI.getCFInstrCost(Instruction::PHI, Ctx.CostKind);
 }
 
-/// A helper function that returns an integer or floating-point constant with
-/// value C.
-static Constant *getUnsignedIntOrFpConstant(Type *Ty, uint64_t C) {
-  return Ty->isIntegerTy() ? ConstantInt::get(Ty, C) : ConstantFP::get(Ty, C);
-}
-
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 void VPWidenIntOrFpInductionRecipe::printRecipe(
     raw_ostream &O, const Twine &Indent, VPSlotTracker &SlotTracker) const {
@@ -2451,8 +2445,14 @@ void VPScalarIVStepsRecipe::execute(VPTransformState &State) {
     StartIdx0 = Builder.CreateSIToFP(StartIdx0, BaseIVTy);
 
   for (unsigned Lane = StartLane; Lane < EndLane; ++Lane) {
-    Value *StartIdx = Builder.CreateBinOp(
-        AddOp, StartIdx0, getUnsignedIntOrFpConstant(BaseIVTy, Lane));
+    // It is okay if the induction variable type cannot hold the lane number,
+    // we expect truncation in this case.
+    Constant *LaneValue =
+        BaseIVTy->isIntegerTy()
+            ? ConstantInt::get(BaseIVTy, Lane, /*IsSigned=*/false,
+                               /*ImplicitTrunc=*/true)
+            : ConstantFP::get(BaseIVTy, Lane);
+    Value *StartIdx = Builder.CreateBinOp(AddOp, StartIdx0, LaneValue);
     // The step returned by `createStepForVF` is a runtime-evaluated value
     // when VF is scalable. Otherwise, it should be folded into a Constant.
     assert((State.VF.isScalable() || isa<Constant>(StartIdx)) &&
diff --git a/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll b/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll
new file mode 100644
index 0000000000000..cdfe9c30d10af
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll
@@ -0,0 +1,68 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt -passes=loop-vectorize -force-vector-width=4 -S %s 2>&1 | FileCheck %s
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-grtev4-linux-gnu"
+
+define void @copy_bitcast_fusion(ptr noalias %foo, ptr noalias %bar) {
+; CHECK-LABEL: define void @copy_bitcast_fusion(
+; CHECK-SAME: ptr noalias [[FOO:%.*]], ptr noalias [[BAR:%.*]]) {
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[TMP1:%.*]] = select i1 false, i64 1, i64 0
+; CHECK-NEXT:    [[TMP2:%.*]] = select i1 true, i64 1, i64 0
+; CHECK-NEXT:    [[TMP3:%.*]] = select i1 false, i64 1, i64 0
+; CHECK-NEXT:    [[TMP4:%.*]] = select i1 true, i64 1, i64 0
+; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP1]]
+; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP2]]
+; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP3]]
+; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr { float, float }, ptr [[FOO]], i64 [[TMP4]]
+; CHECK-NEXT:    [[TMP9:%.*]] = load float, ptr [[TMP5]], align 4
+; CHECK-NEXT:    [[TMP10:%.*]] = load float, ptr [[TMP6]], align 4
+; CHECK-NEXT:    [[TMP11:%.*]] = load float, ptr [[TMP7]], align 4
+; CHECK-NEXT:    [[TMP12:%.*]] = load float, ptr [[TMP8]], align 4
+; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x float> poison, float [[TMP9]], i32 0
+; CHECK-NEXT:    [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP10]], i32 1
+; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP11]], i32 2
+; CHECK-NEXT:    [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP12]], i32 3
+; CHECK-NEXT:    [[TMP17:%.*]] = shufflevector <4 x float> [[TMP16]], <4 x float> zeroinitializer, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
+; CHECK-NEXT:    [[TMP18:%.*]] = shufflevector <8 x float> [[TMP17]], <8 x float> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    [[TMP19:%.*]] = shufflevector <16 x float> [[TMP18]], <16 x float> <float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float 0.000000e+00, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, <24 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
+; CHECK-NEXT:    [[INTERLEAVED_VEC:%.*]] = shufflevector <24 x float> [[TMP19]], <24 x float> poison, <24 x i32> <i32 0, i32 4, i32 8, i32 12, i32 16, i32 20, i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 2, i32 6, i32 10, i32 14, i32 18, i32 22, i32 3, i32 7, i32 11, i32 15, i32 19, i32 23>
+; CHECK-NEXT:    store <24 x float> [[INTERLEAVED_VEC]], ptr [[BAR]], align 4
+; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret void
+;
+  br label %body
+
+body:
+  %iv = phi i64 [ 0, %0 ], [ %ptr3, %body ]
+  %iv.trunc = trunc i64 %iv to i1
+  %iv.trunc2 = select i1 %iv.trunc, i64 1, i64 0
+  %unpack.ptr = getelementptr { float, float }, ptr %foo, i64 %iv.trunc2
+  %unpack = load float, ptr %unpack.ptr, align 4
+  %idx3 = mul i64 %iv, 24
+  %bar.ptr = getelementptr i8, ptr %bar, i64 %idx3
+  store float %unpack, ptr %bar.ptr, align 4
+  %repack4 = getelementptr i8, ptr %bar.ptr, i64 4
+  store float 0.000000e+00, ptr %repack4, align 4
+  %ptr1 = getelementptr i8, ptr %bar.ptr, i64 8
+  store float 0.000000e+00, ptr %ptr1, align 4
+  %repack4.1 = getelementptr i8, ptr %bar.ptr, i64 12
+  store float 0.000000e+00, ptr %repack4.1, align 4
+  %ptr2 = getelementptr i8, ptr %bar.ptr, i64 16
+  store float 0.000000e+00, ptr %ptr2, align 4
+  %repack4.2 = getelementptr i8, ptr %bar.ptr, i64 20
+  store float 0.000000e+00, ptr %repack4.2, align 4
+  %ptr3 = add i64 %iv, 1
+  %exitcond.not = icmp eq i64 %ptr3, 4
+  br i1 %exitcond.not, label %exit, label %body
+
+exit:
+  ret void
+}

@github-actions
Copy link

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef([^a-zA-Z0-9_-]|$)|UndefValue::get)' 'HEAD~1' HEAD llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

The following files introduce new uses of undef:

  • llvm/test/Transforms/LoopVectorize/X86/vplan-single-bit-ind-var-width-4.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
; RUN: opt -passes=loop-vectorize -force-vector-width=4 -S %s 2>&1 | FileCheck %s

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably can drop the target triple here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't reproduce for me without the triple.

@googlewalt googlewalt merged commit 7450a75 into llvm:main Jan 10, 2026
12 of 13 checks passed
@boomanaiden154 boomanaiden154 deleted the vplan-wrapping-1-9-26 branch January 10, 2026 17:53
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Jan 18, 2026
)

VPScalarIVStepsRecipe relies on APInt truncation in order to vectorize
blocks with a width greater than the maximum value the types of some of
their (changing) operands are able to hold (e.g., an i1 input with a
vector width of 4). Simply reenable implicit truncation in
ConstantInt::get() to cover this case.

Remove the helper function given it is only called in one place to
prevent accidentally using it elsewhere where we probably do not want
implicit truncation turned on.

This fixes another case that we saw after
acb78bd did not fix that issue, which
had the same stack trace. We still want to keep lane constants as
unsigned.

Somewhat similar to 6d1e7d4.

This test case comes from a tensorflow/XLA compilation from a test case
in https://github.com/google-research/spherical-cnn.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants