[SPIR-V] Decompose bitcasts involving bool vectors#187960
[SPIR-V] Decompose bitcasts involving bool vectors#187960
Conversation
OpTypeBool has no defined bitwidth in SPIR-V, so OpBitcast is invalid for boolean vector types. Decompose <N x i1> <-> iN bitcasts into element-wise extract/shift/OR and AND/icmp/insert sequences during IR preprocessing.
|
@llvm/pr-subscribers-backend-spir-v Author: Dmitry Sidorov (MrSidims) ChangesOpTypeBool has no defined bitwidth in SPIR-V, so OpBitcast is invalid for boolean vector types. Decompose <N x i1> <-> iN bitcasts into element-wise extract/shift/OR and AND/icmp/insert sequences during IR preprocessing. Full diff: https://github.com/llvm/llvm-project/pull/187960.diff 3 Files Affected:
diff --git a/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp b/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
index 131b56e92b8be..57c01b8e3c08b 100644
--- a/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVEmitIntrinsics.cpp
@@ -302,6 +302,7 @@ class SPIRVEmitIntrinsics
void useRoundingMode(ConstrainedFPIntrinsic *FPI, IRBuilder<> &B);
bool processMaskedMemIntrinsic(IntrinsicInst &I);
bool convertMaskedMemIntrinsics(Module &M);
+ void preprocessBoolVectorBitcasts(Function &F);
void emitUnstructuredLoopControls(Function &F, IRBuilder<> &B);
@@ -3157,6 +3158,7 @@ bool SPIRVEmitIntrinsics::runOnFunction(Function &Func) {
preprocessUndefs(B);
preprocessCompositeConstants(B);
+ preprocessBoolVectorBitcasts(Func);
SmallVector<Instruction *> Worklist(
llvm::make_pointer_range(instructions(Func)));
@@ -3389,6 +3391,84 @@ bool SPIRVEmitIntrinsics::processMaskedMemIntrinsic(IntrinsicInst &I) {
return false;
}
+// SPIR-V doesn't support bitcasts involving vector boolean type. Decompose such
+// bitcasts into element-wise operations before building instructions
+// worklist, so new instructions are properly visited and converted to
+// SPIR-V intrinsics.
+void SPIRVEmitIntrinsics::preprocessBoolVectorBitcasts(Function &F) {
+ struct BoolVecBitcast {
+ BitCastInst *BC;
+ FixedVectorType *BoolVecTy;
+ bool SrcIsBoolVec;
+ };
+
+ auto getAsBoolVec = [](Type *Ty) -> FixedVectorType * {
+ auto *VTy = dyn_cast<FixedVectorType>(Ty);
+ return (VTy && VTy->getElementType()->isIntegerTy(1)) ? VTy : nullptr;
+ };
+
+ SmallVector<BoolVecBitcast, 4> ToReplace;
+ for (auto &I : instructions(F)) {
+ auto *BC = dyn_cast<BitCastInst>(&I);
+ if (!BC)
+ continue;
+ if (auto *BVTy = getAsBoolVec(BC->getSrcTy()))
+ ToReplace.push_back({BC, BVTy, true});
+ else if (auto *BVTy = getAsBoolVec(BC->getDestTy()))
+ ToReplace.push_back({BC, BVTy, false});
+ }
+
+ for (auto &[BC, BoolVecTy, SrcIsBoolVec] : ToReplace) {
+ IRBuilder<> B(BC);
+ Value *Src = BC->getOperand(0);
+ unsigned BoolVecN = BoolVecTy->getNumElements();
+ // Use iN as the scalar intermediate type for the bool vector side.
+ Type *IntTy = B.getIntNTy(BoolVecN);
+
+ // Convert source to scalar integer.
+ Value *IntVal;
+ if (SrcIsBoolVec) {
+ // Extract each bool, zext, shift, and OR.
+ IntVal = ConstantInt::get(IntTy, 0);
+ for (unsigned I = 0; I < BoolVecN; ++I) {
+ Value *Elem = B.CreateExtractElement(Src, B.getInt32(I));
+ Value *Ext = B.CreateZExt(Elem, IntTy);
+ if (I > 0)
+ Ext = B.CreateShl(Ext, ConstantInt::get(IntTy, I));
+ IntVal = B.CreateOr(IntVal, Ext);
+ }
+ } else {
+ // Source is a non-bool type. If it's already a scalar integer, use it
+ // directly, otherwise bitcast to iN first.
+ IntVal = Src;
+ if (!Src->getType()->isIntegerTy())
+ IntVal = B.CreateBitCast(Src, IntTy);
+ }
+
+ // Convert scalar integer to destination type.
+ Value *Result;
+ if (!SrcIsBoolVec) {
+ // Test each bit with AND + icmp.
+ Result = PoisonValue::get(BoolVecTy);
+ for (unsigned I = 0; I < BoolVecN; ++I) {
+ Value *Mask = ConstantInt::get(IntTy, APInt::getOneBitSet(BoolVecN, I));
+ Value *And = B.CreateAnd(IntVal, Mask);
+ Value *Cmp = B.CreateICmpNE(And, ConstantInt::get(IntTy, 0));
+ Result = B.CreateInsertElement(Result, Cmp, B.getInt32(I));
+ }
+ } else {
+ // Destenation is a non-bool type. If it's a scalar integer, use IntVal
+ // directly, otherwise bitcast from iN.
+ Result = IntVal;
+ if (!BC->getDestTy()->isIntegerTy())
+ Result = B.CreateBitCast(IntVal, BC->getDestTy());
+ }
+
+ BC->replaceAllUsesWith(Result);
+ BC->eraseFromParent();
+ }
+}
+
bool SPIRVEmitIntrinsics::convertMaskedMemIntrinsics(Module &M) {
bool Changed = false;
diff --git a/llvm/test/CodeGen/SPIRV/bool-vector-bitcast.ll b/llvm/test/CodeGen/SPIRV/bool-vector-bitcast.ll
new file mode 100644
index 0000000000000..a7f171c0e0b44
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/bool-vector-bitcast.ll
@@ -0,0 +1,111 @@
+; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s
+; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val %}
+;
+; Verify that bitcasts between bool vectors and other types are decomposed
+; into element-wise operations instead of generating OpBitcast, which is
+; invalid for OpTypeBool.
+
+; CHECK-DAG: %[[#BOOL:]] = OpTypeBool
+; CHECK-DAG: %[[#BVEC8:]] = OpTypeVector %[[#BOOL]] 8
+; CHECK-DAG: %[[#I8:]] = OpTypeInt 8 0
+
+; CHECK-DAG: %[[#ZERO:]] = OpConstantNull %[[#I8]]
+; CHECK-DAG: %[[#ONE:]] = OpConstant %[[#I8]] 1
+; CHECK-DAG: %[[#TWO:]] = OpConstant %[[#I8]] 2
+; CHECK-DAG: %[[#FOUR:]] = OpConstant %[[#I8]] 4
+; CHECK-DAG: %[[#EIGHT:]] = OpConstant %[[#I8]] 8
+; CHECK-DAG: %[[#C16:]] = OpConstant %[[#I8]] 16
+; CHECK-DAG: %[[#C32:]] = OpConstant %[[#I8]] 32
+; CHECK-DAG: %[[#C64:]] = OpConstant %[[#I8]] 64
+; CHECK-DAG: %[[#C128:]] = OpConstant %[[#I8]] 128
+; CHECK-DAG: %[[#C3:]] = OpConstant %[[#I8]] 3
+; CHECK-DAG: %[[#C5:]] = OpConstant %[[#I8]] 5
+; CHECK-DAG: %[[#C6:]] = OpConstant %[[#I8]] 6
+; CHECK-DAG: %[[#C7:]] = OpConstant %[[#I8]] 7
+
+; bitcast <8 x i1> to i8
+; Extracts each bool, zero-extends via OpSelect, shifts into position, and ORs.
+;
+; CHECK: %[[#B2S:]] = OpFunction %[[#I8]]
+; CHECK: %[[#B2S_ARG:]] = OpFunctionParameter %[[#BVEC8]]
+; CHECK: OpLabel
+;
+; CHECK: %[[#E0:]] = OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 0
+; CHECK: %[[#S0:]] = OpSelect %[[#I8]] %[[#E0]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#OR0:]] = OpBitwiseOr %[[#I8]] %[[#ZERO]] %[[#S0]]
+;
+; CHECK: %[[#E1:]] = OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 1
+; CHECK: %[[#S1:]] = OpSelect %[[#I8]] %[[#E1]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#SHL1:]] = OpShiftLeftLogical %[[#I8]] %[[#S1]] %[[#ONE]]
+; CHECK: %[[#OR1:]] = OpBitwiseOr %[[#I8]] %[[#OR0]] %[[#SHL1]]
+;
+; CHECK: %[[#E2:]] = OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 2
+; CHECK: %[[#S2:]] = OpSelect %[[#I8]] %[[#E2]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#SHL2:]] = OpShiftLeftLogical %[[#I8]] %[[#S2]] %[[#TWO]]
+; CHECK: %[[#OR2:]] = OpBitwiseOr %[[#I8]] %[[#OR1]] %[[#SHL2]]
+;
+; CHECK: OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 3
+; CHECK: OpShiftLeftLogical %[[#I8]] %{{.*}} %[[#C3]]
+; CHECK: %[[#OR3:]] = OpBitwiseOr %[[#I8]]
+; CHECK: OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 4
+; CHECK: OpShiftLeftLogical %[[#I8]] %{{.*}} %[[#FOUR]]
+; CHECK: %[[#OR4:]] = OpBitwiseOr %[[#I8]]
+; CHECK: OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 5
+; CHECK: OpShiftLeftLogical %[[#I8]] %{{.*}} %[[#C5]]
+; CHECK: %[[#OR5:]] = OpBitwiseOr %[[#I8]]
+; CHECK: OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 6
+; CHECK: OpShiftLeftLogical %[[#I8]] %{{.*}} %[[#C6]]
+; CHECK: %[[#OR6:]] = OpBitwiseOr %[[#I8]]
+;
+; CHECK: %[[#E7:]] = OpCompositeExtract %[[#BOOL]] %[[#B2S_ARG]] 7
+; CHECK: %[[#S7:]] = OpSelect %[[#I8]] %[[#E7]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#SHL7:]] = OpShiftLeftLogical %[[#I8]] %[[#S7]] %[[#C7]]
+; CHECK: %[[#OR7:]] = OpBitwiseOr %[[#I8]] %[[#OR6]] %[[#SHL7]]
+;
+; CHECK: OpReturnValue %[[#OR7]]
+define i8 @boolvec_to_scalar(<8 x i1> %v) {
+ %r = bitcast <8 x i1> %v to i8
+ ret i8 %r
+}
+
+; bitcast i8 to <8 x i1>
+; Tests each bit with AND + INotEqual, inserts each bool into the result vector.
+;
+; CHECK: %[[#S2B:]] = OpFunction %[[#BVEC8]]
+; CHECK: %[[#S2B_ARG:]] = OpFunctionParameter %[[#I8]]
+; CHECK: OpLabel
+;
+; CHECK: %[[#A0:]] = OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#ONE]]
+; CHECK: %[[#C0:]] = OpINotEqual %[[#BOOL]] %[[#A0]] %[[#ZERO]]
+; CHECK: %[[#I0:]] = OpCompositeInsert %[[#BVEC8]] %[[#C0]] %{{.*}} 0
+;
+; CHECK: %[[#A1:]] = OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#TWO]]
+; CHECK: %[[#C1:]] = OpINotEqual %[[#BOOL]] %[[#A1]] %[[#ZERO]]
+; CHECK: %[[#I1:]] = OpCompositeInsert %[[#BVEC8]] %[[#C1]] %[[#I0]] 1
+;
+; CHECK: %[[#A2:]] = OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#FOUR]]
+; CHECK: %[[#C2:]] = OpINotEqual %[[#BOOL]] %[[#A2]] %[[#ZERO]]
+; CHECK: %[[#I2:]] = OpCompositeInsert %[[#BVEC8]] %[[#C2]] %[[#I1]] 2
+;
+; CHECK: OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#EIGHT]]
+; CHECK: OpINotEqual %[[#BOOL]]
+; CHECK: %[[#I3:]] = OpCompositeInsert %[[#BVEC8]] %{{.*}} %[[#I2]] 3
+; CHECK: OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#C16]]
+; CHECK: OpINotEqual %[[#BOOL]]
+; CHECK: %[[#I4:]] = OpCompositeInsert %[[#BVEC8]] %{{.*}} %[[#I3]] 4
+; CHECK: OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#C32]]
+; CHECK: OpINotEqual %[[#BOOL]]
+; CHECK: %[[#I5:]] = OpCompositeInsert %[[#BVEC8]] %{{.*}} %[[#I4]] 5
+; CHECK: OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#C64]]
+; CHECK: OpINotEqual %[[#BOOL]]
+; CHECK: %[[#I6:]] = OpCompositeInsert %[[#BVEC8]] %{{.*}} %[[#I5]] 6
+;
+; CHECK: %[[#A7:]] = OpBitwiseAnd %[[#I8]] %[[#S2B_ARG]] %[[#C128]]
+; CHECK: %[[#C7B:]] = OpINotEqual %[[#BOOL]] %[[#A7]] %[[#ZERO]]
+; CHECK: %[[#I7:]] = OpCompositeInsert %[[#BVEC8]] %[[#C7B]] %[[#I6]] 7
+;
+; CHECK: OpReturnValue %[[#I7]]
+define <8 x i1> @scalar_to_boolvec(i8 %v) {
+ %r = bitcast i8 %v to <8 x i1>
+ ret <8 x i1> %r
+}
diff --git a/llvm/test/CodeGen/SPIRV/llvm-intrinsics/masked-load-store.ll b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/masked-load-store.ll
new file mode 100644
index 0000000000000..67a59ce884824
--- /dev/null
+++ b/llvm/test/CodeGen/SPIRV/llvm-intrinsics/masked-load-store.ll
@@ -0,0 +1,63 @@
+; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s
+; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val %}
+;
+; Verify that llvm.masked.load and llvm.masked.store lower correctly.
+; ScalarizeMaskedMemIntrin expands these into scalar conditional loads/stores
+; with <N x i1> -> iN bitcasts for the mask, which must be decomposed by the
+; backend as SPIR-V don't doesn't allow bitcasts to/from boolean vectors.
+
+; CHECK-DAG: %[[#BOOL:]] = OpTypeBool
+; CHECK-DAG: %[[#BVEC8:]] = OpTypeVector %[[#BOOL]] 8
+; CHECK-DAG: %[[#I8:]] = OpTypeInt 8 0
+; CHECK-DAG: %[[#VOID:]] = OpTypeVoid
+
+; CHECK-DAG: %[[#ZERO:]] = OpConstantNull %[[#I8]]
+; CHECK-DAG: %[[#ONE:]] = OpConstant %[[#I8]] 1
+
+; The scalarization of llvm.masked.load/store produces <8 x i1> -> i8 bitcast.
+; Verify the mask decomposition and scalarized conditional loads compile.
+;
+; CHECK: OpFunction
+; CHECK: %[[#ML_MASK:]] = OpFunctionParameter %[[#BVEC8]]
+; CHECK: %[[#ML_E0:]] = OpCompositeExtract %[[#BOOL]] %[[#ML_MASK]] 0
+; CHECK: %[[#ML_S0:]] = OpSelect %[[#I8]] %[[#ML_E0]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#ML_OR0:]] = OpBitwiseOr %[[#I8]] %[[#ZERO]] %[[#ML_S0]]
+; CHECK: %[[#ML_E1:]] = OpCompositeExtract %[[#BOOL]] %[[#ML_MASK]] 1
+; CHECK: %[[#ML_S1:]] = OpSelect %[[#I8]] %[[#ML_E1]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#ML_SHL1:]] = OpShiftLeftLogical %[[#I8]] %[[#ML_S1]] %[[#ONE]]
+; CHECK: %[[#ML_OR1:]] = OpBitwiseOr %[[#I8]] %[[#ML_OR0]] %[[#ML_SHL1]]
+; CHECK: %[[#ML_A0:]] = OpBitwiseAnd %[[#I8]] %{{.*}} %[[#ONE]]
+; CHECK: %[[#ML_C0:]] = OpINotEqual %[[#BOOL]] %[[#ML_A0]] %[[#ZERO]]
+; CHECK: OpBranchConditional %[[#ML_C0]]
+; CHECK: %[[#ML_LD:]] = OpLoad %[[#I8]]
+; CHECK: %[[#ML_INS:]] = OpCompositeInsert %{{.*}} %[[#ML_LD]] %{{.*}} 0
+; CHECK: OpPhi %{{.*}} %[[#ML_INS]]
+; CHECK: OpStore %{{.*}} %{{.*}}
+define void @masked_load_v8i8(<8 x i1> %mask) {
+ %v = call <8 x i8> @llvm.masked.load.v8i8.p1(ptr addrspace(1) null, <8 x i1> %mask, <8 x i8> zeroinitializer)
+ store <8 x i8> %v, ptr addrspace(3) null, align 1
+ ret void
+}
+
+; CHECK: OpFunction
+; CHECK: %[[#MS_VAL:]] = OpFunctionParameter
+; CHECK: %[[#MS_MASK:]] = OpFunctionParameter %[[#BVEC8]]
+; CHECK: %[[#MS_E0:]] = OpCompositeExtract %[[#BOOL]] %[[#MS_MASK]] 0
+; CHECK: %[[#MS_S0:]] = OpSelect %[[#I8]] %[[#MS_E0]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#MS_OR0:]] = OpBitwiseOr %[[#I8]] %[[#ZERO]] %[[#MS_S0]]
+; CHECK: %[[#MS_E1:]] = OpCompositeExtract %[[#BOOL]] %[[#MS_MASK]] 1
+; CHECK: %[[#MS_S1:]] = OpSelect %[[#I8]] %[[#MS_E1]] %[[#ONE]] %[[#ZERO]]
+; CHECK: %[[#MS_SHL1:]] = OpShiftLeftLogical %[[#I8]] %[[#MS_S1]] %[[#ONE]]
+; CHECK: %[[#MS_OR1:]] = OpBitwiseOr %[[#I8]] %[[#MS_OR0]] %[[#MS_SHL1]]
+; CHECK: %[[#MS_A0:]] = OpBitwiseAnd %[[#I8]] %{{.*}} %[[#ONE]]
+; CHECK: %[[#MS_C0:]] = OpINotEqual %[[#BOOL]] %[[#MS_A0]] %[[#ZERO]]
+; CHECK: OpBranchConditional %[[#MS_C0]]
+; CHECK: %[[#MS_ELEM:]] = OpCompositeExtract %[[#I8]] %[[#MS_VAL]] 0
+; CHECK: OpStore %{{.*}} %[[#MS_ELEM]]
+define void @masked_store_v8i8(<8 x i8> %val, <8 x i1> %mask) {
+ call void @llvm.masked.store.v8i8.p1(<8 x i8> %val, ptr addrspace(1) null, <8 x i1> %mask)
+ ret void
+}
+
+declare <8 x i8> @llvm.masked.load.v8i8.p1(ptr addrspace(1), <8 x i1>, <8 x i8>)
+declare void @llvm.masked.store.v8i8.p1(<8 x i8>, ptr addrspace(1), <8 x i1>)
|
| // bitcasts into element-wise operations before building instructions | ||
| // worklist, so new instructions are properly visited and converted to | ||
| // SPIR-V intrinsics. | ||
| void SPIRVEmitIntrinsics::preprocessBoolVectorBitcasts(Function &F) { |
There was a problem hiding this comment.
Notably, there are other approaches to fix the issue:
- The most proper way is to actually catch G_BITCAST at GlobalISel legalizer and check for src/dst. While it's "more proper" it feels like it fights the SPIR-V backend's design, which relies on IR-level preprocessing in SPIRVEmitIntrinsics.
- Override isLegalMaskedLoad/isLegalMaskedStore in SPIRVTargetTransformInfo to prevent ScalarizeMaskedMemIntrin from scalarizing - this will only help with masked.load/store (original issue), but won't help in general in case if
IN <-> <N x i1>bitcast was somehow generated.
| @@ -0,0 +1,111 @@ | |||
| ; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o - | FileCheck %s | |||
| ; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - -filetype=obj | spirv-val %} | |||
There was a problem hiding this comment.
Is it relevant for shader env?
There was a problem hiding this comment.
apparently , it's crashing for shader env. I'll investigate this before merging and either address it or open an issue.
There was a problem hiding this comment.
So, the issue is that Vulkan sets a limit of 4-element vectors. "Big" vectors should have been split, but this is work-in-progress with various known issues reported already (for example #186864). I suggest to skip test coverage for Vulkan env for this patch.
There was a problem hiding this comment.
Sounds good to me. Can we leave as todo/runx/similar?
s-perron
left a comment
There was a problem hiding this comment.
This seems like the right strategy. I did not look at the details, but, I'm fine with merging of others approve.
OpTypeBool has no defined bitwidth in SPIR-V, so OpBitcast is invalid for boolean vector types. Decompose `<N x i1> <-> iN` bitcasts into element-wise extract/shift/OR and AND/icmp/insert sequences during IR preprocessing. Fixes: https://github.com/kuhar/iree/blob/amdgcn-spirv/spirv-repros/bitcast_crash.ll and llvm#185815
OpTypeBool has no defined bitwidth in SPIR-V, so OpBitcast is invalid for boolean vector types. Decompose
<N x i1> <-> iNbitcasts into element-wise extract/shift/OR and AND/icmp/insert sequences during IR preprocessing.Fixes: https://github.com/kuhar/iree/blob/amdgcn-spirv/spirv-repros/bitcast_crash.ll and #185815