diff --git a/llvm/lib/CodeGen/ExpandVectorPredication.cpp b/llvm/lib/CodeGen/ExpandVectorPredication.cpp index 2d4da33a566dc..6c873a9aee27f 100644 --- a/llvm/lib/CodeGen/ExpandVectorPredication.cpp +++ b/llvm/lib/CodeGen/ExpandVectorPredication.cpp @@ -123,9 +123,12 @@ static bool maySpeculateLanes(VPIntrinsic &VPI) { if (isa(VPI)) return false; // Fallback to whether the intrinsic is speculatable. - std::optional OpcOpt = VPI.getFunctionalOpcode(); - unsigned FunctionalOpc = OpcOpt.value_or((unsigned)Instruction::Call); - return isSafeToSpeculativelyExecuteWithOpcode(FunctionalOpc, &VPI); + if (auto IntrID = VPI.getFunctionalIntrinsicID()) + return Intrinsic::getAttributes(VPI.getContext(), *IntrID) + .hasFnAttr(Attribute::AttrKind::Speculatable); + if (auto Opc = VPI.getFunctionalOpcode()) + return isSafeToSpeculativelyExecuteWithOpcode(*Opc, &VPI); + return false; } //// } Helpers diff --git a/llvm/test/CodeGen/Generic/expand-vp.ll b/llvm/test/CodeGen/Generic/expand-vp.ll index 509f86a64d9ce..40d183273b86d 100644 --- a/llvm/test/CodeGen/Generic/expand-vp.ll +++ b/llvm/test/CodeGen/Generic/expand-vp.ll @@ -18,6 +18,10 @@ declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) +declare <8 x i32> @llvm.vp.smax.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) +declare <8 x i32> @llvm.vp.smin.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) +declare <8 x i32> @llvm.vp.umax.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) +declare <8 x i32> @llvm.vp.umin.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) ; Bit arith declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) @@ -52,12 +56,16 @@ define void @test_vp_int_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i3 %r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) %r5 = call <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) %r6 = call <8 x i32> @llvm.vp.urem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) - %r7 = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) - %r8 = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) - %r9 = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) - %rA = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) - %rB = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) - %rC = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %r7 = call <8 x i32> @llvm.vp.smax.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %r8 = call <8 x i32> @llvm.vp.smin.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %r9 = call <8 x i32> @llvm.vp.umax.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %rA = call <8 x i32> @llvm.vp.umin.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %rB = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %rC = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %rD = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %rE = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %rF = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) + %r10 = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) ret void } @@ -70,6 +78,10 @@ declare @llvm.vp.sdiv.nxv4i32(, @llvm.vp.srem.nxv4i32(, , , i32) declare @llvm.vp.udiv.nxv4i32(, , , i32) declare @llvm.vp.urem.nxv4i32(, , , i32) +declare @llvm.vp.smax.nxv4i32(, , , i32) +declare @llvm.vp.smin.nxv4i32(, , , i32) +declare @llvm.vp.umax.nxv4i32(, , , i32) +declare @llvm.vp.umin.nxv4i32(, , , i32) ; Bit arith declare @llvm.vp.and.nxv4i32(, , , i32) declare @llvm.vp.xor.nxv4i32(, , , i32) @@ -87,12 +99,16 @@ define void @test_vp_int_vscale( %i0, %i1, %r4 = call @llvm.vp.srem.nxv4i32( %i0, %i1, %m, i32 %n) %r5 = call @llvm.vp.udiv.nxv4i32( %i0, %i1, %m, i32 %n) %r6 = call @llvm.vp.urem.nxv4i32( %i0, %i1, %m, i32 %n) - %r7 = call @llvm.vp.and.nxv4i32( %i0, %i1, %m, i32 %n) - %r8 = call @llvm.vp.or.nxv4i32( %i0, %i1, %m, i32 %n) - %r9 = call @llvm.vp.xor.nxv4i32( %i0, %i1, %m, i32 %n) - %rA = call @llvm.vp.ashr.nxv4i32( %i0, %i1, %m, i32 %n) - %rB = call @llvm.vp.lshr.nxv4i32( %i0, %i1, %m, i32 %n) - %rC = call @llvm.vp.shl.nxv4i32( %i0, %i1, %m, i32 %n) + %r7 = call @llvm.vp.smax.nxv4i32( %i0, %i1, %m, i32 %n) + %r8 = call @llvm.vp.smin.nxv4i32( %i0, %i1, %m, i32 %n) + %r9 = call @llvm.vp.umax.nxv4i32( %i0, %i1, %m, i32 %n) + %rA = call @llvm.vp.umin.nxv4i32( %i0, %i1, %m, i32 %n) + %rB = call @llvm.vp.and.nxv4i32( %i0, %i1, %m, i32 %n) + %rC = call @llvm.vp.or.nxv4i32( %i0, %i1, %m, i32 %n) + %rD = call @llvm.vp.xor.nxv4i32( %i0, %i1, %m, i32 %n) + %rE = call @llvm.vp.ashr.nxv4i32( %i0, %i1, %m, i32 %n) + %rF = call @llvm.vp.lshr.nxv4i32( %i0, %i1, %m, i32 %n) + %r10 = call @llvm.vp.shl.nxv4i32( %i0, %i1, %m, i32 %n) ret void } @@ -166,6 +182,10 @@ define void @test_vp_cmp_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x float> %f0, <8 x ; ALL-CONVERT: %{{.+}} = udiv <8 x i32> %i0, %{{.+}} ; ALL-CONVERT-NOT: %{{.+}} = urem <8 x i32> %i0, %i1 ; ALL-CONVERT: %{{.+}} = urem <8 x i32> %i0, %{{.+}} +; ALL-CONVERT-NEXT: %{{.+}} = call <8 x i32> @llvm.smax.v8i32(<8 x i32> %i0, <8 x i32> %i1) +; ALL-CONVERT-NEXT: %{{.+}} = call <8 x i32> @llvm.smin.v8i32(<8 x i32> %i0, <8 x i32> %i1) +; ALL-CONVERT-NEXT: %{{.+}} = call <8 x i32> @llvm.umax.v8i32(<8 x i32> %i0, <8 x i32> %i1) +; ALL-CONVERT-NEXT: %{{.+}} = call <8 x i32> @llvm.umin.v8i32(<8 x i32> %i0, <8 x i32> %i1) ; ALL-CONVERT-NEXT: %{{.+}} = and <8 x i32> %i0, %i1 ; ALL-CONVERT-NEXT: %{{.+}} = or <8 x i32> %i0, %i1 ; ALL-CONVERT-NEXT: %{{.+}} = xor <8 x i32> %i0, %i1 @@ -263,12 +283,16 @@ define void @test_vp_cmp_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x float> %f0, <8 x ; LEGAL_LEGAL-NEXT: %r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) ; LEGAL_LEGAL-NEXT: %r5 = call <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) ; LEGAL_LEGAL-NEXT: %r6 = call <8 x i32> @llvm.vp.urem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) -; LEGAL_LEGAL-NEXT: %r7 = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) -; LEGAL_LEGAL-NEXT: %r8 = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) -; LEGAL_LEGAL-NEXT: %r9 = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) -; LEGAL_LEGAL-NEXT: %rA = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) -; LEGAL_LEGAL-NEXT: %rB = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) -; LEGAL_LEGAL-NEXT: %rC = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r7 = call <8 x i32> @llvm.vp.smax.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r8 = call <8 x i32> @llvm.vp.smin.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r9 = call <8 x i32> @llvm.vp.umax.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rA = call <8 x i32> @llvm.vp.umin.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rB = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rC = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rD = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rE = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rF = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r10 = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) ; LEGAL_LEGAL-NEXT: ret void ; LEGAL_LEGAL:define void @test_vp_int_vscale( %i0, %i1, %i2, %f3, %m, i32 %n) { @@ -279,12 +303,16 @@ define void @test_vp_cmp_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x float> %f0, <8 x ; LEGAL_LEGAL-NEXT: %r4 = call @llvm.vp.srem.nxv4i32( %i0, %i1, %m, i32 %n) ; LEGAL_LEGAL-NEXT: %r5 = call @llvm.vp.udiv.nxv4i32( %i0, %i1, %m, i32 %n) ; LEGAL_LEGAL-NEXT: %r6 = call @llvm.vp.urem.nxv4i32( %i0, %i1, %m, i32 %n) -; LEGAL_LEGAL-NEXT: %r7 = call @llvm.vp.and.nxv4i32( %i0, %i1, %m, i32 %n) -; LEGAL_LEGAL-NEXT: %r8 = call @llvm.vp.or.nxv4i32( %i0, %i1, %m, i32 %n) -; LEGAL_LEGAL-NEXT: %r9 = call @llvm.vp.xor.nxv4i32( %i0, %i1, %m, i32 %n) -; LEGAL_LEGAL-NEXT: %rA = call @llvm.vp.ashr.nxv4i32( %i0, %i1, %m, i32 %n) -; LEGAL_LEGAL-NEXT: %rB = call @llvm.vp.lshr.nxv4i32( %i0, %i1, %m, i32 %n) -; LEGAL_LEGAL-NEXT: %rC = call @llvm.vp.shl.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r7 = call @llvm.vp.smax.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r8 = call @llvm.vp.smin.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r9 = call @llvm.vp.umax.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rA = call @llvm.vp.umin.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rB = call @llvm.vp.and.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rC = call @llvm.vp.or.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rD = call @llvm.vp.xor.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rE = call @llvm.vp.ashr.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %rF = call @llvm.vp.lshr.nxv4i32( %i0, %i1, %m, i32 %n) +; LEGAL_LEGAL-NEXT: %r10 = call @llvm.vp.shl.nxv4i32( %i0, %i1, %m, i32 %n) ; LEGAL_LEGAL-NEXT: ret void ; LEGAL_LEGAL: define void @test_vp_reduce_int_v4(i32 %start, <4 x i32> %vi, <4 x i1> %m, i32 %n) { @@ -342,12 +370,16 @@ define void @test_vp_cmp_v8(<8 x i32> %i0, <8 x i32> %i1, <8 x float> %f0, <8 x ; DISCARD_LEGAL-NOT: %r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) ; DISCARD_LEGAL-NOT: %r5 = call <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) ; DISCARD_LEGAL-NOT: %r6 = call <8 x i32> @llvm.vp.urem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) -; DISCARD_LEGAL: %r7 = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) -; DISCARD_LEGAL-NEXT: %r8 = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) -; DISCARD_LEGAL-NEXT: %r9 = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) -; DISCARD_LEGAL-NEXT: %rA = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) -; DISCARD_LEGAL-NEXT: %rB = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) -; DISCARD_LEGAL-NEXT: %rC = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL: %r7 = call <8 x i32> @llvm.vp.smax.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL: %r8 = call <8 x i32> @llvm.vp.smin.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL: %r9 = call <8 x i32> @llvm.vp.umax.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL: %rA = call <8 x i32> @llvm.vp.umin.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL-NEXT: %rB = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL-NEXT: %rC = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL-NEXT: %rD = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL-NEXT: %rE = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL-NEXT: %rF = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) +; DISCARD_LEGAL-NEXT: %r10 = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 8) ; DISCARD_LEGAL-NEXT: ret void ; TODO compute vscale only once and use caching.