[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values #74304

RKSimon · 2023-12-04T11:39:06Z

If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.

llvmbot · 2023-12-04T11:39:31Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Simon Pilgrim (RKSimon)

Changes

If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.

Patch is 32.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74304.diff

11 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+13)
(modified) llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll (+1-2)
(modified) llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll (+5-15)
(modified) llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll (+2-4)
(modified) llvm/test/CodeGen/X86/2012-07-10-extload64.ll (+2-2)
(modified) llvm/test/CodeGen/X86/fold-load-vec.ll (+1-1)
(modified) llvm/test/CodeGen/X86/half.ll (+6-8)
(modified) llvm/test/CodeGen/X86/nontemporal-3.ll (+104-542)
(modified) llvm/test/CodeGen/X86/pr41619.ll (+1-2)
(modified) llvm/test/CodeGen/X86/vec_zero_cse.ll (+2-2)
(modified) llvm/test/CodeGen/X86/vector-shuffle-combining.ll (+1-1)

diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 2a3425a42607e..e64a380965d4a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -22243,6 +22243,19 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode *N) {
   unsigned NumElts = VecVT.getVectorNumElements();
   unsigned VecEltBitWidth = VecVT.getScalarSizeInBits();
 
+  // See if the extracted element is constant, in which case fold it if its
+  // a legal fp immediate.
+  if (IndexC && ScalarVT.isFloatingPoint()) {
+    APInt EltMask = APInt::getOneBitSet(NumElts, IndexC->getZExtValue());
+    KnownBits KnownElt = DAG.computeKnownBits(VecOp, EltMask);
+    if (KnownElt.isConstant()) {
+      APFloat CstFP =
+          APFloat(DAG.EVTToAPFloatSemantics(ScalarVT), KnownElt.getConstant());
+      if (TLI.isFPImmLegal(CstFP, ScalarVT))
+        return DAG.getConstantFP(CstFP, DL, ScalarVT);
+    }
+  }
+
   // TODO: These transforms should not require the 'hasOneUse' restriction, but
   // there are regressions on multiple targets without it. We can end up with a
   // mess of scalar and vector code if we reduce only part of the DAG to scalar.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll b/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll
index 55750ab34e17a..3c8aca5145261 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll
@@ -69,11 +69,10 @@ define void @insert_vec_v23i32_uaddlv_from_v8i16(ptr %0) {
 ; CHECK:       ; %bb.0: ; %entry
 ; CHECK-NEXT:    movi.2d v0, #0000000000000000
 ; CHECK-NEXT:    movi.2d v2, #0000000000000000
-; CHECK-NEXT:    add x8, x0, #88
+; CHECK-NEXT:    str wzr, [x0, #88]
 ; CHECK-NEXT:    uaddlv.8h s1, v0
 ; CHECK-NEXT:    stp q0, q0, [x0, #16]
 ; CHECK-NEXT:    stp q0, q0, [x0, #48]
-; CHECK-NEXT:    st1.s { v0 }[2], [x8]
 ; CHECK-NEXT:    str d0, [x0, #80]
 ; CHECK-NEXT:    mov.s v2[0], v1[0]
 ; CHECK-NEXT:    ucvtf.4s v1, v2
diff --git a/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll b/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll
index 86ebf803c5783..bbd2acbab4246 100644
--- a/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll
@@ -30,10 +30,10 @@ define [1 x <4 x float>] @test1() {
 define [1 x <4 x float>] @test2() {
 ; CHECK-LABEL:    .p2align    4, 0x0              ; -- Begin function test2
 ; CHECK-NEXT: lCPI1_0:
-; CHECK-NEXT:     .long   0x00000000              ; float 0
-; CHECK-NEXT:     .long   0x00000000              ; float 0
-; CHECK-NEXT:     .long   0x00000000              ; float 0
-; CHECK-NEXT:     .long   0x3f800000              ; float 1
+; CHECK-NEXT:     .long   0x80000000              ; float -0
+; CHECK-NEXT:     .long   0x80000000              ; float -0
+; CHECK-NEXT:     .long   0x80000000              ; float -0
+; CHECK-NEXT:     .long   0xbf800000              ; float -1
 ; CHECK-NEXT:     .section    __TEXT,__text,regular,pure_instructions
 ; CHECK-NEXT:     .globl  _test2
 ; CHECK-NEXT:     .p2align    2
@@ -43,17 +43,7 @@ define [1 x <4 x float>] @test2() {
 ; CHECK-NEXT: Lloh2:
 ; CHECK-NEXT:     adrp    x8, lCPI1_0@PAGE
 ; CHECK-NEXT: Lloh3:
-; CHECK-NEXT:     ldr q1, [x8, lCPI1_0@PAGEOFF]
-; CHECK-NEXT:     mov s2, v1[1]
-; CHECK-NEXT:     fneg    s0, s1
-; CHECK-NEXT:     mov s3, v1[2]
-; CHECK-NEXT:     mov s1, v1[3]
-; CHECK-NEXT:     fneg    s2, s2
-; CHECK-NEXT:     fneg    s3, s3
-; CHECK-NEXT:     fneg    s1, s1
-; CHECK-NEXT:     mov.s   v0[1], v2[0]
-; CHECK-NEXT:     mov.s   v0[2], v3[0]
-; CHECK-NEXT:     mov.s   v0[3], v1[0]
+; CHECK-NEXT:     ldr q0, [x8, lCPI1_0@PAGEOFF]
 ; CHECK-NEXT:     ret
 ;
   %constexpr = fneg float extractelement (<4 x float> bitcast (<1 x i128> <i128 84405977732342157929391748327801880576> to <4 x float>), i32 0)
diff --git a/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll b/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll
index e7f62b9dfc221..d3f410d37567c 100644
--- a/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll
+++ b/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll
@@ -49,14 +49,12 @@ entry:
 define void @zero_test() {
 ; X86-LABEL: zero_test:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    xorps %xmm0, %xmm0
-; X86-NEXT:    movlps %xmm0, (%eax)
+; X86-NEXT:    movl $0, (%eax)
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: zero_test:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    xorps %xmm0, %xmm0
-; X64-NEXT:    movlps %xmm0, (%rax)
+; X64-NEXT:    movq $0, (%rax)
 ; X64-NEXT:    retq
 entry:
   %0 = select <2 x i1> undef, <2 x float> undef, <2 x float> zeroinitializer
diff --git a/llvm/test/CodeGen/X86/2012-07-10-extload64.ll b/llvm/test/CodeGen/X86/2012-07-10-extload64.ll
index b6ec3b34eb107..f4ec8bde3700a 100644
--- a/llvm/test/CodeGen/X86/2012-07-10-extload64.ll
+++ b/llvm/test/CodeGen/X86/2012-07-10-extload64.ll
@@ -29,8 +29,8 @@ define void @store_64(ptr %ptr) {
 ; X86-LABEL: store_64:
 ; X86:       # %bb.0: # %BB
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    xorps %xmm0, %xmm0
-; X86-NEXT:    movlps %xmm0, (%eax)
+; X86-NEXT:    movl $0, 4(%eax)
+; X86-NEXT:    movl $0, (%eax)
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: store_64:
diff --git a/llvm/test/CodeGen/X86/fold-load-vec.ll b/llvm/test/CodeGen/X86/fold-load-vec.ll
index 348929cdf9f79..0bf846a0930bb 100644
--- a/llvm/test/CodeGen/X86/fold-load-vec.ll
+++ b/llvm/test/CodeGen/X86/fold-load-vec.ll
@@ -10,8 +10,8 @@ define void @sample_test(ptr %source, ptr %dest) nounwind {
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    movq %rdi, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    movq %rsi, {{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movq $0, (%rsp)
 ; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    movlps %xmm0, (%rsp)
 ; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movlps %xmm0, (%rsp)
 ; CHECK-NEXT:    movlps %xmm0, (%rsi)
diff --git a/llvm/test/CodeGen/X86/half.ll b/llvm/test/CodeGen/X86/half.ll
index 596e465ee8cac..7225257203161 100644
--- a/llvm/test/CodeGen/X86/half.ll
+++ b/llvm/test/CodeGen/X86/half.ll
@@ -1082,12 +1082,11 @@ define void @main.158() #0 {
 ; BWON-F16C-LABEL: main.158:
 ; BWON-F16C:       # %bb.0: # %entry
 ; BWON-F16C-NEXT:    vxorps %xmm0, %xmm0, %xmm0
-; BWON-F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
-; BWON-F16C-NEXT:    vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
-; BWON-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
-; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; BWON-F16C-NEXT:    vucomiss %xmm0, %xmm1
-; BWON-F16C-NEXT:    vxorps %xmm0, %xmm0, %xmm0
+; BWON-F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm1
+; BWON-F16C-NEXT:    vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
+; BWON-F16C-NEXT:    vcvtph2ps %xmm1, %xmm1
+; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
+; BWON-F16C-NEXT:    vucomiss %xmm1, %xmm2
 ; BWON-F16C-NEXT:    jae .LBB20_2
 ; BWON-F16C-NEXT:  # %bb.1: # %entry
 ; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
@@ -1100,8 +1099,7 @@ define void @main.158() #0 {
 ; CHECK-I686-LABEL: main.158:
 ; CHECK-I686:       # %bb.0: # %entry
 ; CHECK-I686-NEXT:    subl $12, %esp
-; CHECK-I686-NEXT:    pxor %xmm0, %xmm0
-; CHECK-I686-NEXT:    movd %xmm0, (%esp)
+; CHECK-I686-NEXT:    movl $0, (%esp)
 ; CHECK-I686-NEXT:    calll __truncsfhf2
 ; CHECK-I686-NEXT:    pextrw $0, %xmm0, %eax
 ; CHECK-I686-NEXT:    movw %ax, (%esp)
diff --git a/llvm/test/CodeGen/X86/nontemporal-3.ll b/llvm/test/CodeGen/X86/nontemporal-3.ll
index a2d2c5ca43011..f9872b10097a1 100644
--- a/llvm/test/CodeGen/X86/nontemporal-3.ll
+++ b/llvm/test/CodeGen/X86/nontemporal-3.ll
@@ -93,247 +93,66 @@ define void @test_zero_v4f64_align1(ptr %dst) nounwind {
 }
 
 define void @test_zero_v8f32_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v8f32_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v8f32_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorl %eax, %eax
-; SSE4A-NEXT:    movntiq %rax, 8(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 24(%rdi)
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v8f32_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v8f32_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v8f32_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v8f32_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <8 x float> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v4i64_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v4i64_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v4i64_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v4i64_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v4i64_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v4i64_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v4i64_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <4 x i64> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v8i32_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v8i32_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v8i32_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v8i32_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v8i32_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v8i32_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v8i32_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <8 x i32> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v16i16_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v16i16_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v16i16_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v16i16_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v16i16_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v16i16_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v16i16_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <16 x i16> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v32i8_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v32i8_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v32i8_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v32i8_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v32i8_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v32i8_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v32i8_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <32 x i8> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
@@ -508,347 +327,86 @@ define void @test_zero_v8f64_align1(ptr %dst) nounwind {
 }
 
 define void @test_zero_v16f32_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v16f32_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    movntiq %rax, 40(%rdi)
-; SSE2-NEXT:    movntiq %rax, 32(%rdi)
-; SSE2-NEXT:    movntiq %rax, 56(%rdi)
-; SSE2-NEXT:    movntiq %rax, 48(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v16f32_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorl %eax, %eax
-; SSE4A-NEXT:    movntiq %rax, 8(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 24(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 40(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 56(%rdi)
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 32(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 48(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v16f32_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    movntiq %rax, 40(%rdi)
-; SSE41-NEXT:    movntiq %rax, 32(%rdi)
-; SSE41-NEXT:    movntiq %rax, 56(%rdi)
-; SSE41-NEXT:    movntiq %rax, 48(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v16f32_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    movntiq %rax, 40(%rdi)
-; AVX-NEXT:    movntiq %rax, 32(%rdi)
-; AVX-NEXT:    movntiq %rax, 56(%rdi)
-; AVX-NEXT:    movntiq %rax, 48(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v16f32_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    movntiq %rax, 40(%rdi)
-; AVX512-NEXT:    movntiq %rax, 32(%rdi)
-; AVX512-NEXT:    movntiq %rax, 56(%rdi)
-; AVX512-NEXT:    movntiq %rax, 48(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v16f32_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    movntiq %rax, 40(%rdi)
+; CHECK-NEXT:    movntiq %rax, 32(%rdi)
+; CHECK-NEXT:    movntiq %rax, 56(%rdi)
+; CHECK-NEXT:    movntiq %rax, 48(%rdi)
+; CHECK-NEXT:    retq
   store <16 x float> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v8i64_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v8i64_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %...
[truncated]

llvm/test/CodeGen/X86/nontemporal-3.ll

llvm/test/CodeGen/X86/2012-07-10-extload64.ll

No need to generate/spill/restore to cpu stack Cleanup work to allow us to properly use isFPImmLegal and fix some regressions encountered while looking at llvm#74304

No need to generate/spill/restore to cpu stack Cleanup work to allow us to properly use isFPImmLegal and fix some regressions encountered while looking at #74304

If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores Another cleanup step for llvm#74304

If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores Another cleanup step for #74304

arsenm · 2023-12-07T11:15:14Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

+  if (IndexC && ScalarVT.isFloatingPoint()) {
+    APInt EltMask = APInt::getOneBitSet(NumElts, IndexC->getZExtValue());
+    KnownBits KnownElt = DAG.computeKnownBits(VecOp, EltMask);
+    if (KnownElt.isConstant()) {


Why is this fancy instead of just using dyn_cast<ConstantSDNode>?

Its to handle bitcasts of vector data (e.g. <2 x i32>), which don't get folded away if we have multiple uses

phoebewang

LGTM.

If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.

RKSimon requested review from nikic, phoebewang, davemgreen and goldsteinn December 4, 2023 11:39

llvmbot added backend:AArch64 backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Dec 4, 2023

RKSimon force-pushed the perf/extract-fpimm branch from 47256f7 to 25dd561 Compare December 5, 2023 09:39

RKSimon mentioned this pull request Dec 5, 2023

[DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value #74443

Merged

phoebewang reviewed Dec 5, 2023

View reviewed changes

llvm/test/CodeGen/X86/nontemporal-3.ll Show resolved Hide resolved

llvm/test/CodeGen/X86/2012-07-10-extload64.ll Outdated Show resolved Hide resolved

RKSimon force-pushed the perf/extract-fpimm branch from 25dd561 to 92591d7 Compare December 5, 2023 16:05

RKSimon mentioned this pull request Dec 6, 2023

[X86] Set x87 fld1/fldz pseudo instructions as rematerializable #74592

Merged

RKSimon marked this pull request as draft December 6, 2023 13:54

RKSimon mentioned this pull request Dec 6, 2023

[DAG] Don't split f64 constant stores if the fp imm is legal #74622

Merged

RKSimon force-pushed the perf/extract-fpimm branch from 92591d7 to bc7ee75 Compare December 6, 2023 17:40

RKSimon added a commit that referenced this pull request Dec 7, 2023

[DAG] Don't split f64 constant stores if the fp imm is legal (#74622)

22df088

If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores Another cleanup step for #74304

RKSimon force-pushed the perf/extract-fpimm branch from bc7ee75 to 0d1f7fc Compare December 7, 2023 10:58

RKSimon marked this pull request as ready for review December 7, 2023 11:00

arsenm reviewed Dec 7, 2023

View reviewed changes

RKSimon force-pushed the perf/extract-fpimm branch from 0d1f7fc to 7599f06 Compare December 7, 2023 11:55

phoebewang approved these changes Dec 7, 2023

View reviewed changes

[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values

3ae3edc

If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.

RKSimon force-pushed the perf/extract-fpimm branch from 7599f06 to 3ae3edc Compare December 7, 2023 14:24

RKSimon merged commit f1200ca into llvm:main Dec 7, 2023
3 of 4 checks passed

RKSimon deleted the perf/extract-fpimm branch December 7, 2023 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values #74304

[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values #74304

RKSimon commented Dec 4, 2023

llvmbot commented Dec 4, 2023 •

edited

arsenm Dec 7, 2023 •

edited

RKSimon Dec 7, 2023

phoebewang left a comment

[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values #74304

[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values #74304

Conversation

RKSimon commented Dec 4, 2023

llvmbot commented Dec 4, 2023 • edited

arsenm Dec 7, 2023 • edited

Choose a reason for hiding this comment

RKSimon Dec 7, 2023

Choose a reason for hiding this comment

phoebewang left a comment

Choose a reason for hiding this comment

llvmbot commented Dec 4, 2023 •

edited

arsenm Dec 7, 2023 •

edited