Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DAG] visitEXTRACT_VECTOR_ELT - constant fold legal fp imm values #74304

Merged
merged 1 commit into from
Dec 7, 2023

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Dec 4, 2023

If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 4, 2023

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Simon Pilgrim (RKSimon)

Changes

If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.


Patch is 32.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74304.diff

11 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+13)
  • (modified) llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll (+1-2)
  • (modified) llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll (+5-15)
  • (modified) llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll (+2-4)
  • (modified) llvm/test/CodeGen/X86/2012-07-10-extload64.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/fold-load-vec.ll (+1-1)
  • (modified) llvm/test/CodeGen/X86/half.ll (+6-8)
  • (modified) llvm/test/CodeGen/X86/nontemporal-3.ll (+104-542)
  • (modified) llvm/test/CodeGen/X86/pr41619.ll (+1-2)
  • (modified) llvm/test/CodeGen/X86/vec_zero_cse.ll (+2-2)
  • (modified) llvm/test/CodeGen/X86/vector-shuffle-combining.ll (+1-1)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 2a3425a42607e..e64a380965d4a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -22243,6 +22243,19 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode *N) {
   unsigned NumElts = VecVT.getVectorNumElements();
   unsigned VecEltBitWidth = VecVT.getScalarSizeInBits();
 
+  // See if the extracted element is constant, in which case fold it if its
+  // a legal fp immediate.
+  if (IndexC && ScalarVT.isFloatingPoint()) {
+    APInt EltMask = APInt::getOneBitSet(NumElts, IndexC->getZExtValue());
+    KnownBits KnownElt = DAG.computeKnownBits(VecOp, EltMask);
+    if (KnownElt.isConstant()) {
+      APFloat CstFP =
+          APFloat(DAG.EVTToAPFloatSemantics(ScalarVT), KnownElt.getConstant());
+      if (TLI.isFPImmLegal(CstFP, ScalarVT))
+        return DAG.getConstantFP(CstFP, DL, ScalarVT);
+    }
+  }
+
   // TODO: These transforms should not require the 'hasOneUse' restriction, but
   // there are regressions on multiple targets without it. We can end up with a
   // mess of scalar and vector code if we reduce only part of the DAG to scalar.
diff --git a/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll b/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll
index 55750ab34e17a..3c8aca5145261 100644
--- a/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll
+++ b/llvm/test/CodeGen/AArch64/aarch64-neon-vector-insert-uaddlv.ll
@@ -69,11 +69,10 @@ define void @insert_vec_v23i32_uaddlv_from_v8i16(ptr %0) {
 ; CHECK:       ; %bb.0: ; %entry
 ; CHECK-NEXT:    movi.2d v0, #0000000000000000
 ; CHECK-NEXT:    movi.2d v2, #0000000000000000
-; CHECK-NEXT:    add x8, x0, #88
+; CHECK-NEXT:    str wzr, [x0, #88]
 ; CHECK-NEXT:    uaddlv.8h s1, v0
 ; CHECK-NEXT:    stp q0, q0, [x0, #16]
 ; CHECK-NEXT:    stp q0, q0, [x0, #48]
-; CHECK-NEXT:    st1.s { v0 }[2], [x8]
 ; CHECK-NEXT:    str d0, [x0, #80]
 ; CHECK-NEXT:    mov.s v2[0], v1[0]
 ; CHECK-NEXT:    ucvtf.4s v1, v2
diff --git a/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll b/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll
index 86ebf803c5783..bbd2acbab4246 100644
--- a/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-promote-const-complex-initializers.ll
@@ -30,10 +30,10 @@ define [1 x <4 x float>] @test1() {
 define [1 x <4 x float>] @test2() {
 ; CHECK-LABEL:    .p2align    4, 0x0              ; -- Begin function test2
 ; CHECK-NEXT: lCPI1_0:
-; CHECK-NEXT:     .long   0x00000000              ; float 0
-; CHECK-NEXT:     .long   0x00000000              ; float 0
-; CHECK-NEXT:     .long   0x00000000              ; float 0
-; CHECK-NEXT:     .long   0x3f800000              ; float 1
+; CHECK-NEXT:     .long   0x80000000              ; float -0
+; CHECK-NEXT:     .long   0x80000000              ; float -0
+; CHECK-NEXT:     .long   0x80000000              ; float -0
+; CHECK-NEXT:     .long   0xbf800000              ; float -1
 ; CHECK-NEXT:     .section    __TEXT,__text,regular,pure_instructions
 ; CHECK-NEXT:     .globl  _test2
 ; CHECK-NEXT:     .p2align    2
@@ -43,17 +43,7 @@ define [1 x <4 x float>] @test2() {
 ; CHECK-NEXT: Lloh2:
 ; CHECK-NEXT:     adrp    x8, lCPI1_0@PAGE
 ; CHECK-NEXT: Lloh3:
-; CHECK-NEXT:     ldr q1, [x8, lCPI1_0@PAGEOFF]
-; CHECK-NEXT:     mov s2, v1[1]
-; CHECK-NEXT:     fneg    s0, s1
-; CHECK-NEXT:     mov s3, v1[2]
-; CHECK-NEXT:     mov s1, v1[3]
-; CHECK-NEXT:     fneg    s2, s2
-; CHECK-NEXT:     fneg    s3, s3
-; CHECK-NEXT:     fneg    s1, s1
-; CHECK-NEXT:     mov.s   v0[1], v2[0]
-; CHECK-NEXT:     mov.s   v0[2], v3[0]
-; CHECK-NEXT:     mov.s   v0[3], v1[0]
+; CHECK-NEXT:     ldr q0, [x8, lCPI1_0@PAGEOFF]
 ; CHECK-NEXT:     ret
 ;
   %constexpr = fneg float extractelement (<4 x float> bitcast (<1 x i128> <i128 84405977732342157929391748327801880576> to <4 x float>), i32 0)
diff --git a/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll b/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll
index e7f62b9dfc221..d3f410d37567c 100644
--- a/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll
+++ b/llvm/test/CodeGen/X86/2011-10-19-widen_vselect.ll
@@ -49,14 +49,12 @@ entry:
 define void @zero_test() {
 ; X86-LABEL: zero_test:
 ; X86:       # %bb.0: # %entry
-; X86-NEXT:    xorps %xmm0, %xmm0
-; X86-NEXT:    movlps %xmm0, (%eax)
+; X86-NEXT:    movl $0, (%eax)
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: zero_test:
 ; X64:       # %bb.0: # %entry
-; X64-NEXT:    xorps %xmm0, %xmm0
-; X64-NEXT:    movlps %xmm0, (%rax)
+; X64-NEXT:    movq $0, (%rax)
 ; X64-NEXT:    retq
 entry:
   %0 = select <2 x i1> undef, <2 x float> undef, <2 x float> zeroinitializer
diff --git a/llvm/test/CodeGen/X86/2012-07-10-extload64.ll b/llvm/test/CodeGen/X86/2012-07-10-extload64.ll
index b6ec3b34eb107..f4ec8bde3700a 100644
--- a/llvm/test/CodeGen/X86/2012-07-10-extload64.ll
+++ b/llvm/test/CodeGen/X86/2012-07-10-extload64.ll
@@ -29,8 +29,8 @@ define void @store_64(ptr %ptr) {
 ; X86-LABEL: store_64:
 ; X86:       # %bb.0: # %BB
 ; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
-; X86-NEXT:    xorps %xmm0, %xmm0
-; X86-NEXT:    movlps %xmm0, (%eax)
+; X86-NEXT:    movl $0, 4(%eax)
+; X86-NEXT:    movl $0, (%eax)
 ; X86-NEXT:    retl
 ;
 ; X64-LABEL: store_64:
diff --git a/llvm/test/CodeGen/X86/fold-load-vec.ll b/llvm/test/CodeGen/X86/fold-load-vec.ll
index 348929cdf9f79..0bf846a0930bb 100644
--- a/llvm/test/CodeGen/X86/fold-load-vec.ll
+++ b/llvm/test/CodeGen/X86/fold-load-vec.ll
@@ -10,8 +10,8 @@ define void @sample_test(ptr %source, ptr %dest) nounwind {
 ; CHECK-NEXT:    subq $24, %rsp
 ; CHECK-NEXT:    movq %rdi, {{[0-9]+}}(%rsp)
 ; CHECK-NEXT:    movq %rsi, {{[0-9]+}}(%rsp)
+; CHECK-NEXT:    movq $0, (%rsp)
 ; CHECK-NEXT:    xorps %xmm0, %xmm0
-; CHECK-NEXT:    movlps %xmm0, (%rsp)
 ; CHECK-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
 ; CHECK-NEXT:    movlps %xmm0, (%rsp)
 ; CHECK-NEXT:    movlps %xmm0, (%rsi)
diff --git a/llvm/test/CodeGen/X86/half.ll b/llvm/test/CodeGen/X86/half.ll
index 596e465ee8cac..7225257203161 100644
--- a/llvm/test/CodeGen/X86/half.ll
+++ b/llvm/test/CodeGen/X86/half.ll
@@ -1082,12 +1082,11 @@ define void @main.158() #0 {
 ; BWON-F16C-LABEL: main.158:
 ; BWON-F16C:       # %bb.0: # %entry
 ; BWON-F16C-NEXT:    vxorps %xmm0, %xmm0, %xmm0
-; BWON-F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
-; BWON-F16C-NEXT:    vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
-; BWON-F16C-NEXT:    vcvtph2ps %xmm0, %xmm0
-; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; BWON-F16C-NEXT:    vucomiss %xmm0, %xmm1
-; BWON-F16C-NEXT:    vxorps %xmm0, %xmm0, %xmm0
+; BWON-F16C-NEXT:    vcvtps2ph $4, %xmm0, %xmm1
+; BWON-F16C-NEXT:    vpmovzxwq {{.*#+}} xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero
+; BWON-F16C-NEXT:    vcvtph2ps %xmm1, %xmm1
+; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero
+; BWON-F16C-NEXT:    vucomiss %xmm1, %xmm2
 ; BWON-F16C-NEXT:    jae .LBB20_2
 ; BWON-F16C-NEXT:  # %bb.1: # %entry
 ; BWON-F16C-NEXT:    vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
@@ -1100,8 +1099,7 @@ define void @main.158() #0 {
 ; CHECK-I686-LABEL: main.158:
 ; CHECK-I686:       # %bb.0: # %entry
 ; CHECK-I686-NEXT:    subl $12, %esp
-; CHECK-I686-NEXT:    pxor %xmm0, %xmm0
-; CHECK-I686-NEXT:    movd %xmm0, (%esp)
+; CHECK-I686-NEXT:    movl $0, (%esp)
 ; CHECK-I686-NEXT:    calll __truncsfhf2
 ; CHECK-I686-NEXT:    pextrw $0, %xmm0, %eax
 ; CHECK-I686-NEXT:    movw %ax, (%esp)
diff --git a/llvm/test/CodeGen/X86/nontemporal-3.ll b/llvm/test/CodeGen/X86/nontemporal-3.ll
index a2d2c5ca43011..f9872b10097a1 100644
--- a/llvm/test/CodeGen/X86/nontemporal-3.ll
+++ b/llvm/test/CodeGen/X86/nontemporal-3.ll
@@ -93,247 +93,66 @@ define void @test_zero_v4f64_align1(ptr %dst) nounwind {
 }
 
 define void @test_zero_v8f32_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v8f32_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v8f32_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorl %eax, %eax
-; SSE4A-NEXT:    movntiq %rax, 8(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 24(%rdi)
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v8f32_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v8f32_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v8f32_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v8f32_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <8 x float> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v4i64_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v4i64_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v4i64_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v4i64_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v4i64_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v4i64_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v4i64_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <4 x i64> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v8i32_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v8i32_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v8i32_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v8i32_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v8i32_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v8i32_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v8i32_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <8 x i32> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v16i16_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v16i16_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v16i16_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v16i16_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v16i16_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v16i16_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v16i16_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <16 x i16> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v32i8_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v32i8_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v32i8_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, 8(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 24(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v32i8_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v32i8_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v32i8_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v32i8_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    retq
   store <32 x i8> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
@@ -508,347 +327,86 @@ define void @test_zero_v8f64_align1(ptr %dst) nounwind {
 }
 
 define void @test_zero_v16f32_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v16f32_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %rax, 16(%rdi)
-; SSE2-NEXT:    movntiq %rax, 40(%rdi)
-; SSE2-NEXT:    movntiq %rax, 32(%rdi)
-; SSE2-NEXT:    movntiq %rax, 56(%rdi)
-; SSE2-NEXT:    movntiq %rax, 48(%rdi)
-; SSE2-NEXT:    retq
-;
-; SSE4A-LABEL: test_zero_v16f32_align1:
-; SSE4A:       # %bb.0:
-; SSE4A-NEXT:    xorl %eax, %eax
-; SSE4A-NEXT:    movntiq %rax, 8(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 24(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 40(%rdi)
-; SSE4A-NEXT:    movntiq %rax, 56(%rdi)
-; SSE4A-NEXT:    xorps %xmm0, %xmm0
-; SSE4A-NEXT:    movntsd %xmm0, (%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 16(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 32(%rdi)
-; SSE4A-NEXT:    movntsd %xmm0, 48(%rdi)
-; SSE4A-NEXT:    retq
-;
-; SSE41-LABEL: test_zero_v16f32_align1:
-; SSE41:       # %bb.0:
-; SSE41-NEXT:    xorl %eax, %eax
-; SSE41-NEXT:    movntiq %rax, 8(%rdi)
-; SSE41-NEXT:    movntiq %rax, (%rdi)
-; SSE41-NEXT:    movntiq %rax, 24(%rdi)
-; SSE41-NEXT:    movntiq %rax, 16(%rdi)
-; SSE41-NEXT:    movntiq %rax, 40(%rdi)
-; SSE41-NEXT:    movntiq %rax, 32(%rdi)
-; SSE41-NEXT:    movntiq %rax, 56(%rdi)
-; SSE41-NEXT:    movntiq %rax, 48(%rdi)
-; SSE41-NEXT:    retq
-;
-; AVX-LABEL: test_zero_v16f32_align1:
-; AVX:       # %bb.0:
-; AVX-NEXT:    xorl %eax, %eax
-; AVX-NEXT:    movntiq %rax, 8(%rdi)
-; AVX-NEXT:    movntiq %rax, (%rdi)
-; AVX-NEXT:    movntiq %rax, 24(%rdi)
-; AVX-NEXT:    movntiq %rax, 16(%rdi)
-; AVX-NEXT:    movntiq %rax, 40(%rdi)
-; AVX-NEXT:    movntiq %rax, 32(%rdi)
-; AVX-NEXT:    movntiq %rax, 56(%rdi)
-; AVX-NEXT:    movntiq %rax, 48(%rdi)
-; AVX-NEXT:    retq
-;
-; AVX512-LABEL: test_zero_v16f32_align1:
-; AVX512:       # %bb.0:
-; AVX512-NEXT:    xorl %eax, %eax
-; AVX512-NEXT:    movntiq %rax, 8(%rdi)
-; AVX512-NEXT:    movntiq %rax, (%rdi)
-; AVX512-NEXT:    movntiq %rax, 24(%rdi)
-; AVX512-NEXT:    movntiq %rax, 16(%rdi)
-; AVX512-NEXT:    movntiq %rax, 40(%rdi)
-; AVX512-NEXT:    movntiq %rax, 32(%rdi)
-; AVX512-NEXT:    movntiq %rax, 56(%rdi)
-; AVX512-NEXT:    movntiq %rax, 48(%rdi)
-; AVX512-NEXT:    retq
+; CHECK-LABEL: test_zero_v16f32_align1:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    movntiq %rax, 8(%rdi)
+; CHECK-NEXT:    movntiq %rax, (%rdi)
+; CHECK-NEXT:    movntiq %rax, 24(%rdi)
+; CHECK-NEXT:    movntiq %rax, 16(%rdi)
+; CHECK-NEXT:    movntiq %rax, 40(%rdi)
+; CHECK-NEXT:    movntiq %rax, 32(%rdi)
+; CHECK-NEXT:    movntiq %rax, 56(%rdi)
+; CHECK-NEXT:    movntiq %rax, 48(%rdi)
+; CHECK-NEXT:    retq
   store <16 x float> zeroinitializer, ptr %dst, align 1, !nontemporal !1
   ret void
 }
 
 define void @test_zero_v8i64_align1(ptr %dst) nounwind {
-; SSE2-LABEL: test_zero_v8i64_align1:
-; SSE2:       # %bb.0:
-; SSE2-NEXT:    xorl %eax, %eax
-; SSE2-NEXT:    movntiq %rax, 8(%rdi)
-; SSE2-NEXT:    movntiq %rax, (%rdi)
-; SSE2-NEXT:    movntiq %rax, 24(%rdi)
-; SSE2-NEXT:    movntiq %...
[truncated]

@RKSimon RKSimon marked this pull request as draft December 6, 2023 13:54
RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Dec 6, 2023
No need to generate/spill/restore to cpu stack

Cleanup work to allow us to properly use isFPImmLegal and fix some regressions encountered while looking at llvm#74304
RKSimon added a commit that referenced this pull request Dec 6, 2023
No need to generate/spill/restore to cpu stack

Cleanup work to allow us to properly use isFPImmLegal and fix some regressions encountered while looking at #74304
RKSimon added a commit to RKSimon/llvm-project that referenced this pull request Dec 7, 2023
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for llvm#74304
RKSimon added a commit that referenced this pull request Dec 7, 2023
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for #74304
@RKSimon RKSimon marked this pull request as ready for review December 7, 2023 11:00
if (IndexC && ScalarVT.isFloatingPoint()) {
APInt EltMask = APInt::getOneBitSet(NumElts, IndexC->getZExtValue());
KnownBits KnownElt = DAG.computeKnownBits(VecOp, EltMask);
if (KnownElt.isConstant()) {
Copy link
Contributor

@arsenm arsenm Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this fancy instead of just using dyn_cast<ConstantSDNode>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its to handle bitcasts of vector data (e.g. <2 x i32>), which don't get folded away if we have multiple uses

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

If we're extracting a constant floating point value, and the constant is a legal fp imm value, then replace the extraction with a fp constant.
@RKSimon RKSimon merged commit f1200ca into llvm:main Dec 7, 2023
3 of 4 checks passed
@RKSimon RKSimon deleted the perf/extract-fpimm branch December 7, 2023 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants