Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 #91125

Merged
merged 1 commit into from
May 6, 2024

Conversation

phoebewang
Copy link
Contributor

AVX doesn't provide 16-bit BROADCAST instruction.

Fixes #91005

AVX doesn't provide 16-bit BROADCAST instruction.

Fixes llvm#91005
@llvmbot
Copy link
Collaborator

llvmbot commented May 5, 2024

@llvm/pr-subscribers-backend-x86

Author: Phoebe Wang (phoebewang)

Changes

AVX doesn't provide 16-bit BROADCAST instruction.

Fixes #91005


Full diff: https://github.com/llvm/llvm-project/pull/91125.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+1-1)
  • (added) llvm/test/CodeGen/X86/pr91005.ll (+39)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 2cd7e09232601c..5b62c00d0e731c 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -7414,7 +7414,7 @@ static SDValue lowerBuildVectorAsBroadcast(BuildVectorSDNode *BVOp,
     // With pattern matching, the VBROADCAST node may become a VMOVDDUP.
     if (ScalarSize == 32 ||
         (ScalarSize == 64 && (IsGE256 || Subtarget.hasVLX())) ||
-        CVT == MVT::f16 ||
+        (CVT == MVT::f16 && Subtarget.hasAVX2()) ||
         (OptForSize && (ScalarSize == 64 || Subtarget.hasAVX2()))) {
       const Constant *C = nullptr;
       if (ConstantSDNode *CI = dyn_cast<ConstantSDNode>(Ld))
diff --git a/llvm/test/CodeGen/X86/pr91005.ll b/llvm/test/CodeGen/X86/pr91005.ll
new file mode 100644
index 00000000000000..97fd1ce4568826
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr91005.ll
@@ -0,0 +1,39 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+f16c < %s | FileCheck %s
+
+define void @PR91005(ptr %0) minsize {
+; CHECK-LABEL: PR91005:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl %eax, %eax
+; CHECK-NEXT:    testb %al, %al
+; CHECK-NEXT:    je .LBB0_2
+; CHECK-NEXT:  # %bb.1:
+; CHECK-NEXT:    vbroadcastss {{.*#+}} xmm0 = [31744,31744,31744,31744]
+; CHECK-NEXT:    vpcmpeqw %xmm0, %xmm0, %xmm0
+; CHECK-NEXT:    vpinsrw $0, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT:    vpand %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:    vcvtph2ps %xmm0, %xmm0
+; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    vmulss %xmm1, %xmm0, %xmm0
+; CHECK-NEXT:    vcvtps2ph $4, %xmm0, %xmm0
+; CHECK-NEXT:    vmovd %xmm0, %eax
+; CHECK-NEXT:    movw %ax, (%rdi)
+; CHECK-NEXT:  .LBB0_2: # %common.ret
+; CHECK-NEXT:    retq
+  %2 = bitcast <2 x half> poison to <2 x i16>
+  %3 = icmp eq <2 x i16> %2, <i16 31744, i16 31744>
+  br i1 poison, label %4, label %common.ret
+
+common.ret:                                       ; preds = %4, %1
+  ret void
+
+4:                                                ; preds = %1
+  %5 = select <2 x i1> %3, <2 x half> <half 0xH3C00, half 0xH3C00>, <2 x half> zeroinitializer
+  %6 = fmul <2 x half> %5, zeroinitializer
+  %7 = fsub <2 x half> %6, zeroinitializer
+  %8 = extractelement <2 x half> %7, i64 0
+  store half %8, ptr %0, align 2
+  br label %common.ret
+}
+
+declare <2 x half> @llvm.fabs.v2f16(<2 x half>)

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@phoebewang phoebewang merged commit f7bfb07 into llvm:main May 6, 2024
6 checks passed
@phoebewang phoebewang deleted the fp16 branch May 6, 2024 02:59
@phoebewang phoebewang added this to the LLVM 18.X Release milestone May 6, 2024
@phoebewang
Copy link
Contributor Author

/cherry-pick f7bfb07

llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request May 6, 2024
…91125)

AVX doesn't provide 16-bit BROADCAST instruction.

Fixes llvm#91005

(cherry picked from commit f7bfb07)
@llvmbot
Copy link
Collaborator

llvmbot commented May 6, 2024

/pull-request #91161

phoebewang added a commit to phoebewang/llvm-project that referenced this pull request May 8, 2024
tstellar pushed a commit to phoebewang/llvm-project that referenced this pull request May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

LLVM 18 regression: -Os causes Cannot select: t98: v8f16,ch = X86ISD::VBROADCAST_LOAD
3 participants