Skip to content

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Oct 1, 2025

PACKSS intrinsic calls are only expanded to X86ISD::PACKSS nodes during legalisation, after which time we fail to handle cases where ASHR sign splats (now lowered to X86ISD::VSRAI) are unnecessary/

Add additional example of FREEZE(PACKSS()) as that's an issue as well.

…insics

PACKSS intrinsic calls are only expanded to X86ISD::PACKSS nodes during legalisation, after which time we fail to handle cases where ASHR sign splats (now lowered to X86ISD::VSRAI) are unnecessary/

Add additional example of FREEZE(PACKSS()) as that's an issue as well.
@RKSimon RKSimon enabled auto-merge (squash) October 1, 2025 13:24
@llvmbot
Copy link
Member

llvmbot commented Oct 1, 2025

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

PACKSS intrinsic calls are only expanded to X86ISD::PACKSS nodes during legalisation, after which time we fail to handle cases where ASHR sign splats (now lowered to X86ISD::VSRAI) are unnecessary/

Add additional example of FREEZE(PACKSS()) as that's an issue as well.


Full diff: https://github.com/llvm/llvm-project/pull/161518.diff

1 Files Affected:

  • (added) llvm/test/CodeGen/X86/combine-pack.ll (+55)
diff --git a/llvm/test/CodeGen/X86/combine-pack.ll b/llvm/test/CodeGen/X86/combine-pack.ll
new file mode 100644
index 0000000000000..9e740b04073e0
--- /dev/null
+++ b/llvm/test/CodeGen/X86/combine-pack.ll
@@ -0,0 +1,55 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 | FileCheck %s --check-prefixes=SSE
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx | FileCheck %s --check-prefixes=AVX
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s --check-prefixes=AVX
+
+declare <8 x i16> @llvm.x86.sse2.packssdw.128(<4 x i32>, <4 x i32>)
+
+; TODO: Failure to remove unnecessary signsplat
+define <8 x i16> @combine_packss_v4i32_signsplat(<4 x i32> %a0, <4 x i32> %a1) {
+; SSE-LABEL: combine_packss_v4i32_signsplat:
+; SSE:       # %bb.0:
+; SSE-NEXT:    pcmpgtd %xmm1, %xmm0
+; SSE-NEXT:    pcmpeqd %xmm1, %xmm1
+; SSE-NEXT:    packssdw %xmm1, %xmm0
+; SSE-NEXT:    psraw $15, %xmm0
+; SSE-NEXT:    retq
+;
+; AVX-LABEL: combine_packss_v4i32_signsplat:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vpcmpgtd %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vpcmpeqd %xmm1, %xmm1, %xmm1
+; AVX-NEXT:    vpackssdw %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vpsraw $15, %xmm0, %xmm0
+; AVX-NEXT:    retq
+  %cmp = icmp sgt <4 x i32> %a0, %a1
+  %ext = sext <4 x i1> %cmp to <4 x i32>
+  %pack = call <8 x i16> @llvm.x86.sse2.packssdw.128(<4 x i32> %ext, <4 x i32> splat (i32 -1))
+  %signsplat = ashr <8 x i16> %pack, splat (i16 15)
+  ret <8 x i16> %signsplat
+}
+
+; TODO: Failure to remove unnecessary signsplat through freeze
+define <8 x i16> @combine_packss_v4i32_freeze_signsplat(<4 x i32> %a0, <4 x i32> %a1) {
+; SSE-LABEL: combine_packss_v4i32_freeze_signsplat:
+; SSE:       # %bb.0:
+; SSE-NEXT:    pcmpgtd %xmm1, %xmm0
+; SSE-NEXT:    pcmpeqd %xmm1, %xmm1
+; SSE-NEXT:    packssdw %xmm1, %xmm0
+; SSE-NEXT:    psraw $15, %xmm0
+; SSE-NEXT:    retq
+;
+; AVX-LABEL: combine_packss_v4i32_freeze_signsplat:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vpcmpgtd %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vpcmpeqd %xmm1, %xmm1, %xmm1
+; AVX-NEXT:    vpackssdw %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vpsraw $15, %xmm0, %xmm0
+; AVX-NEXT:    retq
+  %cmp = icmp sgt <4 x i32> %a0, %a1
+  %ext = sext <4 x i1> %cmp to <4 x i32>
+  %pack = call <8 x i16> @llvm.x86.sse2.packssdw.128(<4 x i32> %ext, <4 x i32> splat (i32 -1))
+  %freeze = freeze <8 x i16> %pack
+  %signsplat = ashr <8 x i16> %freeze, splat (i16 15)
+  ret <8 x i16> %signsplat
+}

@RKSimon RKSimon merged commit 2e5a5fd into llvm:main Oct 1, 2025
9 of 11 checks passed
@RKSimon RKSimon deleted the x86-pack-signsplat branch October 1, 2025 13:59
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…insics (llvm#161518)

PACKSS intrinsic calls are only expanded to X86ISD::PACKSS nodes during
legalisation, after which time we fail to handle cases where ASHR sign
splats (now lowered to X86ISD::VSRAI) are unnecessary.

Add additional example of FREEZE(PACKSS()) as that's an issue as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants