Skip to content

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Sep 29, 2025

X86ISD::INSERTPS shuffles can't create undef/poison itself, allowing us to fold freeze(insertps(x,y,i)) -> insertps(freeze(x),freeze(y),i)

…oisonForTargetNode - add X86ISD::INSERTPS handling

X86ISD::INSERTPS shuffles can't create undef/poison itself, allowing us to fold freeze(insertps(x,y,i)) -> insertps(freeze(x),freeze(y),i)
@RKSimon RKSimon enabled auto-merge (squash) September 29, 2025 17:08
@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2025

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

X86ISD::INSERTPS shuffles can't create undef/poison itself, allowing us to fold freeze(insertps(x,y,i)) -> insertps(freeze(x),freeze(y),i)


Full diff: https://github.com/llvm/llvm-project/pull/161234.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2)
  • (modified) llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll (+1-4)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 292eab77e2002..cd04ff5bc7ef4 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -45169,6 +45169,7 @@ bool X86TargetLowering::isGuaranteedNotToBeUndefOrPoisonForTargetNode(
   case X86ISD::Wrapper:
   case X86ISD::WrapperRIP:
     return true;
+  case X86ISD::INSERTPS:
   case X86ISD::BLENDI:
   case X86ISD::PSHUFB:
   case X86ISD::PSHUFD:
@@ -45239,6 +45240,7 @@ bool X86TargetLowering::canCreateUndefOrPoisonForTargetNode(
   case X86ISD::BLENDV:
     return false;
   // SSE target shuffles.
+  case X86ISD::INSERTPS:
   case X86ISD::PSHUFB:
   case X86ISD::PSHUFD:
   case X86ISD::UNPCKL:
diff --git a/llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll b/llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll
index bec33492bbf1e..3590c4d027be7 100644
--- a/llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll
+++ b/llvm/test/CodeGen/X86/vector-shuffle-combining-sse41.ll
@@ -62,15 +62,12 @@ define <4 x i32> @combine_blend_of_permutes_v4i32(<2 x i64> %a0, <2 x i64> %a1)
 define <4 x float> @freeze_insertps(<4 x float> %a0, <4 x float> %a1) {
 ; SSE-LABEL: freeze_insertps:
 ; SSE:       # %bb.0:
-; SSE-NEXT:    insertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
-; SSE-NEXT:    insertps {{.*#+}} xmm1 = xmm0[1],xmm1[1,2,3]
 ; SSE-NEXT:    movaps %xmm1, %xmm0
 ; SSE-NEXT:    retq
 ;
 ; AVX-LABEL: freeze_insertps:
 ; AVX:       # %bb.0:
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
-; AVX-NEXT:    vinsertps {{.*#+}} xmm0 = xmm0[1],xmm1[1,2,3]
+; AVX-NEXT:    vmovaps %xmm1, %xmm0
 ; AVX-NEXT:    retq
   %s0 = call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %a0, <4 x float> %a1, i8 16)
   %f0 = freeze <4 x float> %s0

@RKSimon RKSimon merged commit 8d48d69 into llvm:main Sep 29, 2025
11 checks passed
@RKSimon RKSimon deleted the x86-insertps-poison branch September 30, 2025 08:40
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…oisonForTargetNode - add X86ISD::INSERTPS handling (llvm#161234)

X86ISD::INSERTPS shuffles can't create undef/poison itself, allowing us to fold freeze(insertps(x,y,i)) -> insertps(freeze(x),freeze(y),i)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants