Skip to content

Conversation

@kuroyukiasuna
Copy link

Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations.

Fixes #152143

…ison (llvm#152143)

Saturating truncation operations are well-defined for all inputs and cannot create
poison or undef values. This allows the optimizer to eliminate unnecessary freeze
instructions after these operations.

Fixes llvm#152143
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Nov 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 20, 2025

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-x86

Author: Jerry Dang (kuroyukiasuna)

Changes

Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations.

Fixes #152143


Full diff: https://github.com/llvm/llvm-project/pull/168809.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+3)
  • (added) llvm/test/CodeGen/X86/truncate-sat-freeze.ll (+64)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1b15a207a2d37..0f0174c8aea35 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5664,6 +5664,9 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::FP_EXTEND:
   case ISD::FP_TO_SINT_SAT:
   case ISD::FP_TO_UINT_SAT:
+  case ISD::TRUNCATE_SSAT_U:
+  case ISD::TRUNCATE_SSAT_S:
+  case ISD::TRUNCATE_USAT_U:
     // No poison except from flags (which is handled above)
     return false;
 
diff --git a/llvm/test/CodeGen/X86/truncate-sat-freeze.ll b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
new file mode 100644
index 0000000000000..78aebe05ec1de
--- /dev/null
+++ b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s
+
+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.
+
+define <2 x i32> @trunc_ssat_s_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_s_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [18446744071562067968,18446744071562067968]
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [2147483647,2147483647]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> <i64 -2147483648, i64 -2147483648>)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 2147483647, i64 2147483647>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_ssat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm1
+; CHECK-NEXT:    vpand %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpmovsxbd {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> zeroinitializer)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_usat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_usat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT:    vpcmpgtq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; CHECK-NEXT:    vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.umin.v2i64(<2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %2 = trunc <2 x i64> %1 to <2 x i32>
+  %3 = freeze <2 x i32> %2
+  ret <2 x i32> %3
+}
+
+declare <2 x i64> @llvm.smax.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.smin.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.umin.v2i64(<2 x i64>, <2 x i64>)

@llvmbot
Copy link
Member

llvmbot commented Nov 20, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Jerry Dang (kuroyukiasuna)

Changes

Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations.

Fixes #152143


Full diff: https://github.com/llvm/llvm-project/pull/168809.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+3)
  • (added) llvm/test/CodeGen/X86/truncate-sat-freeze.ll (+64)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1b15a207a2d37..0f0174c8aea35 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5664,6 +5664,9 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::FP_EXTEND:
   case ISD::FP_TO_SINT_SAT:
   case ISD::FP_TO_UINT_SAT:
+  case ISD::TRUNCATE_SSAT_U:
+  case ISD::TRUNCATE_SSAT_S:
+  case ISD::TRUNCATE_USAT_U:
     // No poison except from flags (which is handled above)
     return false;
 
diff --git a/llvm/test/CodeGen/X86/truncate-sat-freeze.ll b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
new file mode 100644
index 0000000000000..78aebe05ec1de
--- /dev/null
+++ b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s
+
+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.
+
+define <2 x i32> @trunc_ssat_s_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_s_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [18446744071562067968,18446744071562067968]
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [2147483647,2147483647]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> <i64 -2147483648, i64 -2147483648>)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 2147483647, i64 2147483647>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_ssat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm1
+; CHECK-NEXT:    vpand %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpmovsxbd {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> zeroinitializer)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_usat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_usat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT:    vpcmpgtq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; CHECK-NEXT:    vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.umin.v2i64(<2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %2 = trunc <2 x i64> %1 to <2 x i32>
+  %3 = freeze <2 x i32> %2
+  ret <2 x i32> %3
+}
+
+declare <2 x i64> @llvm.smax.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.smin.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.umin.v2i64(<2 x i64>, <2 x i64>)

Comment on lines 4 to 10
; Test that freeze is eliminated for saturation truncate patterns.
; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
; canCreateUndefOrPoison() to ensure consistency and enable potential future
; optimizations. This test validates the end-to-end behavior that no freeze
; instruction appears in the output.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like commented, this test doesn't really exercise the code path I added in SelectionDAG::canCreateUndefOrPoison. The IR level optimization is too aggressive. I'm fairly new to SelectionDAG, cannot think of better way to write a test, apologies. Any suggestions/ideas on better testing are appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget about x86 - you need to move this to aarch64 which uses these and preferably use their intrinsics which expand directly to ISD::TRUNCATE_SAT nodes (neon.sqxtn etc.).

Also to actually test this - you've got to demonstrate that the freeze has been removed / moved up through the truncate node. I typically use KnownBits/SignBits value tracking folds that can only occur once the freeze is moved out of the way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RKSimon Something I tried:

define <4 x i32> @sqxtun_freeze_zext_and(<4 x i32> %a) {
; CHECK-LABEL: sqxtun_freeze_zext_and:
; CHECK:       // %bb.0:
; CHECK-NEXT:    sqxtun v0.4h, v0.4s
; CHECK-NEXT:    ushll v0.4s, v0.4h, #0
; CHECK-NEXT:    ret
  %trunc = tail call <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> %a)
  %freeze = freeze <4 x i16> %trunc
  %zext = zext <4 x i16> %freeze to <4 x i32>
  %and = and <4 x i32> %zext, <i32 65535, i32 65535, i32 65535, i32 65535>
  ret <4 x i32> %and
}

declare <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32>)

I can verified the freeze elimination is working:

Legalized selection DAG (9 nodes):

t13: v4i16 = truncate_ssat_u t2
t5: v4i16 = freeze t13
t6: v4i32 = zero_extend t5

Optimized legalized selection DAG (8 nodes):

t13: v4i16 = truncate_ssat_u t2
t6: v4i32 = zero_extend t13

However, the final assembly is the same with or without my change, as a result the test would always pass. Seems the freeze is eliminated during instruction selection? Need some suggestion on a test pattern where this freeze elimination produces visibly different assembly - or is that the correct path to pursue? Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried several test patterns to show visible codegen differences:

  • Comparison with out-of-range values (should fold to constant)
  • Redundant min/max operations (should be eliminated)
  • KnownBits patterns with AND/zext

None produced different assembly with/without the change, though the DAG optimization does occur.

@github-actions
Copy link

🐧 Linux x64 Test Results

  • 186415 tests passed
  • 4864 tests skipped

@RKSimon RKSimon self-requested a review November 20, 2025 10:39
Comment on lines 4 to 10
; Test that freeze is eliminated for saturation truncate patterns.
; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
; canCreateUndefOrPoison() to ensure consistency and enable potential future
; optimizations. This test validates the end-to-end behavior that no freeze
; instruction appears in the output.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget about x86 - you need to move this to aarch64 which uses these and preferably use their intrinsics which expand directly to ISD::TRUNCATE_SAT nodes (neon.sqxtn etc.).

Also to actually test this - you've got to demonstrate that the freeze has been removed / moved up through the truncate node. I typically use KnownBits/SignBits value tracking folds that can only occur once the freeze is moved out of the way.

@@ -0,0 +1,80 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress so far: The select pattern tests showed different codegen in older LLVM(18.1.3). But the current (22.0.0git) assembly output is identical with/without this change.
I've verified that freeze elimination occurs at the DAG level (node count: 9→8 in optimized legalized DAG via -debug-only=isel).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DAG] SelectionDAG::canCreateUndefOrPoison - add TRUNCATE_SAT handling + tests

3 participants