[DAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison (#152143) #168809

kuroyukiasuna · 2025-11-20T02:11:52Z

Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations.

Fixes #152143

…ison (llvm#152143) Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations. Fixes llvm#152143

github-actions · 2025-11-20T02:12:10Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-11-20T02:12:38Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-backend-x86

Author: Jerry Dang (kuroyukiasuna)

Changes

Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations.

Fixes #152143

Full diff: https://github.com/llvm/llvm-project/pull/168809.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+3)
(added) llvm/test/CodeGen/X86/truncate-sat-freeze.ll (+64)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1b15a207a2d37..0f0174c8aea35 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5664,6 +5664,9 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::FP_EXTEND:
   case ISD::FP_TO_SINT_SAT:
   case ISD::FP_TO_UINT_SAT:
+  case ISD::TRUNCATE_SSAT_U:
+  case ISD::TRUNCATE_SSAT_S:
+  case ISD::TRUNCATE_USAT_U:
     // No poison except from flags (which is handled above)
     return false;
 
diff --git a/llvm/test/CodeGen/X86/truncate-sat-freeze.ll b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
new file mode 100644
index 0000000000000..78aebe05ec1de
--- /dev/null
+++ b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s
+
+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.
+
+define <2 x i32> @trunc_ssat_s_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_s_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [18446744071562067968,18446744071562067968]
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [2147483647,2147483647]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> <i64 -2147483648, i64 -2147483648>)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 2147483647, i64 2147483647>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_ssat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm1
+; CHECK-NEXT:    vpand %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpmovsxbd {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> zeroinitializer)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_usat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_usat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT:    vpcmpgtq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; CHECK-NEXT:    vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.umin.v2i64(<2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %2 = trunc <2 x i64> %1 to <2 x i32>
+  %3 = freeze <2 x i32> %2
+  ret <2 x i32> %3
+}
+
+declare <2 x i64> @llvm.smax.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.smin.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.umin.v2i64(<2 x i64>, <2 x i64>)

llvmbot · 2025-11-20T02:12:38Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Jerry Dang (kuroyukiasuna)

Changes

Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations.

Fixes #152143

Full diff: https://github.com/llvm/llvm-project/pull/168809.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+3)
(added) llvm/test/CodeGen/X86/truncate-sat-freeze.ll (+64)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1b15a207a2d37..0f0174c8aea35 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5664,6 +5664,9 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
   case ISD::FP_EXTEND:
   case ISD::FP_TO_SINT_SAT:
   case ISD::FP_TO_UINT_SAT:
+  case ISD::TRUNCATE_SSAT_U:
+  case ISD::TRUNCATE_SSAT_S:
+  case ISD::TRUNCATE_USAT_U:
     // No poison except from flags (which is handled above)
     return false;
 
diff --git a/llvm/test/CodeGen/X86/truncate-sat-freeze.ll b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
new file mode 100644
index 0000000000000..78aebe05ec1de
--- /dev/null
+++ b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s
+
+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.
+
+define <2 x i32> @trunc_ssat_s_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_s_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [18446744071562067968,18446744071562067968]
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpbroadcastq {{.*#+}} xmm1 = [2147483647,2147483647]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> <i64 -2147483648, i64 -2147483648>)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 2147483647, i64 2147483647>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_ssat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm1
+; CHECK-NEXT:    vpand %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vpmovsxbd {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; CHECK-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT:    vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> zeroinitializer)
+  %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %3 = trunc <2 x i64> %2 to <2 x i32>
+  %4 = freeze <2 x i32> %3
+  ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_usat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_usat_u_freeze:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT:    vpcmpgtq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; CHECK-NEXT:    vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT:    vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT:    retq
+  %1 = call <2 x i64> @llvm.umin.v2i64(<2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>)
+  %2 = trunc <2 x i64> %1 to <2 x i32>
+  %3 = freeze <2 x i32> %2
+  ret <2 x i32> %3
+}
+
+declare <2 x i64> @llvm.smax.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.smin.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.umin.v2i64(<2 x i64>, <2 x i64>)

kuroyukiasuna · 2025-11-20T02:14:17Z

llvm/test/CodeGen/X86/truncate-sat-freeze.ll

+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.


Like commented, this test doesn't really exercise the code path I added in SelectionDAG::canCreateUndefOrPoison. The IR level optimization is too aggressive. I'm fairly new to SelectionDAG, cannot think of better way to write a test, apologies. Any suggestions/ideas on better testing are appreciated.

Forget about x86 - you need to move this to aarch64 which uses these and preferably use their intrinsics which expand directly to ISD::TRUNCATE_SAT nodes (neon.sqxtn etc.).

Also to actually test this - you've got to demonstrate that the freeze has been removed / moved up through the truncate node. I typically use KnownBits/SignBits value tracking folds that can only occur once the freeze is moved out of the way.

@RKSimon Something I tried:

define <4 x i32> @sqxtun_freeze_zext_and(<4 x i32> %a) { ; CHECK-LABEL: sqxtun_freeze_zext_and: ; CHECK: // %bb.0: ; CHECK-NEXT: sqxtun v0.4h, v0.4s ; CHECK-NEXT: ushll v0.4s, v0.4h, #0 ; CHECK-NEXT: ret %trunc = tail call <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> %a) %freeze = freeze <4 x i16> %trunc %zext = zext <4 x i16> %freeze to <4 x i32> %and = and <4 x i32> %zext, <i32 65535, i32 65535, i32 65535, i32 65535> ret <4 x i32> %and } declare <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32>)

I can verified the freeze elimination is working:

Legalized selection DAG (9 nodes):

t13: v4i16 = truncate_ssat_u t2 t5: v4i16 = freeze t13 t6: v4i32 = zero_extend t5

Optimized legalized selection DAG (8 nodes):

t13: v4i16 = truncate_ssat_u t2 t6: v4i32 = zero_extend t13

However, the final assembly is the same with or without my change, as a result the test would always pass. Seems the freeze is eliminated during instruction selection? Need some suggestion on a test pattern where this freeze elimination produces visibly different assembly - or is that the correct path to pursue? Thanks!

I tried several test patterns to show visible codegen differences:

Comparison with out-of-range values (should fold to constant)

Redundant min/max operations (should be eliminated)

KnownBits patterns with AND/zext

None produced different assembly with/without the change, though the DAG optimization does occur.

github-actions · 2025-11-20T09:01:13Z

🐧 Linux x64 Test Results

186415 tests passed
4864 tests skipped

RKSimon · 2025-11-20T10:47:36Z

llvm/test/CodeGen/X86/truncate-sat-freeze.ll

+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.


Forget about x86 - you need to move this to aarch64 which uses these and preferably use their intrinsics which expand directly to ISD::TRUNCATE_SAT nodes (neon.sqxtn etc.).

Also to actually test this - you've got to demonstrate that the freeze has been removed / moved up through the truncate node. I typically use KnownBits/SignBits value tracking folds that can only occur once the freeze is moved out of the way.

kuroyukiasuna · 2025-11-21T03:54:16Z

llvm/test/CodeGen/AArch64/truncate-sat-freeze.ll

@@ -0,0 +1,80 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py


Progress so far: The select pattern tests showed different codegen in older LLVM(18.1.3). But the current (22.0.0git) assembly output is identical with/without this change.
I've verified that freeze elimination occurs at the DAG level (node count: 9→8 in optimized legalized DAG via -debug-only=isel).

llvmbot added backend:X86 llvm:SelectionDAG SelectionDAGISel as well labels Nov 20, 2025

kuroyukiasuna commented Nov 20, 2025

View reviewed changes

RKSimon self-requested a review November 20, 2025 10:39

RKSimon requested changes Nov 20, 2025

View reviewed changes

Add tests in AArch64; Remove previous tests in X86

1aa7f89

llvmbot added the backend:AArch64 label Nov 21, 2025

kuroyukiasuna commented Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison (#152143) #168809

[DAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison (#152143) #168809

kuroyukiasuna commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

llvmbot commented Nov 20, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 20, 2025

Uh oh!

kuroyukiasuna Nov 20, 2025

Uh oh!

RKSimon Nov 20, 2025

Uh oh!

kuroyukiasuna Nov 20, 2025

Uh oh!

kuroyukiasuna Nov 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

RKSimon Nov 20, 2025

Uh oh!

kuroyukiasuna Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,80 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py

[DAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison (#152143) #168809

Are you sure you want to change the base?

[DAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison (#152143) #168809

Conversation

kuroyukiasuna commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

llvmbot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 20, 2025

Uh oh!

kuroyukiasuna Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

RKSimon Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

kuroyukiasuna Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

kuroyukiasuna Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 20, 2025

🐧 Linux x64 Test Results

Uh oh!

RKSimon Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

kuroyukiasuna Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Nov 20, 2025 •

edited

Loading