-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[DAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison (#152143) #168809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[DAG] Add TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U to canCreateUndefOrPoison (#152143) #168809
Conversation
…ison (llvm#152143) Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations. Fixes llvm#152143
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-backend-x86 Author: Jerry Dang (kuroyukiasuna) ChangesSaturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations. Fixes #152143 Full diff: https://github.com/llvm/llvm-project/pull/168809.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1b15a207a2d37..0f0174c8aea35 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5664,6 +5664,9 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
case ISD::FP_EXTEND:
case ISD::FP_TO_SINT_SAT:
case ISD::FP_TO_UINT_SAT:
+ case ISD::TRUNCATE_SSAT_U:
+ case ISD::TRUNCATE_SSAT_S:
+ case ISD::TRUNCATE_USAT_U:
// No poison except from flags (which is handled above)
return false;
diff --git a/llvm/test/CodeGen/X86/truncate-sat-freeze.ll b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
new file mode 100644
index 0000000000000..78aebe05ec1de
--- /dev/null
+++ b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s
+
+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.
+
+define <2 x i32> @trunc_ssat_s_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_s_freeze:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpbroadcastq {{.*#+}} xmm1 = [18446744071562067968,18446744071562067968]
+; CHECK-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm2
+; CHECK-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vpbroadcastq {{.*#+}} xmm1 = [2147483647,2147483647]
+; CHECK-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT: retq
+ %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> <i64 -2147483648, i64 -2147483648>)
+ %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 2147483647, i64 2147483647>)
+ %3 = trunc <2 x i64> %2 to <2 x i32>
+ %4 = freeze <2 x i32> %3
+ ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_ssat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_u_freeze:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm1
+; CHECK-NEXT: vpand %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vpmovsxbd {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; CHECK-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT: retq
+ %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> zeroinitializer)
+ %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 4294967295, i64 4294967295>)
+ %3 = trunc <2 x i64> %2 to <2 x i32>
+ %4 = freeze <2 x i32> %3
+ ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_usat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_usat_u_freeze:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT: vpcmpgtq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; CHECK-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT: retq
+ %1 = call <2 x i64> @llvm.umin.v2i64(<2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>)
+ %2 = trunc <2 x i64> %1 to <2 x i32>
+ %3 = freeze <2 x i32> %2
+ ret <2 x i32> %3
+}
+
+declare <2 x i64> @llvm.smax.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.smin.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.umin.v2i64(<2 x i64>, <2 x i64>)
|
|
@llvm/pr-subscribers-llvm-selectiondag Author: Jerry Dang (kuroyukiasuna) ChangesSaturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations. Fixes #152143 Full diff: https://github.com/llvm/llvm-project/pull/168809.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 1b15a207a2d37..0f0174c8aea35 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5664,6 +5664,9 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
case ISD::FP_EXTEND:
case ISD::FP_TO_SINT_SAT:
case ISD::FP_TO_UINT_SAT:
+ case ISD::TRUNCATE_SSAT_U:
+ case ISD::TRUNCATE_SSAT_S:
+ case ISD::TRUNCATE_USAT_U:
// No poison except from flags (which is handled above)
return false;
diff --git a/llvm/test/CodeGen/X86/truncate-sat-freeze.ll b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
new file mode 100644
index 0000000000000..78aebe05ec1de
--- /dev/null
+++ b/llvm/test/CodeGen/X86/truncate-sat-freeze.ll
@@ -0,0 +1,64 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 | FileCheck %s
+
+; Test that freeze is eliminated for saturation truncate patterns.
+; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison
+; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level,
+; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in
+; canCreateUndefOrPoison() to ensure consistency and enable potential future
+; optimizations. This test validates the end-to-end behavior that no freeze
+; instruction appears in the output.
+
+define <2 x i32> @trunc_ssat_s_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_s_freeze:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpbroadcastq {{.*#+}} xmm1 = [18446744071562067968,18446744071562067968]
+; CHECK-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm2
+; CHECK-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vpbroadcastq {{.*#+}} xmm1 = [2147483647,2147483647]
+; CHECK-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT: retq
+ %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> <i64 -2147483648, i64 -2147483648>)
+ %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 2147483647, i64 2147483647>)
+ %3 = trunc <2 x i64> %2 to <2 x i32>
+ %4 = freeze <2 x i32> %3
+ ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_ssat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_ssat_u_freeze:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpxor %xmm1, %xmm1, %xmm1
+; CHECK-NEXT: vpcmpgtq %xmm1, %xmm0, %xmm1
+; CHECK-NEXT: vpand %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vpmovsxbd {{.*#+}} xmm1 = [4294967295,0,4294967295,0]
+; CHECK-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm2
+; CHECK-NEXT: vblendvpd %xmm2, %xmm0, %xmm1, %xmm0
+; CHECK-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT: retq
+ %1 = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %a0, <2 x i64> zeroinitializer)
+ %2 = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %1, <2 x i64> <i64 4294967295, i64 4294967295>)
+ %3 = trunc <2 x i64> %2 to <2 x i32>
+ %4 = freeze <2 x i32> %3
+ ret <2 x i32> %4
+}
+
+define <2 x i32> @trunc_usat_u_freeze(<2 x i64> %a0) {
+; CHECK-LABEL: trunc_usat_u_freeze:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vpxor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm1
+; CHECK-NEXT: vpcmpgtq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
+; CHECK-NEXT: vblendvpd %xmm1, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
+; CHECK-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
+; CHECK-NEXT: retq
+ %1 = call <2 x i64> @llvm.umin.v2i64(<2 x i64> %a0, <2 x i64> <i64 4294967295, i64 4294967295>)
+ %2 = trunc <2 x i64> %1 to <2 x i32>
+ %3 = freeze <2 x i32> %2
+ ret <2 x i32> %3
+}
+
+declare <2 x i64> @llvm.smax.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.smin.v2i64(<2 x i64>, <2 x i64>)
+declare <2 x i64> @llvm.umin.v2i64(<2 x i64>, <2 x i64>)
|
| ; Test that freeze is eliminated for saturation truncate patterns. | ||
| ; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison | ||
| ; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level, | ||
| ; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in | ||
| ; canCreateUndefOrPoison() to ensure consistency and enable potential future | ||
| ; optimizations. This test validates the end-to-end behavior that no freeze | ||
| ; instruction appears in the output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like commented, this test doesn't really exercise the code path I added in SelectionDAG::canCreateUndefOrPoison. The IR level optimization is too aggressive. I'm fairly new to SelectionDAG, cannot think of better way to write a test, apologies. Any suggestions/ideas on better testing are appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget about x86 - you need to move this to aarch64 which uses these and preferably use their intrinsics which expand directly to ISD::TRUNCATE_SAT nodes (neon.sqxtn etc.).
Also to actually test this - you've got to demonstrate that the freeze has been removed / moved up through the truncate node. I typically use KnownBits/SignBits value tracking folds that can only occur once the freeze is moved out of the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RKSimon Something I tried:
define <4 x i32> @sqxtun_freeze_zext_and(<4 x i32> %a) {
; CHECK-LABEL: sqxtun_freeze_zext_and:
; CHECK: // %bb.0:
; CHECK-NEXT: sqxtun v0.4h, v0.4s
; CHECK-NEXT: ushll v0.4s, v0.4h, #0
; CHECK-NEXT: ret
%trunc = tail call <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32> %a)
%freeze = freeze <4 x i16> %trunc
%zext = zext <4 x i16> %freeze to <4 x i32>
%and = and <4 x i32> %zext, <i32 65535, i32 65535, i32 65535, i32 65535>
ret <4 x i32> %and
}
declare <4 x i16> @llvm.aarch64.neon.sqxtun.v4i16(<4 x i32>)
I can verified the freeze elimination is working:
Legalized selection DAG (9 nodes):
t13: v4i16 = truncate_ssat_u t2
t5: v4i16 = freeze t13
t6: v4i32 = zero_extend t5
Optimized legalized selection DAG (8 nodes):
t13: v4i16 = truncate_ssat_u t2
t6: v4i32 = zero_extend t13
However, the final assembly is the same with or without my change, as a result the test would always pass. Seems the freeze is eliminated during instruction selection? Need some suggestion on a test pattern where this freeze elimination produces visibly different assembly - or is that the correct path to pursue? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried several test patterns to show visible codegen differences:
- Comparison with out-of-range values (should fold to constant)
- Redundant min/max operations (should be eliminated)
- KnownBits patterns with AND/zext
None produced different assembly with/without the change, though the DAG optimization does occur.
🐧 Linux x64 Test Results
|
| ; Test that freeze is eliminated for saturation truncate patterns. | ||
| ; The freeze elimination happens at the IR level due to the IntrNoCreateUndefOrPoison | ||
| ; attribute on the llvm.smax/smin/umin intrinsics. At the SelectionDAG level, | ||
| ; TRUNCATE_SSAT_S/U and TRUNCATE_USAT_U operations are also marked in | ||
| ; canCreateUndefOrPoison() to ensure consistency and enable potential future | ||
| ; optimizations. This test validates the end-to-end behavior that no freeze | ||
| ; instruction appears in the output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget about x86 - you need to move this to aarch64 which uses these and preferably use their intrinsics which expand directly to ISD::TRUNCATE_SAT nodes (neon.sqxtn etc.).
Also to actually test this - you've got to demonstrate that the freeze has been removed / moved up through the truncate node. I typically use KnownBits/SignBits value tracking folds that can only occur once the freeze is moved out of the way.
| @@ -0,0 +1,80 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Progress so far: The select pattern tests showed different codegen in older LLVM(18.1.3). But the current (22.0.0git) assembly output is identical with/without this change.
I've verified that freeze elimination occurs at the DAG level (node count: 9→8 in optimized legalized DAG via -debug-only=isel).
Saturating truncation operations are well-defined for all inputs and cannot create poison or undef values. This allows the optimizer to eliminate unnecessary freeze instructions after these operations.
Fixes #152143