-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[Hexagon] Optimize sfclass/dfclass compares #165735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fclass intrinsics generate a sub-optimal code by doing a predicate transfer and compare. This patch optimizes out and directly uses the predicate.
|
@llvm/pr-subscribers-backend-hexagon Author: Sumanth Gundapaneni (sgundapa) Changesfclass intrinsics generate a sub-optimal code by doing a predicate transfer and compare. This patch optimizes out and directly uses the predicate. Full diff: https://github.com/llvm/llvm-project/pull/165735.diff 2 Files Affected:
diff --git a/llvm/lib/Target/Hexagon/HexagonPatterns.td b/llvm/lib/Target/Hexagon/HexagonPatterns.td
index 85ce9447c2028..44e9d2402d0a3 100644
--- a/llvm/lib/Target/Hexagon/HexagonPatterns.td
+++ b/llvm/lib/Target/Hexagon/HexagonPatterns.td
@@ -3434,6 +3434,17 @@ let AddedComplexity = 100 in {
(C2_not (S4_stored_locked I32:$Rs, I64:$Rt))>;
}
+multiclass FloatClass<SDPatternOperator IntOp, InstHexagon MI,
+ PatFrag RegPred> {
+ def: Pat<(i1 (seteq (IntOp RegPred:$Rs, u5_0ImmPred_timm:$u5), 0)),
+ (C2_not (MI RegPred:$Rs, u5_0ImmPred_timm:$u5))>;
+ def: Pat<(i1 (setne (IntOp RegPred:$Rs, u5_0ImmPred_timm:$u5), 0)),
+ (MI RegPred:$Rs, u5_0ImmPred_timm:$u5)>;
+}
+
+defm : FloatClass<int_hexagon_F2_sfclass, F2_sfclass, F32>;
+defm : FloatClass<int_hexagon_F2_dfclass, F2_dfclass, F64>;
+
def: Pat<(int_hexagon_instrprof_custom (HexagonAtPcrel tglobaladdr:$addr), u32_0ImmPred:$I),
(PS_call_instrprof_custom tglobaladdr:$addr, imm:$I)>;
diff --git a/llvm/test/CodeGen/Hexagon/isel-fclass.ll b/llvm/test/CodeGen/Hexagon/isel-fclass.ll
new file mode 100644
index 0000000000000..96b02106fa807
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/isel-fclass.ll
@@ -0,0 +1,86 @@
+; Tests lowering of sfclass/dfclass compares.
+; Sub-optimal code
+; {
+; p0 = sfclass(r0,#16)
+; r0 = sfadd(r0,r0)
+; }
+; {
+; r2 = p0
+; }
+; {
+; if (p0.new) r0 = ##1065353216
+; p0 = cmp.eq(r2,#0)
+; jumpr r31
+; }
+; With the patterns added, we should be generating
+; {
+; p0 = sfclass(r0,#16)
+; r0 = sfadd(r0,r0)
+; }
+; {
+; if (!p0) r0 = ##1065353216
+; jumpr r31
+; }
+
+; RUN: llc -march=hexagon -stop-after=hexagon-isel %s -o - | FileCheck %s
+
+; CHECK: bb.0.entry1
+; CHECK: F2_sfclass
+; CHECK-NOT: C2_cmp
+; CHECK: C2_not
+; CHECK: F2_sfadd
+; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+define float @test1(float noundef %x) {
+entry1:
+ %0 = tail call i32 @llvm.hexagon.F2.sfclass(float %x, i32 16)
+ %tobool.not = icmp eq i32 %0, 0
+ %add = fadd float %x, %x
+ %spec.select = select i1 %tobool.not, float 1.000000e+00, float %add
+ ret float %spec.select
+}
+
+; CHECK: bb.0.entry2
+; CHECK: F2_sfclass
+; CHECK-NOT: C2_cmp
+; CHECK: F2_sfadd
+define float @test2(float noundef %x) {
+entry2:
+ %0 = tail call i32 @llvm.hexagon.F2.sfclass(float %x, i32 16)
+ %tobool.not = icmp eq i32 %0, 0
+ %add = fadd float %x, %x
+ %spec.select = select i1 %tobool.not, float %add, float 1.000000e+00
+ ret float %spec.select
+}
+
+; CHECK: bb.0.entry3
+; CHECK: F2_dfclass
+; CHECK-NOT: C2_cmp
+; CHECK: C2_not
+; CHECK: F2_dfadd
+define double @test3(double noundef %x) {
+entry3:
+ %0 = tail call i32 @llvm.hexagon.F2.dfclass(double %x, i32 16)
+ %tobool.not = icmp eq i32 %0, 0
+ %add = fadd double %x, %x
+ %spec.select = select i1 %tobool.not, double 1.000000e+00, double %add
+ ret double %spec.select
+}
+
+; CHECK: bb.0.entry4
+; CHECK: F2_dfclass
+; CHECK-NOT: C2_cmp
+; CHECK: F2_dfadd
+define double @test4(double noundef %x) {
+entry4:
+ %0 = tail call i32 @llvm.hexagon.F2.dfclass(double %x, i32 16)
+ %tobool.not = icmp eq i32 %0, 0
+ %add = fadd double %x, %x
+ %spec.select = select i1 %tobool.not, double %add, double 1.000000e+00
+ ret double %spec.select
+}
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare i32 @llvm.hexagon.F2.dfclass(double, i32 immarg)
+
+; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn memory(none)
+declare i32 @llvm.hexagon.F2.sfclass(float, i32 immarg)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request optimizes the code generation for Hexagon floating-point class comparison operations (sfclass/dfclass). The optimization eliminates unnecessary comparison and register transfer instructions when the result of F2_sfclass/F2_dfclass intrinsics is compared to zero.
Key changes:
- Added pattern matching rules to fold comparisons of
sfclass/dfclassresults with zero directly into predicate operations - Eliminates unnecessary
C2_cmpinstructions and register moves between integer and predicate registers
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| llvm/lib/Target/Hexagon/HexagonPatterns.td | Adds FloatClass multiclass with patterns to fold F2_sfclass/F2_dfclass comparisons with zero into predicate operations |
| llvm/test/CodeGen/Hexagon/isel-fclass.ll | Adds regression tests verifying the optimization generates expected instruction patterns |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| def: Pat<(i1 (seteq (IntOp RegPred:$Rs, u5_0ImmPred_timm:$u5), 0)), | ||
| (C2_not (MI RegPred:$Rs, u5_0ImmPred_timm:$u5))>; | ||
| def: Pat<(i1 (setne (IntOp RegPred:$Rs, u5_0ImmPred_timm:$u5), 0)), | ||
| (MI RegPred:$Rs, u5_0ImmPred_timm:$u5)>; |
Copilot
AI
Oct 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent indentation between lines 3441 and 3443. Line 3441 uses 14 spaces while line 3443 uses 13 spaces. Both should use the same indentation to align with the opening parenthesis of the pattern's first argument, consistent with the style used in lines 3428 and 3430 above.
| (MI RegPred:$Rs, u5_0ImmPred_timm:$u5)>; | |
| (MI RegPred:$Rs, u5_0ImmPred_timm:$u5)>; |
iajbar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
fclass intrinsics generate a sub-optimal code by doing a predicate transfer and compare. This patch optimizes out and directly uses the predicate.
fclass intrinsics generate a sub-optimal code by doing a predicate transfer and compare. This patch optimizes out and directly uses the predicate.
fclass intrinsics generate a sub-optimal code by doing a predicate transfer and compare. This patch optimizes out and directly uses the predicate.