[SPARC] Mark branches as being expensive in early Niagara CPUs #166489

koachan · 2025-11-05T02:06:21Z

Early Niagara processors (T1-T3) lacks any branch predictor, yet they also have a pipeline long enough that the delay slot cannot cover for all of the branch latency.
This means that branch instructions will stall the processor for a couple cycles, which makes them an expensive operation. Additionally, the high cost of branching means that it's still profitable to prefer conditional moves even when the conditional is predictable, so let LLVM know about both things.

On SPARC T2, a pgbench test seem to show a modest, but pretty consistent speedup (up to around 3%).

Early Niagara processors (T1-T3) lacks any branch predictor, yet they also have a pipeline long enough that the delay slot cannot cover for all of the branch latency. This means that branch instructions will stall the processor for a couple cycles, which makes them an expensive operation. Additionally, the high cost of branching means that it's still profitable to prefer conditional moves even when the conditional is predictable, so let LLVM know about both things.

llvmbot · 2025-11-05T02:06:58Z

@llvm/pr-subscribers-backend-sparc

Author: Koakuma (koachan)

Changes

Early Niagara processors (T1-T3) lacks any branch predictor, yet they also have a pipeline long enough that the delay slot cannot cover for all of the branch latency.
This means that branch instructions will stall the processor for a couple cycles, which makes them an expensive operation. Additionally, the high cost of branching means that it's still profitable to prefer conditional moves even when the conditional is predictable, so let LLVM know about both things.

Full diff: https://github.com/llvm/llvm-project/pull/166489.diff

3 Files Affected:

(modified) llvm/lib/Target/Sparc/Sparc.td (+9-3)
(modified) llvm/lib/Target/Sparc/SparcISelLowering.cpp (+8)
(added) llvm/test/CodeGen/SPARC/select-earlyniagara.ll (+43)

diff --git a/llvm/lib/Target/Sparc/Sparc.td b/llvm/lib/Target/Sparc/Sparc.td
index 7137e5fbff4ff..38b0508885069 100644
--- a/llvm/lib/Target/Sparc/Sparc.td
+++ b/llvm/lib/Target/Sparc/Sparc.td
@@ -95,6 +95,9 @@ def FeatureSoftFloat : SubtargetFeature<"soft-float", "UseSoftFloat", "true",
 def TuneSlowRDPC : SubtargetFeature<"slow-rdpc", "HasSlowRDPC", "true",
                                     "rd %pc, %XX is slow", [FeatureV9]>;
 
+def TuneNoPredictor : SubtargetFeature<"no-predictor", "HasNoPredictor", "true",
+                                    "Processor has no branch predictor, branches stall execution", []>;
+
 //==== Features added predmoninantly for LEON subtarget support
 include "LeonFeatures.td"
 
@@ -174,12 +177,15 @@ def : Proc<"ultrasparc3",     [FeatureV9, FeatureV8Deprecated, FeatureVIS,
                                FeatureVIS2],
                               [TuneSlowRDPC]>;
 def : Proc<"niagara",         [FeatureV9, FeatureV8Deprecated, FeatureVIS,
-                               FeatureVIS2, FeatureUA2005]>;
+                               FeatureVIS2, FeatureUA2005],
+                              [TuneNoPredictor]>;
 def : Proc<"niagara2",        [FeatureV9, FeatureV8Deprecated, UsePopc,
-                               FeatureVIS, FeatureVIS2, FeatureUA2005]>;
+                               FeatureVIS, FeatureVIS2, FeatureUA2005],
+                              [TuneNoPredictor]>;
 def : Proc<"niagara3",        [FeatureV9, FeatureV8Deprecated, UsePopc,
                                FeatureVIS, FeatureVIS2, FeatureVIS3,
-                               FeatureUA2005, FeatureUA2007]>;
+                               FeatureUA2005, FeatureUA2007],
+                              [TuneNoPredictor]>;
 def : Proc<"niagara4",        [FeatureV9, FeatureV8Deprecated, UsePopc,
                                FeatureVIS, FeatureVIS2, FeatureVIS3,
                                FeatureUA2005, FeatureUA2007, FeatureOSA2011,
diff --git a/llvm/lib/Target/Sparc/SparcISelLowering.cpp b/llvm/lib/Target/Sparc/SparcISelLowering.cpp
index cbb7db68f7e7c..ae3c32687c207 100644
--- a/llvm/lib/Target/Sparc/SparcISelLowering.cpp
+++ b/llvm/lib/Target/Sparc/SparcISelLowering.cpp
@@ -2000,6 +2000,14 @@ SparcTargetLowering::SparcTargetLowering(const TargetMachine &TM,
 
   setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
 
+  // Some processors have no branch predictor and have pipelines longer than
+  // what can be covered by the delay slot. This results in a stall, so mark
+  // branches to be expensive on those processors.
+  setJumpIsExpensive(Subtarget->hasNoPredictor());
+  // The high cost of branching means that using conditional moves will
+  // still be profitable even if the condition is predictable.
+  PredictableSelectIsExpensive = !isJumpExpensive();
+
   setMinFunctionAlignment(Align(4));
 
   computeRegisterProperties(Subtarget->getRegisterInfo());
diff --git a/llvm/test/CodeGen/SPARC/select-earlyniagara.ll b/llvm/test/CodeGen/SPARC/select-earlyniagara.ll
new file mode 100644
index 0000000000000..2cec10455d205
--- /dev/null
+++ b/llvm/test/CodeGen/SPARC/select-earlyniagara.ll
@@ -0,0 +1,43 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -O3 < %s -relocation-model=pic -mtriple=sparc -mcpu=v9 | FileCheck --check-prefix=SPARC %s
+; RUN: llc -O3 < %s -relocation-model=pic -mtriple=sparcv9 -mcpu=v9 | FileCheck --check-prefix=SPARC64 %s
+
+;; Early Niagara processors should prefer conditional moves over branches
+;; even when it's predictable.
+
+define i32 @cinc(i32 %cond, i32 %num) #0 {
+; SPARC-LABEL: cinc:
+; SPARC:       ! %bb.0: ! %entry
+; SPARC-NEXT:    cmp %o0, 0
+; SPARC-NEXT:    bne %icc, .LBB0_2
+; SPARC-NEXT:    mov %o1, %o0
+; SPARC-NEXT:  ! %bb.1: ! %inc
+; SPARC-NEXT:    add %o0, 1, %o0
+; SPARC-NEXT:  .LBB0_2: ! %cont
+; SPARC-NEXT:    retl
+; SPARC-NEXT:    nop
+;
+; SPARC64-LABEL: cinc:
+; SPARC64:       ! %bb.0: ! %entry
+; SPARC64-NEXT:    cmp %o0, 0
+; SPARC64-NEXT:    bne %icc, .LBB0_2
+; SPARC64-NEXT:    mov %o1, %o0
+; SPARC64-NEXT:  ! %bb.1: ! %inc
+; SPARC64-NEXT:    add %o0, 1, %o0
+; SPARC64-NEXT:  .LBB0_2: ! %cont
+; SPARC64-NEXT:    retl
+; SPARC64-NEXT:    nop
+entry:
+  %cmp = icmp eq i32 %cond, 0
+  %exp = call i1 @llvm.expect.i1(i1 %cmp, i1 0)
+  br i1 %exp, label %inc, label %cont
+inc:
+  %add = add nsw i32 %num, 1
+  br label %cont
+cont:
+  %phi = phi i32 [ %add, %inc ], [ %num, %entry ]
+  ret i32 %phi
+}
+declare i1 @llvm.expect.i1(i1, i1)
+
+attributes #0 = { nounwind "tune-cpu"="niagara" }

koachan · 2025-11-05T02:08:35Z

llvm/test/CodeGen/SPARC/select-earlyniagara.ll

+; SPARC64-LABEL: cinc:
+; SPARC64:       ! %bb.0: ! %entry
+; SPARC64-NEXT:    cmp %o0, 0
+; SPARC64-NEXT:    bne %icc, .LBB0_2


I'm tuning this for niagara and from debugging dumps, I see that branches have been properly marked as expensive and PredictableSelectIsExpensive is false, yet the codegen still chooses branches over conditional moves.
How do I convince the codegen to emit conditional moves here?

I guess this is because the input IR contains explicit branches. llc doesn't run CFG optimizer.
Consider rewriting this test to use select instruction

s-barannikov

Setting these flags doesn't necessarily result in better codegen (in my experience).
The impact should be assessed using some benchmarks.

koachan · 2025-11-05T02:56:12Z

Setting these flags doesn't necessarily result in better codegen (in my experience). The impact should be assessed using some benchmarks.

Ya, that's why it's still marked as WIP.
But before doing benchmarks I at least want to confirm that the codegen changes are actually happening, hence the test case.

koachan · 2025-11-05T14:19:03Z

So I tried some pgbench on a 48-thread SPARC T2, and the patch does seem to increase performance (with only one exception on 48-thread SELECT workload):

threads	SELECT		INS/UPD
	main	+patch	main	+patch
1	2318.04	2413.27	351.07	372.09
8	17273.82	17568.66	2513.91	2619.37
16	30384.08	31282.25	4322.87	4611.78
24	41794.27	43145.46	5719.44	5987.82
32	48902.99	50403.88	6498.09	6579.41
40	58342.80	60795.85	6935.46	7009.69
48	61119.86	59821.99	7363.66	7424.10

The speedup is quite modest (at around 2-3%), but given that it's only from setting two codegen tunables I'd say that this is a good result.

s-barannikov · 2025-11-05T14:39:26Z

2-3% is actually a huge difference (one may say too good to be true)

s-barannikov · 2025-11-05T14:41:31Z

llvm/test/CodeGen/SPARC/select-earlyniagara.ll

@@ -0,0 +1,33 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py


The test should show the difference in codegen between subtargets with and without branch predictor.

s-barannikov

LGTM

s-barannikov · 2025-11-05T16:29:30Z

llvm/test/CodeGen/SPARC/predictable-select-earlyniagara.ll

+  ret i32 %ret
+}
+
+attributes #0 = { nounwind "tune-cpu"="niagara" }


(nit) Can the two tests be combined into one with two RUN lines, one of which passing -mattr=+no-predictor?

koachan requested review from brad0, rorth and s-barannikov November 5, 2025 02:06

llvmbot added the backend:Sparc label Nov 5, 2025

koachan commented Nov 5, 2025

View reviewed changes

s-barannikov reviewed Nov 5, 2025

View reviewed changes

Update tests

57f8b6c

koachan changed the title ~~[WIP][SPARC] Mark branches as being expensive in early Niagara CPUs~~ [SPARC] Mark branches as being expensive in early Niagara CPUs Nov 5, 2025

s-barannikov reviewed Nov 5, 2025

View reviewed changes

Update tests

cb3e05c

s-barannikov approved these changes Nov 5, 2025

View reviewed changes

Update tests

ab573a1

koachan merged commit 0c0b0ea into llvm:main Nov 5, 2025
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARC] Mark branches as being expensive in early Niagara CPUs #166489

[SPARC] Mark branches as being expensive in early Niagara CPUs #166489

Uh oh!

koachan commented Nov 5, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 5, 2025

Uh oh!

koachan Nov 5, 2025

Uh oh!

s-barannikov Nov 5, 2025

Uh oh!

s-barannikov left a comment

Uh oh!

koachan commented Nov 5, 2025

Uh oh!

koachan commented Nov 5, 2025

Uh oh!

s-barannikov commented Nov 5, 2025

Uh oh!

s-barannikov Nov 5, 2025

Uh oh!

koachan Nov 5, 2025

Uh oh!

s-barannikov left a comment

Uh oh!

s-barannikov Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,33 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py

[SPARC] Mark branches as being expensive in early Niagara CPUs #166489

[SPARC] Mark branches as being expensive in early Niagara CPUs #166489

Uh oh!

Conversation

koachan commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 5, 2025

Uh oh!

koachan Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

s-barannikov Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

s-barannikov left a comment

Choose a reason for hiding this comment

Uh oh!

koachan commented Nov 5, 2025

Uh oh!

koachan commented Nov 5, 2025

Uh oh!

s-barannikov commented Nov 5, 2025

Uh oh!

s-barannikov Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

koachan Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

s-barannikov left a comment

Choose a reason for hiding this comment

Uh oh!

s-barannikov Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

koachan commented Nov 5, 2025 •

edited

Loading