Skip to content

Conversation

@koachan
Copy link
Contributor

@koachan koachan commented Nov 5, 2025

Early Niagara processors (T1-T3) lacks any branch predictor, yet they also have a pipeline long enough that the delay slot cannot cover for all of the branch latency.
This means that branch instructions will stall the processor for a couple cycles, which makes them an expensive operation. Additionally, the high cost of branching means that it's still profitable to prefer conditional moves even when the conditional is predictable, so let LLVM know about both things.

On SPARC T2, a pgbench test seem to show a modest, but pretty consistent speedup (up to around 3%).

Early Niagara processors (T1-T3) lacks any branch predictor, yet they also
have a pipeline long enough that the delay slot cannot cover for all of
the branch latency.
This means that branch instructions will stall the processor for a couple
cycles, which makes them an expensive operation. Additionally, the high cost
of branching means that it's still profitable to prefer conditional moves
even when the conditional is predictable, so let LLVM know about both things.
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2025

@llvm/pr-subscribers-backend-sparc

Author: Koakuma (koachan)

Changes

Early Niagara processors (T1-T3) lacks any branch predictor, yet they also have a pipeline long enough that the delay slot cannot cover for all of the branch latency.
This means that branch instructions will stall the processor for a couple cycles, which makes them an expensive operation. Additionally, the high cost of branching means that it's still profitable to prefer conditional moves even when the conditional is predictable, so let LLVM know about both things.


Full diff: https://github.com/llvm/llvm-project/pull/166489.diff

3 Files Affected:

  • (modified) llvm/lib/Target/Sparc/Sparc.td (+9-3)
  • (modified) llvm/lib/Target/Sparc/SparcISelLowering.cpp (+8)
  • (added) llvm/test/CodeGen/SPARC/select-earlyniagara.ll (+43)
diff --git a/llvm/lib/Target/Sparc/Sparc.td b/llvm/lib/Target/Sparc/Sparc.td
index 7137e5fbff4ff..38b0508885069 100644
--- a/llvm/lib/Target/Sparc/Sparc.td
+++ b/llvm/lib/Target/Sparc/Sparc.td
@@ -95,6 +95,9 @@ def FeatureSoftFloat : SubtargetFeature<"soft-float", "UseSoftFloat", "true",
 def TuneSlowRDPC : SubtargetFeature<"slow-rdpc", "HasSlowRDPC", "true",
                                     "rd %pc, %XX is slow", [FeatureV9]>;
 
+def TuneNoPredictor : SubtargetFeature<"no-predictor", "HasNoPredictor", "true",
+                                    "Processor has no branch predictor, branches stall execution", []>;
+
 //==== Features added predmoninantly for LEON subtarget support
 include "LeonFeatures.td"
 
@@ -174,12 +177,15 @@ def : Proc<"ultrasparc3",     [FeatureV9, FeatureV8Deprecated, FeatureVIS,
                                FeatureVIS2],
                               [TuneSlowRDPC]>;
 def : Proc<"niagara",         [FeatureV9, FeatureV8Deprecated, FeatureVIS,
-                               FeatureVIS2, FeatureUA2005]>;
+                               FeatureVIS2, FeatureUA2005],
+                              [TuneNoPredictor]>;
 def : Proc<"niagara2",        [FeatureV9, FeatureV8Deprecated, UsePopc,
-                               FeatureVIS, FeatureVIS2, FeatureUA2005]>;
+                               FeatureVIS, FeatureVIS2, FeatureUA2005],
+                              [TuneNoPredictor]>;
 def : Proc<"niagara3",        [FeatureV9, FeatureV8Deprecated, UsePopc,
                                FeatureVIS, FeatureVIS2, FeatureVIS3,
-                               FeatureUA2005, FeatureUA2007]>;
+                               FeatureUA2005, FeatureUA2007],
+                              [TuneNoPredictor]>;
 def : Proc<"niagara4",        [FeatureV9, FeatureV8Deprecated, UsePopc,
                                FeatureVIS, FeatureVIS2, FeatureVIS3,
                                FeatureUA2005, FeatureUA2007, FeatureOSA2011,
diff --git a/llvm/lib/Target/Sparc/SparcISelLowering.cpp b/llvm/lib/Target/Sparc/SparcISelLowering.cpp
index cbb7db68f7e7c..ae3c32687c207 100644
--- a/llvm/lib/Target/Sparc/SparcISelLowering.cpp
+++ b/llvm/lib/Target/Sparc/SparcISelLowering.cpp
@@ -2000,6 +2000,14 @@ SparcTargetLowering::SparcTargetLowering(const TargetMachine &TM,
 
   setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
 
+  // Some processors have no branch predictor and have pipelines longer than
+  // what can be covered by the delay slot. This results in a stall, so mark
+  // branches to be expensive on those processors.
+  setJumpIsExpensive(Subtarget->hasNoPredictor());
+  // The high cost of branching means that using conditional moves will
+  // still be profitable even if the condition is predictable.
+  PredictableSelectIsExpensive = !isJumpExpensive();
+
   setMinFunctionAlignment(Align(4));
 
   computeRegisterProperties(Subtarget->getRegisterInfo());
diff --git a/llvm/test/CodeGen/SPARC/select-earlyniagara.ll b/llvm/test/CodeGen/SPARC/select-earlyniagara.ll
new file mode 100644
index 0000000000000..2cec10455d205
--- /dev/null
+++ b/llvm/test/CodeGen/SPARC/select-earlyniagara.ll
@@ -0,0 +1,43 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -O3 < %s -relocation-model=pic -mtriple=sparc -mcpu=v9 | FileCheck --check-prefix=SPARC %s
+; RUN: llc -O3 < %s -relocation-model=pic -mtriple=sparcv9 -mcpu=v9 | FileCheck --check-prefix=SPARC64 %s
+
+;; Early Niagara processors should prefer conditional moves over branches
+;; even when it's predictable.
+
+define i32 @cinc(i32 %cond, i32 %num) #0 {
+; SPARC-LABEL: cinc:
+; SPARC:       ! %bb.0: ! %entry
+; SPARC-NEXT:    cmp %o0, 0
+; SPARC-NEXT:    bne %icc, .LBB0_2
+; SPARC-NEXT:    mov %o1, %o0
+; SPARC-NEXT:  ! %bb.1: ! %inc
+; SPARC-NEXT:    add %o0, 1, %o0
+; SPARC-NEXT:  .LBB0_2: ! %cont
+; SPARC-NEXT:    retl
+; SPARC-NEXT:    nop
+;
+; SPARC64-LABEL: cinc:
+; SPARC64:       ! %bb.0: ! %entry
+; SPARC64-NEXT:    cmp %o0, 0
+; SPARC64-NEXT:    bne %icc, .LBB0_2
+; SPARC64-NEXT:    mov %o1, %o0
+; SPARC64-NEXT:  ! %bb.1: ! %inc
+; SPARC64-NEXT:    add %o0, 1, %o0
+; SPARC64-NEXT:  .LBB0_2: ! %cont
+; SPARC64-NEXT:    retl
+; SPARC64-NEXT:    nop
+entry:
+  %cmp = icmp eq i32 %cond, 0
+  %exp = call i1 @llvm.expect.i1(i1 %cmp, i1 0)
+  br i1 %exp, label %inc, label %cont
+inc:
+  %add = add nsw i32 %num, 1
+  br label %cont
+cont:
+  %phi = phi i32 [ %add, %inc ], [ %num, %entry ]
+  ret i32 %phi
+}
+declare i1 @llvm.expect.i1(i1, i1)
+
+attributes #0 = { nounwind "tune-cpu"="niagara" }

; SPARC64-LABEL: cinc:
; SPARC64: ! %bb.0: ! %entry
; SPARC64-NEXT: cmp %o0, 0
; SPARC64-NEXT: bne %icc, .LBB0_2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tuning this for niagara and from debugging dumps, I see that branches have been properly marked as expensive and PredictableSelectIsExpensive is false, yet the codegen still chooses branches over conditional moves.
How do I convince the codegen to emit conditional moves here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is because the input IR contains explicit branches. llc doesn't run CFG optimizer.
Consider rewriting this test to use select instruction

Copy link
Contributor

@s-barannikov s-barannikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting these flags doesn't necessarily result in better codegen (in my experience).
The impact should be assessed using some benchmarks.

@koachan
Copy link
Contributor Author

koachan commented Nov 5, 2025

Setting these flags doesn't necessarily result in better codegen (in my experience). The impact should be assessed using some benchmarks.

Ya, that's why it's still marked as WIP.
But before doing benchmarks I at least want to confirm that the codegen changes are actually happening, hence the test case.

@koachan
Copy link
Contributor Author

koachan commented Nov 5, 2025

So I tried some pgbench on a 48-thread SPARC T2, and the patch does seem to increase performance (with only one exception on 48-thread SELECT workload):

threads SELECT   INS/UPD
  main +patch main +patch
1 2318.04 2413.27 351.07 372.09
8 17273.82 17568.66 2513.91 2619.37
16 30384.08 31282.25 4322.87 4611.78
24 41794.27 43145.46 5719.44 5987.82
32 48902.99 50403.88 6498.09 6579.41
40 58342.80 60795.85 6935.46 7009.69
48 61119.86 59821.99 7363.66 7424.10

The speedup is quite modest (at around 2-3%), but given that it's only from setting two codegen tunables I'd say that this is a good result.

@koachan koachan changed the title [WIP][SPARC] Mark branches as being expensive in early Niagara CPUs [SPARC] Mark branches as being expensive in early Niagara CPUs Nov 5, 2025
@s-barannikov
Copy link
Contributor

2-3% is actually a huge difference (one may say too good to be true)

@@ -0,0 +1,33 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should show the difference in codegen between subtargets with and without branch predictor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done~

Copy link
Contributor

@s-barannikov s-barannikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ret i32 %ret
}

attributes #0 = { nounwind "tune-cpu"="niagara" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Can the two tests be combined into one with two RUN lines, one of which passing -mattr=+no-predictor?

@koachan koachan merged commit 0c0b0ea into llvm:main Nov 5, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants