[RISCV] Use C.ADD when OR is not compressible due to register restriction #156044

preames · 2025-08-29T15:04:17Z

c.or requires that all the operands be the gprc register class, but c.add does not. As a result, we can use c.add for disjoint or to allow additional compression.

This patch does the transform extremely late (when converting to MCInst) so that we only emit an OR as an ADD if the difference actually reduces code size.

I haven't touched the register allocator hint mechanism (yet), so this is only catching cases which naturally end up reusing one of the source registers.

This is a (likely much better) alternative to #155669. When I first wrote that, I hadn't realized that we already propagate disjoint onto the RISCV::OR MachineInst Node.

Note there is a small correctness risk with this change - if we forgot to drop the disioint flag somewhere this could cause miscompiles, and I don't think we have another use of the flag this late in the backend.

…tions c.or requires that all the operands be the gprc register class, but c.add does not. As a result, we can use c.add for disjoint or to allow additional compression. This patch does the transform extremely late (when converting to MCInst) so that we only emit an OR as an ADD if the difference actually reduces code size. I haven't touched the register allocator hint mechanism (yet), so this is only catching cases which naturally end up reusing one of the source registers. This is a (likely much better) alternative to llvm#155669. When I first wrote that, I hadn't reazlied that we already propoagate disjoint onto the RISCV::OR MachineInst Node. Note there is a small correctness risk with this change - if we forgot to drop the disoint flag somewhere this could cause miscompiles, and I don't think we have another use of the flag this late in the backend.

llvmbot · 2025-08-29T15:04:50Z

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

c.or requires that all the operands be the gprc register class, but c.add does not. As a result, we can use c.add for disjoint or to allow additional compression.

This patch does the transform extremely late (when converting to MCInst) so that we only emit an OR as an ADD if the difference actually reduces code size.

I haven't touched the register allocator hint mechanism (yet), so this is only catching cases which naturally end up reusing one of the source registers.

This is a (likely much better) alternative to #155669. When I first wrote that, I hadn't realized that we already propagate disjoint onto the RISCV::OR MachineInst Node.

Note there is a small correctness risk with this change - if we forgot to drop the disioint flag somewhere this could cause miscompiles, and I don't think we have another use of the flag this late in the backend.

Full diff: https://github.com/llvm/llvm-project/pull/156044.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp (+15-1)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+5-5)

diff --git a/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp b/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
index 83566b1c57782..f10d2a1ccba58 100644
--- a/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
+++ b/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
@@ -1178,7 +1178,21 @@ bool RISCVAsmPrinter::lowerToMCInst(const MachineInstr *MI, MCInst &OutMI) {
   if (lowerRISCVVMachineInstrToMCInst(MI, OutMI, STI))
     return false;
 
-  OutMI.setOpcode(MI->getOpcode());
+  unsigned Opcode = MI->getOpcode();
+  // If we have a disjoint OR which isn't compressible as an c.or, we can
+  // convert it to a c.add which doesn't have the gprc register restriction.
+  if (STI->hasStdExtZca() && Opcode == RISCV::OR &&
+      MI->getFlag(MachineInstr::Disjoint)) {
+    Register Rd = MI->getOperand(0).getReg();
+    Register Rs1 = MI->getOperand(1).getReg();
+    Register Rs2 = MI->getOperand(2).getReg();
+    if ((Rd == Rs1 || Rd == Rs2) &&
+        !(RISCV::GPRCRegClass.contains(Rd) &&
+          RISCV::GPRCRegClass.contains(Rs1) &&
+          RISCV::GPRCRegClass.contains(Rs2)))
+      Opcode = RISCV::ADD;
+  }
+  OutMI.setOpcode(Opcode);
 
   for (const MachineOperand &MO : MI->operands()) {
     MCOperand MCOp;
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
index d9bb007a10f71..157ab0bd30d70 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll
@@ -1532,7 +1532,7 @@ define <16 x i8> @buildvec_v16i8_loads_contigous(ptr %p) {
 ; RVA22U64-NEXT:    slli a4, a4, 24
 ; RVA22U64-NEXT:    slli a5, a5, 32
 ; RVA22U64-NEXT:    slli a1, a1, 40
-; RVA22U64-NEXT:    or a6, a6, a2
+; RVA22U64-NEXT:    add a6, a6, a2
 ; RVA22U64-NEXT:    or t2, a4, a3
 ; RVA22U64-NEXT:    or t1, a1, a5
 ; RVA22U64-NEXT:    lbu a4, 8(a0)
@@ -1907,7 +1907,7 @@ define <16 x i8> @buildvec_v16i8_loads_gather(ptr %p) {
 ; RVA22U64-NEXT:    slli a4, a4, 24
 ; RVA22U64-NEXT:    slli a5, a5, 32
 ; RVA22U64-NEXT:    slli a1, a1, 40
-; RVA22U64-NEXT:    or a7, a7, a2
+; RVA22U64-NEXT:    add a7, a7, a2
 ; RVA22U64-NEXT:    or t3, a4, a3
 ; RVA22U64-NEXT:    or t2, a1, a5
 ; RVA22U64-NEXT:    lbu a4, 93(a0)
@@ -1918,13 +1918,13 @@ define <16 x i8> @buildvec_v16i8_loads_gather(ptr %p) {
 ; RVA22U64-NEXT:    slli t0, t0, 56
 ; RVA22U64-NEXT:    slli a4, a4, 8
 ; RVA22U64-NEXT:    or a3, t0, a6
-; RVA22U64-NEXT:    or a4, t1, a4
+; RVA22U64-NEXT:    add a4, a4, t1
 ; RVA22U64-NEXT:    lbu a5, 161(a0)
 ; RVA22U64-NEXT:    lbu a1, 154(a0)
 ; RVA22U64-NEXT:    lbu a0, 163(a0)
 ; RVA22U64-NEXT:    slli t4, t4, 16
 ; RVA22U64-NEXT:    slli a5, a5, 24
-; RVA22U64-NEXT:    or a5, a5, t4
+; RVA22U64-NEXT:    add a5, a5, t4
 ; RVA22U64-NEXT:    slli a2, a2, 32
 ; RVA22U64-NEXT:    slli a0, a0, 40
 ; RVA22U64-NEXT:    or a0, a0, a2
@@ -3083,7 +3083,7 @@ define <8 x i8> @buildvec_v8i8_pack(i8 %e1, i8 %e2, i8 %e3, i8 %e4, i8 %e5, i8 %
 ; RVA22U64-NEXT:    slli a2, a2, 16
 ; RVA22U64-NEXT:    slli a3, a3, 24
 ; RVA22U64-NEXT:    slli a1, a1, 8
-; RVA22U64-NEXT:    or a5, a5, t0
+; RVA22U64-NEXT:    add a5, a5, t0
 ; RVA22U64-NEXT:    or a4, a7, a4
 ; RVA22U64-NEXT:    or a2, a2, a3
 ; RVA22U64-NEXT:    or a0, a0, a1

github-actions · 2025-08-29T15:07:52Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.

diff --git a/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp b/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
index f10d2a1cc..f0e4631e5 100644
--- a/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
+++ b/llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp
@@ -1186,10 +1186,9 @@ bool RISCVAsmPrinter::lowerToMCInst(const MachineInstr *MI, MCInst &OutMI) {
     Register Rd = MI->getOperand(0).getReg();
     Register Rs1 = MI->getOperand(1).getReg();
     Register Rs2 = MI->getOperand(2).getReg();
-    if ((Rd == Rs1 || Rd == Rs2) &&
-        !(RISCV::GPRCRegClass.contains(Rd) &&
-          RISCV::GPRCRegClass.contains(Rs1) &&
-          RISCV::GPRCRegClass.contains(Rs2)))
+    if ((Rd == Rs1 || Rd == Rs2) && !(RISCV::GPRCRegClass.contains(Rd) &&
+                                      RISCV::GPRCRegClass.contains(Rs1) &&
+                                      RISCV::GPRCRegClass.contains(Rs2)))
       Opcode = RISCV::ADD;
   }
   OutMI.setOpcode(Opcode);

llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp

lenary · 2025-08-31T02:31:12Z

llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp

+  unsigned Opcode = MI->getOpcode();
+  // If we have a disjoint OR which isn't compressible as an c.or, we can
+  // convert it to a c.add which doesn't have the gprc register restriction.
+  if (STI->hasStdExtZca() && Opcode == RISCV::OR &&


I'm not super happy with the choice to do this here, honestly. This is mostly a rote function, and I don't think users would expect changes to happen in it.

Given you're not changing register allocation, can we do it in one of two places:

A CompressPat, with isCompressOnly=true. I realise we might not be able to control the priority of the two patterns, but I think right now the system uses file order?

RISCVMakeCompressible - where we're sort-of expecting optimisations like this.

A CompressPat, with isCompressOnly=true. I realise we might not be able to control the priority of the two patterns, but I think right now the system uses file order?

I don't think we can check the flag in a CompressPat can we?

Sorry, no, indeed we cannot.

I'm not super happy with the choice to do this here, honestly. This is mostly a rote function, and I don't think users would expect changes to happen in it.

Understood, happy to rework if we see a better option.

Given you're not changing register allocation

Hold on. Not changing register allocation for now. I'm expecting to return to the register allocation in an upcoming patch; I was just staging work to make the diff more understandable. Does this change your take at all?

* A CompressPat, with `isCompressOnly=true`. I realise we might not be able to control the priority of the two patterns, but I think right now the system uses file order?

Craig addressed this idea.

* RISCVMakeCompressible - where we're sort-of expecting optimisations like this.

This only performs size optimizations, and the while this change does improve size, it's not gated on any of the Os/Oz flags.

Not changing register allocation for now. I'm expecting to return to the register allocation in an upcoming patch; I was just staging work to make the diff more understandable. Does this change your take at all?

Not fundamentally - any changes to the register allocation would be in a completely different part of the codebase anyway, and would still need code in a later pass to turn or+disjoint into add.

[RISCVMakeCompressible] only performs size optimizations, and the while this change does improve size, it's not gated on any of the Os/Oz flags.

Sure, but this patch is still an optimisation about trying to make a specific pattern more amenable to future compression - your code is introducing add with the knowledge that the compress patterns for add cover more instances of the instruction vs the compress patterns for or, which to my mind is essentially what RISCVMakeCompressible is doing.

I don't entirely understand why this optimisation warrants a more special treatment than those in RISCVMakeCompressible.

When looking at this code, noticed a cleanup of the existing code: #156482

Honestly, I don't really get your objection to this change at all so I suspect we're talking past each other. You'd previously said this routine was "mostly a rote function, and I don't think users would expect changes to happen in it". This routine already lowers all the V pseudo's and had special casing for the patchable function entry stuff (admittedly, that I'm now removing in the patch linked above). The sole caller has custom lowering for a bunch of MIs; this logic could be easily moved there. Is your objection that I'm adding an optimization into something otherwise purely functional?

I don't entirely understand why this optimisation warrants a more special treatment than those in RISCVMakeCompressible.

RISCVMakeCompressible is currently gated by MinSize. That's my entire objection. If you'd rather that pass had a section which ran unconditionally, and a second which ran only if MinSize, I'm happy to do that.

Is your objection that I'm adding an optimization into something otherwise purely functional?

Broadly, yes. All the MIs with special handling in that function are Pseudos, not well-formed executable/encodable instructions. I don't think someone debugging codegen would expect that function to modify instructions in this way, but maybe it would be obvious with a -print-after-all or similar invocation.

RISCVMakeCompressible is currently gated by MinSize. That's my entire objection.

Understood on the MinSize vs OptSize vs no-gating on MakeCompressible. We want to run that pass at OptSize as well as MinSize (we have an internal patch for this that should go up in the next day or so). A quick look at https://reviews.llvm.org/D92105 shows that there was some debate about Oz vs Os and you suggested the pass would be good even at O2. I'm not sure these decisions have been revisited since then with the community.

I'm about to be out for a week, so:

I'm not going to block this patch.

I'm in favour of revisiting when RISCVMakeCompressible runs.

After that we can make a decision whether move this optimisation in there, or whether it's still better to be in this location.

I hope this is reasonable for a fix-forward approach, which should unblock any register allocation improvements you might want to do.

asb · 2025-09-06T09:49:10Z

Note there is a small correctness risk with this change - if we forgot to drop the disioint flag somewhere this could cause miscompiles, and I don't think we have another use of the flag this late in the backend.

Just as a litmus test, I applied a modified version of this patch that unconditionally selects add when possible (i.e. without gating it just on the case where the extra compressibility is taken advantage of). For an rva22u64 build of the LLVM test suite including SPEC 2017 it results in 1907 conversions of or=>add and all tests still pass.

Obviously this isn't a proof of the absence of any lurking issues, but at least we know that the obvious litmus test is fine.

topperc · 2025-09-08T19:29:59Z

I spoke with Philip earlier. We believe the transforms done by RISCVOptWInstrs invalidate the Disjoint flag, but don't clear the flag. For example, we can no longer guarantee the upper bits are disjoint if we optimize based on demanded bits.

I'm not sure the intended transform in this patch is violated by RISCVOptWInstrs I just know the flag itself does not accurately describe the upper bits as it should. This makes me uncomfortable using the flag.

preames requested review from topperc and wangpc-pp August 29, 2025 15:04

llvmbot added the backend:RISC-V label Aug 29, 2025

preames mentioned this pull request Aug 29, 2025

[RISCV][POC] Should we be using ADD for disjoint or? #155669

Closed

topperc reviewed Aug 29, 2025

View reviewed changes

llvm/lib/Target/RISCV/RISCVAsmPrinter.cpp Show resolved Hide resolved

lenary reviewed Aug 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Use C.ADD when OR is not compressible due to register restriction #156044

[RISCV] Use C.ADD when OR is not compressible due to register restriction #156044

Uh oh!

preames commented Aug 29, 2025

Uh oh!

llvmbot commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

Uh oh!

lenary Aug 31, 2025

Uh oh!

topperc Aug 31, 2025

Uh oh!

lenary Aug 31, 2025

Uh oh!

preames Sep 2, 2025 •

edited

Loading

Uh oh!

lenary Sep 2, 2025

Uh oh!

preames Sep 2, 2025

Uh oh!

lenary Sep 2, 2025

Uh oh!

asb commented Sep 6, 2025

Uh oh!

topperc commented Sep 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

[RISCV] Use C.ADD when OR is not compressible due to register restriction #156044

Are you sure you want to change the base?

[RISCV] Use C.ADD when OR is not compressible due to register restriction #156044

Uh oh!

Conversation

preames commented Aug 29, 2025

Uh oh!

llvmbot commented Aug 29, 2025

Uh oh!

github-actions bot commented Aug 29, 2025

Uh oh!

Uh oh!

lenary Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

topperc Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

lenary Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

preames Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lenary Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

preames Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

lenary Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

asb commented Sep 6, 2025

Uh oh!

topperc commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

preames Sep 2, 2025 •

edited

Loading

topperc commented Sep 8, 2025 •

edited

Loading