[MachineLICM] Rematerialize instructions that may be hoisted before LICM #158479

dianqk · 2025-09-14T13:26:25Z

https://llvm.org/docs/Passes.html#licm-loop-invariant-code-motion has said:

Hoisting operations out of loops is a canonicalization transform. It enables and simplifies subsequent optimizations in the middle-end. Rematerialization of hoisted instructions to reduce register pressure is the responsibility of the back-end, which has more accurate information about register pressure and also handles other optimizations than LICM that increase live-ranges.

I do agree with this, but I cannot find any passes of the back-end doing this. MachineSink is what I'm looking for, but it's not enabled by default. After #117247, I don't think MachineSink is suitable for rematerializing these instructions. Hmm, and I don't really understand the code magic.

The first commit rematerializes all instructions before Machine LICM. It's easy to understand for me; we only need to improve Machine LICM if we find something. This compile time is https://llvm-compile-time-tracker.com/compare.php?from=a4993a27fb005c2c65e065e9d7703533f4d26bd2&to=8cb8bb00cce43634330626e5224cefb46696919c&stat=instructions:u.

The second commit is just my attempt to put it into MachineSink, and the compile time is https://llvm-compile-time-tracker.com/compare.php?from=a4993a27fb005c2c65e065e9d7703533f4d26bd2&to=fa91ca83e6fee687eae647d55a30190667a45954&stat=instructions%3Au.

I haven't added any test cases because I want to hear some ideas for this.

With the first commit, the new result of the reduced example that I added in #115862 (comment) is

+ clang -dumpversion
21.1.0
+ gcc -dumpversion
14.3.0
+ clang -O3 main.c
+ perf stat -e instructions:u ./a.out

 Performance counter stats for './a.out':

     2,550,317,381      instructions:u

       0.186548993 seconds time elapsed

       0.185919000 seconds user
       0.000000000 seconds sys


+ clang-dev -O3 main.c
+ perf stat -e instructions:u ./a.out

 Performance counter stats for './a.out':

       528,662,772      instructions:u

       0.019543261 seconds time elapsed

       0.018484000 seconds user
       0.001030000 seconds sys


+ gcc -O3 main.c
+ perf stat -e instructions:u ./a.out

 Performance counter stats for './a.out':

       453,161,286      instructions:u

       0.014953744 seconds time elapsed

       0.013878000 seconds user
       0.001064000 seconds sys

…ted in LICM

nikic

Thanks for looking into this.

The terminology used here is a bit confusing. "Rematerialization" in this context usually implies that a copy of the instruction is generated in the loop. This is something regalloc can do to avoid spills. As far as I can tell, this is not what you are doing here -- this is plain sinking, not rematerialization.

I'm not particularly familiar with these transforms, so I don't have much to say here. I'm not sure whether this approach of first sinking everything and then trying to hoist again makes sense -- it seems like this would likely end up overshooting in the other direction and end up moving too many calculations into the loop. It's hard to say without seeing how this affects codegen in practice.

sharkautarch · 2025-09-14T17:05:25Z

@dianqk it looks like the remainder of the hoisted instructions in the test c file in #115862 that MachineLICM currently fails to sink w/ -mllvm -sink-insts-to-avoid-spills=1, is due to this line:

llvm-project/llvm/lib/CodeGen/MachineSink.cpp

Line 1748 in 8007022

if (I.getNumDefs() > 1)

this I inferred from the debug output regarding some of the hoisted instructions:

CycleSink: Analysing candidate: %239:gr32 = nsw ADD32ri %234:gr32(tied-def 0), 2, implicit-def dead $eflags
CycleSink: Instruction added as candidate.
CycleSink: Analysing candidate: %240:gr32 = nsw ADD32ri %235:gr32(tied-def 0), 2, implicit-def dead $eflags
CycleSink: Instruction added as candidate.
...
CycleSink: Analysing candidate: %0:gr64_nosp = MOVSX64rr32 %239:gr32
CycleSink: Instruction added as candidate.
CycleSink: Analysing candidate: %1:gr64_nosp = MOVSX64rr32 %240:gr32
CycleSink: Instruction added as candidate.
...
AggressiveCycleSink: Finding sink block for: %1:gr64_nosp = MOVSX64rr32 %240:gr32
AggressiveCycleSink:   Analysing use: 0x55d4e30cee10AggressiveCycleSink: Sinking instruction to block: %bb.8
AggressiveCycleSink: Finding sink block for: %0:gr64_nosp = MOVSX64rr32 %239:gr32
AggressiveCycleSink:   Analysing use: 0x55d4e30cecc8AggressiveCycleSink: Sinking instruction to block: %bb.8

Portion of the Machine IR showing said hoisted instructions (printed by compiling w/ -mllvm --print-changed=diff-quiet -mllvm --filter-print-funcs=loop) in function loop:

*** IR Dump After Machine code sinking (machine-sink) on loop ***
 # Machine code for function loop: IsSSA, TracksLiveness
 Function Live Ins: $edi in %232, $rsi in %233, $edx in %234, $ecx in %235, $r8d in %236, $r9d in %237
 
 bb.0.entry:
   successors: %bb.1(0x80000000); %bb.1(100.00%)
   liveins: $edi, $rsi, $edx, $ecx, $r8d, $r9d
   %237:gr32 = COPY $r9d
   %236:gr32 = COPY $r8d
   %235:gr32 = COPY $ecx
   %234:gr32 = COPY $edx
   %233:gr64 = COPY $rsi
   %232:gr32 = COPY $edi
   %239:gr32 = nsw ADD32ri %234:gr32(tied-def 0), 2, implicit-def dead $eflags
   %240:gr32 = nsw ADD32ri %235:gr32(tied-def 0), 2, implicit-def dead $eflags
-  %0:gr64_nosp = MOVSX64rr32 %239:gr32
-  %1:gr64_nosp = MOVSX64rr32 %240:gr32

Which indicates that the hoisted add instructions aren't sinked by AggressiveCycleSink, because they have more than one def, but the sign extensions on the output of the hoisted add instructions are sinked by AggressiveCycleSink, because they don't have more than one def

dianqk · 2025-09-15T00:53:58Z

Thanks for looking into this.

The terminology used here is a bit confusing. "Rematerialization" in this context usually implies that a copy of the instruction is generated in the loop. This is something regalloc can do to avoid spills. As far as I can tell, this is not what you are doing here -- this is plain sinking, not rematerialization.

I'm not particularly familiar with these transforms, so I don't have much to say here. I'm not sure whether this approach of first sinking everything and then trying to hoist again makes sense -- it seems like this would likely end up overshooting in the other direction and end up moving too many calculations into the loop. It's hard to say without seeing how this affects codegen in practice.

Thanks for your explanation. Rematerialization handles the register spills, not the hoisted instructions.
I just found I missed the thing that the hoisted instructions do not always run even in the loop body. This probably be an issue to be addressed. I haven't checked the performance of these instructions yet.

arsenm · 2025-09-15T02:38:54Z

llvm/lib/CodeGen/MachineCycleAnalysis.cpp

  return true;
 }
+
+bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) {


Why is this check so specific? Should it really be checking for invariant loads?

Perhaps it's unnecessary. I'll check it later.

arsenm · 2025-09-15T02:39:09Z

llvm/lib/CodeGen/MachineCycleAnalysis.cpp

  return true;
 }
+
+bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) {


Suggested change

bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) {

bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) {

const. Also could just be directly a function of MachineInstr?

arsenm · 2025-09-15T02:44:02Z

llvm/lib/CodeGen/MachineCycleAnalysis.cpp

+
+  for (MachineMemOperand *MemOp : MI.memoperands())
+    if (const PseudoSourceValue *PSV = MemOp->getPseudoValue())
+      if (PSV->isGOT() || PSV->isConstantPool())


PSV->isConstant? Or MemOp->isInvariant? Not sure if we verify those are consistent

sharkautarch · 2025-09-15T13:56:07Z

@dianqk
following up on my previous comment (#158479 (comment))
In your specific testcase from the aforementioned issue, it seems like those hoisted adds had two defs:

one explicit def: dst register
one implicit def: eflags register

It seems that in this case, the eflags dst def of said add insn isn't actually used by anything
Which makes me wonder if it'd be better to open a new PR which patches the
if (I.getNumDefs() > 1) for the AggressiveCycleSink code, to, in the case where I.getNumDefs() > 1:

check if there's only one explicit def and if so, if all implicit defs are just on $eflags (early exit if either condition is false)
treat instruction I as only having one def (meaning, proceed to sink instruction I) if the implicit ~~$eflags def isn't actually used by any instruction proceeding instruction I~~ def(s) is/are marked dead

That should allow -mllvm -sink-insts-to-avoid-spills=1 to handle most instructions hoisted by the middle end licm
(and that should still handle most selects, since most selects should be lowered to a test or cmp x86 insn followed by cmovCC, since the test and cmp insns should only have an eflags def)

I would imagine that the only otherwise sinkable insns that would still otherwise be able to be handled by AggressiveCycleSink would be arithmetic insns where the eflags def is not used by any JCC, but is used by one or more cmovCC/setCC/sbb/etc, likely in the same BB the arithmetic insn is in.
I would reckon that the easiest to handle would be arithmetic insn + setCC, whereas w/ cmovCC & sbb would require you to additionally consider the src def(s) of the cmovCC/sbb insn(s).

Either way, I definitely think it'd be best to first create a PR that just handles the simpler case of arithmetic insns w/ one explicit def & one $eflags def, whose output $eflags def isn't used by any following insn.
EDIT: I just realized: the aforementioned hoisted add insns' implicit $eflag def are already marked as implicit-def dead $eflags
and according to the MIR register flags documentation: https://llvm.org/docs/MIRLangRef.html#register-flags
implicit-def dead means that the implicit def is marked as unused
so improving the heuristic of AggressiveCycleSink should be simpler than I originally thought
EDIT 2: simple patch that should hopefully achieve the same result as @dianqk's draft PR:
main...sharkautarch:llvm-project:MIRAggrSink_ignoreDeadDefs
EDIT 3: seems like my above patch leads to an assert fail in one specific case:
ld.lld: /tmp/makepkg/llvm-git/src/llvm-project/llvm/lib/CodeGen/InlineSpiller.cpp:1012: bool (anonymous namespace)::InlineSpiller::foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned int>>, MachineInstr *): Assertion 'MO->isDead() && "Cannot fold physreg def"' failed.

Seems to be caused by my patch making MachineSink sink a MOV32r0
@RKSimon any idea of how to check for MOV32r0 without explicitly checking for that specific opcode?
Hmm… my one idea would be that, since MOV32r0 is printed as %gr:<dst> = MOV32r0 implicit-def dead $eflags, which makes it seem like MOV32r0 doesn’t internally hold a reg use
maybe I can just avoid sinking MOV32r0 by not sinking any instructions that have more than one def and no reg uses…

EDIT 4: Not sinking multi-def insns that don't have any reg uses works to prevent the assert from happening

DragonDisciple · 2025-09-19T19:26:24Z

llvm/lib/CodeGen/MachineCycleAnalysis.cpp

+  if (!MI.isSafeToMove(DontMoveAcrossStore))
+    return false;
+  // Dont sink GOT or constant pool loads.
+  if (MI.mayLoad() && !mayLoadFromGOTOrConstantPool(MI))


Is this conditional reversed?

If it's a load, and it doesn't load from GOT or constant pool, return false.
Looking at the definition, this predicate definitely returns true for GOT/CP loads, so I think this is a logical error.

This was likely copy-pasted from MachineSink, because that's where I saw this first.

Well, it does seem that way. I'll recheck this.

DragonDisciple · 2025-09-19T19:33:59Z

llvm/lib/CodeGen/MachineCycleAnalysis.cpp

+    return false;
+  // Instruction not safe to move.
+  bool DontMoveAcrossStore = true;
+  if (!MI.isSafeToMove(DontMoveAcrossStore))


Is there any way to make this less restrictive than always assuming an aliasing store? Some way to do analysis such that we can determine that moving across a store isn't an issue? In my personal case, which is a matrix multiplication loading from/storing to global arrays, this condition blocks sinking.

Pre-ISel, PRE/LICM is happy to hoist everything out and cause problems, but once we get here, we would be unable to sink things back in.

Is there any way to make this less restrictive than always assuming an aliasing store? Some way to do analysis such that we can determine that moving across a store isn't an issue?

I think technically there is preexisting code that could be reused to do load/store aliasing checks...
See:

llvm-project/llvm/lib/CodeGen/MachineSink.cpp

Line 1650 in dfad983

bool MachineSinking::hasStoreBetween(MachineBasicBlock *From,

Though IMO that'd probably be out of scope of the draft PR...

dianqk · 2025-09-23T14:36:21Z

@dianqk following up on my previous comment (#158479 (comment)) In your specific testcase from the aforementioned issue, it seems like those hoisted adds had two defs:
* one explicit def: dst register

* one implicit def: eflags register

...

It seems that in this case, the eflags dst def of said add insn isn't actually used by anything Which makes me wonder if it'd be better to open a new PR which patches the if (I.getNumDefs() > 1) for the AggressiveCycleSink code, to, in the case where I.getNumDefs() > 1:
* check if there's only one _explicit_ def and if so, if all implicit defs are just on `$eflags` (early exit if either condition is false)

* treat instruction `I` as only having one def (meaning, proceed to sink instruction `I`) if the implicit ~`$eflags` def isn't actually used by any instruction proceeding instruction `I`~ def(s) is/are marked dead
That should allow -mllvm -sink-insts-to-avoid-spills=1 to handle most instructions hoisted by the middle end licm (and that should still handle most selects, since most selects should be lowered to a test or cmp x86 insn followed by cmovCC, since the test and cmp insns should only have an eflags def)

I would imagine that the only otherwise sinkable insns that would still otherwise be able to be handled by AggressiveCycleSink would be arithmetic insns where the eflags def is not used by any JCC, but is used by one or more cmovCC/setCC/sbb/etc, likely in the same BB the arithmetic insn is in. I would reckon that the easiest to handle would be arithmetic insn + setCC, whereas w/ cmovCC & sbb would require you to additionally consider the src def(s) of the cmovCC/sbb insn(s).

Either way, I definitely think it'd be best to first create a PR that just handles the simpler case of arithmetic insns w/ one explicit def & one $eflags def, whose output $eflags def isn't used by any following insn. EDIT: I just realized: the aforementioned hoisted add insns' implicit $eflag def are already marked as implicit-def dead $eflags and according to the MIR register flags documentation: https://llvm.org/docs/MIRLangRef.html#register-flags implicit-def dead means that the implicit def is marked as unused so improving the heuristic of AggressiveCycleSink should be simpler than I originally thought EDIT 2: simple patch that should hopefully achieve the same result as @dianqk's draft PR: main...sharkautarch:llvm-project:MIRAggrSink_ignoreDeadDefs EDIT 3: seems like my above patch leads to an assert fail in one specific case: ld.lld: /tmp/makepkg/llvm-git/src/llvm-project/llvm/lib/CodeGen/InlineSpiller.cpp:1012: bool (anonymous namespace)::InlineSpiller::foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned int>>, MachineInstr *): Assertion 'MO->isDead() && "Cannot fold physreg def"' failed.

Seems to be caused by my patch making MachineSink sink a MOV32r0 @RKSimon any idea of how to check for MOV32r0 without explicitly checking for that specific opcode? Hmm… my one idea would be that, since MOV32r0 is printed as %gr:<dst> = MOV32r0 implicit-def dead $eflags, which makes it seem like MOV32r0 doesn’t internally hold a reg use maybe I can just avoid sinking MOV32r0 by not sinking any instructions that have more than one def and no reg uses…

EDIT 4: Not sinking multi-def insns that don't have any reg uses works to prevent the assert from happening

It sounds fine, but SinkInstsIntoCycle is false by default. We need a way that can work with the default options.

preames · 2025-09-23T15:20:48Z

Just to follow up on a point of confusion upthread. Rematerialization during register allocation (i.e. splitting and spilling) may duplicate instructions to their uses. This was said above, but there seemed to be a misunderstanding that the original instruction would survive. If all uses of the original instruction are rematerialized (not guaranteed if e.g. we split a live interval), then the original instruction would become dead, and should be deleted.

I'll note that the rematerialization heuristics are very delicate, and have lots of subtle interactions w/ flags such as CheapAsAMove.

sharkautarch · 2025-09-23T15:22:29Z

llvm/lib/CodeGen/MachineLICM.cpp

+  MachineBasicBlock *Preheader = Cycle->getCyclePreheader();
+  assert(Preheader && "Cycle sink needs a preheader block");
+  MachineBasicBlock *SinkBlock = nullptr;
+  const MachineOperand &MO = I.getOperand(0);


Because this is only checking the first def of MachineInstr I, and, unlike in aggressivelySinkIntoCycle(), instruction candidates (for rematerializeIntoCycle()) with more than one non-dead defs aren't rejected, there's probably a correctness issue here.

You should change this to either account for all non-dead defs of I, or simply do an early-return if I has more than one non-dead defs. For the latter approach, you can simply copy my code here: main...sharkautarch:llvm-project:MIRAggrSink_ignoreDeadDefs

[MachineLICM] Rematerialize instructions that may be hoisted before LICM

8cb8bb0

dianqk requested review from arsenm, phoebewang, s-barannikov and nikic September 14, 2025 13:26

[MachineSink][Experiment] Rematerialize instructions that may be hois…

9ffa46a

…ted in LICM

dianqk force-pushed the licm-rematerialization branch from fa91ca8 to 9ffa46a Compare September 14, 2025 13:37

nikic reviewed Sep 14, 2025

View reviewed changes

nikic requested a review from preames September 14, 2025 16:13

arsenm added the llvm:codegen label Sep 15, 2025

arsenm reviewed Sep 15, 2025

View reviewed changes

jayfoad self-requested a review September 19, 2025 10:20

DragonDisciple reviewed Sep 19, 2025

View reviewed changes

sharkautarch reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MachineLICM] Rematerialize instructions that may be hoisted before LICM #158479

[MachineLICM] Rematerialize instructions that may be hoisted before LICM #158479

Uh oh!

dianqk commented Sep 14, 2025 •

edited

Loading

Uh oh!

nikic left a comment

Uh oh!

sharkautarch commented Sep 14, 2025

Uh oh!

dianqk commented Sep 15, 2025 •

edited

Loading

Uh oh!

arsenm Sep 15, 2025

Uh oh!

dianqk Sep 23, 2025

Uh oh!

arsenm Sep 15, 2025

Uh oh!

arsenm Sep 15, 2025

Uh oh!

sharkautarch commented Sep 15, 2025 •

edited

Loading

Uh oh!

DragonDisciple Sep 19, 2025

Uh oh!

dianqk Sep 23, 2025

Uh oh!

DragonDisciple Sep 19, 2025

Uh oh!

sharkautarch Sep 22, 2025

Uh oh!

dianqk commented Sep 23, 2025

Uh oh!

preames commented Sep 23, 2025

Uh oh!

sharkautarch Sep 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

	bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) {
	bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) {

[MachineLICM] Rematerialize instructions that may be hoisted before LICM #158479

Are you sure you want to change the base?

[MachineLICM] Rematerialize instructions that may be hoisted before LICM #158479

Uh oh!

Conversation

dianqk commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

sharkautarch commented Sep 14, 2025

Uh oh!

dianqk commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sharkautarch commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dianqk commented Sep 23, 2025

Uh oh!

preames commented Sep 23, 2025

Uh oh!

sharkautarch Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dianqk commented Sep 14, 2025 •

edited

Loading

dianqk commented Sep 15, 2025 •

edited

Loading

sharkautarch commented Sep 15, 2025 •

edited

Loading

sharkautarch Sep 23, 2025 •

edited

Loading