-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[MachineLICM] Rematerialize instructions that may be hoisted before LICM #158479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fa91ca8
to
9ffa46a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this.
The terminology used here is a bit confusing. "Rematerialization" in this context usually implies that a copy of the instruction is generated in the loop. This is something regalloc can do to avoid spills. As far as I can tell, this is not what you are doing here -- this is plain sinking, not rematerialization.
I'm not particularly familiar with these transforms, so I don't have much to say here. I'm not sure whether this approach of first sinking everything and then trying to hoist again makes sense -- it seems like this would likely end up overshooting in the other direction and end up moving too many calculations into the loop. It's hard to say without seeing how this affects codegen in practice.
@dianqk it looks like the remainder of the hoisted instructions in the test c file in #115862 that MachineLICM currently fails to sink w/ llvm-project/llvm/lib/CodeGen/MachineSink.cpp Line 1748 in 8007022
this I inferred from the debug output regarding some of the hoisted instructions:
Portion of the Machine IR showing said hoisted instructions (printed by compiling w/ *** IR Dump After Machine code sinking (machine-sink) on loop ***
# Machine code for function loop: IsSSA, TracksLiveness
Function Live Ins: $edi in %232, $rsi in %233, $edx in %234, $ecx in %235, $r8d in %236, $r9d in %237
bb.0.entry:
successors: %bb.1(0x80000000); %bb.1(100.00%)
liveins: $edi, $rsi, $edx, $ecx, $r8d, $r9d
%237:gr32 = COPY $r9d
%236:gr32 = COPY $r8d
%235:gr32 = COPY $ecx
%234:gr32 = COPY $edx
%233:gr64 = COPY $rsi
%232:gr32 = COPY $edi
%239:gr32 = nsw ADD32ri %234:gr32(tied-def 0), 2, implicit-def dead $eflags
%240:gr32 = nsw ADD32ri %235:gr32(tied-def 0), 2, implicit-def dead $eflags
- %0:gr64_nosp = MOVSX64rr32 %239:gr32
- %1:gr64_nosp = MOVSX64rr32 %240:gr32 Which indicates that the hoisted add instructions aren't sinked by AggressiveCycleSink, because they have more than one def, but the sign extensions on the output of the hoisted add instructions are sinked by AggressiveCycleSink, because they don't have more than one def |
Thanks for your explanation. Rematerialization handles the register spills, not the hoisted instructions. |
return true; | ||
} | ||
|
||
bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this check so specific? Should it really be checking for invariant loads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's unnecessary. I'll check it later.
return true; | ||
} | ||
|
||
bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) { | |
bool llvm::mayLoadFromGOTOrConstantPool(MachineInstr &MI) { |
const. Also could just be directly a function of MachineInstr?
|
||
for (MachineMemOperand *MemOp : MI.memoperands()) | ||
if (const PseudoSourceValue *PSV = MemOp->getPseudoValue()) | ||
if (PSV->isGOT() || PSV->isConstantPool()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PSV->isConstant? Or MemOp->isInvariant? Not sure if we verify those are consistent
@dianqk
It seems that in this case, the eflags dst def of said add insn isn't actually used by anything
That should allow I would imagine that the only otherwise sinkable insns that would still otherwise be able to be handled by Either way, I definitely think it'd be best to first create a PR that just handles the simpler case of arithmetic insns w/ one explicit def & one Seems to be caused by my patch making MachineSink sink a MOV32r0 EDIT 4: Not sinking multi-def insns that don't have any reg uses works to prevent the assert from happening |
if (!MI.isSafeToMove(DontMoveAcrossStore)) | ||
return false; | ||
// Dont sink GOT or constant pool loads. | ||
if (MI.mayLoad() && !mayLoadFromGOTOrConstantPool(MI)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this conditional reversed?
If it's a load, and it doesn't load from GOT or constant pool, return false.
Looking at the definition, this predicate definitely returns true for GOT/CP loads, so I think this is a logical error.
This was likely copy-pasted from MachineSink, because that's where I saw this first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it does seem that way. I'll recheck this.
return false; | ||
// Instruction not safe to move. | ||
bool DontMoveAcrossStore = true; | ||
if (!MI.isSafeToMove(DontMoveAcrossStore)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to make this less restrictive than always assuming an aliasing store? Some way to do analysis such that we can determine that moving across a store isn't an issue? In my personal case, which is a matrix multiplication loading from/storing to global arrays, this condition blocks sinking.
Pre-ISel, PRE/LICM is happy to hoist everything out and cause problems, but once we get here, we would be unable to sink things back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to make this less restrictive than always assuming an aliasing store? Some way to do analysis such that we can determine that moving across a store isn't an issue?
I think technically there is preexisting code that could be reused to do load/store aliasing checks...
See:
llvm-project/llvm/lib/CodeGen/MachineSink.cpp
Line 1650 in dfad983
bool MachineSinking::hasStoreBetween(MachineBasicBlock *From, |
Though IMO that'd probably be out of scope of the draft PR...
...
It sounds fine, but |
Just to follow up on a point of confusion upthread. Rematerialization during register allocation (i.e. splitting and spilling) may duplicate instructions to their uses. This was said above, but there seemed to be a misunderstanding that the original instruction would survive. If all uses of the original instruction are rematerialized (not guaranteed if e.g. we split a live interval), then the original instruction would become dead, and should be deleted. I'll note that the rematerialization heuristics are very delicate, and have lots of subtle interactions w/ flags such as CheapAsAMove. |
MachineBasicBlock *Preheader = Cycle->getCyclePreheader(); | ||
assert(Preheader && "Cycle sink needs a preheader block"); | ||
MachineBasicBlock *SinkBlock = nullptr; | ||
const MachineOperand &MO = I.getOperand(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is only checking the first def of MachineInstr I
, and, unlike in aggressivelySinkIntoCycle()
, instruction candidates (for rematerializeIntoCycle()
) with more than one non-dead defs aren't rejected, there's probably a correctness issue here.
You should change this to either account for all non-dead defs of I
, or simply do an early-return if I
has more than one non-dead defs. For the latter approach, you can simply copy my code here: main...sharkautarch:llvm-project:MIRAggrSink_ignoreDeadDefs
Fixes #115862.
https://llvm.org/docs/Passes.html#licm-loop-invariant-code-motion has said:
I do agree with this, but I cannot find any passes of the back-end doing this. MachineSink is what I'm looking for, but it's not enabled by default. After #117247, I don't think MachineSink is suitable for rematerializing these instructions. Hmm, and I don't really understand the code magic.
The first commit rematerializes all instructions before Machine LICM. It's easy to understand for me; we only need to improve Machine LICM if we find something. This compile time is https://llvm-compile-time-tracker.com/compare.php?from=a4993a27fb005c2c65e065e9d7703533f4d26bd2&to=8cb8bb00cce43634330626e5224cefb46696919c&stat=instructions:u.
The second commit is just my attempt to put it into MachineSink, and the compile time is https://llvm-compile-time-tracker.com/compare.php?from=a4993a27fb005c2c65e065e9d7703533f4d26bd2&to=fa91ca83e6fee687eae647d55a30190667a45954&stat=instructions%3Au.
I haven't added any test cases because I want to hear some ideas for this.
With the first commit, the new result of the reduced example that I added in #115862 (comment) is