-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SystemZ] Improve shouldCoalesce() for i128. #74942
Conversation
You can test this locally with the following command:git-clang-format --diff 898db1136e67939e075c502eebc0fbb5ab168e50 52bb695262fefdd7849206462f48b77d75b8c9b8 -- llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp View the diff from clang-format here.diff --git a/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp b/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
index d5313acd87..42e23c7711 100644
--- a/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
@@ -376,13 +376,10 @@ SystemZRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
return false;
}
-bool SystemZRegisterInfo::shouldCoalesce(MachineInstr *MI,
- const TargetRegisterClass *SrcRC,
- unsigned SubReg,
- const TargetRegisterClass *DstRC,
- unsigned DstSubReg,
- const TargetRegisterClass *NewRC,
- LiveIntervals &LIS) const {
+bool SystemZRegisterInfo::shouldCoalesce(
+ MachineInstr *MI, const TargetRegisterClass *SrcRC, unsigned SubReg,
+ const TargetRegisterClass *DstRC, unsigned DstSubReg,
+ const TargetRegisterClass *NewRC, LiveIntervals &LIS) const {
assert (MI->isCopy() && "Only expecting COPY instructions");
// Coalesce anything which is not a COPY involving a subreg to/from GR128.
@@ -400,8 +397,8 @@ bool SystemZRegisterInfo::shouldCoalesce(MachineInstr *MI,
MachineBasicBlock *MBB = MI->getParent();
MachineInstr *FirstMI = LIS.getInstructionFromIndex(LI.beginIndex());
MachineInstr *LastMI = LIS.getInstructionFromIndex(LI.endIndex());
- if (!FirstMI || FirstMI->getParent() != MBB ||
- !LastMI || LastMI->getParent() != MBB)
+ if (!FirstMI || FirstMI->getParent() != MBB || !LastMI ||
+ LastMI->getParent() != MBB)
return false;
// Check if coalescing seems safe by finding the set of clobbered physreg
|
I think it might be conceptually simpler and also more general to -instead of adding this special-case heuristic- modify the base heuristics to always allow the GR128 live range to span multiple blocks, and only restrict the to-be-coalesced GR64 to only live within a single block. (And still do the three-free-GR128 check in that latter block.) |
Sorry, but I am a little worried about that because if we only check the three-free for the coalesced GPR64, those three registers may be different ones than those free over the GPR128 interval. If we on the other hand decided to also check the GPR128 interval spanning many blocks that could in huge functions become a real compile time slowdown, I think. Once i128:s live in 32 vector registers, I think this is much less needed. If GPR128:s are then only used one at a time in order to use a scalar instruction, there should not have to bee any intervening phys reg clobbers to make this problematic. I can only think of inline assembly instructions then to be potentially problematic and if we made sure they are not scheduled inside such a local GPR128 use, we would be safe. Another idea might be to emit the VLGV/VLVGP instructions directly without any COPYs so that we would never need any coalescing for GPR128 in the first place. We could then instead add
Perhaps in the future if support for earlier machines would be dropped this could even all be removed... |
I don't really think this is a concern - if this happens, then regalloc may have to insert a move at some CFG edge, but that should work fine. The underlying problem requiring this special handling really was related to local problems with register availability around a call site and/or inline asm, I think. |
29faa54
to
fa8c8bf
Compare
OK - if we trust regalloc to do that kind of splitting, this seems like a good improvement to me. I guess in any case we were never truly on safe ground as the scheduler is run after coalescing and could potentially move some inlineasms so as to make more physreg clobbers overlap a GPR128 interval. Patch updated to allow coalescing even with a global GR128 as long as the subreg is local. I thought checking the local part of the GPR128 interval as well as the subreg region for clobbering physregs would be desired, but it was not as trivial as I had hoped for to find out if the GPR128 interval is live in/out or if not what is the first/last point of liveness of it in MBB. I think my current version is ok, but perhaps there is a better way? (If you think it would be enough to only check the subreg range, ofc that would be much simpler...) A more simple way for me at least would be to use MRI and iterate over uses (/defs) and by comparing slot indexes between them finding the latest (/earliest) instruction. liveAt() could be used to check for liveness around MBB boundaries. One thing I am aware of is that there could with this be a hole in the liveness, if a GPR128 (defined by a COPY) is live for a while, then killed, and then the same vreg is redefined later in MBB and becomes live. This patch would then only scan to the first kill. My hope is then that regalloc would if needed split the interval also at that point, but ofc I am not sure about that... I guess this might be unnecessarily complicated. Maybe it would be better to make sure we don't emit these COPYs (with i128 in VRs) in the first place and then leave this method as it was for older machines..? If that could be done with a pseudo or so, that is. |
It seems to me we don't really need to care about the live range of the 128-bit register at all. Regalloc will have to handle that no matter what, whether we coalesce or not. Maybe we actually only need to check the live range of the 64-bit register itself - if there's 128-bit registers available during that range, then coalescing shouldn't make the situation any worse, and should be allowed. |
fa8c8bf
to
52bb695
Compare
OK - patch simplified to just check the GR64 range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like nice simplification as well. LGTM, thanks!
The SystemZ implementation of shouldCoalesce() is merely a workaround for the fact that regalloc can run out of registers when building 128-bit registers out of subregs. Too many long GPR128 live ranges resulting from coalescing must be avoided. The heuristic used is to only allow this when the resulting live range is relatively small inside a single MBB.
This patch adds a bit more freedom to the coalescer by allowing coalescing when COPY:ing a subreg into a 128-bit reg where the subreg was defined close above and only has one use. This is an obvious case that should not cause any problems.
This helps @atomicrmw_xchg() in atomicrmw-ops-i128.ll both on main and on "[SystemZ] Support i128 as legal type in VRs
#74625".