Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SystemZ] Improve shouldCoalesce() for i128. #74942

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

JonPsson1
Copy link
Contributor

The SystemZ implementation of shouldCoalesce() is merely a workaround for the fact that regalloc can run out of registers when building 128-bit registers out of subregs. Too many long GPR128 live ranges resulting from coalescing must be avoided. The heuristic used is to only allow this when the resulting live range is relatively small inside a single MBB.

This patch adds a bit more freedom to the coalescer by allowing coalescing when COPY:ing a subreg into a 128-bit reg where the subreg was defined close above and only has one use. This is an obvious case that should not cause any problems.

This helps @atomicrmw_xchg() in atomicrmw-ops-i128.ll both on main and on "[SystemZ] Support i128 as legal type in VRs
#74625".

@JonPsson1 JonPsson1 self-assigned this Dec 9, 2023
Copy link

github-actions bot commented Dec 9, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 898db1136e67939e075c502eebc0fbb5ab168e50 52bb695262fefdd7849206462f48b77d75b8c9b8 -- llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp b/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
index d5313acd87..42e23c7711 100644
--- a/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZRegisterInfo.cpp
@@ -376,13 +376,10 @@ SystemZRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator MI,
   return false;
 }
 
-bool SystemZRegisterInfo::shouldCoalesce(MachineInstr *MI,
-                                         const TargetRegisterClass *SrcRC,
-                                         unsigned SubReg,
-                                         const TargetRegisterClass *DstRC,
-                                         unsigned DstSubReg,
-                                         const TargetRegisterClass *NewRC,
-                                         LiveIntervals &LIS) const {
+bool SystemZRegisterInfo::shouldCoalesce(
+    MachineInstr *MI, const TargetRegisterClass *SrcRC, unsigned SubReg,
+    const TargetRegisterClass *DstRC, unsigned DstSubReg,
+    const TargetRegisterClass *NewRC, LiveIntervals &LIS) const {
   assert (MI->isCopy() && "Only expecting COPY instructions");
 
   // Coalesce anything which is not a COPY involving a subreg to/from GR128.
@@ -400,8 +397,8 @@ bool SystemZRegisterInfo::shouldCoalesce(MachineInstr *MI,
   MachineBasicBlock *MBB = MI->getParent();
   MachineInstr *FirstMI = LIS.getInstructionFromIndex(LI.beginIndex());
   MachineInstr *LastMI = LIS.getInstructionFromIndex(LI.endIndex());
-  if (!FirstMI || FirstMI->getParent() != MBB ||
-      !LastMI || LastMI->getParent() != MBB)
+  if (!FirstMI || FirstMI->getParent() != MBB || !LastMI ||
+      LastMI->getParent() != MBB)
     return false;
 
   // Check if coalescing seems safe by finding the set of clobbered physreg

@uweigand
Copy link
Member

I think it might be conceptually simpler and also more general to -instead of adding this special-case heuristic- modify the base heuristics to always allow the GR128 live range to span multiple blocks, and only restrict the to-be-coalesced GR64 to only live within a single block. (And still do the three-free-GR128 check in that latter block.)

@JonPsson1
Copy link
Contributor Author

Sorry, but I am a little worried about that because if we only check the three-free for the coalesced GPR64, those three registers may be different ones than those free over the GPR128 interval. If we on the other hand decided to also check the GPR128 interval spanning many blocks that could in huge functions become a real compile time slowdown, I think.

Once i128:s live in 32 vector registers, I think this is much less needed. If GPR128:s are then only used one at a time in order to use a scalar instruction, there should not have to bee any intervening phys reg clobbers to make this problematic. I can only think of inline assembly instructions then to be potentially problematic and if we made sure they are not scheduled inside such a local GPR128 use, we would be safe. Another idea might be to emit the VLGV/VLVGP instructions directly without any COPYs so that we would never need any coalescing for GPR128 in the first place.

We could then instead add

// The workaround below for regalloc running out of GPR128 registers is not
// needed when vector registers are used for i128 values.
 if (MI->getMF()->getSubtarget<SystemZSubtarget>().hasVector())
   return true;

Perhaps in the future if support for earlier machines would be dropped this could even all be removed...

@uweigand
Copy link
Member

Sorry, but I am a little worried about that because if we only check the three-free for the coalesced GPR64, those three registers may be different ones than those free over the GPR128 interval.

I don't really think this is a concern - if this happens, then regalloc may have to insert a move at some CFG edge, but that should work fine. The underlying problem requiring this special handling really was related to local problems with register availability around a call site and/or inline asm, I think.

@JonPsson1
Copy link
Contributor Author

I don't really think this is a concern - if this happens, then regalloc may have to insert a move at some CFG edge, but that should work fine. The underlying problem requiring this special handling really was related to local problems with register availability around a call site and/or inline asm, I think.

OK - if we trust regalloc to do that kind of splitting, this seems like a good improvement to me. I guess in any case we were never truly on safe ground as the scheduler is run after coalescing and could potentially move some inlineasms so as to make more physreg clobbers overlap a GPR128 interval.

Patch updated to allow coalescing even with a global GR128 as long as the subreg is local. I thought checking the local part of the GPR128 interval as well as the subreg region for clobbering physregs would be desired, but it was not as trivial as I had hoped for to find out if the GPR128 interval is live in/out or if not what is the first/last point of liveness of it in MBB. I think my current version is ok, but perhaps there is a better way? (If you think it would be enough to only check the subreg range, ofc that would be much simpler...)

A more simple way for me at least would be to use MRI and iterate over uses (/defs) and by comparing slot indexes between them finding the latest (/earliest) instruction. liveAt() could be used to check for liveness around MBB boundaries.

One thing I am aware of is that there could with this be a hole in the liveness, if a GPR128 (defined by a COPY) is live for a while, then killed, and then the same vreg is redefined later in MBB and becomes live. This patch would then only scan to the first kill. My hope is then that regalloc would if needed split the interval also at that point, but ofc I am not sure about that...

I guess this might be unnecessarily complicated. Maybe it would be better to make sure we don't emit these COPYs (with i128 in VRs) in the first place and then leave this method as it was for older machines..? If that could be done with a pseudo or so, that is.

@uweigand
Copy link
Member

It seems to me we don't really need to care about the live range of the 128-bit register at all. Regalloc will have to handle that no matter what, whether we coalesce or not. Maybe we actually only need to check the live range of the 64-bit register itself - if there's 128-bit registers available during that range, then coalescing shouldn't make the situation any worse, and should be allowed.

@JonPsson1
Copy link
Contributor Author

OK - patch simplified to just check the GR64 range.

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like nice simplification as well. LGTM, thanks!

@JonPsson1 JonPsson1 merged commit 01061ed into llvm:main Dec 14, 2023
3 of 4 checks passed
@JonPsson1 JonPsson1 deleted the ShouldCoalesce branch December 14, 2023 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants