-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC][CodeGen] Exploit STMW and LMW in 32-bit big-endian mode. #74415
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
||
// Record the first reg that STMW/LMW are going to merge since STMW/LMW save | ||
// from rN to r31. | ||
MergeFrom = CSI[BeginI].getReg(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary complicating. LMW/STMW only applies for AIX 32-bit. For AIX, we just need to find the first GPR(assume the CSI is sorted on ascending ordering), that would be the MergeFrom. On AIX, CSRs always contain the lowest GPR till R31.
; CHECK-NEXT: mtlr 0 | ||
; CHECK-NEXT: bl | ||
|
||
@a = external local_unnamed_addr global i32, align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case is too complicated. Please use below one
define dso_local void @test_simple() #0 {
entry:
call void asm sideeffect "nop", "~{r16}"()
ret void
}
static cl::opt<bool> | ||
EnableLoadStoreMultiple("ppc-enable-load-store-multiple", | ||
cl::desc("Enable load/store multiple (only " | ||
"support in 32-bit big-endian mode)."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of 32-bit big-endian, maybe it is better to limit this under AIX-32 bit.
deb9ba4
to
6243b1b
Compare
89e7085
to
c46a2f6
Compare
@@ -2399,6 +2405,43 @@ bool PPCFrameLowering::assignCalleeSavedSpillSlots( | |||
return AllSpilledToReg; | |||
} | |||
|
|||
static void findContinuousLoadStore(const MachineFunction *MF, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static void findContinuousLoadStore(const MachineFunction *MF, | |
static bool findConsecutiveLoadStore(const MachineFunction *MF, |
Return true
if consecutive ld/std found.
ArrayRef<CalleeSavedInfo> CSI, | ||
Register &MergeFrom) { | ||
const MachineFrameInfo &MFI = MF->getFrameInfo(); | ||
int64_t Offset = MFI.estimateStackSize(*MF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use PPCFrameLowering::determineFrameLayout()
to determine the stack size on PPC?
This patch exploits the stmw and lmw instructions. Max to r13~r31 continuous store/load instructions can be merged to stmw/lmw, which saves code size.