Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Thumb1] Prefer bic instead of lsl/lsr to align stack where possible #82123

Closed
wants to merge 1 commit into from

Conversation

AtariDreams
Copy link
Contributor

We can use bic to align the stack pointer if we can encode an immediate in there.

Otherwise, fallback to the shifts.

@AtariDreams AtariDreams changed the title Prefer bic insread of lsl/lsr to align stack where possible [Thumb1] Prefer bic insread of lsl/lsr to align stack where possible Feb 17, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 17, 2024

@llvm/pr-subscribers-backend-arm

Author: AtariDreams (AtariDreams)

Changes

We can use bic to align the stack pointer if we can encode an immediate in there.

Otherwise, fallback to the shifts.


Full diff: https://github.com/llvm/llvm-project/pull/82123.diff

1 Files Affected:

  • (modified) llvm/lib/Target/ARM/Thumb1FrameLowering.cpp (+49-26)
diff --git a/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp b/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
index 0f4ece64bff532..1e17d9928526e5 100644
--- a/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
+++ b/llvm/lib/Target/ARM/Thumb1FrameLowering.cpp
@@ -448,32 +448,55 @@ void Thumb1FrameLowering::emitPrologue(MachineFunction &MF,
   AFI->setDPRCalleeSavedAreaSize(DPRCSSize);
 
   if (RegInfo->hasStackRealignment(MF)) {
-    const unsigned NrBitsToZero = Log2(MFI.getMaxAlign());
-    // Emit the following sequence, using R4 as a temporary, since we cannot use
-    // SP as a source or destination register for the shifts:
-    // mov  r4, sp
-    // lsrs r4, r4, #NrBitsToZero
-    // lsls r4, r4, #NrBitsToZero
-    // mov  sp, r4
-    BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::R4)
-      .addReg(ARM::SP, RegState::Kill)
-      .add(predOps(ARMCC::AL));
-
-    BuildMI(MBB, MBBI, dl, TII.get(ARM::tLSRri), ARM::R4)
-      .addDef(ARM::CPSR)
-      .addReg(ARM::R4, RegState::Kill)
-      .addImm(NrBitsToZero)
-      .add(predOps(ARMCC::AL));
-
-    BuildMI(MBB, MBBI, dl, TII.get(ARM::tLSLri), ARM::R4)
-      .addDef(ARM::CPSR)
-      .addReg(ARM::R4, RegState::Kill)
-      .addImm(NrBitsToZero)
-      .add(predOps(ARMCC::AL));
-
-    BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)
-      .addReg(ARM::R4, RegState::Kill)
-      .add(predOps(ARMCC::AL));
+    const Align Alignment = MFI.getMaxAlign();
+    const unsigned NrBitsToZero = Log2(Alignment);
+    const unsigned AlignMask = Alignment.value() - 1U;
+
+    // Emit the following sequence, using R4 as a temporary, since we cannot
+    // use SP as a source or destination register:
+    //   mov r4, sp
+    //
+    // Then, if the mask to zero the required number of bits
+    // can be encoded in the bic immediate field:
+    //   bic r4, r4, Alignment-1
+    // otherwise, emit:
+    //   lsr r4, r4, log2(Alignment)
+    //   lsl r4, r4, log2(Alignment)
+    //
+    // Finally, save back to sp:
+    //   mov sp, r4
+    if (AlignMask <= 255) {
+      BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::R4)
+          .addReg(ARM::SP, RegState::Kill)
+          .add(predOps(ARMCC::AL));
+      BuildMI(MBB, MBBI, dl, TII.get(ARM::tBIC), ARM::R4)
+          .addReg(ARM::R4, RegState::Kill)
+          .addImm(AlignMask)
+          .add(predOps(ARMCC::AL));
+      BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)
+          .addReg(ARM::R4, RegState::Kill)
+          .add(predOps(ARMCC::AL));
+    } else {
+      BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::R4)
+          .addReg(ARM::SP, RegState::Kill)
+          .add(predOps(ARMCC::AL));
+
+      BuildMI(MBB, MBBI, dl, TII.get(ARM::tLSRri), ARM::R4)
+          .addDef(ARM::CPSR)
+          .addReg(ARM::R4, RegState::Kill)
+          .addImm(NrBitsToZero)
+          .add(predOps(ARMCC::AL));
+
+      BuildMI(MBB, MBBI, dl, TII.get(ARM::tLSLri), ARM::R4)
+          .addDef(ARM::CPSR)
+          .addReg(ARM::R4, RegState::Kill)
+          .addImm(NrBitsToZero)
+          .add(predOps(ARMCC::AL));
+
+      BuildMI(MBB, MBBI, dl, TII.get(ARM::tMOVr), ARM::SP)
+          .addReg(ARM::R4, RegState::Kill)
+          .add(predOps(ARMCC::AL));
+    }
 
     AFI->setShouldRestoreSPFromFP(true);
   }

@AtariDreams AtariDreams changed the title [Thumb1] Prefer bic insread of lsl/lsr to align stack where possible [Thumb1] Prefer bic instead of lsl/lsr to align stack where possible Feb 17, 2024
Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Discourse for more information.

@AtariDreams AtariDreams force-pushed the temporary branch 3 times, most recently from 9be8189 to b666807 Compare February 17, 2024 20:06
We can use bic to align the stack pointer if we can encode an immediate in there.

Otherwise, fallback to the shifts.
@efriedma-quic
Copy link
Collaborator

Not sure if you intentionally closed this... but for future reference, t2BICri is Thumb2-only, so there's no way to make this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants