Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[X86][CodeGen] Teach frame lowering to spill/reload registers w/ PUSHP/POPP, PUSH2[P]/POP2[P] #73292

Merged
merged 9 commits into from
Nov 27, 2023

Conversation

KanRobert
Copy link
Contributor

@KanRobert KanRobert commented Nov 24, 2023

#73092 supported the encoding/decoding for PUSHP/POPP
#73233 supported the encoding/decoding for PUSH2[P]/POP2[P]

In this patch, we teach frame lowering to spill/reload registers w/ these instructions.

  1. Use PPX for balanced spill/reload
  2. Use PUSH2/POP2 for continuous spills/reloads
  3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 24, 2023

@llvm/pr-subscribers-backend-x86

Author: Shengchen Kan (KanRobert)

Changes

Patch is 39.56 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73292.diff

7 Files Affected:

  • (modified) llvm/lib/Target/X86/X86.td (+4)
  • (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+132-22)
  • (modified) llvm/lib/Target/X86/X86MachineFunctionInfo.h (+21-1)
  • (added) llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll (+217)
  • (added) llvm/test/CodeGen/X86/apx/push2-pop2-vector-register.ll (+59)
  • (added) llvm/test/CodeGen/X86/apx/push2-pop2.ll (+432)
  • (added) llvm/test/CodeGen/X86/apx/pushp-popp.ll (+29)
diff --git a/llvm/lib/Target/X86/X86.td b/llvm/lib/Target/X86/X86.td
index ade175d99c89a8d..522d8513c9aff52 100644
--- a/llvm/lib/Target/X86/X86.td
+++ b/llvm/lib/Target/X86/X86.td
@@ -343,6 +343,10 @@ def FeatureAVX10_1_512 : SubtargetFeature<"avx10.1-512", "HasAVX10_1_512", "true
                                           [FeatureAVX10_1, FeatureEVEX512]>;
 def FeatureEGPR : SubtargetFeature<"egpr", "HasEGPR", "true",
                                    "Support extended general purpose register">;
+def FeaturePush2Pop2 : SubtargetFeature<"push2pop2", "HasPush2Pop2", "true",
+                                        "Support PUSH2/POP2 instructions">;
+def FeaturePPX : SubtargetFeature<"ppx", "HasPPX", "true",
+                                  "Support Push-Pop Acceleration">;
 
 // Ivy Bridge and newer processors have enhanced REP MOVSB and STOSB (aka
 // "string operations"). See "REP String Enhancement" in the Intel Software
diff --git a/llvm/lib/Target/X86/X86FrameLowering.cpp b/llvm/lib/Target/X86/X86FrameLowering.cpp
index b042f6865f40d01..14208f851317d78 100644
--- a/llvm/lib/Target/X86/X86FrameLowering.cpp
+++ b/llvm/lib/Target/X86/X86FrameLowering.cpp
@@ -41,6 +41,7 @@
 STATISTIC(NumFrameLoopProbe, "Number of loop stack probes used in prologue");
 STATISTIC(NumFrameExtraProbe,
           "Number of extra stack probes generated in prologue");
+STATISTIC(NumFunctionUsingPush2Pop2, "Number of funtions using push2/pop2");
 
 using namespace llvm;
 
@@ -139,6 +140,21 @@ static unsigned getMOVriOpcode(bool Use64BitReg, int64_t Imm) {
   return X86::MOV32ri;
 }
 
+static unsigned getPUSHOpcode(const X86Subtarget &ST) {
+  return ST.is64Bit() ? (ST.hasPPX() ? X86::PUSHP64r : X86::PUSH64r)
+                      : X86::PUSH32r;
+}
+static unsigned getPOPOpcode(const X86Subtarget &ST) {
+  return ST.is64Bit() ? (ST.hasPPX() ? X86::POPP64r : X86::POP64r)
+                      : X86::POP32r;
+}
+static unsigned getPUSH2Opcode(const X86Subtarget &ST) {
+  return ST.hasPPX() ? X86::PUSH2P : X86::PUSH2;
+}
+static unsigned getPOP2Opcode(const X86Subtarget &ST) {
+  return ST.hasPPX() ? X86::POP2P : X86::POP2;
+}
+
 static bool isEAXLiveIn(MachineBasicBlock &MBB) {
   for (MachineBasicBlock::RegisterMaskPair RegMask : MBB.liveins()) {
     unsigned Reg = RegMask.PhysReg;
@@ -1679,7 +1695,8 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
       NumBytes = alignTo(NumBytes, MaxAlign);
 
     // Save EBP/RBP into the appropriate stack slot.
-    BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::PUSH64r : X86::PUSH32r))
+    BuildMI(MBB, MBBI, DL,
+            TII.get(getPUSHOpcode(MF.getSubtarget<X86Subtarget>())))
         .addReg(MachineFramePtr, RegState::Kill)
         .setMIFlag(MachineInstr::FrameSetup);
 
@@ -1818,18 +1835,30 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
   // Skip the callee-saved push instructions.
   bool PushedRegs = false;
   int StackOffset = 2 * stackGrowth;
-
-  while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup) &&
-         (MBBI->getOpcode() == X86::PUSH32r ||
-          MBBI->getOpcode() == X86::PUSH64r)) {
+  MachineBasicBlock::iterator LastCSPush = MBBI;
+  auto IsCSPush = [&](const MachineBasicBlock::iterator &MBBI) {
+    return MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup) &&
+           (MBBI->getOpcode() == X86::PUSH32r ||
+            MBBI->getOpcode() == X86::PUSH64r ||
+            MBBI->getOpcode() == X86::PUSHP64r ||
+            MBBI->getOpcode() == X86::PUSH2 ||
+            MBBI->getOpcode() == X86::PUSH2P);
+  };
+
+  while (IsCSPush(MBBI)) {
     PushedRegs = true;
     Register Reg = MBBI->getOperand(0).getReg();
+    LastCSPush = MBBI;
     ++MBBI;
 
     if (!HasFP && NeedsDwarfCFI) {
       // Mark callee-saved push instruction.
       // Define the current CFA rule to use the provided offset.
       assert(StackSize);
+      // Compared to push, push2 introduces more stack offset (one more register).
+      if (LastCSPush->getOpcode() == X86::PUSH2 ||
+          LastCSPush->getOpcode() == X86::PUSH2P)
+        StackOffset += stackGrowth;
       BuildCFI(MBB, MBBI, DL,
                MCCFIInstruction::cfiDefCfaOffset(nullptr, -StackOffset),
                MachineInstr::FrameSetup);
@@ -1841,6 +1870,11 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF,
       BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
           .addImm(Reg)
           .setMIFlag(MachineInstr::FrameSetup);
+      if (LastCSPush->getOpcode() == X86::PUSH2 ||
+          LastCSPush->getOpcode() == X86::PUSH2P)
+        BuildMI(MBB, MBBI, DL, TII.get(X86::SEH_PushReg))
+            .addImm(LastCSPush->getOperand(1).getReg())
+            .setMIFlag(MachineInstr::FrameSetup);
     }
   }
 
@@ -2317,7 +2351,8 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       emitSPUpdate(MBB, MBBI, DL, Offset, /*InEpilogue*/ true);
     }
     // Pop EBP.
-    BuildMI(MBB, MBBI, DL, TII.get(Is64Bit ? X86::POP64r : X86::POP32r),
+    BuildMI(MBB, MBBI, DL,
+            TII.get(getPOPOpcode(MF.getSubtarget<X86Subtarget>())),
             MachineFramePtr)
         .setMIFlag(MachineInstr::FrameDestroy);
 
@@ -2360,7 +2395,13 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       if ((Opc != X86::POP32r || !PI->getFlag(MachineInstr::FrameDestroy)) &&
           (Opc != X86::POP64r || !PI->getFlag(MachineInstr::FrameDestroy)) &&
           (Opc != X86::BTR64ri8 || !PI->getFlag(MachineInstr::FrameDestroy)) &&
-          (Opc != X86::ADD64ri32 || !PI->getFlag(MachineInstr::FrameDestroy)))
+          (Opc != X86::ADD64ri32 || !PI->getFlag(MachineInstr::FrameDestroy)) &&
+          (Opc != X86::POPP64r || !PI->getFlag(MachineInstr::FrameDestroy)) &&
+          (Opc != X86::POP2 || !PI->getFlag(MachineInstr::FrameDestroy)) &&
+          (Opc != X86::POP2P || !PI->getFlag(MachineInstr::FrameDestroy)) &&
+          (Opc != X86::LEA64r || !PI->getFlag(MachineInstr::FrameDestroy))
+
+      )
         break;
       FirstCSPop = PI;
     }
@@ -2451,8 +2492,12 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       MachineBasicBlock::iterator PI = MBBI;
       unsigned Opc = PI->getOpcode();
       ++MBBI;
-      if (Opc == X86::POP32r || Opc == X86::POP64r) {
+      if (Opc == X86::POP32r || Opc == X86::POP64r || Opc == X86::POPP64r ||
+          Opc == X86::POP2 || Opc == X86::POP2P) {
         Offset += SlotSize;
+        // Compared to pop, pop2 introduces more stack offset (one more register).
+        if (Opc == X86::POP2 || Opc == X86::POP2P)
+          Offset += SlotSize;
         BuildCFI(MBB, MBBI, DL,
                  MCCFIInstruction::cfiDefCfaOffset(nullptr, -Offset),
                  MachineInstr::FrameDestroy);
@@ -2735,6 +2780,28 @@ bool X86FrameLowering::assignCalleeSavedSpillSlots(
     }
   }
 
+  // Strategy:
+  // 1. Not use push2 when there are less than 2 pushs.
+  // 2. When the number of CSR push is odd
+  //    a. Start to use push2 from the 1st push if stack is 16B aligned.
+  //    b. Start to use push2 from the 2nd push if stack is not 16B aligned.
+  // 3. When the number of CSR push is even, start to use push2 from the 1st
+  //    push and make the stack 16B aligned before the push
+  unsigned NumRegsForPush2 = 0;
+  if (STI.hasPush2Pop2()) {
+    unsigned NumCSGPR = llvm::count_if(CSI, [](const CalleeSavedInfo &I) {
+      return X86::GR64RegClass.contains(I.getReg());
+    });
+    bool UsePush2Pop2 = NumCSGPR > 1;
+    X86FI->setPadForPush2Pop2(UsePush2Pop2 && NumCSGPR % 2 == 0 &&
+                              SpillSlotOffset % 16 != 0);
+    NumRegsForPush2 = UsePush2Pop2 ? alignDown(NumCSGPR, 2) : 0;
+    if (X86FI->padForPush2Pop2()) {
+      SpillSlotOffset -= SlotSize;
+      MFI.CreateFixedSpillStackObject(SlotSize, SpillSlotOffset);
+    }
+  }
+
   // Assign slots for GPRs. It increases frame size.
   for (CalleeSavedInfo &I : llvm::reverse(CSI)) {
     Register Reg = I.getReg();
@@ -2742,6 +2809,13 @@ bool X86FrameLowering::assignCalleeSavedSpillSlots(
     if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
       continue;
 
+    // A CSR is a candidate for push2/pop2 when it's slot offset is 16B aligned
+    // or only an odd number of registers in the candidates.
+    if (X86FI->getNumCandidatesForPush2Pop2() < NumRegsForPush2 &&
+        (SpillSlotOffset % 16 == 0 ||
+         X86FI->getNumCandidatesForPush2Pop2() % 2))
+      X86FI->addCandidateForPush2Pop2(Reg);
+
     SpillSlotOffset -= SlotSize;
     CalleeSavedFrameSize += SlotSize;
 
@@ -2759,6 +2833,10 @@ bool X86FrameLowering::assignCalleeSavedSpillSlots(
     // TODO: saving the slot index is better?
     X86FI->setRestoreBasePointer(CalleeSavedFrameSize);
   }
+  assert(X86FI->getNumCandidatesForPush2Pop2() % 2 == 0 &&
+         "Expect even candidates for push2/pop2");
+  if (X86FI->getNumCandidatesForPush2Pop2())
+    ++NumFunctionUsingPush2Pop2;
   X86FI->setCalleeSavedFrameSize(CalleeSavedFrameSize);
   MFI.setCVBytesOfCalleeSavedRegisters(CalleeSavedFrameSize);
 
@@ -2808,7 +2886,13 @@ bool X86FrameLowering::spillCalleeSavedRegisters(
 
   // Push GPRs. It increases frame size.
   const MachineFunction &MF = *MBB.getParent();
-  unsigned Opc = STI.is64Bit() ? X86::PUSH64r : X86::PUSH32r;
+  unsigned Opc = getPUSHOpcode(STI);
+  const X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
+  MachineInstrBuilder MIB;
+  bool IncompletePush2 = false;
+  if (X86FI->padForPush2Pop2())
+    emitSPUpdate(MBB, MI, DL, -(int64_t)SlotSize, /*InEpilogue=*/false);
+
   for (const CalleeSavedInfo &I : llvm::reverse(CSI)) {
     Register Reg = I.getReg();
 
@@ -2832,17 +2916,28 @@ bool X86FrameLowering::spillCalleeSavedRegisters(
       }
     }
 
-    // Do not set a kill flag on values that are also marked as live-in. This
-    // happens with the @llvm-returnaddress intrinsic and with arguments
-    // passed in callee saved registers.
-    // Omitting the kill flags is conservatively correct even if the live-in
-    // is not used after all.
-    BuildMI(MBB, MI, DL, TII.get(Opc))
-        .addReg(Reg, getKillRegState(CanKill))
-        .setMIFlag(MachineInstr::FrameSetup);
+    if (X86FI->isCandidateForPush2Pop2(Reg)) {
+      if (MIB.getInstr() && IncompletePush2) {
+        MIB.addReg(Reg, getKillRegState(CanKill));
+        IncompletePush2 = false;
+      } else {
+        MIB = BuildMI(MBB, MI, DL, TII.get(getPUSH2Opcode(STI)))
+                  .addReg(Reg, getKillRegState(CanKill))
+                  .setMIFlag(MachineInstr::FrameSetup);
+        IncompletePush2 = true;
+      }
+    } else {
+      // Do not set a kill flag on values that are also marked as live-in. This
+      // happens with the @llvm-returnaddress intrinsic and with arguments
+      // passed in callee saved registers.
+      // Omitting the kill flags is conservatively correct even if the live-in
+      // is not used after all.
+      BuildMI(MBB, MI, DL, TII.get(Opc))
+          .addReg(Reg, getKillRegState(CanKill))
+          .setMIFlag(MachineInstr::FrameSetup);
+    }
   }
 
-  const X86MachineFunctionInfo *X86FI = MF.getInfo<X86MachineFunctionInfo>();
   if (X86FI->getRestoreBasePointer()) {
     unsigned Opc = STI.is64Bit() ? X86::PUSH64r : X86::PUSH32r;
     Register BaseReg = this->TRI->getBaseRegister();
@@ -2958,15 +3053,30 @@ bool X86FrameLowering::restoreCalleeSavedRegisters(
   }
 
   // POP GPRs.
-  unsigned Opc = STI.is64Bit() ? X86::POP64r : X86::POP32r;
+  MachineInstrBuilder MIB;
+  bool IncompletePop2 = false;
+  unsigned Opc = getPOPOpcode(STI);
   for (const CalleeSavedInfo &I : CSI) {
     Register Reg = I.getReg();
     if (!X86::GR64RegClass.contains(Reg) && !X86::GR32RegClass.contains(Reg))
       continue;
-
-    BuildMI(MBB, MI, DL, TII.get(Opc), Reg)
-        .setMIFlag(MachineInstr::FrameDestroy);
+    if (X86FI->isCandidateForPush2Pop2(Reg)) {
+      if (MIB.getInstr() && IncompletePop2) {
+        MIB.addReg(Reg, RegState::Define);
+        IncompletePop2 = false;
+      } else {
+        MIB = BuildMI(MBB, MI, DL, TII.get(getPOP2Opcode(STI)), Reg)
+                  .setMIFlag(MachineInstr::FrameDestroy);
+        IncompletePop2 = true;
+      }
+    } else {
+      BuildMI(MBB, MI, DL, TII.get(Opc), Reg)
+          .setMIFlag(MachineInstr::FrameDestroy);
+    }
   }
+  if (X86FI->padForPush2Pop2())
+    emitSPUpdate(MBB, MI, DL, SlotSize, /*InEpilogue=*/true);
+
   return true;
 }
 
diff --git a/llvm/lib/Target/X86/X86MachineFunctionInfo.h b/llvm/lib/Target/X86/X86MachineFunctionInfo.h
index 9b2cc35c57e00ec..5fcb691b095a616 100644
--- a/llvm/lib/Target/X86/X86MachineFunctionInfo.h
+++ b/llvm/lib/Target/X86/X86MachineFunctionInfo.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/CodeGen/CallingConvLower.h"
 #include "llvm/CodeGen/MachineFunction.h"
+#include <set>
 
 namespace llvm {
 
@@ -117,6 +118,12 @@ class X86MachineFunctionInfo : public MachineFunctionInfo {
   /// determine if we should insert tilerelease in frame lowering.
   bool HasVirtualTileReg = false;
 
+  /// Ajust stack for push2/pop2
+  bool PadForPush2Pop2 = false;
+
+  /// Candidate registers for push2/pop2
+  std::set<Register> CandidatesForPush2Pop2;
+
   /// True if this function has CFI directives that adjust the CFA.
   /// This is used to determine if we should direct the debugger to use
   /// the CFA instead of the stack pointer.
@@ -165,7 +172,7 @@ class X86MachineFunctionInfo : public MachineFunctionInfo {
   const DenseMap<int, unsigned>& getWinEHXMMSlotInfo() const {
     return WinEHXMMSlotInfo; }
 
-  unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize; }
+  unsigned getCalleeSavedFrameSize() const { return CalleeSavedFrameSize + 8 * padForPush2Pop2(); }
   void setCalleeSavedFrameSize(unsigned bytes) { CalleeSavedFrameSize = bytes; }
 
   unsigned getBytesToPopOnReturn() const { return BytesToPopOnReturn; }
@@ -232,6 +239,19 @@ class X86MachineFunctionInfo : public MachineFunctionInfo {
   bool hasVirtualTileReg() const { return HasVirtualTileReg; }
   void setHasVirtualTileReg(bool v) { HasVirtualTileReg = v; }
 
+  bool padForPush2Pop2() const { return PadForPush2Pop2; }
+  void setPadForPush2Pop2(bool V) { PadForPush2Pop2 = V; }
+
+  bool isCandidateForPush2Pop2(Register Reg) const {
+    return CandidatesForPush2Pop2.find(Reg) != CandidatesForPush2Pop2.end();
+  }
+  void addCandidateForPush2Pop2(Register Reg) {
+    CandidatesForPush2Pop2.insert(Reg);
+  }
+  size_t getNumCandidatesForPush2Pop2() const {
+    return CandidatesForPush2Pop2.size();
+  }
+
   bool hasCFIAdjustCfa() const { return HasCFIAdjustCfa; }
   void setHasCFIAdjustCfa(bool v) { HasCFIAdjustCfa = v; }
 
diff --git a/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
new file mode 100644
index 000000000000000..4aa6fdc3b77cb14
--- /dev/null
+++ b/llvm/test/CodeGen/X86/apx/push2-pop2-cfi-seh.ll
@@ -0,0 +1,217 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu | FileCheck %s --check-prefix=LIN-REF
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+push2pop2 | FileCheck %s --check-prefix=LIN
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+push2pop2,+ppx | FileCheck %s --check-prefix=LIN-PPX
+; RUN: llc < %s -mtriple=x86_64-windows-msvc | FileCheck %s --check-prefix=WIN-REF
+; RUN: llc < %s -mtriple=x86_64-windows-msvc -mattr=+push2pop2 | FileCheck %s --check-prefix=WIN
+; RUN: llc < %s -mtriple=x86_64-windows-msvc -mattr=+push2pop2,+ppx | FileCheck %s --check-prefix=WIN-PPX
+
+define i32 @csr6_alloc16(ptr %argv) {
+; LIN-REF-LABEL: csr6_alloc16:
+; LIN-REF:       # %bb.0: # %entry
+; LIN-REF-NEXT:    pushq %rbp
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 16
+; LIN-REF-NEXT:    pushq %r15
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 24
+; LIN-REF-NEXT:    pushq %r14
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 32
+; LIN-REF-NEXT:    pushq %r13
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 40
+; LIN-REF-NEXT:    pushq %r12
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 48
+; LIN-REF-NEXT:    pushq %rbx
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 56
+; LIN-REF-NEXT:    subq $24, %rsp
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 80
+; LIN-REF-NEXT:    .cfi_offset %rbx, -56
+; LIN-REF-NEXT:    .cfi_offset %r12, -48
+; LIN-REF-NEXT:    .cfi_offset %r13, -40
+; LIN-REF-NEXT:    .cfi_offset %r14, -32
+; LIN-REF-NEXT:    .cfi_offset %r15, -24
+; LIN-REF-NEXT:    .cfi_offset %rbp, -16
+; LIN-REF-NEXT:    #APP
+; LIN-REF-NEXT:    #NO_APP
+; LIN-REF-NEXT:    xorl %ecx, %ecx
+; LIN-REF-NEXT:    xorl %eax, %eax
+; LIN-REF-NEXT:    callq *%rcx
+; LIN-REF-NEXT:    addq $24, %rsp
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 56
+; LIN-REF-NEXT:    popq %rbx
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 48
+; LIN-REF-NEXT:    popq %r12
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 40
+; LIN-REF-NEXT:    popq %r13
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 32
+; LIN-REF-NEXT:    popq %r14
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 24
+; LIN-REF-NEXT:    popq %r15
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 16
+; LIN-REF-NEXT:    popq %rbp
+; LIN-REF-NEXT:    .cfi_def_cfa_offset 8
+; LIN-REF-NEXT:    retq
+;
+; LIN-LABEL: csr6_alloc16:
+; LIN:       # %bb.0: # %entry
+; LIN-NEXT:    pushq %rax
+; LIN-NEXT:    .cfi_def_cfa_offset 16
+; LIN-NEXT:    push2 %r15, %rbp
+; LIN-NEXT:    .cfi_def_cfa_offset 32
+; LIN-NEXT:    push2 %r13, %r14
+; LIN-NEXT:    .cfi_def_cfa_offset 48
+; LIN-NEXT:    push2 %rbx, %r12
+; LIN-NEXT:    .cfi_def_cfa_offset 64
+; LIN-NEXT:    subq $16, %rsp
+; LIN-NEXT:    .cfi_def_cfa_offset 80
+; LIN-NEXT:    .cfi_offset %rbx, -64
+; LIN-NEXT:    .cfi_offset %r12, -56
+; LIN-NEXT:    .cfi_offset %r13, -48
+; LIN-NEXT:    .cfi_offset %r14, -40
+; LIN-NEXT:    .cfi_offset %r15, -32
+; LIN-NEXT:    .cfi_offset %rbp, -24
+; LIN-NEXT:    #APP
+; LIN-NEXT:    #NO_APP
+; LIN-NEXT:    xorl %ecx, %ecx
+; LIN-NEXT:    xorl %eax, %eax
+; LIN-NEXT:    callq *%rcx
+; LIN-NEXT:    addq $16, %rsp
+; LIN-NEXT:    .cfi_def_cfa_offset 64
+; LIN-NEXT:    pop2 %r12, %rbx
+; LIN-NEXT:    .cfi_def_cfa_offset 48
+; LIN-NEXT:    pop2 %r14, %r13
+; LIN-NEXT:    .cfi_def_cfa_offset 32
+; LIN-NEXT:    pop2 %rbp, %r15
+; LIN-NEXT:    .cfi_def_cfa_offset 16
+; LIN-NEXT:    popq %rcx
+; LIN-NEXT:    .cfi_def_cfa_offset 8
+; LIN-NEXT:    retq
+;
+; LIN-PPX-LABEL: csr6_alloc16:
+; LIN-PPX:       # %bb.0: # %entry
+; LIN-PPX-NEXT:    pushq %rax
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 16
+; LIN-PPX-NEXT:    push2p %r15, %rbp
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 32
+; LIN-PPX-NEXT:    push2p %r13, %r14
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 48
+; LIN-PPX-NEXT:    push2p %rbx, %r12
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 64
+; LIN-PPX-NEXT:    subq $16, %rsp
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 80
+; LIN-PPX-NEXT:    .cfi_offset %rbx, -64
+; LIN-PPX-NEXT:    .cfi_offset %r12, -56
+; LIN-PPX-NEXT:    .cfi_offset %r13, -48
+; LIN-PPX-NEXT:    .cfi_offset %r14, -40
+; LIN-PPX-NEXT:    .cfi_offset %r15, -32
+; LIN-PPX-NEXT:    .cfi_offset %rbp, -24
+; LIN-PPX-NEXT:    #APP
+; LIN-PPX-NEXT:    #NO_APP
+; LIN-PPX-NEXT:    xorl %ecx, %ecx
+; LIN-PPX-NEXT:    xorl %eax, %eax
+; LIN-PPX-NEXT:    callq *%rcx
+; LIN-PPX-NEXT:    addq $16, %rsp
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 64
+; LIN-PPX-NEXT:    pop2p %r12, %rbx
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 48
+; LIN-PPX-NEXT:    pop2p %r14, %r13
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 32
+; LIN-PPX-NEXT:    pop2p %rbp, %r15
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 16
+; LIN-PPX-NEXT:    popq %rcx
+; LIN-PPX-NEXT:    .cfi_def_cfa_offset 8
+; LIN-PPX-NEXT:    retq
+;
+; WIN-REF-LABEL: csr6_alloc16:
+; WIN-REF:       # %bb.0: # %entry
+; WIN-REF-NEXT:    pushq %r15
+; WIN-REF-NEXT:    .seh_pushreg %r15
+; WIN-REF-NEXT:    pushq %r14
+; WIN-REF-NEXT:    .seh_pushreg %r14
+; WIN-REF-NEXT:    pushq %r13
+; WIN-REF-NEXT:    .seh_pushreg %r13
+; WIN-REF-NEXT:    pushq %r12
+; WIN-REF-NEXT:    .seh_pushreg %r12
+; WIN-REF-NEXT:    pushq %rbp
+; WIN-REF-NEXT:    .seh_pushreg %rbp
+; WIN-REF-NEXT:    pushq %rbx
+; WIN-REF-NEXT:    .seh_pushreg %rbx
+; WIN-REF-NEXT:    subq $56, %rsp
+; WIN-REF-NEXT:    .seh_stackalloc 56
+; WIN-REF-NEXT:    .seh_endprologue
+; WIN-REF-NEXT:    #APP
+; WIN-REF-NEXT:    #NO_APP
+; WIN-REF-NEXT:    xorl %eax, %eax
+; WIN-REF-NEXT:    callq *%rax
+; WIN-REF-NEXT:    nop
+; WIN-REF-NEXT:    addq $56, %rsp
+; WIN-REF-NEXT:    popq %rbx
+;...
[truncated]

Copy link

github-actions bot commented Nov 24, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@KanRobert
Copy link
Contributor Author

KanRobert commented Nov 27, 2023

Ping @phoebewang @XinWang10

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@KanRobert
Copy link
Contributor Author

Thanks for the quick review @phoebewang !

@KanRobert KanRobert merged commit cb112eb into llvm:main Nov 27, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants