Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Allocate the varargs GPR save area as a single object. #74354

Merged
merged 3 commits into from
Dec 5, 2023

Conversation

topperc
Copy link
Collaborator

@topperc topperc commented Dec 4, 2023

Previously we allocated one object for each GPR. We also allocated the same offset twice, once to save for VASTART and then again for the first register in the save loop.

This patch uses a single object for all the registers and shares this with VASTART. This is more consistent with other targets like AArch64 and ARM.

I've removed the setValue(nullptr) from the memory operand now. Having a single object makes me a lot more comfortable about alias analysis being able to see what is going on. This led to the scheduling changes in push-pop-popret.ll and vararg.ll.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 4, 2023

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-globalisel

Author: Craig Topper (topperc)

Changes

Previously we allocated one object for each GPR. We also allocated the same offset twice, once to save for VASTART and then again for the first register in the save loop.

This patch uses a single object for all the registers and shares this with VASTART. This is more consistent with other targets like AArch64 and ARM.

I've removed the setValue(nullptr) from the memory operand now. Having a single object makes me a lot more comfortable about alias analysis being able to see what is going on. This led to the scheduling changes in push-pop-popret.ll and vararg.ll.


Patch is 45.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74354.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp (+43-40)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/lower-args-vararg.ll (+140-112)
  • (modified) llvm/test/CodeGen/RISCV/push-pop-popret.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/vararg.ll (+16-16)
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
index 9e96fba069c4e..d3357b70e5143 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
@@ -436,60 +436,63 @@ void RISCVCallLowering::saveVarArgRegisters(
   const RISCVSubtarget &Subtarget = MF.getSubtarget<RISCVSubtarget>();
   unsigned XLenInBytes = Subtarget.getXLen() / 8;
   ArrayRef<MCPhysReg> ArgRegs(ArgGPRs);
+  MachineRegisterInfo &MRI = MF.getRegInfo();
   unsigned Idx = CCInfo.getFirstUnallocated(ArgRegs);
+  MachineFrameInfo &MFI = MF.getFrameInfo();
+  RISCVMachineFunctionInfo *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();
 
-  // Offset of the first variable argument from stack pointer, and size of
-  // the vararg save area. For now, the varargs save area is either zero or
-  // large enough to hold a0-a7.
-  int VaArgOffset;
+  // Size of the vararg save area. For now, the varargs save area is either
+  // zero or large enough to hold a0-a7.
   int VarArgsSaveSize = XLenInBytes * (ArgRegs.size() - Idx);
+  int FI;
 
   // If all registers are allocated, then all varargs must be passed on the
   // stack and we don't need to save any argregs.
   if (VarArgsSaveSize == 0) {
-    VaArgOffset = Assigner.StackSize;
+    int VaArgOffset = Assigner.StackSize;
+    FI = MFI.CreateFixedObject(XLenInBytes, VaArgOffset, true);
   } else {
-    VaArgOffset = -VarArgsSaveSize;
+    int VaArgOffset = -VarArgsSaveSize;
+    FI = MFI.CreateFixedObject(VarArgsSaveSize, VaArgOffset, true);
+
+    // If saving an odd number of registers then create an extra stack slot to
+    // ensure that the frame pointer is 2*XLEN-aligned, which in turn ensures
+    // offsets to even-numbered registered remain 2*XLEN-aligned.
+    if (Idx % 2) {
+      MFI.CreateFixedObject(XLenInBytes, VaArgOffset - (int)XLenInBytes, true);
+      VarArgsSaveSize += XLenInBytes;
+    }
+
+    const LLT p0 = LLT::pointer(MF.getDataLayout().getAllocaAddrSpace(),
+                                Subtarget.getXLen());
+    const LLT sXLen = LLT::scalar(Subtarget.getXLen());
+
+    auto FIN = MIRBuilder.buildFrameIndex(p0, FI);
+    auto Offset = MIRBuilder.buildConstant(
+        MRI.createGenericVirtualRegister(sXLen), XLenInBytes);
+
+    // Copy the integer registers that may have been used for passing varargs
+    // to the vararg save area.
+    const MVT XLenVT = Subtarget.getXLenVT();
+    for (unsigned I = Idx; I < ArgRegs.size(); ++I) {
+      const Register VReg = MRI.createGenericVirtualRegister(sXLen);
+      Handler.assignValueToReg(
+          VReg, ArgRegs[I],
+          CCValAssign::getReg(I + MF.getFunction().getNumOperands(), XLenVT,
+                              ArgRegs[I], XLenVT, CCValAssign::Full));
+      auto MPO =
+          MachinePointerInfo::getFixedStack(MF, FI, (I - Idx) * XLenInBytes);
+      auto Store =
+          MIRBuilder.buildStore(VReg, FIN, MPO, inferAlignFromPtrInfo(MF, MPO));
+      FIN = MIRBuilder.buildPtrAdd(MRI.createGenericVirtualRegister(p0),
+                                   FIN.getReg(0), Offset);
+    }
   }
 
   // Record the frame index of the first variable argument which is a value
   // necessary to G_VASTART.
-  MachineFrameInfo &MFI = MF.getFrameInfo();
-  int FI = MFI.CreateFixedObject(XLenInBytes, VaArgOffset, true);
-  RISCVMachineFunctionInfo *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();
   RVFI->setVarArgsFrameIndex(FI);
-
-  // If saving an odd number of registers then create an extra stack slot to
-  // ensure that the frame pointer is 2*XLEN-aligned, which in turn ensures
-  // offsets to even-numbered registered remain 2*XLEN-aligned.
-  if (Idx % 2) {
-    MFI.CreateFixedObject(XLenInBytes, VaArgOffset - (int)XLenInBytes, true);
-    VarArgsSaveSize += XLenInBytes;
-  }
   RVFI->setVarArgsSaveSize(VarArgsSaveSize);
-
-  // Copy the integer registers that may have been used for passing varargs
-  // to the vararg save area.
-  const LLT p0 = LLT::pointer(MF.getDataLayout().getAllocaAddrSpace(),
-                              Subtarget.getXLen());
-  const LLT sXLen = LLT::scalar(Subtarget.getXLen());
-  const MVT XLenVT = Subtarget.getXLenVT();
-  MachineRegisterInfo &MRI = MF.getRegInfo();
-  for (unsigned I = Idx; I < ArgRegs.size(); ++I, VaArgOffset += XLenInBytes) {
-    const Register VReg = MRI.createGenericVirtualRegister(sXLen);
-    Handler.assignValueToReg(
-        VReg, ArgRegs[I],
-        CCValAssign::getReg(I + MF.getFunction().getNumOperands(), XLenVT,
-                            ArgRegs[I], XLenVT, CCValAssign::Full));
-    FI = MFI.CreateFixedObject(XLenInBytes, VaArgOffset, true);
-    auto FIN = MIRBuilder.buildFrameIndex(p0, FI);
-    auto MPO = MachinePointerInfo::getFixedStack(MF, FI);
-    auto Store =
-        MIRBuilder.buildStore(VReg, FIN, MPO, inferAlignFromPtrInfo(MF, MPO));
-    // This was taken from  SelectionDAG, but we are not sure why it exists.
-    // It is being investigated in github.com/llvm/llvm-project/issues/73735.
-    Store->memoperands()[0]->setValue((Value *)nullptr);
-  }
 }
 
 bool RISCVCallLowering::lowerFormalArguments(MachineIRBuilder &MIRBuilder,
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index cf1b11c14b6d0..97cc92bcb125f 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -17691,49 +17691,49 @@ SDValue RISCVTargetLowering::LowerFormalArguments(
     MachineRegisterInfo &RegInfo = MF.getRegInfo();
     RISCVMachineFunctionInfo *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();
 
-    // Offset of the first variable argument from stack pointer, and size of
-    // the vararg save area. For now, the varargs save area is either zero or
-    // large enough to hold a0-a7.
-    int VaArgOffset;
+    // Size of the vararg save area. For now, the varargs save area is either
+    // zero or large enough to hold a0-a7.
     int VarArgsSaveSize = XLenInBytes * (ArgRegs.size() - Idx);
+    int FI;
 
     // If all registers are allocated, then all varargs must be passed on the
     // stack and we don't need to save any argregs.
     if (VarArgsSaveSize == 0) {
-      VaArgOffset = CCInfo.getStackSize();
+      int VaArgOffset = CCInfo.getStackSize();
+      FI = MFI.CreateFixedObject(XLenInBytes, VaArgOffset, true);
     } else {
-      VaArgOffset = -VarArgsSaveSize;
+      int VaArgOffset = -VarArgsSaveSize;
+      FI = MFI.CreateFixedObject(VarArgsSaveSize, VaArgOffset, true);
+
+      // If saving an odd number of registers then create an extra stack slot to
+      // ensure that the frame pointer is 2*XLEN-aligned, which in turn ensures
+      // offsets to even-numbered registered remain 2*XLEN-aligned.
+      if (Idx % 2) {
+        MFI.CreateFixedObject(XLenInBytes, VaArgOffset - (int)XLenInBytes,
+                              true);
+        VarArgsSaveSize += XLenInBytes;
+      }
+
+      SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
+
+      // Copy the integer registers that may have been used for passing varargs
+      // to the vararg save area.
+      for (unsigned I = Idx; I < ArgRegs.size(); ++I) {
+        const Register Reg = RegInfo.createVirtualRegister(RC);
+        RegInfo.addLiveIn(ArgRegs[I], Reg);
+        SDValue ArgValue = DAG.getCopyFromReg(Chain, DL, Reg, XLenVT);
+        SDValue Store = DAG.getStore(
+            Chain, DL, ArgValue, FIN,
+            MachinePointerInfo::getFixedStack(MF, FI, (I - Idx) * XLenInBytes));
+        OutChains.push_back(Store);
+        FIN =
+            DAG.getMemBasePlusOffset(FIN, TypeSize::getFixed(XLenInBytes), DL);
+      }
     }
 
     // Record the frame index of the first variable argument
     // which is a value necessary to VASTART.
-    int FI = MFI.CreateFixedObject(XLenInBytes, VaArgOffset, true);
     RVFI->setVarArgsFrameIndex(FI);
-
-    // If saving an odd number of registers then create an extra stack slot to
-    // ensure that the frame pointer is 2*XLEN-aligned, which in turn ensures
-    // offsets to even-numbered registered remain 2*XLEN-aligned.
-    if (Idx % 2) {
-      MFI.CreateFixedObject(XLenInBytes, VaArgOffset - (int)XLenInBytes, true);
-      VarArgsSaveSize += XLenInBytes;
-    }
-
-    // Copy the integer registers that may have been used for passing varargs
-    // to the vararg save area.
-    for (unsigned I = Idx; I < ArgRegs.size();
-         ++I, VaArgOffset += XLenInBytes) {
-      const Register Reg = RegInfo.createVirtualRegister(RC);
-      RegInfo.addLiveIn(ArgRegs[I], Reg);
-      SDValue ArgValue = DAG.getCopyFromReg(Chain, DL, Reg, XLenVT);
-      FI = MFI.CreateFixedObject(XLenInBytes, VaArgOffset, true);
-      SDValue PtrOff = DAG.getFrameIndex(FI, getPointerTy(DAG.getDataLayout()));
-      SDValue Store = DAG.getStore(Chain, DL, ArgValue, PtrOff,
-                                   MachinePointerInfo::getFixedStack(MF, FI));
-      cast<StoreSDNode>(Store.getNode())
-          ->getMemOperand()
-          ->setValue((Value *)nullptr);
-      OutChains.push_back(Store);
-    }
     RVFI->setVarArgsSaveSize(VarArgsSaveSize);
   }
 
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/lower-args-vararg.ll b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/lower-args-vararg.ll
index ecfccc48bb34f..020f171076995 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/lower-args-vararg.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/lower-args-vararg.ll
@@ -10,27 +10,29 @@ define void @va1arg(ptr %a, ...) {
   ; RV32-NEXT:   liveins: $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17
   ; RV32-NEXT: {{  $}}
   ; RV32-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.1
+  ; RV32-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4
   ; RV32-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
-  ; RV32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.6
-  ; RV32-NEXT:   G_STORE [[COPY1]](s32), [[FRAME_INDEX]](p0) :: (store (s32))
+  ; RV32-NEXT:   G_STORE [[COPY1]](s32), [[FRAME_INDEX]](p0) :: (store (s32) into %fixed-stack.1)
+  ; RV32-NEXT:   [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX]], [[C]](s32)
   ; RV32-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $x12
-  ; RV32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.5
-  ; RV32-NEXT:   G_STORE [[COPY2]](s32), [[FRAME_INDEX1]](p0) :: (store (s32), align 8)
+  ; RV32-NEXT:   G_STORE [[COPY2]](s32), [[PTR_ADD]](p0) :: (store (s32) into %fixed-stack.1 + 4)
+  ; RV32-NEXT:   [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD]], [[C]](s32)
   ; RV32-NEXT:   [[COPY3:%[0-9]+]]:_(s32) = COPY $x13
-  ; RV32-NEXT:   [[FRAME_INDEX2:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.4
-  ; RV32-NEXT:   G_STORE [[COPY3]](s32), [[FRAME_INDEX2]](p0) :: (store (s32))
+  ; RV32-NEXT:   G_STORE [[COPY3]](s32), [[PTR_ADD1]](p0) :: (store (s32) into %fixed-stack.1 + 8)
+  ; RV32-NEXT:   [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD1]], [[C]](s32)
   ; RV32-NEXT:   [[COPY4:%[0-9]+]]:_(s32) = COPY $x14
-  ; RV32-NEXT:   [[FRAME_INDEX3:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.3
-  ; RV32-NEXT:   G_STORE [[COPY4]](s32), [[FRAME_INDEX3]](p0) :: (store (s32), align 16)
+  ; RV32-NEXT:   G_STORE [[COPY4]](s32), [[PTR_ADD2]](p0) :: (store (s32) into %fixed-stack.1 + 12)
+  ; RV32-NEXT:   [[PTR_ADD3:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD2]], [[C]](s32)
   ; RV32-NEXT:   [[COPY5:%[0-9]+]]:_(s32) = COPY $x15
-  ; RV32-NEXT:   [[FRAME_INDEX4:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.2
-  ; RV32-NEXT:   G_STORE [[COPY5]](s32), [[FRAME_INDEX4]](p0) :: (store (s32))
+  ; RV32-NEXT:   G_STORE [[COPY5]](s32), [[PTR_ADD3]](p0) :: (store (s32) into %fixed-stack.1 + 16)
+  ; RV32-NEXT:   [[PTR_ADD4:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD3]], [[C]](s32)
   ; RV32-NEXT:   [[COPY6:%[0-9]+]]:_(s32) = COPY $x16
-  ; RV32-NEXT:   [[FRAME_INDEX5:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.1
-  ; RV32-NEXT:   G_STORE [[COPY6]](s32), [[FRAME_INDEX5]](p0) :: (store (s32), align 8)
+  ; RV32-NEXT:   G_STORE [[COPY6]](s32), [[PTR_ADD4]](p0) :: (store (s32) into %fixed-stack.1 + 20)
+  ; RV32-NEXT:   [[PTR_ADD5:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD4]], [[C]](s32)
   ; RV32-NEXT:   [[COPY7:%[0-9]+]]:_(s32) = COPY $x17
-  ; RV32-NEXT:   [[FRAME_INDEX6:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.0
-  ; RV32-NEXT:   G_STORE [[COPY7]](s32), [[FRAME_INDEX6]](p0) :: (store (s32))
+  ; RV32-NEXT:   G_STORE [[COPY7]](s32), [[PTR_ADD5]](p0) :: (store (s32) into %fixed-stack.1 + 24)
+  ; RV32-NEXT:   [[PTR_ADD6:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD5]], [[C]](s32)
   ; RV32-NEXT:   PseudoRET
   ;
   ; RV64-LABEL: name: va1arg
@@ -38,27 +40,29 @@ define void @va1arg(ptr %a, ...) {
   ; RV64-NEXT:   liveins: $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17
   ; RV64-NEXT: {{  $}}
   ; RV64-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.1
+  ; RV64-NEXT:   [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
   ; RV64-NEXT:   [[COPY1:%[0-9]+]]:_(s64) = COPY $x11
-  ; RV64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.6
-  ; RV64-NEXT:   G_STORE [[COPY1]](s64), [[FRAME_INDEX]](p0) :: (store (s64))
+  ; RV64-NEXT:   G_STORE [[COPY1]](s64), [[FRAME_INDEX]](p0) :: (store (s64) into %fixed-stack.1)
+  ; RV64-NEXT:   [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX]], [[C]](s64)
   ; RV64-NEXT:   [[COPY2:%[0-9]+]]:_(s64) = COPY $x12
-  ; RV64-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.5
-  ; RV64-NEXT:   G_STORE [[COPY2]](s64), [[FRAME_INDEX1]](p0) :: (store (s64), align 16)
+  ; RV64-NEXT:   G_STORE [[COPY2]](s64), [[PTR_ADD]](p0) :: (store (s64) into %fixed-stack.1 + 8)
+  ; RV64-NEXT:   [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD]], [[C]](s64)
   ; RV64-NEXT:   [[COPY3:%[0-9]+]]:_(s64) = COPY $x13
-  ; RV64-NEXT:   [[FRAME_INDEX2:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.4
-  ; RV64-NEXT:   G_STORE [[COPY3]](s64), [[FRAME_INDEX2]](p0) :: (store (s64))
+  ; RV64-NEXT:   G_STORE [[COPY3]](s64), [[PTR_ADD1]](p0) :: (store (s64) into %fixed-stack.1 + 16)
+  ; RV64-NEXT:   [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD1]], [[C]](s64)
   ; RV64-NEXT:   [[COPY4:%[0-9]+]]:_(s64) = COPY $x14
-  ; RV64-NEXT:   [[FRAME_INDEX3:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.3
-  ; RV64-NEXT:   G_STORE [[COPY4]](s64), [[FRAME_INDEX3]](p0) :: (store (s64), align 16)
+  ; RV64-NEXT:   G_STORE [[COPY4]](s64), [[PTR_ADD2]](p0) :: (store (s64) into %fixed-stack.1 + 24)
+  ; RV64-NEXT:   [[PTR_ADD3:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD2]], [[C]](s64)
   ; RV64-NEXT:   [[COPY5:%[0-9]+]]:_(s64) = COPY $x15
-  ; RV64-NEXT:   [[FRAME_INDEX4:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.2
-  ; RV64-NEXT:   G_STORE [[COPY5]](s64), [[FRAME_INDEX4]](p0) :: (store (s64))
+  ; RV64-NEXT:   G_STORE [[COPY5]](s64), [[PTR_ADD3]](p0) :: (store (s64) into %fixed-stack.1 + 32)
+  ; RV64-NEXT:   [[PTR_ADD4:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD3]], [[C]](s64)
   ; RV64-NEXT:   [[COPY6:%[0-9]+]]:_(s64) = COPY $x16
-  ; RV64-NEXT:   [[FRAME_INDEX5:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.1
-  ; RV64-NEXT:   G_STORE [[COPY6]](s64), [[FRAME_INDEX5]](p0) :: (store (s64), align 16)
+  ; RV64-NEXT:   G_STORE [[COPY6]](s64), [[PTR_ADD4]](p0) :: (store (s64) into %fixed-stack.1 + 40)
+  ; RV64-NEXT:   [[PTR_ADD5:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD4]], [[C]](s64)
   ; RV64-NEXT:   [[COPY7:%[0-9]+]]:_(s64) = COPY $x17
-  ; RV64-NEXT:   [[FRAME_INDEX6:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.0
-  ; RV64-NEXT:   G_STORE [[COPY7]](s64), [[FRAME_INDEX6]](p0) :: (store (s64))
+  ; RV64-NEXT:   G_STORE [[COPY7]](s64), [[PTR_ADD5]](p0) :: (store (s64) into %fixed-stack.1 + 48)
+  ; RV64-NEXT:   [[PTR_ADD6:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD5]], [[C]](s64)
   ; RV64-NEXT:   PseudoRET
   ret void
 }
@@ -70,24 +74,26 @@ define void @va2arg(ptr %a, ptr %b, ...) {
   ; RV32-NEXT: {{  $}}
   ; RV32-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
   ; RV32-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; RV32-NEXT:   [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 4
   ; RV32-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $x12
-  ; RV32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.5
-  ; RV32-NEXT:   G_STORE [[COPY2]](s32), [[FRAME_INDEX]](p0) :: (store (s32), align 8)
+  ; RV32-NEXT:   G_STORE [[COPY2]](s32), [[FRAME_INDEX]](p0) :: (store (s32) into %fixed-stack.0, align 8)
+  ; RV32-NEXT:   [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX]], [[C]](s32)
   ; RV32-NEXT:   [[COPY3:%[0-9]+]]:_(s32) = COPY $x13
-  ; RV32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.4
-  ; RV32-NEXT:   G_STORE [[COPY3]](s32), [[FRAME_INDEX1]](p0) :: (store (s32))
+  ; RV32-NEXT:   G_STORE [[COPY3]](s32), [[PTR_ADD]](p0) :: (store (s32) into %fixed-stack.0 + 4)
+  ; RV32-NEXT:   [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD]], [[C]](s32)
   ; RV32-NEXT:   [[COPY4:%[0-9]+]]:_(s32) = COPY $x14
-  ; RV32-NEXT:   [[FRAME_INDEX2:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.3
-  ; RV32-NEXT:   G_STORE [[COPY4]](s32), [[FRAME_INDEX2]](p0) :: (store (s32), align 16)
+  ; RV32-NEXT:   G_STORE [[COPY4]](s32), [[PTR_ADD1]](p0) :: (store (s32) into %fixed-stack.0 + 8, align 8)
+  ; RV32-NEXT:   [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD1]], [[C]](s32)
   ; RV32-NEXT:   [[COPY5:%[0-9]+]]:_(s32) = COPY $x15
-  ; RV32-NEXT:   [[FRAME_INDEX3:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.2
-  ; RV32-NEXT:   G_STORE [[COPY5]](s32), [[FRAME_INDEX3]](p0) :: (store (s32))
+  ; RV32-NEXT:   G_STORE [[COPY5]](s32), [[PTR_ADD2]](p0) :: (store (s32) into %fixed-stack.0 + 12)
+  ; RV32-NEXT:   [[PTR_ADD3:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD2]], [[C]](s32)
   ; RV32-NEXT:   [[COPY6:%[0-9]+]]:_(s32) = COPY $x16
-  ; RV32-NEXT:   [[FRAME_INDEX4:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.1
-  ; RV32-NEXT:   G_STORE [[COPY6]](s32), [[FRAME_INDEX4]](p0) :: (store (s32), align 8)
+  ; RV32-NEXT:   G_STORE [[COPY6]](s32), [[PTR_ADD3]](p0) :: (store (s32) into %fixed-stack.0 + 16, align 8)
+  ; RV32-NEXT:   [[PTR_ADD4:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD3]], [[C]](s32)
   ; RV32-NEXT:   [[COPY7:%[0-9]+]]:_(s32) = COPY $x17
-  ; RV32-NEXT:   [[FRAME_INDEX5:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.0
-  ; RV32-NEXT:   G_STORE [[COPY7]](s32), [[FRAME_INDEX5]](p0) :: (store (s32))
+  ; RV32-NEXT:   G_STORE [[COPY7]](s32), [[PTR_ADD4]](p0) :: (store (s32) into %fixed-stack.0 + 20)
+  ; RV32-NEXT:   [[PTR_ADD5:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD4]], [[C]](s32)
   ; RV32-NEXT:   PseudoRET
   ;
   ; RV64-LABEL: name: va2arg
@@ -96,24 +102,26 @@ define void @va2arg(ptr %a, ptr %b, ...) {
   ; RV64-NEXT: {{  $}}
   ; RV64-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
   ; RV64-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.0
+  ; RV64-NEXT:   [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
   ; RV64-NEXT:   [[COPY2:%[0-9]+]]:_(s64) = COPY $x12
-  ; RV64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.5
-  ; RV64-NEXT:   G_STORE [[COPY2]](s64), [[FRAME_INDEX]](p0) :: (store (s64), align 16)
+  ; RV64-NEXT:   G_STORE [[COPY2]](s64), [[FRAME_INDEX]](p0) :: (store (s64) into %fixed-stack.0, align 16)
+  ; RV64-NEXT:   [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX]], [[C]](s64)
   ; RV64-NEXT:   [[COPY3:%[0-9]+]]:_(s64) = COPY $x13
-  ; RV64-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.4
-  ; RV64-NEXT:   G_STORE [[COPY3]](s64), [[FRAME_INDEX1]](p0) :: (store (s64))
+  ; RV64-NEXT:   G_STORE [[COPY3]](s64), [[PTR_ADD]](p0) :: (store (s64) into %fixed-stack.0 + 8)
+  ; RV64-NEXT:   [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD]], [[C]](s64)
   ; RV64-NEXT:   [[COPY4:%[0-9]+]]:_(s64) = COPY $x14
-  ; RV64-NEXT:   [[FRAME_INDEX2:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.3
-  ; RV64-NEXT:   G_STORE [[COPY4]](s64), [[FRAME_INDEX2]](p0) :: (store (s64), align 16)
+  ; RV64-NEXT:   G_STORE [[COPY4]](s64), [[PTR_ADD1]](p0) :: (store (s64) into %fixed-stack.0 + 16, align 16)
+  ; RV64-NEXT:   [[PTR_ADD2:%[0-9]+]]:_(p0) = G_PTR_ADD [[PTR_ADD1]], [[C]](s64)
   ; RV64-NEXT:   [[COPY5:%[0-9]+]]:_(s64) = COPY $x15
-  ; RV64-NEXT:   [[FRAME_INDEX3:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.2
-  ;...
[truncated]

; RV64IZCMP-SR-NEXT: addi a0, sp, 24
; RV64IZCMP-SR-NEXT: sd a0, 8(sp)
; RV64IZCMP-SR-NEXT: lwu a0, 12(sp)
; RV64IZCMP-SR-NEXT: lwu a1, 8(sp)
; RV64IZCMP-SR-NEXT: lwu a3, 8(sp)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is kind of weird, all the alignments are 4 instead of XLen/8. This leads to memory access splitting on RV64 that isn't seen on RV32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plan to rework these tests?

; LP64-LP64F-LP64D-FPELIM-NEXT: addi a0, sp, 24
; LP64-LP64F-LP64D-FPELIM-NEXT: sd a0, 8(sp)
; LP64-LP64F-LP64D-FPELIM-NEXT: lw a0, 8(sp)
; LP64-LP64F-LP64D-FPELIM-NEXT: sd a3, 40(sp)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests is also weird. vastart stores a value that is supposed to be a pointer, but the test loads it as an i32 insead of iXLen.

// ensure that the frame pointer is 2*XLEN-aligned, which in turn ensures
// offsets to even-numbered registered remain 2*XLEN-aligned.
if (Idx % 2) {
MFI.CreateFixedObject(XLenInBytes, VaArgOffset - (int)XLenInBytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

// ensure that the frame pointer is 2*XLEN-aligned, which in turn ensures
// offsets to even-numbered registered remain 2*XLEN-aligned.
if (Idx % 2) {
MFI.CreateFixedObject(XLenInBytes, VaArgOffset - (int)XLenInBytes, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use C-style casting.

@@ -10,55 +10,59 @@ define void @va1arg(ptr %a, ...) {
; RV32-NEXT: liveins: $x10, $x11, $x12, $x13, $x14, $x15, $x16, $x17
; RV32-NEXT: {{ $}}
; RV32-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x10
; RV32-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %fixed-stack.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a regression or this can be optimized in later passes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it would get optimized by this generic combiner.

def ptr_add_immed_matchdata : GIDefMatchData<"PtrAddChain">;                     
def ptr_add_immed_chain : GICombineRule<                                         
  (defs root:$d, ptr_add_immed_matchdata:$matchinfo),                            
  (match (wip_match_opcode G_PTR_ADD):$d,                                        
         [{ return Helper.matchPtrAddImmedChain(*${d}, ${matchinfo}); }]),       
  (apply [{ Helper.applyPtrAddImmedChain(*${d}, ${matchinfo}); }])>

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

VarArgsSaveSize += XLenInBytes;
}

const LLT p0 = LLT::pointer(MF.getDataLayout().getAllocaAddrSpace(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables should start with capital letter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names are consistent with the naming of LLTs in GISel code in RISC-V and other targets. I would prefer not to adopt a new convention for this one piece of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, but this is against LLVM coding style...whatever... 😵‍💫

Previously we allocated one object for each GPR. We also allocated
the same offset twice, once to save for VASTART and then again for
the first register in the save loop.

This patch uses a single object for all the registers and shares
this with VASTART. This is more consistent with other targets like
AArch64 and ARM.

I've removed the setValue(nullptr) from the memory operand now.
Having a single object makes me a lot more comfortable about alias
analysis being able to see what is going on. This led to the scheduling
changes in push-pop-popret.ll and vararg.ll.
@topperc topperc merged commit 3c5b42a into llvm:main Dec 5, 2023
4 checks passed
@topperc topperc deleted the pr/vararg-fi branch December 5, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants