Skip to content

Conversation

KanRobert
Copy link
Contributor

@KanRobert KanRobert commented Nov 22, 2023

A PUSH and its corresponding POP may be marked with a 1-bit Push-Pop Acceleration (PPX)
hint to indicate that the POP reads the value written by the PUSH from the stack. The PPX hint
is encoded by setting REX2.W = 1 and is applicable only to PUSH with opcode 0x50+rd and POP
with opcode 0x58+rd in the legacy space. It is not applicable to any other variants of PUSH and POP.

@llvmbot llvmbot added backend:X86 llvm:mc Machine (object) code labels Nov 22, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-mc

Author: Shengchen Kan (KanRobert)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/73092.diff

5 Files Affected:

  • (modified) llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp (+2-2)
  • (modified) llvm/lib/Target/X86/X86InstrMisc.td (+4)
  • (added) llvm/test/MC/Disassembler/X86/apx/pushp-popp.txt (+34)
  • (added) llvm/test/MC/X86/apx/pushp-popp-att.s (+31)
  • (added) llvm/test/MC/X86/apx/pushp-popp-intel.s (+27)
diff --git a/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp b/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
index 3e42499fe6be340..d5218d356a5dec8 100644
--- a/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
+++ b/llvm/lib/Target/X86/Disassembler/X86Disassembler.cpp
@@ -1257,9 +1257,9 @@ static int getInstructionID(struct InternalInstruction *insn,
     attrMask &= ~ATTR_ADSIZE;
   }
 
-  // Absolute jump need special handling
+  // Absolute jump and pushp/popp need special handling
   if (insn->rex2ExtensionPrefix[0] == 0xd5 && insn->opcodeType == ONEBYTE &&
-      insn->opcode == 0xA1)
+      (insn->opcode == 0xA1 || (insn->opcode & 0xf0) == 0x50))
     attrMask |= ATTR_REX2;
 
   if (insn->mode == MODE_16BIT) {
diff --git a/llvm/lib/Target/X86/X86InstrMisc.td b/llvm/lib/Target/X86/X86InstrMisc.td
index 88e7a388713f80e..9812386f594a29b 100644
--- a/llvm/lib/Target/X86/X86InstrMisc.td
+++ b/llvm/lib/Target/X86/X86InstrMisc.td
@@ -161,6 +161,8 @@ let isCodeGenOnly = 1, ForceDisassemble = 1 in {
 def POP64rmr: I<0x8F, MRM0r, (outs GR64:$reg), (ins), "pop{q}\t$reg", []>,
                 OpSize32, Requires<[In64BitMode]>;
 } // isCodeGenOnly = 1, ForceDisassemble = 1
+def POPP64r  : I<0x58, AddRegFrm, (outs GR64:$reg), (ins), "popp\t$reg", []>,
+                 REX_W, ExplicitREX2Prefix, Requires<[In64BitMode]>;
 } // mayLoad, SchedRW
 let mayLoad = 1, mayStore = 1, SchedRW = [WriteCopy] in
 def POP64rmm: I<0x8F, MRM0m, (outs), (ins i64mem:$dst), "pop{q}\t$dst", []>,
@@ -173,6 +175,8 @@ let isCodeGenOnly = 1, ForceDisassemble = 1 in {
 def PUSH64rmr: I<0xFF, MRM6r, (outs), (ins GR64:$reg), "push{q}\t$reg", []>,
                  OpSize32, Requires<[In64BitMode]>;
 } // isCodeGenOnly = 1, ForceDisassemble = 1
+def PUSHP64r  : I<0x50, AddRegFrm, (outs), (ins GR64:$reg), "pushp\t$reg", []>,
+                  REX_W, ExplicitREX2Prefix, Requires<[In64BitMode]>;
 } // mayStore, SchedRW
 let mayLoad = 1, mayStore = 1, SchedRW = [WriteCopy] in {
 def PUSH64rmm: I<0xFF, MRM6m, (outs), (ins i64mem:$src), "push{q}\t$src", []>,
diff --git a/llvm/test/MC/Disassembler/X86/apx/pushp-popp.txt b/llvm/test/MC/Disassembler/X86/apx/pushp-popp.txt
new file mode 100644
index 000000000000000..4ec534fc9dcf6ec
--- /dev/null
+++ b/llvm/test/MC/Disassembler/X86/apx/pushp-popp.txt
@@ -0,0 +1,34 @@
+# RUN: llvm-mc -triple x86_64 -disassemble %s | FileCheck %s --check-prefix=ATT
+# RUN: llvm-mc -triple x86_64 -disassemble -output-asm-variant=1 %s | FileCheck %s --check-prefix=INTEL
+
+# ATT:   pushp	%rax
+# INTEL: pushp	rax
+0xd5,0x08,0x50
+
+# ATT:   pushp	%rbx
+# INTEL: pushp	rbx
+0xd5,0x08,0x53
+
+# ATT:   pushp	%r15
+# INTEL: pushp	r15
+0xd5,0x09,0x57
+
+# ATT:   pushp	%r16
+# INTEL: pushp	r16
+0xd5,0x18,0x50
+
+# ATT:   popp	%rax
+# INTEL: popp	rax
+0xd5,0x08,0x58
+
+# ATT:   popp	%rbx
+# INTEL: popp	rbx
+0xd5,0x08,0x5b
+
+# ATT:   popp	%r15
+# INTEL: popp	r15
+0xd5,0x09,0x5f
+
+# ATT:   popp	%r16
+# INTEL: popp	r16
+0xd5,0x18,0x58
diff --git a/llvm/test/MC/X86/apx/pushp-popp-att.s b/llvm/test/MC/X86/apx/pushp-popp-att.s
new file mode 100644
index 000000000000000..a8107448bb088f8
--- /dev/null
+++ b/llvm/test/MC/X86/apx/pushp-popp-att.s
@@ -0,0 +1,31 @@
+# RUN: llvm-mc -triple x86_64 -show-encoding %s | FileCheck %s
+# RUN: not llvm-mc -triple i386 -show-encoding %s 2>&1 | FileCheck %s --check-prefix=ERROR
+
+# ERROR-COUNT-8: error:
+# ERROR-NOT: error:
+
+# CHECK: pushp	%rax
+# CHECK: encoding: [0xd5,0x08,0x50]
+         pushp	%rax
+# CHECK: pushp	%rbx
+# CHECK: encoding: [0xd5,0x08,0x53]
+         pushp	%rbx
+# CHECK: pushp	%r15
+# CHECK: encoding: [0xd5,0x09,0x57]
+         pushp	%r15
+# CHECK: pushp	%r16
+# CHECK: encoding: [0xd5,0x18,0x50]
+         pushp	%r16
+
+# CHECK: popp	%rax
+# CHECK: encoding: [0xd5,0x08,0x58]
+         popp	%rax
+# CHECK: popp	%rbx
+# CHECK: encoding: [0xd5,0x08,0x5b]
+         popp	%rbx
+# CHECK: popp	%r15
+# CHECK: encoding: [0xd5,0x09,0x5f]
+         popp	%r15
+# CHECK: popp	%r16
+# CHECK: encoding: [0xd5,0x18,0x58]
+         popp	%r16
diff --git a/llvm/test/MC/X86/apx/pushp-popp-intel.s b/llvm/test/MC/X86/apx/pushp-popp-intel.s
new file mode 100644
index 000000000000000..97b72a2cf726bb8
--- /dev/null
+++ b/llvm/test/MC/X86/apx/pushp-popp-intel.s
@@ -0,0 +1,27 @@
+# RUN: llvm-mc -triple x86_64 -show-encoding -x86-asm-syntax=intel -output-asm-variant=1 %s | FileCheck %s
+
+# CHECK: pushp	rax
+# CHECK: encoding: [0xd5,0x08,0x50]
+         pushp	rax
+# CHECK: pushp	rbx
+# CHECK: encoding: [0xd5,0x08,0x53]
+         pushp	rbx
+# CHECK: pushp	r15
+# CHECK: encoding: [0xd5,0x09,0x57]
+         pushp	r15
+# CHECK: pushp	r16
+# CHECK: encoding: [0xd5,0x18,0x50]
+         pushp	r16
+
+# CHECK: popp	rax
+# CHECK: encoding: [0xd5,0x08,0x58]
+         popp	rax
+# CHECK: popp	rbx
+# CHECK: encoding: [0xd5,0x08,0x5b]
+         popp	rbx
+# CHECK: popp	r15
+# CHECK: encoding: [0xd5,0x09,0x5f]
+         popp	r15
+# CHECK: popp	r16
+# CHECK: encoding: [0xd5,0x18,0x58]
+         popp	r16

@e-kud
Copy link
Contributor

e-kud commented Nov 23, 2023

Just curious, all combinations are from 0x50 to 0x5F, can we test them all, or it is really redundant?

Copy link
Contributor

@e-kud e-kud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@KanRobert
Copy link
Contributor Author

Just curious, all combinations are from 0x50 to 0x5F, can we test them all, or it is really redundant?

The logic for 0x50~0x5F is about the format AddRegFrm and already tested with legacy PUSH/POP. So I think it's redundant.

@KanRobert KanRobert merged commit a3cab1f into llvm:main Nov 23, 2023
KanRobert added a commit that referenced this pull request Nov 24, 2023
PUSH2 and POP2 are two new instructions for (respectively)
pushing/popping 2 GPRs at a time to/from
the stack. The opcodes of PUSH2 and POP2 are those of “PUSH r/m” and
“POP r/m” from legacy map 0, but we
require ModRM.Mod = 3 in order to disallow memory operand. 

The 1-bit Push-Pop Acceleration hint described in #73092 applies to
PUSH2/POP2 too, then we have PUSH2P/POP2P.

For AT&T syntax, PUSH2[P] pushes the registers from right to left onto
the stack. POP2[P] pops the stack to registers from right to left. Intel
syntax has the opposite order - from left to right.

The assembly syntax is aligned with GCC & binutils
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637718.html
KanRobert added a commit that referenced this pull request Nov 27, 2023
…P/POPP, PUSH2[P]/POP2[P] (#73292)

#73092 supported the encoding/decoding for PUSHP/POPP
#73233 supported the encoding/decoding for PUSH2[P]/POP2[P]

In this patch, we teach frame lowering to spill/reload registers w/
these instructions.

1. Use PPX for balanced spill/reload
2. Use PUSH2/POP2 for continuous spills/reloads
3. PUSH2/POP2 must be 16B-aligned on the stack, so pad when necessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:mc Machine (object) code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants