-
Notifications
You must be signed in to change notification settings - Fork 15.1k
[AArch64] Fallback to PRFUM for PRFM with negative or unaligned offset #166756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-backend-aarch64 Author: Cullen Rhodes (c-rhodes) ChangesSection C3.2.2 (quoted below) in the ARMARM makes this a requirement of assemblers for load/stores with unscaled offset. It makes no mention of PRFM so I don't consider this to be a bug, although I can see why we would want to extend this behaviour to the unscaled variants of these instructions as well, as GCC does. This patch adds an alias for this. C3.2.2 Load/store register (unscaled offset) The load/store register instructions with an unscaled offset support See Load/store addressing modes. The load/store register (unscaled offset) instructions are required to The ambiguous immediate offsets are byte offsets that are both: Other byte offsets in the range -256 to 255 inclusive are unambiguous. Fixes #83226. Full diff: https://github.com/llvm/llvm-project/pull/166756.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 2871a20e28b65..5a608ef80230f 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -4444,6 +4444,11 @@ defm PRFUM : PrefetchUnscaled<0b11, 0, 0b10, "prfum",
[(AArch64Prefetch timm:$Rt,
(am_unscaled64 GPR64sp:$Rn, simm9:$offset))]>;
+// PRFM falls back to PRFUM for negative or unaligned offsets (not a multiple
+// of 8).
+def : InstAlias<"prfm $Rt, [$Rn, $offset]",
+ (PRFUMi prfop:$Rt, GPR64sp:$Rn, simm9_offset_fb64:$offset), 0>;
+
//---
// (unscaled immediate, unprivileged)
defm LDTRX : LoadUnprivileged<0b11, 0, 0b01, GPR64, "ldtr">;
diff --git a/llvm/test/MC/AArch64/prfum.s b/llvm/test/MC/AArch64/prfum.s
new file mode 100644
index 0000000000000..81a864a694325
--- /dev/null
+++ b/llvm/test/MC/AArch64/prfum.s
@@ -0,0 +1,44 @@
+// RUN: llvm-mc -triple=aarch64 -show-encoding --print-imm-hex=false < %s \
+// RUN: | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+// RUN: llvm-mc -triple=aarch64 -filetype=obj < %s \
+// RUN: | llvm-objdump -d --print-imm-hex=false - | FileCheck %s --check-prefix=CHECK-INST
+// Disassemble encoding and check the re-encoding (-show-encoding) matches.
+// RUN: llvm-mc -triple=aarch64 -show-encoding < %s \
+// RUN: | sed '/.text/d' | sed 's/.*encoding: //g' \
+// RUN: | llvm-mc -triple=aarch64 -disassemble -show-encoding --print-imm-hex=false \
+// RUN: | FileCheck %s --check-prefixes=CHECK-ENCODING,CHECK-INST
+
+// PRFM falls back to PRFUM for negative or unaligned offsets (not a multiple
+// of 8).
+
+prfm pldl1keep, [x0, #-256]
+// CHECK-INST: prfum pldl1keep, [x0, #-256]
+// CHECK-ENCODING: [0x00,0x00,0x90,0xf8]
+
+prfm pldl1keep, [x0, #-8]
+// CHECK-INST: prfum pldl1keep, [x0, #-8]
+// CHECK-ENCODING: [0x00,0x80,0x9f,0xf8]
+
+prfm pldl1keep, [x0, #-1]
+// CHECK-INST: prfum pldl1keep, [x0, #-1]
+// CHECK-ENCODING: [0x00,0xf0,0x9f,0xf8]
+
+prfm pldl1keep, [x0, #0]
+// CHECK-INST: prfm pldl1keep, [x0]
+// CHECK-ENCODING: [0x00,0x00,0x80,0xf9]
+
+prfm pldl1keep, [x0, #1]
+// CHECK-INST: prfum pldl1keep, [x0, #1]
+// CHECK-ENCODING: [0x00,0x10,0x80,0xf8]
+
+prfm pldl1keep, [x0, #8]
+// CHECK-INST: prfm pldl1keep, [x0, #8]
+// CHECK-ENCODING: [0x00,0x04,0x80,0xf9]
+
+prfm pldl1keep, [x0, #255]
+// CHECK-INST: prfum pldl1keep, [x0, #255]
+// CHECK-ENCODING: [0x00,0xf0,0x8f,0xf8]
+
+prfm pldl1keep, [x0, #256]
+// CHECK-INST: prfm pldl1keep, [x0, #256]
+// CHECK-ENCODING: [0x00,0x80,0x80,0xf9]
|
jthackray
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Section C3.2.2 (quoted below) in the ARMARM makes this a requirement of
assemblers for load/stores with unscaled offset. It makes no mention of
PRFM so I don't consider this to be a bug, although I can see why we
would want to extend this behaviour to the unscaled variants of these
instructions as well, as GCC does. This patch adds an alias for this.
C3.2.2 Load/store register (unscaled offset)
The load/store register instructions with an unscaled offset support
only one addressing mode:
Base plus an unscaled 9-bit signed immediate offset.
See Load/store addressing modes.
The load/store register (unscaled offset) instructions are required to
disambiguate this instruction class from the load/store register
instruction forms that support an addressing mode of base plus a scaled,
unsigned 12-bit immediate offset, because that can represent some offset
values in the same range.
The ambiguous immediate offsets are byte offsets that are both:
In the range 0-255, inclusive.
Naturally aligned to the access size.
Other byte offsets in the range -256 to 255 inclusive are unambiguous.
An assembler program translating a load/store instruction, for example
LDR, is required to encode an unambiguous offset using the unscaled
9-bit offset form, and to encode an ambiguous offset using the scaled
12-bit offset form. A programmer might force the generation of the
unscaled 9-bit form by using one of the mnemonics in Table C.3.21. Arm
recommends that a disassembler outputs all unscaled 9-bit offset forms
using one of these mnemonics, but unambiguous offsets can be output
using a load/store single register mnemonic, for example, LDR.
Fixes llvm#83226.
5b44736 to
734df0d
Compare
* main: (1028 commits) [clang][DebugInfo] Attach `DISubprogram` to additional call variants (llvm#166202) [C2y] Claim nonconformance to WG14 N3348 (llvm#166966) [X86] 2012-01-10-UndefExceptionEdge.ll - regenerate test checks (llvm#167307) Remove unused standard headers: <string>, <optional>, <numeric>, <tuple> (llvm#167232) [DebugInfo] Add Verifier check for incorrectly-scoped retainedNodes (llvm#166855) [VPlan] Don't apply predication discount to non-originally-predicated blocks (llvm#160449) [libc++] Avoid overloaded `operator,` for (`T`, `Iter`) cases (llvm#161049) [tools][llc] Make save-stats.ll test target independent (llvm#167238) [AArch64] Fallback to PRFUM for PRFM with negative or unaligned offset (llvm#166756) [X86] ldexp-avx512.ll - add v8f16/v16f16/v32f16 test coverage for llvm#165694 (llvm#167294) [DropAssumes] Drop dereferenceable assumptions after vectorization. (llvm#166947) [VPlan] Simplify branch-cond with getVectorTripCount (llvm#155604) Remove unused <algorithm> inclusion (llvm#166942) [AArch64] Combine subtract with borrow to SBC. (llvm#165271) [AArch64][SVE] Avoid redundant extend of unsigned i8/i16 extracts. (llvm#165863) [SPIRV] Fix failing assertion in SPIRVAsmPrinter (llvm#166909) [libc++] Merge insert/emplace(const_iterator, Args...) implementations (llvm#166470) [libc++] Replace __libcpp_is_final with a variable template (llvm#167137) [gn build] Port 152bda7 [libc++] Replace the last uses of __tuple_types with __type_list (llvm#167214) ...
Section C3.2.2 (quoted below) in the ARMARM makes this a requirement of assemblers for load/stores with unscaled offset. It makes no mention of PRFM so I don't consider this to be a bug, although I can see why we would want to extend this behaviour to the unscaled variants of these instructions as well, as GCC does. This patch adds an alias for this.
C3.2.2 Load/store register (unscaled offset)
The load/store register instructions with an unscaled offset support
only one addressing mode:
See Load/store addressing modes.
The load/store register (unscaled offset) instructions are required to
disambiguate this instruction class from the load/store register
instruction forms that support an addressing mode of base plus a scaled,
unsigned 12-bit immediate offset, because that can represent some offset
values in the same range.
The ambiguous immediate offsets are byte offsets that are both:
Other byte offsets in the range -256 to 255 inclusive are unambiguous.
An assembler program translating a load/store instruction, for example
LDR, is required to encode an unambiguous offset using the unscaled
9-bit offset form, and to encode an ambiguous offset using the scaled
12-bit offset form. A programmer might force the generation of the
unscaled 9-bit form by using one of the mnemonics in Table C.3.21. Arm
recommends that a disassembler outputs all unscaled 9-bit offset forms
using one of these mnemonics, but unambiguous offsets can be output
using a load/store single register mnemonic, for example, LDR.
Fixes #83226.