[AArch64] Add CodeGen support for FEAT_CPA #79569

rgwott · 2024-01-26T10:16:31Z

CPA stands for Checked Pointer Arithmetic and is part of the 2023 MTE architecture extensions for A-profile.
The new CPA instructions perform regular pointer arithmetic (such as base register + offset) but check for overflow in the most significant bits of the result.

In this patch we intend to capture the semantics of pointer arithmetic when it is not folded into loads/stores, then generate the appropriate CPA instructions.

Mode details about the extension can be found at:

CPA stands for Checked Pointer Arithmetic and is part of the 2023 MTE architecture extensions for A-profile. The new CPA instructions perform regular pointer arithmetic (such as base register + offset) but check for overflow in the most significant bits of the result. In this patch we intend to capture the semantics of pointer arithmetic when it is not folded into loads/stores, then generate the appropriate CPA instructions. Mode details about the extension can be found at: * https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-developments-2023 * https://developer.arm.com/documentation/ddi0602/2023-09/

llvmbot · 2024-01-26T10:17:02Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Rodolfo Wottrich (rgwott)

Changes

CPA stands for Checked Pointer Arithmetic and is part of the 2023 MTE architecture extensions for A-profile.
The new CPA instructions perform regular pointer arithmetic (such as base register + offset) but check for overflow in the most significant bits of the result.

In this patch we intend to capture the semantics of pointer arithmetic when it is not folded into loads/stores, then generate the appropriate CPA instructions.

Mode details about the extension can be found at:

Patch is 22.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79569.diff

8 Files Affected:

(modified) llvm/include/llvm/Target/TargetMachine.h (+5)
(modified) llvm/include/llvm/Target/TargetSelectionDAG.td (+3-2)
(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.td (+20)
(modified) llvm/lib/Target/AArch64/AArch64TargetMachine.cpp (+4)
(modified) llvm/lib/Target/AArch64/AArch64TargetMachine.h (+4)
(modified) llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp (+4)
(added) llvm/test/CodeGen/AArch64/cpa-globalisel.ll (+171)
(added) llvm/test/CodeGen/AArch64/cpa-selectiondag.ll (+169)

diff --git a/llvm/include/llvm/Target/TargetMachine.h b/llvm/include/llvm/Target/TargetMachine.h
index a522a12299bb029..42b2d8ef536e827 100644
--- a/llvm/include/llvm/Target/TargetMachine.h
+++ b/llvm/include/llvm/Target/TargetMachine.h
@@ -420,6 +420,11 @@ class TargetMachine {
   virtual unsigned getAddressSpaceForPseudoSourceKind(unsigned Kind) const {
     return 0;
   }
+
+  /// True if target has some form of pointer arithmetic checking.
+  /// Helps identify whether pointer arithmetic semantics should be preserved
+  /// for passes such as instruction selection.
+  virtual bool isPtrArithmeticChecked(const Function &F) const { return false; }
 };
 
 /// This class describes a target machine that is implemented with the LLVM
diff --git a/llvm/include/llvm/Target/TargetSelectionDAG.td b/llvm/include/llvm/Target/TargetSelectionDAG.td
index 22360353790dbce..071e801a4785e4e 100644
--- a/llvm/include/llvm/Target/TargetSelectionDAG.td
+++ b/llvm/include/llvm/Target/TargetSelectionDAG.td
@@ -109,7 +109,7 @@ def SDTOther  : SDTypeProfile<1, 0, [SDTCisVT<0, OtherVT>]>; // for 'vt'.
 def SDTUNDEF  : SDTypeProfile<1, 0, []>;                     // for 'undef'.
 def SDTUnaryOp  : SDTypeProfile<1, 1, []>;                   // for bitconvert.
 
-def SDTPtrAddOp : SDTypeProfile<1, 2, [     // ptradd
+def SDTPtrAddSubOp : SDTypeProfile<1, 2, [  // ptradd, ptrsub
   SDTCisSameAs<0, 1>, SDTCisInt<2>, SDTCisPtrTy<1>
 ]>;
 def SDTIntBinOp : SDTypeProfile<1, 2, [     // add, and, or, xor, udiv, etc.
@@ -384,8 +384,9 @@ def tblockaddress: SDNode<"ISD::TargetBlockAddress",  SDTPtrLeaf, [],
 
 def add        : SDNode<"ISD::ADD"       , SDTIntBinOp   ,
                         [SDNPCommutative, SDNPAssociative]>;
-def ptradd     : SDNode<"ISD::ADD"       , SDTPtrAddOp, []>;
+def ptradd     : SDNode<"ISD::ADD"       , SDTPtrAddSubOp, []>;
 def sub        : SDNode<"ISD::SUB"       , SDTIntBinOp>;
+def ptrsub     : SDNode<"ISD::SUB"       , SDTPtrAddSubOp, []>;
 def mul        : SDNode<"ISD::MUL"       , SDTIntBinOp,
                         [SDNPCommutative, SDNPAssociative]>;
 def mulhs      : SDNode<"ISD::MULHS"     , SDTIntBinOp, [SDNPCommutative]>;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.td b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
index 03baa7497615e3d..ccc7817cab18985 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.td
@@ -9587,6 +9587,26 @@ let Predicates = [HasCPA] in {
   // Scalar multiply-add/subtract
   def MADDPT : MulAccumCPA<0, "maddpt">;
   def MSUBPT : MulAccumCPA<1, "msubpt">;
+
+  // Rules to use CPA instructions in pointer arithmetic patterns which are not
+  // folded into loads/stores. The AddedComplexity serves to help supersede
+  // other simpler (non-CPA) patterns and make sure CPA is used instead.
+  let AddedComplexity = 20 in {
+    def : Pat<(ptradd GPR64sp:$Rn, GPR64sp:$Rm),
+              (ADDPT_shift GPR64sp:$Rn, GPR64sp:$Rm, (i32 0))>;
+    def : Pat<(ptradd GPR64sp:$Rn, (shl GPR64sp:$Rm, (i64 imm0_7:$imm))),
+              (ADDPT_shift GPR64sp:$Rn, GPR64sp:$Rm,
+                           (i32 (trunc_imm imm0_7:$imm)))>;
+    def : Pat<(ptrsub GPR64sp:$Rn, GPR64sp:$Rm),
+              (SUBPT_shift GPR64sp:$Rn, GPR64sp:$Rm, (i32 0))>;
+    def : Pat<(ptrsub GPR64sp:$Rn, (shl GPR64sp:$Rm, (i64 imm0_7:$imm))),
+              (SUBPT_shift GPR64sp:$Rn, GPR64sp:$Rm,
+                           (i32 (trunc_imm imm0_7:$imm)))>;
+    def : Pat<(ptradd GPR64:$Ra, (mul GPR64:$Rn, GPR64:$Rm)),
+              (MADDPT GPR64:$Rn, GPR64:$Rm, GPR64:$Ra)>;
+    def : Pat<(ptradd GPR64:$Ra, (mul GPR64:$Rn, (sub (i64 0), GPR64:$Rm))),
+              (MSUBPT GPR64:$Rn, GPR64:$Rm, GPR64:$Ra)>;
+  }
 }
 
 include "AArch64InstrAtomics.td"
diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
index 6fbc13d8904f2e2..e67df1f417c55cd 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -892,3 +892,7 @@ bool AArch64TargetMachine::parseMachineFunctionInfo(
   MF.getInfo<AArch64FunctionInfo>()->initializeBaseYamlFields(YamlMFI);
   return false;
 }
+
+bool AArch64TargetMachine::isPtrArithmeticChecked(const Function &F) const {
+  return getSubtargetImpl(F)->hasCPA();
+}
\ No newline at end of file
diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.h b/llvm/lib/Target/AArch64/AArch64TargetMachine.h
index 8fb68b06f137803..e21d35f87290661 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.h
@@ -71,6 +71,10 @@ class AArch64TargetMachine : public LLVMTargetMachine {
     return true;
   }
 
+  /// In AArch64, true if FEAT_CPA is present. Helps preserve pointer arithmetic
+  /// semantics for instruction selection.
+  bool isPtrArithmeticChecked(const Function &F) const override;
+
 private:
   bool isLittle;
 };
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
index 8344e79f78e1eb6..7480328f4b5dec3 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp
@@ -2075,6 +2075,10 @@ bool AArch64InstructionSelector::preISelLower(MachineInstr &I) {
     return Changed;
   }
   case TargetOpcode::G_PTR_ADD:
+    // If Checked Pointer Arithmetic (FEAT_CPA) is present, preserve the pointer
+    // arithmetic semantics instead of falling back to regular arithmetic.
+    if (TM.isPtrArithmeticChecked(MF.getFunction()))
+      return false;
     return convertPtrAddToAdd(I, MRI);
   case TargetOpcode::G_LOAD: {
     // For scalar loads of pointers, we try to convert the dest type from p0
diff --git a/llvm/test/CodeGen/AArch64/cpa-globalisel.ll b/llvm/test/CodeGen/AArch64/cpa-globalisel.ll
new file mode 100644
index 000000000000000..d7880454146b2c1
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/cpa-globalisel.ll
@@ -0,0 +1,171 @@
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=+cpa -O0 -global-isel=1 -global-isel-abort=1 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-CPA-O0
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=+cpa -O3 -global-isel=1 -global-isel-abort=1 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-CPA-O3
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=-cpa -O0 -global-isel=1 -global-isel-abort=1 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-NOCPA-O0
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=-cpa -O3 -global-isel=1 -global-isel-abort=1 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-NOCPA-O3
+
+%struct.my_type = type { i64, i64 }
+%struct.my_type2 = type { i64, i64, i64, i64, i64, i64 }
+
+@array = external dso_local global [10 x %struct.my_type], align 8
+@array2 = external dso_local global [10 x %struct.my_type2], align 8
+
+define void @addpt1(i64 %index, i64 %arg) {
+; CHECK-CPA-O0-LABEL:    addpt1:
+; CHECK-CPA-O0:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-CPA-O0:          str	x{{[0-9]+}}, [[[REG1]], #8]
+;
+; CHECK-CPA-O3-LABEL:    addpt1:
+; CHECK-CPA-O3:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-CPA-O3:          str	x{{[0-9]+}}, [[[REG1]], #8]
+;
+; CHECK-NOCPA-O0-LABEL:  addpt1:
+; CHECK-NOCPA-O0:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-NOCPA-O0:        str	x{{[0-9]+}}, [[[REG1]], #8]
+;
+; CHECK-NOCPA-O3-LABEL:  addpt1:
+; CHECK-NOCPA-O3:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-NOCPA-O3:        str	x{{[0-9]+}}, [[[REG1]], #8]
+entry:
+  %e2 = getelementptr inbounds %struct.my_type, ptr @array, i64 %index, i32 1
+  store i64 %arg, ptr %e2, align 8
+  ret void
+}
+
+define void @maddpt1(i32 %pos, ptr %val) {
+; CHECK-CPA-O0-LABEL:    maddpt1:
+; CHECK-CPA-O0:          maddpt	x0, x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          b	memcpy
+;
+; CHECK-CPA-O3-LABEL:    maddpt1:
+; CHECK-CPA-O3:          maddpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]]]
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #16]
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #32]
+;
+; CHECK-NOCPA-O0-LABEL:  maddpt1:
+; CHECK-NOCPA-O0:        smaddl	x0, w{{[0-9]+}}, w{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O0:        b	memcpy
+;
+; CHECK-NOCPA-O3-LABEL:  maddpt1:
+; CHECK-NOCPA-O3:        smaddl	[[REG1:x[0-9]+]], w{{[0-9]+}}, w{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]]]
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #16]
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #32]
+entry:
+  %idxprom = sext i32 %pos to i64
+  %arrayidx = getelementptr inbounds [10 x %struct.my_type2], ptr @array2, i64 0, i64 %idxprom
+  tail call void @llvm.memcpy.p0.p0.i64(ptr align 8 dereferenceable(48) %arrayidx, ptr align 8 dereferenceable(48) %val, i64 48, i1 false)
+  ret void
+}
+
+define void @msubpt1(i32 %index, i32 %elem) {
+; CHECK-CPA-O0-LABEL:    msubpt1:
+; CHECK-CPA-O0:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          msubpt	x0, x{{[0-9]+}}, x{{[0-9]+}}, [[REG1]]
+; CHECK-CPA-O0:          addpt	x1, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          b	memcpy
+;
+; CHECK-CPA-O3-LABEL:    msubpt1:
+; CHECK-CPA-O3:          msubpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #192]
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #208]
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #224]
+;
+; CHECK-NOCPA-O0-LABEL:  msubpt1:
+; CHECK-NOCPA-O0:        mneg	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O0:        add	x0, x{{[0-9]+}}, [[REG1]]
+; CHECK-NOCPA-O0:        b	memcpy
+;
+; CHECK-NOCPA-O3-LABEL:  msubpt1:
+; CHECK-NOCPA-O3:        mneg	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O3:        add	[[REG2:x[0-9]+]], x{{[0-9]+}}, [[REG1]]
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #192]
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #208]
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #224]
+entry:
+  %idx.ext = sext i32 %index to i64
+  %idx.neg = sub nsw i64 0, %idx.ext
+  %add.ptr = getelementptr inbounds %struct.my_type2, ptr getelementptr inbounds ([10 x %struct.my_type2], ptr @array2, i64 0, i64 6), i64 %idx.neg
+  tail call void @llvm.memcpy.p0.p0.i64(ptr align 8 dereferenceable(48) %add.ptr, ptr align 8 dereferenceable(48) getelementptr inbounds ([10 x %struct.my_type2], ptr @array2, i64 0, i64 2), i64 48, i1 false), !tbaa.struct !6
+  ret void
+}
+
+define void @subpt1(i32 %index, i32 %elem) {
+; CHECK-CPA-O0-LABEL:    subpt1:
+; CHECK-CPA-O0:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          str	q{{[0-9]+}}, [[[REG1]], x{{[0-9]+}}, lsl #4]
+;
+; CHECK-CPA-O3-LABEL:    subpt1:
+; CHECK-CPA-O3:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #64]
+;
+; CHECK-NOCPA-O0-LABEL:  subpt1:
+; CHECK-NOCPA-O0:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, #96
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG1]], x{{[0-9]+}}, lsl #4]
+;
+; CHECK-NOCPA-O3-LABEL:  subpt1:
+; CHECK-NOCPA-O3:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #64]
+entry:
+  %conv = sext i32 %index to i64
+  %mul.neg = mul nsw i64 %conv, -16
+  %add.ptr = getelementptr inbounds %struct.my_type, ptr getelementptr inbounds ([10 x %struct.my_type], ptr @array, i64 0, i64 6), i64 %mul.neg
+  tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %add.ptr, ptr noundef nonnull align 8 dereferenceable(16) getelementptr inbounds ([10 x %struct.my_type], ptr @array, i64 0, i64 2), i64 16, i1 false), !tbaa.struct !6
+  ret void
+}
+
+define void @subpt2(i32 %index, i32 %elem) {
+; CHECK-CPA-O0-LABEL:    subpt2:
+; CHECK-CPA-O0:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          str	q{{[0-9]+}}, [[[REG1]], x{{[0-9]+}}, lsl #4]
+;
+; CHECK-CPA-O3-LABEL:    subpt2:
+; CHECK-CPA-O3:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #64]
+;
+; CHECK-NOCPA-O0-LABEL:  subpt2:
+; CHECK-NOCPA-O0:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, #96
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG1]], x{{[0-9]+}}, lsl #4]
+;
+; CHECK-NOCPA-O3-LABEL:  subpt2:
+; CHECK-NOCPA-O3:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #64]
+entry:
+  %idx.ext = sext i32 %index to i64
+  %idx.neg = sub nsw i64 0, %idx.ext
+  %add.ptr = getelementptr inbounds %struct.my_type, ptr getelementptr inbounds ([10 x %struct.my_type], ptr @array, i64 0, i64 6), i64 %idx.neg
+  tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %add.ptr, ptr noundef nonnull align 8 dereferenceable(16) getelementptr inbounds ([10 x %struct.my_type], ptr @array, i64 0, i64 2), i64 16, i1 false), !tbaa.struct !11
+  ret void
+}
+
+define ptr @subpt3(ptr %ptr, i32 %index) {
+; CHECK-CPA-O0-LABEL:    subpt3:
+; CHECK-CPA-O0:          mov	[[REG1:x[0-9]+]], #-8
+; CHECK-CPA-O0:          addpt	x0, x{{[0-9]+}}, [[REG1]]
+; CHECK-CPA-O0:          ret
+;
+; CHECK-CPA-O3-LABEL:    subpt3:
+; CHECK-CPA-O3:          mov	[[REG1:x[0-9]+]], #-8
+; CHECK-CPA-O3:          addpt	x0, x{{[0-9]+}}, [[REG1]]
+; CHECK-CPA-O3:          ret
+;
+; CHECK-NOCPA-O0-LABEL:  subpt3:
+; CHECK-NOCPA-O0:        subs	x0, x{{[0-9]+}}, #8
+; CHECK-NOCPA-O0:        ret
+;
+; CHECK-NOCPA-O3-LABEL:  subpt3:
+; CHECK-NOCPA-O3:        sub	x0, x{{[0-9]+}}, #8
+; CHECK-NOCPA-O3:        ret
+entry:
+  %incdec.ptr.i.i.i = getelementptr inbounds i64, ptr %ptr, i64 -1
+  ret ptr %incdec.ptr.i.i.i
+}
+
+declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
+
+!6 = !{i64 0, i64 8, !7, i64 8, i64 8, !7, i64 16, i64 8, !7, i64 24, i64 8, !7, i64 32, i64 8, !7, i64 40, i64 8, !7}
+!7 = !{!8, !8, i64 0}
+!8 = !{!"long", !9, i64 0}
+!9 = !{!"omnipotent char", !10, i64 0}
+!10 = !{!"Simple C++ TBAA"}
+!11 = !{i64 0, i64 8, !7, i64 8, i64 8, !7}
diff --git a/llvm/test/CodeGen/AArch64/cpa-selectiondag.ll b/llvm/test/CodeGen/AArch64/cpa-selectiondag.ll
new file mode 100644
index 000000000000000..49798e70ea873a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/cpa-selectiondag.ll
@@ -0,0 +1,169 @@
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=+cpa -O0 -global-isel=0 -fast-isel=0 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-CPA-O0
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=+cpa -O3 -global-isel=0 -fast-isel=0 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-CPA-O3
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=-cpa -O0 -global-isel=0 -fast-isel=0 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-NOCPA-O0
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs --mattr=-cpa -O3 -global-isel=0 -fast-isel=0 %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK-NOCPA-O3
+
+%struct.my_type = type { i64, i64 }
+%struct.my_type2 = type { i64, i64, i64, i64, i64, i64 }
+
+@array = external dso_local global [10 x %struct.my_type], align 8
+@array2 = external dso_local global [10 x %struct.my_type2], align 8
+
+define void @addpt1(i64 %index, i64 %arg) {
+; CHECK-CPA-O0-LABEL:    addpt1:
+; CHECK-CPA-O0:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-CPA-O0:          str	x{{[0-9]+}}, [[[REG1]], #8]
+;
+; CHECK-CPA-O3-LABEL:    addpt1:
+; CHECK-CPA-O3:          addpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-CPA-O3:          str	x{{[0-9]+}}, [[[REG1]], #8]
+;
+; CHECK-NOCPA-O0-LABEL:  addpt1:
+; CHECK-NOCPA-O0:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-NOCPA-O0:        str	x{{[0-9]+}}, [[[REG1]], #8]
+
+; CHECK-NOCPA-O3-LABEL:  addpt1:
+; CHECK-NOCPA-O3:        add	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #4
+; CHECK-NOCPA-O3:        str	x{{[0-9]+}}, [[[REG1]], #8]
+entry:
+  %e2 = getelementptr inbounds %struct.my_type, ptr @array, i64 %index, i32 1
+  store i64 %arg, ptr %e2, align 8
+  ret void
+}
+
+define void @maddpt1(i32 %pos, ptr %val) {
+; CHECK-CPA-O0-LABEL:    maddpt1:
+; CHECK-CPA-O0:          maddpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          str	q{{[0-9]+}}, [[[REG1]], #16]
+;
+; CHECK-CPA-O3-LABEL:    maddpt1:
+; CHECK-CPA-O3:          maddpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O3:          stp	q{{[0-9]+}}, q{{[0-9]+}}, [[[REG1]], #16]
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]]]
+;
+; CHECK-NOCPA-O0-LABEL:  maddpt1:
+; CHECK-NOCPA-O0:        smaddl	[[REG1:x[0-9]+]], w{{[0-9]+}}, w{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG1]], #32]
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG1]], #16]
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG1]]]
+;
+; CHECK-NOCPA-O3-LABEL:  maddpt1:
+; CHECK-NOCPA-O3:        smaddl	[[REG1:x[0-9]+]], w{{[0-9]+}}, w{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O3:        stp	q{{[0-9]+}}, q{{[0-9]+}}, [[[REG1]], #16]
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]]]
+entry:
+  %idxprom = sext i32 %pos to i64
+  %arrayidx = getelementptr inbounds [10 x %struct.my_type2], ptr @array2, i64 0, i64 %idxprom
+  tail call void @llvm.memcpy.p0.p0.i64(ptr align 8 dereferenceable(48) %arrayidx, ptr align 8 dereferenceable(48) %val, i64 48, i1 false)
+  ret void
+}
+
+define void @msubpt1(i32 %index, i32 %elem) {
+; CHECK-CPA-O0-LABEL:    msubpt1:
+; CHECK-CPA-O0:          msubpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          str	q{{[0-9]+}}, [[[REG1]], #288]
+;
+; CHECK-CPA-O3-LABEL:    msubpt1:
+; CHECK-CPA-O3:          msubpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O3:          stp	q{{[0-9]+}}, q{{[0-9]+}}, [[[REG1]], #304]
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #288]
+;
+; CHECK-NOCPA-O0-LABEL:  msubpt1:
+; CHECK-NOCPA-O0:        mneg	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O0:        add	[[REG2:x[0-9]+]], x{{[0-9]+}}, [[REG1]]
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG2]], #320]
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG2]], #304]
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG2]], #288]
+;
+; CHECK-NOCPA-O3-LABEL:  msubpt1:
+; CHECK-NOCPA-O3:        mneg	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-NOCPA-O3:        add	[[REG2:x[0-9]+]], x{{[0-9]+}}, [[REG1]]
+; CHECK-NOCPA-O3:        stp	q{{[0-9]+}}, q{{[0-9]+}}, [[[REG2]], #304]
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG2]], #288]
+entry:
+  %idx.ext = sext i32 %index to i64
+  %idx.neg = sub nsw i64 0, %idx.ext
+  %add.ptr = getelementptr inbounds %struct.my_type2, ptr getelementptr inbounds ([10 x %struct.my_type2], ptr @array2, i64 0, i64 6), i64 %idx.neg
+  tail call void @llvm.memcpy.p0.p0.i64(ptr align 8 dereferenceable(48) %add.ptr, ptr align 8 dereferenceable(48) getelementptr inbounds ([10 x %struct.my_type2], ptr @array2, i64 0, i64 2), i64 48, i1 false), !tbaa.struct !6
+  ret void
+}
+
+define void @subpt1(i32 %index, i32 %elem) {
+; CHECK-CPA-O0-LABEL:    subpt1:
+; CHECK-CPA-O0:          subpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O0:          str	q{{[0-9]+}}, [[[REG1]], #96]
+;
+; CHECK-CPA-O3-LABEL:    subpt1:
+; CHECK-CPA-O3:          subpt	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}
+; CHECK-CPA-O3:          str	q{{[0-9]+}}, [[[REG1]], #96]
+;
+; CHECK-NOCPA-O0-LABEL:  subpt1:
+; CHECK-NOCPA-O0:        subs	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #8
+; CHECK-NOCPA-O0:        str	q{{[0-9]+}}, [[[REG1]], #96]
+;
+; CHECK-NOCPA-O3-LABEL:  subpt1:
+; CHECK-NOCPA-O3:        sub	[[REG1:x[0-9]+]], x{{[0-9]+}}, x{{[0-9]+}}, lsl #8
+; CHECK-NOCPA-O3:        str	q{{[0-9]+}}, [[[REG1]], #96]
+entry:
+  %conv = sext i32 %index to i64
+  %mul.neg = mul nsw i64 %conv, -16
+  %add.ptr = getelementptr inbounds %struct.my_type, ptr getelementptr inbounds ([10 x %struct.my_type], ptr @array, i64 0, i64 6), i64 %mul.neg
+  tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %add.ptr, ptr noundef nonnull align 8 dereferenceable(16) getelementptr inbounds ([10 x %struct.my_type], ptr @array, i64 ...
[truncated]

arsenm · 2024-01-26T10:25:40Z

llvm/include/llvm/Target/TargetMachine.h

+  /// True if target has some form of pointer arithmetic checking.
+  /// Helps identify whether pointer arithmetic semantics should be preserved
+  /// for passes such as instruction selection.
+  virtual bool isPtrArithmeticChecked(const Function &F) const { return false; }


Don't see why this needs to be in TargetMachine. It seems to only depend on the subtarget, so can go directly there?

Plus the only use is in AArch64

arsenm · 2024-01-26T10:27:30Z

llvm/include/llvm/Target/TargetMachine.h

+  /// True if target has some form of pointer arithmetic checking.
+  /// Helps identify whether pointer arithmetic semantics should be preserved
+  /// for passes such as instruction selection.
+  virtual bool isPtrArithmeticChecked(const Function &F) const { return false; }


Plus the only use is in AArch64

arsenm · 2024-01-26T10:27:47Z

llvm/lib/Target/AArch64/GISel/AArch64InstructionSelector.cpp

+    // If Checked Pointer Arithmetic (FEAT_CPA) is present, preserve the pointer
+    // arithmetic semantics instead of falling back to regular arithmetic.
+    if (TM.isPtrArithmeticChecked(MF.getFunction()))
+      return false;


How is this supposed to work for SDAG?

davemgreen

I don't think the patterns will work for SDAG, it doesn't know the difference between a i64 and a ptr. Please add a test like this and make sure it doesn't miscompile:

define i64 @subi64(i64 %ptr, i32 %index) {
; CHECK-CPA-O0-LABEL: subi64:
; CHECK-CPA-O0:       // %bb.0: // %entry
; CHECK-CPA-O0-NEXT:    mov x8, #-1 // =0xffffffffffffffff
; CHECK-CPA-O0-NEXT:    addpt x0, x0, x8
; CHECK-CPA-O0-NEXT:    ret
;
; CHECK-CPA-O3-LABEL: subi64:
; CHECK-CPA-O3:       // %bb.0: // %entry
; CHECK-CPA-O3-NEXT:    mov x8, #-1 // =0xffffffffffffffff
; CHECK-CPA-O3-NEXT:    addpt x0, x0, x8
; CHECK-CPA-O3-NEXT:    ret
;
; CHECK-NOCPA-O0-LABEL: subi64:
; CHECK-NOCPA-O0:       // %bb.0: // %entry
; CHECK-NOCPA-O0-NEXT:    subs x0, x0, #1
; CHECK-NOCPA-O0-NEXT:    ret
;
; CHECK-NOCPA-O3-LABEL: subi64:
; CHECK-NOCPA-O3:       // %bb.0: // %entry
; CHECK-NOCPA-O3-NEXT:    sub x0, x0, #1
; CHECK-NOCPA-O3-NEXT:    ret
entry:
  %incdec.ptr.i.i.i = add i64 %ptr, -1
  ret i64 %incdec.ptr.i.i.i
}

It might be better too to have a GlobalISel patch and a SelectionDAG version.

rgwott requested review from jthackray, momchil-velikov, pratlucas and tmatheson-arm January 26, 2024 10:16

llvmbot added backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well labels Jan 26, 2024

arsenm reviewed Jan 26, 2024

View reviewed changes

davemgreen reviewed Jan 26, 2024

View reviewed changes

rgwott closed this Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Add CodeGen support for FEAT_CPA #79569

[AArch64] Add CodeGen support for FEAT_CPA #79569

rgwott commented Jan 26, 2024

llvmbot commented Jan 26, 2024 •

edited

arsenm Jan 26, 2024

arsenm Jan 26, 2024

arsenm Jan 26, 2024

arsenm Jan 26, 2024

davemgreen left a comment

[AArch64] Add CodeGen support for FEAT_CPA #79569

[AArch64] Add CodeGen support for FEAT_CPA #79569

Conversation

rgwott commented Jan 26, 2024

llvmbot commented Jan 26, 2024 • edited

arsenm Jan 26, 2024

Choose a reason for hiding this comment

arsenm Jan 26, 2024

Choose a reason for hiding this comment

arsenm Jan 26, 2024

Choose a reason for hiding this comment

arsenm Jan 26, 2024

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

llvmbot commented Jan 26, 2024 •

edited