[IR] Require that ptrmask mask matches pointer index size #69343

nikic · 2023-10-17T15:22:15Z

Currently, we specify that the ptrmask intrinsic allows the mask to have any size, which will be zero-extended or truncated to the pointer size.

However, what semantics of the specified GEP expansion actually imply is that the mask is only meaningful up to the pointer type index size -- any higher bits of the pointer will always be preserved. In other words, the mask gets 1-extended from the index size to the pointer size. This is also the behavior we want for CHERI architectures.

This PR makes two changes:

It spells out the interaction with the pointer type index size more explicitly.
It requires that the mask matches the pointer type index size. The intention here is to make handling of this intrinsic more robust, to avoid accidental mix-ups of pointer size and index size in code generating this intrinsic. If a zero-extend or truncate of the mask is desired, it should just be done explicitly in IR. This also cuts down on the amount of testing we have to do, and things transforms needs to check for.

As far as I can tell, we don't actually support pointers with different index type size at the SDAG level, so I'm just asserting the sizes match there for now. Out-of-tree targets using different index sizes may need to adjust that code.

github-actions · 2023-10-17T15:35:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2023-10-18T08:33:04Z

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-amdgpu

Author: Nikita Popov (nikic)

Changes

Currently, we specify that the ptrmask intrinsic allows the mask to have any size, which will be zero-extended or truncated to the pointer size.

However, what semantics of the specified GEP expansion actually imply is that the mask is only meaningful up to the pointer type index size -- any higher bits of the pointer will always be preserved. In other words, the mask gets 1-extended from the index size to the pointer size. This is also the behavior we want for CHERI architectures.

This PR makes two changes:

It spells out the interaction with the pointer type index size more explicitly.
It requires that the mask matches the pointer type index size. The intention here is to make handling of this intrinsic more robust, to avoid accidental mix-ups of pointer size and index size in code generating this intrinsic. If a zero-extend or truncate of the mask is desired, it should just be done explicitly in IR. This also cuts down on the amount of testing we have to do, and things transforms needs to check for.

As far as I can tell, we don't actually support pointers with different index type size at the SDAG level, so I'm just asserting the sizes match there for now. Out-of-tree targets using different index sizes may need to adjust that code.

Patch is 30.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/69343.diff

13 Files Affected:

(modified) llvm/docs/LangRef.rst (+16-5)
(modified) llvm/lib/Analysis/ValueTracking.cpp (+2-2)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+4-3)
(modified) llvm/lib/IR/Verifier.cpp (+4)
(modified) llvm/test/CodeGen/AArch64/lower-ptrmask.ll (-15)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll (-108)
(modified) llvm/test/CodeGen/AMDGPU/ptrmask.ll (+3-160)
(modified) llvm/test/CodeGen/X86/lower-ptrmask.ll (-16)
(modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/ptrmask.ll (-16)
(modified) llvm/test/Transforms/InferAlignment/ptrmask.ll (+28)
(modified) llvm/test/Transforms/InstCombine/align-addr.ll (-28)
(modified) llvm/test/Transforms/InstCombine/consecutive-ptrmask.ll (+15-37)
(modified) llvm/test/Verifier/ptrmask.ll (+18)

diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 2035091be5a6840..53dc00a41e01e96 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -26952,7 +26952,8 @@ Arguments:
 """"""""""
 
 The first argument is a pointer or vector of pointers. The second argument is
-an integer or vector of integers.
+an integer or vector of integers with the same bit width as the index type
+size of the first argument.
 
 Overview:
 """"""""""
@@ -26965,10 +26966,20 @@ to facilitate alias analysis and underlying-object detection.
 Semantics:
 """"""""""
 
-The result of ``ptrmask(ptr, mask)`` is equivalent to
-``getelementptr ptr, (ptrtoint(ptr) & mask) - ptrtoint(ptr)``. Both the returned
-pointer(s) and the first argument are based on the same underlying object (for more
-information on the *based on* terminology see
+The result of ``ptrmask(%ptr, %mask)`` is equivalent to the following expansion,
+where ``iPtrIdx`` is the index type size of the pointer::
+
+    %intptr = ptrtoint ptr %ptr to iPtrIdx ; this may truncate
+    %masked = and iPtrIdx %intptr, %mask
+    %diff = sub iPtrIdx %masked, %intptr
+    %result = getelementptr i8, ptr %ptr, iPtrIdx %diff
+
+Considering this as an operation on the integer representation of the pointer,
+if the pointer index type size is smaller than the pointer type size, this
+implies that the mask is extended with 1 bits to the pointer type size.
+
+Both the returned pointer(s) and the first argument are based on the same
+underlying object (for more information on the *based on* terminology see
 :ref:`the pointer aliasing rules <pointeraliasing>`). If the bitwidth of the
 mask argument does not match the pointer size of the target, the mask is
 zero-extended or truncated accordingly.
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 1e0281b3f1bd79e..9925fdabb5aa993 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -1637,8 +1637,8 @@ static void computeKnownBitsFromOperator(const Operator *I,
         const Value *Mask = I->getOperand(1);
         Known2 = KnownBits(Mask->getType()->getScalarSizeInBits());
         computeKnownBits(Mask, Known2, Depth + 1, Q);
-        // This is basically a pointer typed and.
-        Known &= Known2.zextOrTrunc(Known.getBitWidth());
+        // TODO: 1-extend would be more precise.
+        Known &= Known2.anyextOrTrunc(BitWidth);
         break;
       }
       case Intrinsic::x86_sse42_crc32_64_64:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 4bb0ba6f083109b..4858e17e2649439 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -7420,11 +7420,12 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
   }
   case Intrinsic::ptrmask: {
     SDValue Ptr = getValue(I.getOperand(0));
-    SDValue Const = getValue(I.getOperand(1));
+    SDValue Mask = getValue(I.getOperand(1));
 
     EVT PtrVT = Ptr.getValueType();
-    setValue(&I, DAG.getNode(ISD::AND, sdl, PtrVT, Ptr,
-                             DAG.getZExtOrTrunc(Const, sdl, PtrVT)));
+    assert(PtrVT == Mask.getValueType() &&
+           "Pointers with different index type are not supported by SDAG");
+    setValue(&I, DAG.getNode(ISD::AND, sdl, PtrVT, Ptr, Mask));
     return;
   }
   case Intrinsic::threadlocal_address: {
diff --git a/llvm/lib/IR/Verifier.cpp b/llvm/lib/IR/Verifier.cpp
index 5a3328416db3eb0..6059baf75832863 100644
--- a/llvm/lib/IR/Verifier.cpp
+++ b/llvm/lib/IR/Verifier.cpp
@@ -5964,6 +5964,10 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
             "llvm.ptrmask intrinsic arguments must have the same number of "
             "elements",
             &Call);
+    Check(DL.getIndexTypeSizeInBits(Ty0) == Ty1->getScalarSizeInBits(),
+          "llvm.ptrmask intrinsic second argument bitwidth must match "
+          "pointer index type size of first argument",
+          &Call);
     break;
   }
   };
diff --git a/llvm/test/CodeGen/AArch64/lower-ptrmask.ll b/llvm/test/CodeGen/AArch64/lower-ptrmask.ll
index aceabf27d083f94..30dacea4286af73 100644
--- a/llvm/test/CodeGen/AArch64/lower-ptrmask.ll
+++ b/llvm/test/CodeGen/AArch64/lower-ptrmask.ll
@@ -12,18 +12,3 @@ define ptr @test1(ptr %src) {
   %ptr = call ptr @llvm.ptrmask.p0.i64(ptr %src, i64 72057594037927928)
   ret ptr %ptr
 }
-
-declare ptr @llvm.ptrmask.p0.i32(ptr, i32)
-
-; CHECK-LABEL: name: test2
-; CHECK:         %0:gpr64 = COPY $x0
-; CHECK-NEXT:    %1:gpr32 = MOVi32imm 10000
-; CHECK-NEXT:    %2:gpr64 = SUBREG_TO_REG 0, killed %1, %subreg.sub_32
-; CHECK-NEXT:    %3:gpr64 = ANDXrr %0, killed %2
-; CHECK-NEXT:    $x0 = COPY %3
-; CHECK-NEXT:    RET_ReallyLR implicit $x0
-
-define ptr @test2(ptr %src) {
-  %ptr = call ptr @llvm.ptrmask.p0.i32(ptr %src, i32 10000)
-  ret ptr %ptr
-}
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll
index 8eb0658f8023b1d..7a8e521817a37f4 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-ptrmask.ll
@@ -21,78 +21,6 @@ define ptr @ptrmask_flat_i64(ptr %ptr, i64 %mask) {
   ret ptr %masked
 }
 
-define ptr @ptrmask_flat_i32(ptr %ptr, i32 %mask) {
-  ; CHECK-LABEL: name: ptrmask_flat_i32
-  ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   liveins: $vgpr0, $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
-  ; CHECK-NEXT:   [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
-  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
-  ; CHECK-NEXT:   [[PTRMASK:%[0-9]+]]:_(p0) = G_PTRMASK [[MV]], [[COPY2]](s32)
-  ; CHECK-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[PTRMASK]](p0)
-  ; CHECK-NEXT:   $vgpr0 = COPY [[UV]](s32)
-  ; CHECK-NEXT:   $vgpr1 = COPY [[UV1]](s32)
-  ; CHECK-NEXT:   SI_RETURN implicit $vgpr0, implicit $vgpr1
-  %masked = call ptr @llvm.ptrmask.p0.i32(ptr %ptr, i32 %mask)
-  ret ptr %masked
-}
-
-define ptr @ptrmask_flat_i16(ptr %ptr, i16 %mask) {
-  ; CHECK-LABEL: name: ptrmask_flat_i16
-  ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   liveins: $vgpr0, $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
-  ; CHECK-NEXT:   [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
-  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
-  ; CHECK-NEXT:   [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY2]](s32)
-  ; CHECK-NEXT:   [[PTRMASK:%[0-9]+]]:_(p0) = G_PTRMASK [[MV]], [[TRUNC]](s16)
-  ; CHECK-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[PTRMASK]](p0)
-  ; CHECK-NEXT:   $vgpr0 = COPY [[UV]](s32)
-  ; CHECK-NEXT:   $vgpr1 = COPY [[UV1]](s32)
-  ; CHECK-NEXT:   SI_RETURN implicit $vgpr0, implicit $vgpr1
-  %masked = call ptr @llvm.ptrmask.p0.i16(ptr %ptr, i16 %mask)
-  ret ptr %masked
-}
-
-define ptr @ptrmask_flat_i1(ptr %ptr, i1 %mask) {
-  ; CHECK-LABEL: name: ptrmask_flat_i1
-  ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   liveins: $vgpr0, $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
-  ; CHECK-NEXT:   [[MV:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
-  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
-  ; CHECK-NEXT:   [[TRUNC:%[0-9]+]]:_(s1) = G_TRUNC [[COPY2]](s32)
-  ; CHECK-NEXT:   [[PTRMASK:%[0-9]+]]:_(p0) = G_PTRMASK [[MV]], [[TRUNC]](s1)
-  ; CHECK-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[PTRMASK]](p0)
-  ; CHECK-NEXT:   $vgpr0 = COPY [[UV]](s32)
-  ; CHECK-NEXT:   $vgpr1 = COPY [[UV1]](s32)
-  ; CHECK-NEXT:   SI_RETURN implicit $vgpr0, implicit $vgpr1
-  %masked = call ptr @llvm.ptrmask.p0.i1(ptr %ptr, i1 %mask)
-  ret ptr %masked
-}
-
-define ptr addrspace(3) @ptrmask_local_i64(ptr addrspace(3) %ptr, i64 %mask) {
-  ; CHECK-LABEL: name: ptrmask_local_i64
-  ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   liveins: $vgpr0, $vgpr1, $vgpr2
-  ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(p3) = COPY $vgpr0
-  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
-  ; CHECK-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
-  ; CHECK-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY1]](s32), [[COPY2]](s32)
-  ; CHECK-NEXT:   [[PTRMASK:%[0-9]+]]:_(p3) = G_PTRMASK [[COPY]], [[MV]](s64)
-  ; CHECK-NEXT:   $vgpr0 = COPY [[PTRMASK]](p3)
-  ; CHECK-NEXT:   SI_RETURN implicit $vgpr0
-  %masked = call ptr addrspace(3) @llvm.ptrmask.p3.i64(ptr addrspace(3) %ptr, i64 %mask)
-  ret ptr addrspace(3) %masked
-}
-
 define ptr addrspace(3) @ptrmask_local_i32(ptr addrspace(3) %ptr, i32 %mask) {
   ; CHECK-LABEL: name: ptrmask_local_i32
   ; CHECK: bb.1 (%ir-block.0):
@@ -107,36 +35,6 @@ define ptr addrspace(3) @ptrmask_local_i32(ptr addrspace(3) %ptr, i32 %mask) {
   ret ptr addrspace(3) %masked
 }
 
-define ptr addrspace(3) @ptrmask_local_i16(ptr addrspace(3) %ptr, i16 %mask) {
-  ; CHECK-LABEL: name: ptrmask_local_i16
-  ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   liveins: $vgpr0, $vgpr1
-  ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(p3) = COPY $vgpr0
-  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
-  ; CHECK-NEXT:   [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
-  ; CHECK-NEXT:   [[PTRMASK:%[0-9]+]]:_(p3) = G_PTRMASK [[COPY]], [[TRUNC]](s16)
-  ; CHECK-NEXT:   $vgpr0 = COPY [[PTRMASK]](p3)
-  ; CHECK-NEXT:   SI_RETURN implicit $vgpr0
-  %masked = call ptr addrspace(3) @llvm.ptrmask.p3.i16(ptr addrspace(3) %ptr, i16 %mask)
-  ret ptr addrspace(3) %masked
-}
-
-define ptr addrspace(3) @ptrmask_local_i1(ptr addrspace(3) %ptr, i1 %mask) {
-  ; CHECK-LABEL: name: ptrmask_local_i1
-  ; CHECK: bb.1 (%ir-block.0):
-  ; CHECK-NEXT:   liveins: $vgpr0, $vgpr1
-  ; CHECK-NEXT: {{  $}}
-  ; CHECK-NEXT:   [[COPY:%[0-9]+]]:_(p3) = COPY $vgpr0
-  ; CHECK-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
-  ; CHECK-NEXT:   [[TRUNC:%[0-9]+]]:_(s1) = G_TRUNC [[COPY1]](s32)
-  ; CHECK-NEXT:   [[PTRMASK:%[0-9]+]]:_(p3) = G_PTRMASK [[COPY]], [[TRUNC]](s1)
-  ; CHECK-NEXT:   $vgpr0 = COPY [[PTRMASK]](p3)
-  ; CHECK-NEXT:   SI_RETURN implicit $vgpr0
-  %masked = call ptr addrspace(3) @llvm.ptrmask.p3.i1(ptr addrspace(3) %ptr, i1 %mask)
-  ret ptr addrspace(3) %masked
-}
-
 ; Seems to not work
 ; define <2 x ptr> @ptrmask_flat_i64_v2(<2 x ptr> %ptr, <2 x i64> %mask) {
 ;   %masked = call <2 x ptr> @llvm.ptrmask.v2p0.v2i64(<2 x ptr> %ptr, <2 x i64> %mask)
@@ -144,10 +42,4 @@ define ptr addrspace(3) @ptrmask_local_i1(ptr addrspace(3) %ptr, i1 %mask) {
 ; }
 
 declare ptr @llvm.ptrmask.p0.i64(ptr, i64)
-declare ptr @llvm.ptrmask.p0.i32(ptr, i32)
-declare ptr @llvm.ptrmask.p0.i16(ptr, i16)
-declare ptr @llvm.ptrmask.p0.i1(ptr, i1)
-declare ptr addrspace(3) @llvm.ptrmask.p3.i64(ptr addrspace(3), i64)
 declare ptr addrspace(3) @llvm.ptrmask.p3.i32(ptr addrspace(3), i32)
-declare ptr addrspace(3) @llvm.ptrmask.p3.i16(ptr addrspace(3), i16)
-declare ptr addrspace(3) @llvm.ptrmask.p3.i1(ptr addrspace(3), i1)
diff --git a/llvm/test/CodeGen/AMDGPU/ptrmask.ll b/llvm/test/CodeGen/AMDGPU/ptrmask.ll
index c69c55338f84d08..70622706789331f 100644
--- a/llvm/test/CodeGen/AMDGPU/ptrmask.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptrmask.ll
@@ -21,71 +21,6 @@ define ptr addrspace(1) @v_ptrmask_global_variable_i64(ptr addrspace(1) %ptr, i6
   ret ptr addrspace(1) %masked
 }
 
-define ptr addrspace(1) @v_ptrmask_global_variable_i32(ptr addrspace(1) %ptr, i32 %mask) {
-; GCN-LABEL: v_ptrmask_global_variable_i32:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_and_b32_e32 v0, v0, v2
-; GCN-NEXT:    v_mov_b32_e32 v1, 0
-; GCN-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-LABEL: v_ptrmask_global_variable_i32:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    v_and_b32_e32 v0, v0, v2
-; GFX10-NEXT:    v_mov_b32_e32 v1, 0
-; GFX10-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-LABEL: v_ptrmask_global_variable_i32:
-; GFX11:       ; %bb.0:
-; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_and_b32 v0, v0, v2
-; GFX11-NEXT:    s_setpc_b64 s[30:31]
-  %masked = call ptr addrspace(1) @llvm.ptrmask.p1.i32(ptr addrspace(1) %ptr, i32 %mask)
-  ret ptr addrspace(1) %masked
-}
-
-define ptr addrspace(1) @v_ptrmask_global_variable_i16(ptr addrspace(1) %ptr, i16 %mask) {
-; GCN-LABEL: v_ptrmask_global_variable_i16:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_and_b32_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
-; GCN-NEXT:    v_mov_b32_e32 v1, 0
-; GCN-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-LABEL: v_ptrmask_global_variable_i16:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    v_and_b32_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
-; GFX10-NEXT:    v_mov_b32_e32 v1, 0
-; GFX10-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-LABEL: v_ptrmask_global_variable_i16:
-; GFX11:       ; %bb.0:
-; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_and_b32_e32 v1, 0xffff, v2
-; GFX11-NEXT:    v_dual_mov_b32 v1, 0 :: v_dual_and_b32 v0, v0, v1
-; GFX11-NEXT:    s_setpc_b64 s[30:31]
-  %masked = call ptr addrspace(1) @llvm.ptrmask.p1.i16(ptr addrspace(1) %ptr, i16 %mask)
-  ret ptr addrspace(1) %masked
-}
-
-define ptr addrspace(3) @v_ptrmask_local_variable_i64(ptr addrspace(3) %ptr, i64 %mask) {
-; GCN-LABEL: v_ptrmask_local_variable_i64:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_and_b32_e32 v0, v0, v1
-; GCN-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10PLUS-LABEL: v_ptrmask_local_variable_i64:
-; GFX10PLUS:       ; %bb.0:
-; GFX10PLUS-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10PLUS-NEXT:    v_and_b32_e32 v0, v0, v1
-; GFX10PLUS-NEXT:    s_setpc_b64 s[30:31]
-  %masked = call ptr addrspace(3) @llvm.ptrmask.p3.i64(ptr addrspace(3) %ptr, i64 %mask)
-  ret ptr addrspace(3) %masked
-}
-
 define ptr addrspace(3) @v_ptrmask_local_variable_i32(ptr addrspace(3) %ptr, i32 %mask) {
 ; GCN-LABEL: v_ptrmask_local_variable_i32:
 ; GCN:       ; %bb.0:
@@ -102,29 +37,6 @@ define ptr addrspace(3) @v_ptrmask_local_variable_i32(ptr addrspace(3) %ptr, i32
   ret ptr addrspace(3) %masked
 }
 
-define ptr addrspace(3) @v_ptrmask_local_variable_i16(ptr addrspace(3) %ptr, i16 %mask) {
-; GCN-LABEL: v_ptrmask_local_variable_i16:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_and_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
-; GCN-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX10-LABEL: v_ptrmask_local_variable_i16:
-; GFX10:       ; %bb.0:
-; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10-NEXT:    v_and_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
-; GFX10-NEXT:    s_setpc_b64 s[30:31]
-;
-; GFX11-LABEL: v_ptrmask_local_variable_i16:
-; GFX11:       ; %bb.0:
-; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_and_b32_e32 v1, 0xffff, v1
-; GFX11-NEXT:    v_and_b32_e32 v0, v0, v1
-; GFX11-NEXT:    s_setpc_b64 s[30:31]
-  %masked = call ptr addrspace(3) @llvm.ptrmask.p3.i16(ptr addrspace(3) %ptr, i16 %mask)
-  ret ptr addrspace(3) %masked
-}
-
 define amdgpu_ps ptr addrspace(1) @s_ptrmask_global_variable_i64(ptr addrspace(1) inreg %ptr, i64 inreg %mask) {
 ; GCN-LABEL: s_ptrmask_global_variable_i64:
 ; GCN:       ; %bb.0:
@@ -139,58 +51,6 @@ define amdgpu_ps ptr addrspace(1) @s_ptrmask_global_variable_i64(ptr addrspace(1
   ret ptr addrspace(1) %masked
 }
 
-define amdgpu_ps ptr addrspace(1) @s_ptrmask_global_variable_i32(ptr addrspace(1) inreg %ptr, i32 inreg %mask) {
-; GCN-LABEL: s_ptrmask_global_variable_i32:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_mov_b32 s5, 0
-; GCN-NEXT:    s_and_b64 s[0:1], s[2:3], s[4:5]
-; GCN-NEXT:    s_mov_b32 s1, 0
-; GCN-NEXT:    ; return to shader part epilog
-;
-; GFX10PLUS-LABEL: s_ptrmask_global_variable_i32:
-; GFX10PLUS:       ; %bb.0:
-; GFX10PLUS-NEXT:    s_mov_b32 s5, 0
-; GFX10PLUS-NEXT:    s_and_b64 s[0:1], s[2:3], s[4:5]
-; GFX10PLUS-NEXT:    s_mov_b32 s1, 0
-; GFX10PLUS-NEXT:    ; return to shader part epilog
-  %masked = call ptr addrspace(1) @llvm.ptrmask.p1.i32(ptr addrspace(1) %ptr, i32 %mask)
-  ret ptr addrspace(1) %masked
-}
-
-define amdgpu_ps ptr addrspace(1) @s_ptrmask_global_variable_i16(ptr addrspace(1) inreg %ptr, i16 inreg %mask) {
-; GCN-LABEL: s_ptrmask_global_variable_i16:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_and_b32 s0, s4, 0xffff
-; GCN-NEXT:    s_mov_b32 s1, 0
-; GCN-NEXT:    s_and_b64 s[0:1], s[2:3], s[0:1]
-; GCN-NEXT:    s_mov_b32 s1, 0
-; GCN-NEXT:    ; return to shader part epilog
-;
-; GFX10PLUS-LABEL: s_ptrmask_global_variable_i16:
-; GFX10PLUS:       ; %bb.0:
-; GFX10PLUS-NEXT:    s_mov_b32 s1, 0
-; GFX10PLUS-NEXT:    s_and_b32 s0, s4, 0xffff
-; GFX10PLUS-NEXT:    s_and_b64 s[0:1], s[2:3], s[0:1]
-; GFX10PLUS-NEXT:    s_mov_b32 s1, 0
-; GFX10PLUS-NEXT:    ; return to shader part epilog
-  %masked = call ptr addrspace(1) @llvm.ptrmask.p1.i16(ptr addrspace(1) %ptr, i16 %mask)
-  ret ptr addrspace(1) %masked
-}
-
-define amdgpu_ps ptr addrspace(3) @s_ptrmask_local_variable_i64(ptr addrspace(3) inreg %ptr, i64 inreg %mask) {
-; GCN-LABEL: s_ptrmask_local_variable_i64:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_and_b32 s0, s2, s3
-; GCN-NEXT:    ; return to shader part epilog
-;
-; GFX10PLUS-LABEL: s_ptrmask_local_variable_i64:
-; GFX10PLUS:       ; %bb.0:
-; GFX10PLUS-NEXT:    s_and_b32 s0, s2, s3
-; GFX10PLUS-NEXT:    ; return to shader part epilog
-  %masked = call ptr addrspace(3) @llvm.ptrmask.p3.i64(ptr addrspace(3) %ptr, i64 %mask)
-  ret ptr addrspace(3) %masked
-}
-
 define amdgpu_ps ptr addrspace(3) @s_ptrmask_local_variable_i32(ptr addrspace(3) inreg %ptr, i32 inreg %mask) {
 ; GCN-LABEL: s_ptrmask_local_variable_i32:
 ; GCN:       ; %bb.0:
@@ -205,27 +65,10 @@ define amdgpu_ps ptr addrspace(3) @s_ptrmask_local_variable_i32(ptr addrspace(3)
   ret ptr addrspace(3) %masked
 }
 
-define amdgpu_ps ptr addrspace(3) @s_ptrmask_local_variable_i16(ptr addrspace(3) inreg %ptr, i16 inreg %mask) {
-; GCN-LABEL: s_ptrmask_local_variable_i16:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_and_b32 s0, 0xffff, s3
-; GCN-NEXT:    s_and_b32 s0, s2, s0
-; GCN-NEXT:    ; return to shader part epilog
-;
-; GFX10PLUS-LABEL: s_ptrmask_local_variable_i16:
-; GFX10PLUS:       ; %bb.0:
-; GFX10PLUS-NEXT:    s_and_b32 s0, 0xffff, s3
-; GFX10PLUS-NEXT:    s_and_b32 s0, s2, s0
-; GFX10PLUS-NEXT:    ; return to shader part epilog
-  %masked = call ptr addrspace(3) @llvm.ptrmask.p3.i16(ptr addrspace(3) %ptr, i16 %mask)
-  ret ptr addrspace(3) %masked
-}
-
-declare ptr addrspace(3) @llvm.ptrmask.p3.i64(ptr addrspace(3), i64) #0
 declare ptr addrspace(3) @llvm.ptrmask.p3.i32(ptr addrspace(3), i32) #0
-declare ptr addrspace(3) @llvm.ptrmask.p3.i16(ptr addrspace(3), i16) #0
 declare ptr addrspace(1) @llvm.ptrmask.p1.i64(ptr addrspace(1), i64) #0
-declare ptr addrspace(1) @llvm.ptrmask.p1.i32(ptr addrspace(1), i32) #0
-declare ptr addrspace(1) @llvm.ptrmask.p1.i16(ptr addrspace(1), i16) #0
 
 attributes #0 = { nounwind readnone speculatable willreturn }
+;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
+; GFX10: {{.*}}
+; GFX11: {{.*}}
diff --git a/llvm/test/CodeGen/X86/lower-ptrmask.ll b/llvm/test/CodeGen/X86/lower-ptrmask.ll
index 185564e5a07ae5c..406241ecfff0271 100644
--- a/llvm/test/CodeGen/X86/lower-ptrmask.ll
+++ b/llvm/test/CodeGen/X86/lower-ptrmask.ll
@@ -14,22 +14,6 @@ define ptr @test1(ptr %src) {
   ret ptr %ptr
 }
 
-declare ptr @llvm.ptrmask.p0.i32(ptr, i32)
-
-; CHECK-LABEL: name: test2
-; CHECK:         %0:gr64 = COPY $rdi
-; CHECK-NEXT:    %1:gr32 = COPY %0.sub_32bit
-; CHECK-NEXT:    %2:gr32 = AND32ri %1, 10000, implicit-def dead $eflags
-; CHECK-NEXT:    %3:gr64 = SUBREG_TO_REG 0, killed %2, %subreg.sub_32bit
-; CHECK-N...
[truncated]

Currently, the ptrmask intrinsic allows the mask to have any size and will zero-extend or truncate it to the pointer size. However, per the specified semantics, what we would actually need to do is to first zero-extend or truncate it to the pointer index size and then 1-extend it to the pointer size. This seems to leave a lot of room for error, so this patch proposes to make the intrinsic stricter: It now requires that the mask type matches the pointer index type -- a zext or trunc can be done explicitly in IR and should not be part of the intrinsic. Also spell out that the mask is 1-extended to the pointer size if we're talking about the integer representation (this is implied by the GEP expansion).

jayfoad · 2023-10-18T09:03:16Z

llvm/docs/LangRef.rst

+
+    %intptr = ptrtoint ptr %ptr to iPtrIdx ; this may truncate
+    %masked = and iPtrIdx %intptr, %mask
+    %diff = sub iPtrIdx %masked, %intptr


I don't think this expansion works if the result of the sub is positive, does it?

But %diff will always be negative right? %masked must be less than %intptr since it's the result of an and.

I mean the case where the subtraction overflows so the sign bit of %diff is 0.

Do you maybe have an example of what could go wrong here? I don't get it.

The sub here gets cancelled out by the add the GEP does, I don't think overflow is relevant.

Indeed, it works specifically because everything is using iPtrIdx arithmetic, and therefore (in two's complement) does not matter whether things are positive or negative, it's just modular arithmetic

I was thinking of an example like this, with 16-bit pointers and 8-bit indexes:

%ptr = ptr 0x6789 %mask = i8 0x00 // ptrmask(%ptr, %mask) expands to: %intptr = i8 0x89 %masked = i8 0x00 %diff = i8 0x77 %result = getelementptr i8, ptr 0x6789, i8 0x77 = ptr 0x6800 // not 0x6700!

Have I misunderstood how getelementptr works?

getelementptr only works on the index bits of the pointer. So the result would be 0x6700, not 0x6800.

OK but I don't see that documented in the semantics: https://llvm.org/docs/LangRef.html#id234

Can you please update the documentation of the semantics of getelementptr?

Done as part of #70015.

arichardson

Thanks this looks good to me.

@jrtc27 do you agree this makes sense for CHERI?

arichardson · 2023-10-18T14:40:27Z

llvm/test/Transforms/InferAlignment/ptrmask.ll

@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
 ; RUN: opt < %s -passes=infer-alignment -S | FileCheck %s

+target datalayout = "p1:64:64:64:32-p2:64:64:64:128"


I'm not sure a index size > pointer bitwidth makes much sense, but testing that this work sounds good to me.

I did wonder whether we should make index size > pointer size an invalid data layout...

I'd be happy with that change - I believe I had that as part of https://reviews.llvm.org/D135158 but probably best done in a separate PR.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

goldsteinn · 2023-10-18T16:25:11Z

LGTM from my end.

jrtc27 · 2023-10-18T17:26:21Z

This is also the behavior we want for CHERI architectures.

That's not how we think of it; we think of it as "the mask is extended to the index size and applies only to the index"; there are never any 128-bit integers involved, whether abstractly or not. Note that we do not support ptrtoint/inttoptr of capabilities in a preserving way; they give you only the address portion (/ give you a capability with all-zero metadata).

jrtc27 · 2023-10-18T17:27:11Z

llvm/docs/LangRef.rst

-``getelementptr ptr, (ptrtoint(ptr) & mask) - ptrtoint(ptr)``. Both the returned
-pointer(s) and the first argument are based on the same underlying object (for more
-information on the *based on* terminology see
+The result of ``ptrmask(%ptr, %mask)`` is equivalent to the following expansion,


Is this actually true (both before and after)? I would hope that ptrmask does not have the address-exposed semantics that ptrtoint does?

I've added a sentence to clarify that ptrmask only captures via the return value.

arichardson · 2023-10-19T18:35:05Z

llvm/docs/LangRef.rst

+if the pointer index type size is smaller than the pointer type size, this
+implies that the mask is extended with 1 bits to the pointer type size.


Suggested change

if the pointer index type size is smaller than the pointer type size, this

implies that the mask is extended with 1 bits to the pointer type size.

if the pointer index type size is smaller than the pointer type size, this

implies pointer bits beyond the index size are not affected by this intrinsic

(as if the mask was extended with 1 bits to the pointer type size for integral pointers).

Maybe something like this addresses @jrtc27's comment that non-index bits are not touched at all for CHERI?

Something like that does indeed sound more appropriate for pointers that aren't just plain integers (though technically our capabilities do still report as being integral, since non-integral pointers are overly-restrictive in their semantics (they imply that ptrtoint gives an unstable value, but ours are as stable as any normal integral pointer)).

llvm/docs/LangRef.rst

arichardson · 2023-10-20T18:17:29Z

llvm/test/Transforms/InferAlignment/ptrmask.ll

@@ -1,6 +1,8 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
 ; RUN: opt < %s -passes=infer-alignment -S | FileCheck %s

+target datalayout = "p1:64:64:64:32-p2:64:64:64:128"


I'd be happy with that change - I believe I had that as part of https://reviews.llvm.org/D135158 but probably best done in a separate PR.

nikic · 2023-10-23T08:03:17Z

@jrtc27 Just to double check, are you happy with the current LangRef wording?

jrtc27 · 2023-10-23T16:35:45Z

Yes, thanks

Since PR's llvm#69343 and llvm#67166 we probably have enough support for `llvm.ptrmask` to make it preferable to the GEP stategy.

Since PR's #69343 and #67166 we probably have enough support for `llvm.ptrmask` to make it preferable to the GEP stategy. Closes #71238

nikic mentioned this pull request Oct 17, 2023

[InstCombine] Add combines/simplifications for llvm.ptrmask #67166

Closed

nikic force-pushed the ptrmask-size branch from c076cdb to 0e7dd35 Compare October 18, 2023 08:22

nikic requested review from fhahn, goldsteinn and arichardson October 18, 2023 08:31

nikic marked this pull request as ready for review October 18, 2023 08:31

llvmbot added backend:AArch64 backend:AMDGPU backend:X86 llvm:globalisel llvm:SelectionDAG SelectionDAGISel as well llvm:ir llvm:analysis llvm:transforms labels Oct 18, 2023

nikic force-pushed the ptrmask-size branch from 0e7dd35 to 1539a5f Compare October 18, 2023 08:38

nikic force-pushed the ptrmask-size branch from 1539a5f to cae33d1 Compare October 18, 2023 08:51

jayfoad reviewed Oct 18, 2023

View reviewed changes

arichardson approved these changes Oct 18, 2023

View reviewed changes

jrtc27 reviewed Oct 18, 2023

View reviewed changes

Drop leftover sentence, clarify no capturing

446be93

arichardson reviewed Oct 19, 2023

View reviewed changes

Rephrase to be more applicable to non-integral pointers

1a61193

arichardson approved these changes Oct 20, 2023

View reviewed changes

nikic merged commit eb86de6 into llvm:main Oct 24, 2023
4 checks passed

goldsteinn mentioned this pull request Nov 3, 2023

[Clang][CodeGen] Emit llvm.ptrmask for align_up and align_down #71238

Closed

goldsteinn added a commit that referenced this pull request Nov 4, 2023

[Clang][CodeGen] Emit llvm.ptrmask for align_up and align_down

71be514

Since PR's #69343 and #67166 we probably have enough support for `llvm.ptrmask` to make it preferable to the GEP stategy. Closes #71238

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IR] Require that ptrmask mask matches pointer index size #69343

[IR] Require that ptrmask mask matches pointer index size #69343

nikic commented Oct 17, 2023 •

edited

github-actions bot commented Oct 17, 2023 •

edited

llvmbot commented Oct 18, 2023 •

edited

jayfoad Oct 18, 2023

arichardson Oct 18, 2023

jayfoad Oct 18, 2023

nikic Oct 18, 2023

jrtc27 Oct 18, 2023

jayfoad Oct 18, 2023

nikic Oct 18, 2023

jayfoad Oct 18, 2023

jayfoad Oct 24, 2023

nikic Oct 24, 2023

arichardson left a comment

arichardson Oct 18, 2023

nikic Oct 19, 2023

arichardson Oct 20, 2023

goldsteinn commented Oct 18, 2023

jrtc27 commented Oct 18, 2023

jrtc27 Oct 18, 2023

nikic Oct 19, 2023 •

edited

arichardson Oct 19, 2023

jrtc27 Oct 19, 2023

arichardson Oct 20, 2023

nikic commented Oct 23, 2023

jrtc27 commented Oct 23, 2023

		if the pointer index type size is smaller than the pointer type size, this
		implies that the mask is extended with 1 bits to the pointer type size.

[IR] Require that ptrmask mask matches pointer index size #69343

[IR] Require that ptrmask mask matches pointer index size #69343

Conversation

nikic commented Oct 17, 2023 • edited

github-actions bot commented Oct 17, 2023 • edited

llvmbot commented Oct 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arichardson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldsteinn commented Oct 18, 2023

jrtc27 commented Oct 18, 2023

Choose a reason for hiding this comment

nikic Oct 19, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikic commented Oct 23, 2023

jrtc27 commented Oct 23, 2023

nikic commented Oct 17, 2023 •

edited

github-actions bot commented Oct 17, 2023 •

edited

llvmbot commented Oct 18, 2023 •

edited

nikic Oct 19, 2023 •

edited