[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp #168714

MacDue · 2025-11-19T14:37:42Z

This allows DemandedBits to see that the SVE CNTP intrinsic will only ever produce small positive integers. The maximum value you could get here is 256, which is CNTP on a nxv16i1 on a machine with a 2048bit vector size (the maximum for SVE).

Using this various redundant operations (zexts, sexts, ands, ors, etc) can be eliminated.

This allows DemandedBits to see that the SVE CNTP intrinsic will only ever produce small positive integers. The maximum value you could get here is 256, which is CNTP on a nxv16i1 on a machine with a 2048bit vector size (the maximum for SVE). Using this various redundant operations (zexts, sexts, ands, ors, etc) can be eliminated.

llvmbot · 2025-11-19T14:38:16Z

@llvm/pr-subscribers-backend-aarch64

Author: Benjamin Maxwell (MacDue)

Changes

This allows DemandedBits to see that the SVE CNTP intrinsic will only ever produce small positive integers. The maximum value you could get here is 256, which is CNTP on a nxv16i1 on a machine with a 2048bit vector size (the maximum for SVE).

Using this various redundant operations (zexts, sexts, ands, ors, etc) can be eliminated.

Full diff: https://github.com/llvm/llvm-project/pull/168714.diff

3 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+27-1)
(modified) llvm/test/CodeGen/AArch64/sve-vector-compress.ll (+5-6)
(modified) llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll (+56)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 8f41f230b5521..809c2af499958 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -19459,6 +19459,32 @@ static std::optional<unsigned> IsSVECntIntrinsic(SDValue S) {
   return {};
 }
 
+// Returns the element size associated with an SVE cnt[bhwdp] intrinsic. For
+// cntp (predicate), the element size corresponds to the legal (packed) SVE
+// vector type associated with the predicate. E.g. nxv4i1 returns 32.
+static std::optional<unsigned> GetSVECntElementSize(SDValue Op) {
+  if (auto ElementSize = IsSVECntIntrinsic(Op))
+    return ElementSize;
+  Intrinsic::ID IID = getIntrinsicID(Op.getNode());
+  if (IID != Intrinsic::aarch64_sve_cntp)
+    return {};
+  EVT PredVT = Op.getOperand(Op.getNumOperands() - 1).getValueType();
+  switch (PredVT.getSimpleVT().SimpleTy) {
+  case MVT::nxv1i1:
+    return 128;
+  case MVT::nxv2i1:
+    return 64;
+  case MVT::nxv4i1:
+    return 32;
+  case MVT::nxv8i1:
+    return 16;
+  case MVT::nxv16i1:
+    return 8;
+  default:
+    llvm_unreachable("unexpected predicate type");
+  }
+}
+
 /// Calculates what the pre-extend type is, based on the extension
 /// operation node provided by \p Extend.
 ///
@@ -31666,7 +31692,7 @@ bool AArch64TargetLowering::SimplifyDemandedBitsForTargetNode(
     return false;
   }
   case ISD::INTRINSIC_WO_CHAIN: {
-    if (auto ElementSize = IsSVECntIntrinsic(Op)) {
+    if (auto ElementSize = GetSVECntElementSize(Op)) {
       unsigned MaxSVEVectorSizeInBits = Subtarget->getMaxSVEVectorSizeInBits();
       if (!MaxSVEVectorSizeInBits)
         MaxSVEVectorSizeInBits = AArch64::SVEMaxBitsPerVector;
diff --git a/llvm/test/CodeGen/AArch64/sve-vector-compress.ll b/llvm/test/CodeGen/AArch64/sve-vector-compress.ll
index cc3a3734a9721..f700dee0fb2e4 100644
--- a/llvm/test/CodeGen/AArch64/sve-vector-compress.ll
+++ b/llvm/test/CodeGen/AArch64/sve-vector-compress.ll
@@ -143,20 +143,19 @@ define <vscale x 8 x i32> @test_compress_large(<vscale x 8 x i32> %vec, <vscale
 ; CHECK-NEXT:    addvl sp, sp, #-2
 ; CHECK-NEXT:    .cfi_escape 0x0f, 0x08, 0x8f, 0x10, 0x92, 0x2e, 0x00, 0x40, 0x1e, 0x22 // sp + 16 + 16 * VG
 ; CHECK-NEXT:    .cfi_offset w29, -16
-; CHECK-NEXT:    punpklo p2.h, p0.b
+; CHECK-NEXT:    punpklo p1.h, p0.b
 ; CHECK-NEXT:    cnth x9
-; CHECK-NEXT:    ptrue p1.s
+; CHECK-NEXT:    ptrue p2.s
 ; CHECK-NEXT:    sub x9, x9, #1
 ; CHECK-NEXT:    punpkhi p0.h, p0.b
-; CHECK-NEXT:    compact z0.s, p2, z0.s
-; CHECK-NEXT:    cntp x8, p1, p2.s
+; CHECK-NEXT:    compact z0.s, p1, z0.s
+; CHECK-NEXT:    cntp x8, p2, p1.s
 ; CHECK-NEXT:    compact z1.s, p0, z1.s
 ; CHECK-NEXT:    str z0, [sp]
-; CHECK-NEXT:    mov w8, w8
 ; CHECK-NEXT:    cmp x8, x9
 ; CHECK-NEXT:    csel x8, x8, x9, lo
 ; CHECK-NEXT:    mov x9, sp
-; CHECK-NEXT:    st1w { z1.s }, p1, [x9, x8, lsl #2]
+; CHECK-NEXT:    st1w { z1.s }, p2, [x9, x8, lsl #2]
 ; CHECK-NEXT:    ldr z0, [sp]
 ; CHECK-NEXT:    ldr z1, [sp, #1, mul vl]
 ; CHECK-NEXT:    addvl sp, sp, #2
diff --git a/llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll b/llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll
index 9572778484f8d..568abe718ad9b 100644
--- a/llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll
+++ b/llvm/test/CodeGen/AArch64/vscale-and-sve-cnt-demandedbits.ll
@@ -80,6 +80,62 @@ define i64 @cntd_and_elimination() {
   ret i64 %result
 }
 
+define i64 @cntp_nxv16i1_and_elimination(<vscale x 16 x i1> %p) {
+; CHECK-LABEL: cntp_nxv16i1_and_elimination:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cntp x8, p0, p0.b
+; CHECK-NEXT:    and x9, x8, #0x1fc
+; CHECK-NEXT:    add x0, x8, x9
+; CHECK-NEXT:    ret
+  %cntp = tail call i64 @llvm.aarch64.sve.cntp.nxv16i1(<vscale x 16 x i1> %p, <vscale x 16 x i1> %p)
+  %and_redundant = and i64 %cntp, 511
+  %and_required = and i64 %cntp, 17179869180
+  %result = add i64 %and_redundant, %and_required
+  ret i64 %result
+}
+
+define i64 @cntp_nxv8i1_and_elimination(<vscale x 8 x i1> %p) {
+; CHECK-LABEL: cntp_nxv8i1_and_elimination:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cntp x8, p0, p0.h
+; CHECK-NEXT:    and x9, x8, #0xfc
+; CHECK-NEXT:    add x0, x8, x9
+; CHECK-NEXT:    ret
+  %cntp = tail call i64 @llvm.aarch64.sve.cntp.nxv8i1(<vscale x 8 x i1> %p, <vscale x 8 x i1> %p)
+  %and_redundant = and i64 %cntp, 1023
+  %and_required = and i64 %cntp, 17179869180
+  %result = add i64 %and_redundant, %and_required
+  ret i64 %result
+}
+
+define i64 @cntp_nxv4i1_and_elimination(<vscale x 4 x i1> %p) {
+; CHECK-LABEL: cntp_nxv4i1_and_elimination:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cntp x8, p0, p0.s
+; CHECK-NEXT:    and x9, x8, #0x7c
+; CHECK-NEXT:    add x0, x8, x9
+; CHECK-NEXT:    ret
+  %cntp = tail call i64 @llvm.aarch64.sve.cntp.nxv4i1(<vscale x 4 x i1> %p, <vscale x 4 x i1> %p)
+  %and_redundant = and i64 %cntp, 127
+  %and_required = and i64 %cntp, 17179869180
+  %result = add i64 %and_redundant, %and_required
+  ret i64 %result
+}
+
+define i64 @cntp_nxv2i1_and_elimination(<vscale x 2 x i1> %p) {
+; CHECK-LABEL: cntp_nxv2i1_and_elimination:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    cntp x8, p0, p0.d
+; CHECK-NEXT:    and x9, x8, #0x3c
+; CHECK-NEXT:    add x0, x8, x9
+; CHECK-NEXT:    ret
+  %cntp = tail call i64 @llvm.aarch64.sve.cntp.nxv2i1(<vscale x 2 x i1> %p, <vscale x 2 x i1> %p)
+  %and_redundant = and i64 %cntp, 63
+  %and_required = and i64 %cntp, 17179869180
+  %result = add i64 %and_redundant, %and_required
+  ret i64 %result
+}
+
 define i64 @vscale_trunc_zext() vscale_range(1,16) {
 ; CHECK-LABEL: vscale_trunc_zext:
 ; CHECK:       // %bb.0:

paulwalker-arm

Perhaps the original code is backward in asking for the element size?

I wonder if it should be more literal and ask for the maximum result as an element count, or maximum unscaled result as an unsigned (not really sure which is better). Then have the caller multiply the result by the maximum value of vscale (i.e. (MaxSVEVectorSizeInBits or SVEMaxBitsPerVector) / 128)).

What do you think?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

github-actions · 2025-11-19T15:22:56Z

🐧 Linux x64 Test Results

186403 tests passed
4864 tests skipped

MacDue · 2025-11-19T16:27:57Z

Perhaps the original code is backward in asking for the element size?

I wonder if it should be more literal and ask for the maximum result as an element count, or maximum unscaled result as an unsigned (not really sure which is better). Then have the caller multiply the result by the maximum value of vscale (i.e. (MaxSVEVectorSizeInBits or SVEMaxBitsPerVector) / 128)).

What do you think?

I think that's a little more intuitive 👍

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

github-actions · 2025-11-19T22:36:22Z

✅ With the latest revision this PR passed the C/C++ code formatter.

MacDue · 2025-11-20T08:07:24Z

Note: I slightly tweaked the patch to not add cntp to IsSVECntIntrinsic, as it has another use and cntp does not really make sense there (as it's not like the other sve_cnt* intrinsics).

paulwalker-arm

The new changes look good to me.

MacDue added 2 commits November 19, 2025 14:29

Precommit tests

5e42dbb

MacDue requested review from davemgreen, huntergr-arm and paulwalker-arm November 19, 2025 14:37

llvmbot added the backend:AArch64 label Nov 19, 2025

paulwalker-arm reviewed Nov 19, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

Rework changes

62c06b3

paulwalker-arm reviewed Nov 19, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

Make less silly

d95927c

MDevereau reviewed Nov 19, 2025

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved

paulwalker-arm approved these changes Nov 19, 2025

View reviewed changes

MacDue added 3 commits November 19, 2025 19:55

Merge branch 'main' into demanded_bits_cntp

9113903

Fix typo in llvm_unreachable message

db44ff8

Refactor SVE count intrinsic handling and return types

2450673

MacDue added 2 commits November 19, 2025 22:38

Fix comment formatting in AArch64ISelLowering.cpp

8fd21d2

Fix typo

6397699

paulwalker-arm approved these changes Nov 20, 2025

View reviewed changes

MacDue merged commit 02db2de into llvm:main Nov 20, 2025
10 checks passed

MacDue deleted the demanded_bits_cntp branch November 20, 2025 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp #168714

[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp #168714

MacDue commented Nov 19, 2025

Uh oh!

llvmbot commented Nov 19, 2025

Uh oh!

paulwalker-arm left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

MacDue commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

MacDue commented Nov 20, 2025

Uh oh!

paulwalker-arm left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp #168714

[AArch64][SVE] Implement demanded bits for @llvm.aarch64.sve.cntp #168714

Conversation

MacDue commented Nov 19, 2025

Uh oh!

llvmbot commented Nov 19, 2025

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

MacDue commented Nov 19, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MacDue commented Nov 20, 2025

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Nov 19, 2025 •

edited

Loading

github-actions bot commented Nov 19, 2025 •

edited

Loading