Skip to content

Conversation

moorabbit
Copy link
Contributor

The following avx512 intrinsics were removed:

  • __builtin_ia32_vplzcntd_128
  • __builtin_ia32_vplzcntd_256
  • __builtin_ia32_vplzcntd_512
  • __builtin_ia32_vplzcntq_128
  • __builtin_ia32_vplzcntq_256
  • __builtin_ia32_vplzcntq_512

Users of the removed intrinsics (e.g. _mm512_lzcnt_epi64, _mm_lzcnt_epi32) are now relying on __builtin_elementwise_ctlz and are marked as constexpr.

Fixes: #154279

…nsics.

The following avx512 intrinsics were removed:
- `__builtin_ia32_vplzcntd_128`
- `__builtin_ia32_vplzcntd_256`
- `__builtin_ia32_vplzcntd_512`
- `__builtin_ia32_vplzcntq_128`
- `__builtin_ia32_vplzcntq_256`
- `__builtin_ia32_vplzcntq_512`

Users of the removed intrinsics (e.g. `_mm512_lzcnt_epi64`, `_mm_lzcnt_epi32`) are now relying on
`__builtin_elementwise_ctlz` and are marked as `constexpr`.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang:codegen IR generation bugs: mangling, exceptions, etc. labels Aug 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 23, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-x86

Author: None (moorabbit)

Changes

The following avx512 intrinsics were removed:

  • __builtin_ia32_vplzcntd_128
  • __builtin_ia32_vplzcntd_256
  • __builtin_ia32_vplzcntd_512
  • __builtin_ia32_vplzcntq_128
  • __builtin_ia32_vplzcntq_256
  • __builtin_ia32_vplzcntq_512

Users of the removed intrinsics (e.g. _mm512_lzcnt_epi64, _mm_lzcnt_epi32) are now relying on __builtin_elementwise_ctlz and are marked as constexpr.

Fixes: #154279


Full diff: https://github.com/llvm/llvm-project/pull/155089.diff

6 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsX86.td (-18)
  • (modified) clang/lib/CodeGen/TargetBuiltins/X86.cpp (-9)
  • (modified) clang/lib/Headers/avx512cdintrin.h (+12-8)
  • (modified) clang/lib/Headers/avx512vlcdintrin.h (+20-16)
  • (modified) clang/test/CodeGen/X86/avx512cd-builtins.c (+13-6)
  • (modified) clang/test/CodeGen/X86/avx512vlcd-builtins.c (+21-12)
diff --git a/clang/include/clang/Basic/BuiltinsX86.td b/clang/include/clang/Basic/BuiltinsX86.td
index 527acd9ef086e..0bc93d75c9724 100644
--- a/clang/include/clang/Basic/BuiltinsX86.td
+++ b/clang/include/clang/Basic/BuiltinsX86.td
@@ -1383,8 +1383,6 @@ let Features = "avx512cd,avx512vl", Attributes = [NoThrow, Const, RequiredVector
 let Features = "avx512cd,evex512", Attributes = [NoThrow, Const, RequiredVectorWidth<512>] in {
   def vpconflictdi_512 : X86Builtin<"_Vector<8, long long int>(_Vector<8, long long int>)">;
   def vpconflictsi_512 : X86Builtin<"_Vector<16, int>(_Vector<16, int>)">;
-  def vplzcntd_512 : X86Builtin<"_Vector<16, int>(_Vector<16, int>)">;
-  def vplzcntq_512 : X86Builtin<"_Vector<8, long long int>(_Vector<8, long long int>)">;
 }
 
 let Features = "avx512vl,avx512bitalg", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
@@ -2438,22 +2436,6 @@ let Features = "avx512vl", Attributes = [NoThrow, Const, RequiredVectorWidth<256
   def rcp14ps256_mask : X86Builtin<"_Vector<8, float>(_Vector<8, float>, _Vector<8, float>, unsigned char)">;
 }
 
-let Features = "avx512cd,avx512vl", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
-  def vplzcntd_128 : X86Builtin<"_Vector<4, int>(_Vector<4, int>)">;
-}
-
-let Features = "avx512cd,avx512vl", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in {
-  def vplzcntd_256 : X86Builtin<"_Vector<8, int>(_Vector<8, int>)">;
-}
-
-let Features = "avx512cd,avx512vl", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
-  def vplzcntq_128 : X86Builtin<"_Vector<2, long long int>(_Vector<2, long long int>)">;
-}
-
-let Features = "avx512cd,avx512vl", Attributes = [NoThrow, Const, RequiredVectorWidth<256>] in {
-  def vplzcntq_256 : X86Builtin<"_Vector<4, long long int>(_Vector<4, long long int>)">;
-}
-
 let Features = "avx512f", Attributes = [NoThrow, Const, RequiredVectorWidth<128>] in {
   def vcvtsd2si32 : X86Builtin<"int(_Vector<2, double>, _Constant int)">;
   def vcvtsd2usi32 : X86Builtin<"unsigned int(_Vector<2, double>, _Constant int)">;
diff --git a/clang/lib/CodeGen/TargetBuiltins/X86.cpp b/clang/lib/CodeGen/TargetBuiltins/X86.cpp
index b9248a7d43f85..a4974e45caf10 100644
--- a/clang/lib/CodeGen/TargetBuiltins/X86.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/X86.cpp
@@ -2183,15 +2183,6 @@ Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateBitCast(Res, Ops[0]->getType());
   }
 
-  case X86::BI__builtin_ia32_vplzcntd_128:
-  case X86::BI__builtin_ia32_vplzcntd_256:
-  case X86::BI__builtin_ia32_vplzcntd_512:
-  case X86::BI__builtin_ia32_vplzcntq_128:
-  case X86::BI__builtin_ia32_vplzcntq_256:
-  case X86::BI__builtin_ia32_vplzcntq_512: {
-    Function *F = CGM.getIntrinsic(Intrinsic::ctlz, Ops[0]->getType());
-    return Builder.CreateCall(F, {Ops[0],Builder.getInt1(false)});
-  }
   case X86::BI__builtin_ia32_sqrtss:
   case X86::BI__builtin_ia32_sqrtsd: {
     Value *A = Builder.CreateExtractElement(Ops[0], (uint64_t)0);
diff --git a/clang/lib/Headers/avx512cdintrin.h b/clang/lib/Headers/avx512cdintrin.h
index 33b552f6fe6ad..b39467c138054 100644
--- a/clang/lib/Headers/avx512cdintrin.h
+++ b/clang/lib/Headers/avx512cdintrin.h
@@ -19,6 +19,12 @@
   __attribute__((__always_inline__, __nodebug__,                               \
                  __target__("avx512cd,evex512"), __min_vector_width__(512)))
 
+#if defined(__cplusplus) && (__cplusplus >= 201103L)
+#define __DEFAULT_FN_ATTRS_CONSTEXPR __DEFAULT_FN_ATTRS constexpr
+#else
+#define __DEFAULT_FN_ATTRS_CONSTEXPR __DEFAULT_FN_ATTRS
+#endif
+
 static __inline__ __m512i __DEFAULT_FN_ATTRS
 _mm512_conflict_epi64 (__m512i __A)
 {
@@ -63,10 +69,9 @@ _mm512_maskz_conflict_epi32 (__mmask16 __U, __m512i __A)
                                             (__v16si)_mm512_setzero_si512());
 }
 
-static __inline__ __m512i __DEFAULT_FN_ATTRS
-_mm512_lzcnt_epi32 (__m512i __A)
-{
-  return (__m512i) __builtin_ia32_vplzcntd_512 ((__v16si) __A);
+static __inline__ __m512i __DEFAULT_FN_ATTRS_CONSTEXPR
+_mm512_lzcnt_epi32(__m512i __A) {
+  return (__m512i)__builtin_elementwise_ctlz((__v16si)__A);
 }
 
 static __inline__ __m512i __DEFAULT_FN_ATTRS
@@ -85,10 +90,9 @@ _mm512_maskz_lzcnt_epi32 (__mmask16 __U, __m512i __A)
                                              (__v16si)_mm512_setzero_si512());
 }
 
-static __inline__ __m512i __DEFAULT_FN_ATTRS
-_mm512_lzcnt_epi64 (__m512i __A)
-{
-  return (__m512i) __builtin_ia32_vplzcntq_512 ((__v8di) __A);
+static __inline__ __m512i __DEFAULT_FN_ATTRS_CONSTEXPR
+_mm512_lzcnt_epi64(__m512i __A) {
+  return (__m512i)__builtin_elementwise_ctlz((__v8di)__A);
 }
 
 static __inline__ __m512i __DEFAULT_FN_ATTRS
diff --git a/clang/lib/Headers/avx512vlcdintrin.h b/clang/lib/Headers/avx512vlcdintrin.h
index 923e2c551a97a..f980ed2da0e80 100644
--- a/clang/lib/Headers/avx512vlcdintrin.h
+++ b/clang/lib/Headers/avx512vlcdintrin.h
@@ -23,6 +23,14 @@
                  __target__("avx512vl,avx512cd,no-evex512"),                   \
                  __min_vector_width__(256)))
 
+#if defined(__cplusplus) && (__cplusplus >= 201103L)
+#define __DEFAULT_FN_ATTRS256_CONSTEXPR __DEFAULT_FN_ATTRS256 constexpr
+#define __DEFAULT_FN_ATTRS128_CONSTEXPR __DEFAULT_FN_ATTRS128 constexpr
+#else
+#define __DEFAULT_FN_ATTRS256_CONSTEXPR __DEFAULT_FN_ATTRS256
+#define __DEFAULT_FN_ATTRS128_CONSTEXPR __DEFAULT_FN_ATTRS128
+#endif
+
 static __inline__ __m128i __DEFAULT_FN_ATTRS128
 _mm_broadcastmb_epi64 (__mmask8 __A)
 {
@@ -136,10 +144,9 @@ _mm256_maskz_conflict_epi32 (__mmask8 __U, __m256i __A)
                                              (__v8si)_mm256_setzero_si256());
 }
 
-static __inline__ __m128i __DEFAULT_FN_ATTRS128
-_mm_lzcnt_epi32 (__m128i __A)
-{
-  return (__m128i) __builtin_ia32_vplzcntd_128 ((__v4si) __A);
+static __inline__ __m128i __DEFAULT_FN_ATTRS128_CONSTEXPR
+_mm_lzcnt_epi32(__m128i __A) {
+  return (__m128i)__builtin_elementwise_ctlz((__v4si)__A);
 }
 
 static __inline__ __m128i __DEFAULT_FN_ATTRS128
@@ -158,10 +165,9 @@ _mm_maskz_lzcnt_epi32 (__mmask8 __U, __m128i __A)
                                              (__v4si)_mm_setzero_si128());
 }
 
-static __inline__ __m256i __DEFAULT_FN_ATTRS256
-_mm256_lzcnt_epi32 (__m256i __A)
-{
-  return (__m256i) __builtin_ia32_vplzcntd_256 ((__v8si) __A);
+static __inline__ __m256i __DEFAULT_FN_ATTRS256_CONSTEXPR
+_mm256_lzcnt_epi32(__m256i __A) {
+  return (__m256i)__builtin_elementwise_ctlz((__v8si)__A);
 }
 
 static __inline__ __m256i __DEFAULT_FN_ATTRS256
@@ -180,10 +186,9 @@ _mm256_maskz_lzcnt_epi32 (__mmask8 __U, __m256i __A)
                                              (__v8si)_mm256_setzero_si256());
 }
 
-static __inline__ __m128i __DEFAULT_FN_ATTRS128
-_mm_lzcnt_epi64 (__m128i __A)
-{
-  return (__m128i) __builtin_ia32_vplzcntq_128 ((__v2di) __A);
+static __inline__ __m128i __DEFAULT_FN_ATTRS128_CONSTEXPR
+_mm_lzcnt_epi64(__m128i __A) {
+  return (__m128i)__builtin_elementwise_ctlz((__v2di)__A);
 }
 
 static __inline__ __m128i __DEFAULT_FN_ATTRS128
@@ -202,10 +207,9 @@ _mm_maskz_lzcnt_epi64 (__mmask8 __U, __m128i __A)
                                              (__v2di)_mm_setzero_si128());
 }
 
-static __inline__ __m256i __DEFAULT_FN_ATTRS256
-_mm256_lzcnt_epi64 (__m256i __A)
-{
-  return (__m256i) __builtin_ia32_vplzcntq_256 ((__v4di) __A);
+static __inline__ __m256i __DEFAULT_FN_ATTRS256_CONSTEXPR
+_mm256_lzcnt_epi64(__m256i __A) {
+  return (__m256i)__builtin_elementwise_ctlz((__v4di)__A);
 }
 
 static __inline__ __m256i __DEFAULT_FN_ATTRS256
diff --git a/clang/test/CodeGen/X86/avx512cd-builtins.c b/clang/test/CodeGen/X86/avx512cd-builtins.c
index 45df0fccb322f..03724844dabdd 100644
--- a/clang/test/CodeGen/X86/avx512cd-builtins.c
+++ b/clang/test/CodeGen/X86/avx512cd-builtins.c
@@ -5,6 +5,7 @@
 
 
 #include <immintrin.h>
+#include "builtin_test_helpers.h"
 
 __m512i test_mm512_conflict_epi64(__m512i __A) {
   // CHECK-LABEL: test_mm512_conflict_epi64
@@ -42,35 +43,41 @@ __m512i test_mm512_maskz_conflict_epi32(__mmask16 __U, __m512i __A) {
 }
 __m512i test_mm512_lzcnt_epi32(__m512i __A) {
   // CHECK-LABEL: test_mm512_lzcnt_epi32
-  // CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 false)
+  // CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 true)
   return _mm512_lzcnt_epi32(__A); 
 }
+
+TEST_CONSTEXPR(match_v16si(_mm512_lzcnt_epi32((__m512i)(__v16si){1, 2, 4, 8, 16, 32, 64, 128, 3, 5, 6, 7, 9, 10, 11, 12}), 31, 30, 29, 28, 27, 26, 25, 24, 30, 29, 29, 29, 28, 28, 28, 28));
+
 __m512i test_mm512_mask_lzcnt_epi32(__m512i __W, __mmask16 __U, __m512i __A) {
   // CHECK-LABEL: test_mm512_mask_lzcnt_epi32
-  // CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 false)
+  // CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 true)
   // CHECK: select <16 x i1> %{{.*}}, <16 x i32> %{{.*}}, <16 x i32> %{{.*}}
   return _mm512_mask_lzcnt_epi32(__W,__U,__A); 
 }
 __m512i test_mm512_maskz_lzcnt_epi32(__mmask16 __U, __m512i __A) {
   // CHECK-LABEL: test_mm512_maskz_lzcnt_epi32
-  // CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 false)
+  // CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 true)
   // CHECK: select <16 x i1> %{{.*}}, <16 x i32> %{{.*}}, <16 x i32> %{{.*}}
   return _mm512_maskz_lzcnt_epi32(__U,__A); 
 }
 __m512i test_mm512_lzcnt_epi64(__m512i __A) {
   // CHECK-LABEL: test_mm512_lzcnt_epi64
-  // CHECK: call {{.*}}<8 x i64> @llvm.ctlz.v8i64(<8 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<8 x i64> @llvm.ctlz.v8i64(<8 x i64> %{{.*}}, i1 true)
   return _mm512_lzcnt_epi64(__A); 
 }
+
+TEST_CONSTEXPR(match_v8di(_mm512_lzcnt_epi64((__m512i)(__v8di){1, 2, 4, 8, 16, 32, 64, 128}), 63, 62, 61, 60, 59, 58, 57, 56));
+
 __m512i test_mm512_mask_lzcnt_epi64(__m512i __W, __mmask8 __U, __m512i __A) {
   // CHECK-LABEL: test_mm512_mask_lzcnt_epi64
-  // CHECK: call {{.*}}<8 x i64> @llvm.ctlz.v8i64(<8 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<8 x i64> @llvm.ctlz.v8i64(<8 x i64> %{{.*}}, i1 true)
   // CHECK: select <8 x i1> %{{.*}}, <8 x i64> %{{.*}}, <8 x i64> %{{.*}}
   return _mm512_mask_lzcnt_epi64(__W,__U,__A); 
 }
 __m512i test_mm512_maskz_lzcnt_epi64(__mmask8 __U, __m512i __A) {
   // CHECK-LABEL: test_mm512_maskz_lzcnt_epi64
-  // CHECK: call {{.*}}<8 x i64> @llvm.ctlz.v8i64(<8 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<8 x i64> @llvm.ctlz.v8i64(<8 x i64> %{{.*}}, i1 true)
   // CHECK: select <8 x i1> %{{.*}}, <8 x i64> %{{.*}}, <8 x i64> %{{.*}}
   return _mm512_maskz_lzcnt_epi64(__U,__A); 
 }
diff --git a/clang/test/CodeGen/X86/avx512vlcd-builtins.c b/clang/test/CodeGen/X86/avx512vlcd-builtins.c
index 939fd6460e6a6..6eff5a6c5e6a8 100644
--- a/clang/test/CodeGen/X86/avx512vlcd-builtins.c
+++ b/clang/test/CodeGen/X86/avx512vlcd-builtins.c
@@ -5,6 +5,7 @@
 
 
 #include <immintrin.h>
+#include "builtin_test_helpers.h"
 
 __m128i test_mm_broadcastmb_epi64(__m128i a,__m128i b) {
   // CHECK-LABEL: test_mm_broadcastmb_epi64
@@ -136,80 +137,88 @@ __m256i test_mm256_maskz_conflict_epi32(__mmask8 __U, __m256i __A) {
 
 __m128i test_mm_lzcnt_epi32(__m128i __A) {
   // CHECK-LABEL: test_mm_lzcnt_epi32
-  // CHECK: call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %{{.*}}, i1 false)
+  // CHECK: call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %{{.*}}, i1 true)
   return _mm_lzcnt_epi32(__A); 
 }
 
+TEST_CONSTEXPR(match_v4si(_mm_lzcnt_epi32((__m128i)(__v4si){8, 16, 32, 64}), 28, 27, 26, 25));
+
 __m128i test_mm_mask_lzcnt_epi32(__m128i __W, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: test_mm_mask_lzcnt_epi32
-  // CHECK: call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %{{.*}}, i1 false)
+  // CHECK: call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %{{.*}}, i1 true)
   // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}}
   return _mm_mask_lzcnt_epi32(__W, __U, __A); 
 }
 
 __m128i test_mm_maskz_lzcnt_epi32(__mmask8 __U, __m128i __A) {
   // CHECK-LABEL: test_mm_maskz_lzcnt_epi32
-  // CHECK: call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %{{.*}}, i1 false)
+  // CHECK: call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> %{{.*}}, i1 true)
   // CHECK: select <4 x i1> %{{.*}}, <4 x i32> %{{.*}}, <4 x i32> %{{.*}}
   return _mm_maskz_lzcnt_epi32(__U, __A); 
 }
 
 __m256i test_mm256_lzcnt_epi32(__m256i __A) {
   // CHECK-LABEL: test_mm256_lzcnt_epi32
-  // CHECK: call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> %{{.*}}, i1 false)
+  // CHECK: call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> %{{.*}}, i1 true)
   return _mm256_lzcnt_epi32(__A); 
 }
 
+TEST_CONSTEXPR(match_v8si(_mm256_lzcnt_epi32((__m256i)(__v8si){1, 2, 4, 8, 16, 32, 64, 128}), 31, 30, 29, 28, 27, 26, 25, 24));
+
 __m256i test_mm256_mask_lzcnt_epi32(__m256i __W, __mmask8 __U, __m256i __A) {
   // CHECK-LABEL: test_mm256_mask_lzcnt_epi32
-  // CHECK: call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> %{{.*}}, i1 false)
+  // CHECK: call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> %{{.*}}, i1 true)
   // CHECK: select <8 x i1> %{{.*}}, <8 x i32> %{{.*}}, <8 x i32> %{{.*}}
   return _mm256_mask_lzcnt_epi32(__W, __U, __A); 
 }
 
 __m256i test_mm256_maskz_lzcnt_epi32(__mmask8 __U, __m256i __A) {
   // CHECK-LABEL: test_mm256_maskz_lzcnt_epi32
-  // CHECK: call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> %{{.*}}, i1 false)
+  // CHECK: call <8 x i32> @llvm.ctlz.v8i32(<8 x i32> %{{.*}}, i1 true)
   // CHECK: select <8 x i1> %{{.*}}, <8 x i32> %{{.*}}, <8 x i32> %{{.*}}
   return _mm256_maskz_lzcnt_epi32(__U, __A); 
 }
 
 __m128i test_mm_lzcnt_epi64(__m128i __A) {
   // CHECK-LABEL: test_mm_lzcnt_epi64
-  // CHECK: call {{.*}}<2 x i64> @llvm.ctlz.v2i64(<2 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<2 x i64> @llvm.ctlz.v2i64(<2 x i64> %{{.*}}, i1 true)
   return _mm_lzcnt_epi64(__A); 
 }
 
+TEST_CONSTEXPR(match_v2di(_mm_lzcnt_epi64((__m128i)(__v2di){1, 2}), 63, 62));
+
 __m128i test_mm_mask_lzcnt_epi64(__m128i __W, __mmask8 __U, __m128i __A) {
   // CHECK-LABEL: test_mm_mask_lzcnt_epi64
-  // CHECK: call {{.*}}<2 x i64> @llvm.ctlz.v2i64(<2 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<2 x i64> @llvm.ctlz.v2i64(<2 x i64> %{{.*}}, i1 true)
   // CHECK: select <2 x i1> %{{.*}}, <2 x i64> %{{.*}}, <2 x i64> %{{.*}}
   return _mm_mask_lzcnt_epi64(__W, __U, __A); 
 }
 
 __m128i test_mm_maskz_lzcnt_epi64(__mmask8 __U, __m128i __A) {
   // CHECK-LABEL: test_mm_maskz_lzcnt_epi64
-  // CHECK: call {{.*}}<2 x i64> @llvm.ctlz.v2i64(<2 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<2 x i64> @llvm.ctlz.v2i64(<2 x i64> %{{.*}}, i1 true)
   // CHECK: select <2 x i1> %{{.*}}, <2 x i64> %{{.*}}, <2 x i64> %{{.*}}
   return _mm_maskz_lzcnt_epi64(__U, __A); 
 }
 
 __m256i test_mm256_lzcnt_epi64(__m256i __A) {
   // CHECK-LABEL: test_mm256_lzcnt_epi64
-  // CHECK: call {{.*}}<4 x i64> @llvm.ctlz.v4i64(<4 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<4 x i64> @llvm.ctlz.v4i64(<4 x i64> %{{.*}}, i1 true)
   return _mm256_lzcnt_epi64(__A); 
 }
 
+TEST_CONSTEXPR(match_v4di(_mm256_lzcnt_epi64((__m256i)(__v4di){1, 2, 4, 8}), 63, 62, 61, 60));
+
 __m256i test_mm256_mask_lzcnt_epi64(__m256i __W, __mmask8 __U, __m256i __A) {
   // CHECK-LABEL: test_mm256_mask_lzcnt_epi64
-  // CHECK: call {{.*}}<4 x i64> @llvm.ctlz.v4i64(<4 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<4 x i64> @llvm.ctlz.v4i64(<4 x i64> %{{.*}}, i1 true)
   // CHECK: select <4 x i1> %{{.*}}, <4 x i64> %{{.*}}, <4 x i64> %{{.*}}
   return _mm256_mask_lzcnt_epi64(__W, __U, __A); 
 }
 
 __m256i test_mm256_maskz_lzcnt_epi64(__mmask8 __U, __m256i __A) {
   // CHECK-LABEL: test_mm256_maskz_lzcnt_epi64
-  // CHECK: call {{.*}}<4 x i64> @llvm.ctlz.v4i64(<4 x i64> %{{.*}}, i1 false)
+  // CHECK: call {{.*}}<4 x i64> @llvm.ctlz.v4i64(<4 x i64> %{{.*}}, i1 true)
   // CHECK: select <4 x i1> %{{.*}}, <4 x i64> %{{.*}}, <4 x i64> %{{.*}}
   return _mm256_maskz_lzcnt_epi64(__U, __A); 
 }

@moorabbit
Copy link
Contributor Author

Convert _mm_lzcnt_epi32 etc. (inc mask/maskz variants) to use __builtin_elementwise_ctlz

It's not obvious to me how the mask/maskz variants (e.g. _mm_maskz_lzcnt_epi32) can be refactored to use __builtin_elementwise_ctlz.

Is it enough to replace this call:

(__v16si)_mm512_lzcnt_epi32(__A),

by a call to __builtin_elementwise_ctlz?

@moorabbit moorabbit changed the title [Headers][X86] Use __builtin_elementwise_ctlz rather avx512cd intrinsics. [Headers][X86] Use __builtin_elementwise_ctlz instead of avx512cd intrinsics. Aug 23, 2025
__m512i test_mm512_lzcnt_epi32(__m512i __A) {
// CHECK-LABEL: test_mm512_lzcnt_epi32
// CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 false)
// CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's incorrect here. The is_zero_poison argument needs to stay false. The same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the code I see here:

// The elementwise builtins always exhibit zero-is-undef behaviour
Value *ZeroUndef = Builder.getInt1(
HasFallback || getTarget().isCLZForZeroUndef() ||
BuiltinIDIfNoAsmLabel == Builtin::BI__builtin_elementwise_cttz);

__builtin_elementwise_ctlz always emits @llvm.ctlz.*(*, i1 true).

However, providing a second argument to __builtin_elementwise_ctlz is similar to having is_zero_poison set to false. That's my understanding based on https://clang.llvm.org/docs/LanguageExtensions.html#vector-builtins.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add checks for the additional icmp and select

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to do that in a new commit. PTAL when you can.

@phoebewang
Copy link
Contributor

Convert _mm_lzcnt_epi32 etc. (inc mask/maskz variants) to use __builtin_elementwise_ctlz

It's not obvious to me how the mask/maskz variants (e.g. _mm_maskz_lzcnt_epi32) can be refactored to use __builtin_elementwise_ctlz.

Is it enough to replace this call:

(__v16si)_mm512_lzcnt_epi32(__A),

by a call to __builtin_elementwise_ctlz?

No, calling to intrinsic is good. We just need to change __DEFAULT_FN_ATTRS to __DEFAULT_FN_ATTRS_CONSTEXPR and add tests for them.

@phoebewang phoebewang requested a review from RKSimon August 26, 2025 02:26
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask/maskz tests shouldn't be a problem - __builtin_ia32_select* intrinsics are already constexpr - you just need to add tests similar to what we've done in files lie avx512vpopcntdq-builtins.c

return (__m512i) __builtin_ia32_vplzcntd_512 ((__v16si) __A);
static __inline__ __m512i __DEFAULT_FN_ATTRS_CONSTEXPR
_mm512_lzcnt_epi32(__m512i __A) {
return (__m512i)__builtin_elementwise_ctlz((__v16si)__A);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://clang.llvm.org/docs/LanguageExtensions.html

You need to provide the second argument as well - in this case it'd be:
(__m512i)__builtin_elementwise_ctlz((__v16si)__A, (__16si)_mm512_set1_epi32(32));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is we keep the __builtin_ia32_vplzcnt builtins and add them to VectorExprEvaluator::VisitCallExpr instead - its annoying not to use the generics, but the __builtin_elementwise_ctlz 2 operand variant will end up generating a ctlz+icmp+select sequence that won't be great in -O0 builds - but its whether we really care about that or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that people using x86 intrinsics are primarily interested in the optimized builds.

__m512i test_mm512_lzcnt_epi32(__m512i __A) {
// CHECK-LABEL: test_mm512_lzcnt_epi32
// CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 false)
// CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you add checks for the additional icmp and select

@moorabbit moorabbit requested a review from RKSimon August 28, 2025 14:55
// CHECK: call <16 x i32> @llvm.ctlz.v16i32(<16 x i32> %{{.*}}, i1 true)
// CHECK: icmp eq <16 x i32> %{{.*}}, zeroinitializer
// CHECK: select <16 x i1> %{{.*}}, <16 x i32> %{{.*}}, <16 x i32> %{{.*}}
return _mm512_lzcnt_epi32(__A);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might be able to improve on this with something like:

// CHECK: [[ISZERO:%.+]] = icmp eq <16 x i32> %{{.*}}, zeroinitializer
// CHECK: select <16 x i1> [[ISZERO]], <16 x i32> %{{.*}}, <16 x i32> %{{.*}}

which might help in the latest checks especially where you have multiple selects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the suggestion. Tried to implement it in a new commit.
LMK if it would be better to avoid redefining [[ISZERO]] each time and instead reuse the existing value whenever possible.

Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - cheers

@moorabbit
Copy link
Contributor Author

Thx for the fast review + suggestions.

@RKSimon RKSimon enabled auto-merge (squash) August 28, 2025 16:57
@RKSimon RKSimon merged commit 73cf62b into llvm:main Aug 28, 2025
9 checks passed
@moorabbit moorabbit deleted the update-avx512cd-lzcnt branch August 28, 2025 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Headers][X86] Update AVX512CD lzcnt intrinsics to use __builtin_elementwise_ctlz and support constexpr
4 participants