Reland Refactor WIDE_READ to allow finer control over high-performance function selection (#165613) #170738

Sterling-Augustine · 2025-12-04T20:37:57Z

[Previous commit had an incorrect default case when FIND_FIRST_CHARACTER_WIDE_READ_IMPL was not specified in config.json. This PR is identical to that one with one line fixed.]

As we implement more high-performance string-related functions, we have found a need for better control over their selection than the big-hammer LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr implementation coming, and unless I implement it in every variant, a simple binary value doesn't work.

This PR makes gives finer-grained control over high-performance functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any function they like, the user can now select one of four implementations at build time:

element, which reads byte-by-byte (or wchar by wchar)
wide, which reads by unsigned long
generic, which uses standard clang vector implemenations, if available
arch, which uses an architecture-specific implemenation

(Reading the code carefully, you may note that a user can actually specify any namespace they want, so we aren't technically limited to those 4.)

We may also want to switch from command-line #defines as it is currently done, to something more like
llvm-project/llvm/include/llvm/Config/llvm-config.h.cmake, and complexity out of the command-line. But that's a future problem.

…e function selection (llvm#165613) [Previous commit had an incorrect default case when FIND_FIRST_CHARACTER_WIDE_READ_IMPL was not specified in config.json.] As we implement more high-performance string-related functions, we have found a need for better control over their selection than the big-hammer LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr implementation coming, and unless I implement it in every variant, a simple binary value doesn't work. This PR makes gives finer-grained control over high-performance functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any function they like, the user can now select one of four implementations at build time: 1. element, which reads byte-by-byte (or wchar by wchar) 2. wide, which reads by unsigned long 3. generic, which uses standard clang vector implemenations, if available 4. arch, which uses an architecture-specific implemenation (Reading the code carefully, you may note that a user can actually specify any namespace they want, so we aren't technically limited to those 4.) We may also want to switch from command-line #defines as it is currently done, to something more like llvm-project/llvm/include/llvm/Config/llvm-config.h.cmake, and complexity out of the command-line. But that's a future problem.

llvmbot · 2025-12-04T20:38:31Z

@llvm/pr-subscribers-libc

Author: None (Sterling-Augustine)

Changes

[Previous commit had an incorrect default case when FIND_FIRST_CHARACTER_WIDE_READ_IMPL was not specified in config.json. This PR is identical to that one with one line fixed.]

As we implement more high-performance string-related functions, we have found a need for better control over their selection than the big-hammer LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr implementation coming, and unless I implement it in every variant, a simple binary value doesn't work.

This PR makes gives finer-grained control over high-performance functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any function they like, the user can now select one of four implementations at build time:

element, which reads byte-by-byte (or wchar by wchar)
wide, which reads by unsigned long
generic, which uses standard clang vector implemenations, if available
arch, which uses an architecture-specific implemenation

(Reading the code carefully, you may note that a user can actually specify any namespace they want, so we aren't technically limited to those 4.)

We may also want to switch from command-line #defines as it is currently done, to something more like
llvm-project/llvm/include/llvm/Config/llvm-config.h.cmake, and complexity out of the command-line. But that's a future problem.

Patch is 25.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170738.diff

13 Files Affected:

(modified) libc/cmake/modules/LLVMLibCCompileOptionRules.cmake (+2-3)
(modified) libc/config/config.json (+8-3)
(modified) libc/config/linux/arm/config.json (+5-2)
(modified) libc/config/linux/config.json (+5-2)
(modified) libc/config/linux/riscv/config.json (+5-2)
(modified) libc/docs/configure.rst (+2-1)
(modified) libc/src/string/memory_utils/aarch64/inline_strlen.h (+10-6)
(modified) libc/src/string/memory_utils/generic/inline_strlen.h (+2-3)
(modified) libc/src/string/memory_utils/x86_64/inline_strlen.h (+9-5)
(added) libc/src/string/string_length.h (+213)
(modified) libc/src/string/string_utils.h (+8-156)
(modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+4-1)
(modified) utils/bazel/llvm-project-overlay/libc/libc_configure_options.bzl (+2-1)

diff --git a/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake b/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
index 4e9a9b66a63a7..f4e2a62d14b31 100644
--- a/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
+++ b/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
@@ -81,9 +81,8 @@ function(_get_compile_options_from_config output_var)
     list(APPEND config_options "-DLIBC_QSORT_IMPL=${LIBC_CONF_QSORT_IMPL}")
   endif()
 
-  if(LIBC_CONF_STRING_UNSAFE_WIDE_READ)
-    list(APPEND config_options "-DLIBC_COPT_STRING_UNSAFE_WIDE_READ")
-  endif()
+  list(APPEND config_options "-DLIBC_COPT_STRING_LENGTH_IMPL=${LIBC_CONF_STRING_LENGTH_IMPL}")
+  list(APPEND config_options "-DLIBC_COPT_FIND_FIRST_CHARACTER_IMPL=${LIBC_CONF_FIND_FIRST_CHARACTER_IMPL}")
 
   if(LIBC_CONF_MEMSET_X86_USE_SOFTWARE_PREFETCHING)
     list(APPEND config_options "-DLIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING")
diff --git a/libc/config/config.json b/libc/config/config.json
index a7844e4fe2dd1..f0ab3b9cce2e9 100644
--- a/libc/config/config.json
+++ b/libc/config/config.json
@@ -40,6 +40,7 @@
       "value": false,
       "doc": "Use an alternative printf float implementation based on 320-bit floats"
     },
+
     "LIBC_CONF_PRINTF_DISABLE_FIXED_POINT": {
       "value": false,
       "doc": "Disable printing fixed point values in printf and friends."
@@ -64,9 +65,13 @@
     }
   },
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": false,
-      "doc": "Read more than a byte at a time to perform byte-string operations like strlen."
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "element",
+      "doc": "Selects the implementation for string-length: 'element', 'word', 'clang_vector', or 'arch_vector'."
+    },
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "element",
+      "doc": "Selects the implementation for find-first-character-related functions: 'element', 'word', 'clang_vector', or 'arch_vector'."
     },
     "LIBC_CONF_MEMSET_X86_USE_SOFTWARE_PREFETCHING": {
       "value": false,
diff --git a/libc/config/linux/arm/config.json b/libc/config/linux/arm/config.json
index e7ad4544b104d..caa16744d389f 100644
--- a/libc/config/linux/arm/config.json
+++ b/libc/config/linux/arm/config.json
@@ -1,7 +1,10 @@
 {
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": false
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "element"
+    }
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "element"
     }
   }
 }
diff --git a/libc/config/linux/config.json b/libc/config/linux/config.json
index 30e8b2cdadabe..8e7db248dc1bd 100644
--- a/libc/config/linux/config.json
+++ b/libc/config/linux/config.json
@@ -1,7 +1,10 @@
 {
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": true
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "clang_vector",
+    },
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "word",
     }
   }
 }
diff --git a/libc/config/linux/riscv/config.json b/libc/config/linux/riscv/config.json
index e7ad4544b104d..caa16744d389f 100644
--- a/libc/config/linux/riscv/config.json
+++ b/libc/config/linux/riscv/config.json
@@ -1,7 +1,10 @@
 {
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": false
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "element"
+    }
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "element"
     }
   }
 }
diff --git a/libc/docs/configure.rst b/libc/docs/configure.rst
index 362e293a4b714..43d3c0ec06d3b 100644
--- a/libc/docs/configure.rst
+++ b/libc/docs/configure.rst
@@ -58,8 +58,9 @@ to learn about the defaults for your platform and target.
 * **"setjmp" options**
     - ``LIBC_CONF_SETJMP_AARCH64_RESTORE_PLATFORM_REGISTER``: Make setjmp save the value of x18, and longjmp restore it. The AArch64 ABI delegates this register to platform ABIs, which can choose whether to make it caller-saved.
 * **"string" options**
+    - ``LIBC_CONF_FIND_FIRST_CHARACTER_IMPL``: Selects the implementation for find-first-character-related functions: 'element', 'word', 'clang_vector', or 'arch_vector'.
     - ``LIBC_CONF_MEMSET_X86_USE_SOFTWARE_PREFETCHING``: Inserts prefetch for write instructions (PREFETCHW) for memset on x86 to recover performance when hardware prefetcher is disabled.
-    - ``LIBC_CONF_STRING_UNSAFE_WIDE_READ``: Read more than a byte at a time to perform byte-string operations like strlen.
+    - ``LIBC_CONF_STRING_LENGTH_IMPL``: Selects the implementation for string-length: 'element', 'word', 'clang_vector', or 'arch_vector'.
 * **"threads" options**
     - ``LIBC_CONF_THREAD_MODE``: The implementation used for Mutex, acceptable values are LIBC_THREAD_MODE_PLATFORM, LIBC_THREAD_MODE_SINGLE, and LIBC_THREAD_MODE_EXTERNAL.
 * **"time" options**
diff --git a/libc/src/string/memory_utils/aarch64/inline_strlen.h b/libc/src/string/memory_utils/aarch64/inline_strlen.h
index eafaca9776a42..87f6cb8cf9bd5 100644
--- a/libc/src/string/memory_utils/aarch64/inline_strlen.h
+++ b/libc/src/string/memory_utils/aarch64/inline_strlen.h
@@ -15,7 +15,7 @@
 #include <arm_neon.h>
 #include <stddef.h> // size_t
 namespace LIBC_NAMESPACE_DECL {
-namespace neon {
+namespace internal::neon {
 [[maybe_unused]] LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE static size_t
 string_length(const char *src) {
   using Vector __attribute__((may_alias)) = uint8x8_t;
@@ -43,7 +43,7 @@ string_length(const char *src) {
                                  (cpp::countr_zero(cmp) >> 3));
   }
 }
-} // namespace neon
+} // namespace internal::neon
 } // namespace LIBC_NAMESPACE_DECL
 #endif // __ARM_NEON
 
@@ -51,7 +51,7 @@ string_length(const char *src) {
 #include "src/__support/macros/optimization.h"
 #include <arm_sve.h>
 namespace LIBC_NAMESPACE_DECL {
-namespace sve {
+namespace internal::sve {
 [[maybe_unused]] LIBC_INLINE static size_t string_length(const char *src) {
   const uint8_t *ptr = reinterpret_cast<const uint8_t *>(src);
   // Initialize the first-fault register to all true
@@ -92,15 +92,19 @@ namespace sve {
   len += svcntp_b8(all_true, before_zero);
   return len;
 }
-} // namespace sve
+} // namespace internal::sve
 } // namespace LIBC_NAMESPACE_DECL
 #endif // LIBC_TARGET_CPU_HAS_SVE
 
 namespace LIBC_NAMESPACE_DECL {
+namespace internal::arch_vector {
+[[maybe_unused]] LIBC_INLINE size_t string_length(const char *src) {
 #ifdef LIBC_TARGET_CPU_HAS_SVE
-namespace string_length_impl = sve;
+  return sve::string_length(src);
 #elif defined(__ARM_NEON)
-namespace string_length_impl = neon;
+  return neon::string_length(src);
 #endif
+}
+} // namespace internal::arch_vector
 } // namespace LIBC_NAMESPACE_DECL
 #endif // LLVM_LIBC_SRC_STRING_MEMORY_UTILS_AARCH64_INLINE_STRLEN_H
diff --git a/libc/src/string/memory_utils/generic/inline_strlen.h b/libc/src/string/memory_utils/generic/inline_strlen.h
index 69700e801bcea..7a565b36617ed 100644
--- a/libc/src/string/memory_utils/generic/inline_strlen.h
+++ b/libc/src/string/memory_utils/generic/inline_strlen.h
@@ -14,7 +14,7 @@
 #include "src/__support/common.h"
 
 namespace LIBC_NAMESPACE_DECL {
-namespace internal {
+namespace clang_vector {
 
 // Exploit the underlying integer representation to do a variable shift.
 LIBC_INLINE constexpr cpp::simd_mask<char> shift_mask(cpp::simd_mask<char> m,
@@ -46,9 +46,8 @@ LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE size_t string_length(const char *src) {
              cpp::find_first_set(mask);
   }
 }
-} // namespace internal
+} // namespace clang_vector
 
-namespace string_length_impl = internal;
 } // namespace LIBC_NAMESPACE_DECL
 
 #endif // LLVM_LIBC_SRC_STRING_MEMORY_UTILS_GENERIC_INLINE_STRLEN_H
diff --git a/libc/src/string/memory_utils/x86_64/inline_strlen.h b/libc/src/string/memory_utils/x86_64/inline_strlen.h
index 9e10d58363393..07b4a470f0d77 100644
--- a/libc/src/string/memory_utils/x86_64/inline_strlen.h
+++ b/libc/src/string/memory_utils/x86_64/inline_strlen.h
@@ -15,7 +15,8 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
-namespace string_length_internal {
+namespace internal::arch_vector {
+
 // Return a bit-mask with the nth bit set if the nth-byte in block_ptr is zero.
 template <typename Vector, typename Mask>
 LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE static Mask
@@ -92,15 +93,18 @@ namespace avx512 {
 }
 } // namespace avx512
 #endif
-} // namespace string_length_internal
 
+[[maybe_unused]] LIBC_INLINE size_t string_length(const char *src) {
 #if defined(__AVX512F__)
-namespace string_length_impl = string_length_internal::avx512;
+  return avx512::string_length(src);
 #elif defined(__AVX2__)
-namespace string_length_impl = string_length_internal::avx2;
+  return avx2::string_length(src);
 #else
-namespace string_length_impl = string_length_internal::sse2;
+  return sse2::string_length(src);
 #endif
+}
+
+} // namespace internal::arch_vector
 
 } // namespace LIBC_NAMESPACE_DECL
 
diff --git a/libc/src/string/string_length.h b/libc/src/string/string_length.h
new file mode 100644
index 0000000000000..c828c85c16a17
--- /dev/null
+++ b/libc/src/string/string_length.h
@@ -0,0 +1,213 @@
+//===-- String Length -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Basic implementation and dispatch mechanism for performance-sensitive string-
+// related code.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_STRING_STRING_LENGTH_H
+#define LLVM_LIBC_SRC_STRING_STRING_LENGTH_H
+
+#include "hdr/limits_macros.h"
+#include "hdr/stdint_proxy.h" // uintptr_t
+#include "hdr/types/size_t.h"
+#include "src/__support/CPP/type_traits.h" // cpp::is_same_v
+
+#if LIBC_HAS_VECTOR_TYPE
+#include "src/string/memory_utils/generic/inline_strlen.h"
+#endif
+#if defined(LIBC_TARGET_ARCH_IS_X86)
+#include "src/string/memory_utils/x86_64/inline_strlen.h"
+#elif defined(LIBC_TARGET_ARCH_IS_AARCH64)
+#include "src/string/memory_utils/aarch64/inline_strlen.h"
+#endif
+
+// Set sensible defaults
+#ifndef LIBC_COPT_STRING_LENGTH_IMPL
+#define LIBC_COPT_STRING_LENGTH_IMPL element
+#endif
+#ifndef LIBC_COPT_FIND_FIRST_CHARACTER_IMPL
+#define LIBC_COPT_FIND_FIRST_CHARACTER_IMPL element
+#endif
+
+namespace LIBC_NAMESPACE_DECL {
+namespace internal {
+
+#if !LIBC_HAS_VECTOR_TYPE
+// Forward any clang vector impls to architecture specific ones
+namespace arch_vector {}
+namespace clang_vector = arch_vector;
+#endif
+
+namespace element {
+// Element-by-element (usually a byte, but wider for wchar) implementations of
+// functions that search for data.  Slow, but easy to understand and analyze.
+
+// Returns the length of a string, denoted by the first occurrence
+// of a null terminator.
+LIBC_INLINE size_t string_length(const char *src) {
+  size_t length;
+  for (length = 0; *src; ++src, ++length)
+    ;
+  return length;
+}
+
+template <typename T> LIBC_INLINE size_t string_length_element(const T *src) {
+  size_t length;
+  for (length = 0; *src; ++src, ++length)
+    ;
+  return length;
+}
+
+LIBC_INLINE void *find_first_character(const unsigned char *src,
+                                       unsigned char ch, size_t n) {
+  for (; n && *src != ch; --n, ++src)
+    ;
+  return n ? const_cast<unsigned char *>(src) : nullptr;
+}
+} // namespace element
+
+namespace word {
+// Non-vector, implementations of functions that search for data by reading from
+// memory word-by-word.
+
+template <typename Word> LIBC_INLINE constexpr Word repeat_byte(Word byte) {
+  static_assert(CHAR_BIT == 8, "repeat_byte assumes a byte is 8 bits.");
+  constexpr size_t BITS_IN_BYTE = CHAR_BIT;
+  constexpr size_t BYTE_MASK = 0xff;
+  Word result = 0;
+  byte = byte & BYTE_MASK;
+  for (size_t i = 0; i < sizeof(Word); ++i)
+    result = (result << BITS_IN_BYTE) | byte;
+  return result;
+}
+
+// The goal of this function is to take in a block of arbitrary size and return
+// if it has any bytes equal to zero without branching. This is done by
+// transforming the block such that zero bytes become non-zero and non-zero
+// bytes become zero.
+// The first transformation relies on the properties of carrying in arithmetic
+// subtraction. Specifically, if 0x01 is subtracted from a byte that is 0x00,
+// then the result for that byte must be equal to 0xff (or 0xfe if the next byte
+// needs a carry as well).
+// The next transformation is a simple mask. All zero bytes will have the high
+// bit set after the subtraction, so each byte is masked with 0x80. This narrows
+// the set of bytes that result in a non-zero value to only zero bytes and bytes
+// with the high bit and any other bit set.
+// The final transformation masks the result of the previous transformations
+// with the inverse of the original byte. This means that any byte that had the
+// high bit set will no longer have it set, narrowing the list of bytes which
+// result in non-zero values to just the zero byte.
+template <typename Word> LIBC_INLINE constexpr bool has_zeroes(Word block) {
+  constexpr unsigned int LOW_BITS = repeat_byte<Word>(0x01);
+  constexpr Word HIGH_BITS = repeat_byte<Word>(0x80);
+  Word subtracted = block - LOW_BITS;
+  Word inverted = ~block;
+  return (subtracted & inverted & HIGH_BITS) != 0;
+}
+
+// Unsigned int is the default size for most processors, and on x86-64 it
+// performs better than larger sizes when the src pointer can't be assumed to
+// be aligned to a word boundary, so it's the size we use for reading the
+// string a block at a time.
+
+LIBC_INLINE size_t string_length(const char *src) {
+  using Word = unsigned int;
+  const char *char_ptr = src;
+  // Step 1: read 1 byte at a time to align to block size
+  for (; reinterpret_cast<uintptr_t>(char_ptr) % sizeof(Word) != 0;
+       ++char_ptr) {
+    if (*char_ptr == '\0')
+      return static_cast<size_t>(char_ptr - src);
+  }
+  // Step 2: read blocks
+  for (const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
+       !has_zeroes<Word>(*block_ptr); ++block_ptr) {
+    char_ptr = reinterpret_cast<const char *>(block_ptr);
+  }
+  // Step 3: find the zero in the block
+  for (; *char_ptr != '\0'; ++char_ptr) {
+    ;
+  }
+  return static_cast<size_t>(char_ptr - src);
+}
+
+LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE void *
+find_first_character(const unsigned char *src, unsigned char ch,
+                     size_t max_strlen = cpp::numeric_limits<size_t>::max()) {
+  using Word = unsigned int;
+  const unsigned char *char_ptr = src;
+  size_t cur = 0;
+
+  // If the maximum size of the string is small, the overhead of aligning to a
+  // word boundary and generating a bitmask of the appropriate size may be
+  // greater than the gains from reading larger chunks. Based on some testing,
+  // the crossover point between when it's faster to just read bytewise and read
+  // blocks is somewhere between 16 and 32, so 4 times the size of the block
+  // should be in that range.
+  if (max_strlen < (sizeof(Word) * 4)) {
+    return element::find_first_character(src, ch, max_strlen);
+  }
+  size_t n = max_strlen;
+  // Step 1: read 1 byte at a time to align to block size
+  for (; cur < n && reinterpret_cast<uintptr_t>(char_ptr) % sizeof(Word) != 0;
+       ++cur, ++char_ptr) {
+    if (*char_ptr == ch)
+      return const_cast<unsigned char *>(char_ptr);
+  }
+
+  const Word ch_mask = repeat_byte<Word>(ch);
+
+  // Step 2: read blocks
+  const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
+  for (; cur < n && !has_zeroes<Word>((*block_ptr) ^ ch_mask);
+       cur += sizeof(Word), ++block_ptr)
+    ;
+  char_ptr = reinterpret_cast<const unsigned char *>(block_ptr);
+
+  // Step 3: find the match in the block
+  for (; cur < n && *char_ptr != ch; ++cur, ++char_ptr) {
+    ;
+  }
+
+  if (cur >= n || *char_ptr != ch)
+    return static_cast<void *>(nullptr);
+
+  return const_cast<unsigned char *>(char_ptr);
+}
+
+} // namespace word
+
+// Dispatch mechanism for implementations of performance-sensitive
+// functions. Always measure, but generally from lower- to higher-performance
+// order:
+//
+// 1. element - read char-by-char or wchar-by-wchar
+// 3. word - read word-by-word
+// 3. clang_vector - read using clang's internal vector types
+// 4. arch_vector - hand-coded per architecture. Possibly in asm, or with
+// intrinsics.
+//
+// The called implemenation is chosen at build-time by setting
+// LIBC_CONF_{FUNC}_IMPL in config.json
+static constexpr auto &string_length_impl =
+    LIBC_COPT_STRING_LENGTH_IMPL::string_length;
+static constexpr auto &find_first_character_impl =
+    LIBC_COPT_FIND_FIRST_CHARACTER_IMPL::find_first_character;
+
+template <typename T> LIBC_INLINE size_t string_length(const T *src) {
+  if constexpr (cpp::is_same_v<T, char>)
+    return string_length_impl(src);
+  return element::string_length_element<T>(src);
+}
+
+} // namespace internal
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif //  LLVM_LIBC_SRC_STRING_STRING_LENGTH_H
diff --git a/libc/src/string/string_utils.h b/libc/src/string/string_utils.h
index cbce62ead0328..b0144e01a9006 100644
--- a/libc/src/string/string_utils.h
+++ b/libc/src/string/string_utils.h
@@ -14,172 +14,17 @@
 #ifndef LLVM_LIBC_SRC_STRING_STRING_UTILS_H
 #define LLVM_LIBC_SRC_STRING_STRING_UTILS_H
 
-#include "hdr/limits_macros.h"
-#include "hdr/stdint_proxy.h" // uintptr_t
 #include "hdr/types/size_t.h"
 #include "src/__support/CPP/bitset.h"
-#include "src/__support/CPP/type_traits.h" // cpp::is_same_v
 #include "src/__support/macros/attributes.h"
 #include "src/__support/macros/config.h"
 #include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
 #include "src/string/memory_utils/inline_memcpy.h"
-
-#if defined(LIBC_COPT_STRING_UNSAFE_WIDE_READ)
-#if LIBC_HAS_VECTOR_TYPE
-#include "src/string/memory_utils/generic/inline_strlen.h"
-#elif defined(LIBC_TARGET_ARCH_IS_X86)
-#include "src/string/memory_utils/x86_64/inline_strlen.h"
-#elif defined(LIBC_TARGET_ARCH_IS_AARCH64) && defined(__ARM_NEON)
-#include "src/string/memory_utils/aarch64/inline_strlen.h"
-#else
-namespace string_length_impl = LIBC_NAMESPACE::wide_read;
-#endif
-#endif // defined(LIBC_COPT_STRING_UNSAFE_WIDE_READ)
+#include "src/string/string_length.h"
 
 namespace LIBC_NAMESPACE_DECL {
 namespace internal {
 
-template <typename Word> LIBC_INLINE constexpr Word repeat_byte(Word byte) {
-  static_assert(CHAR_BIT == 8, "repeat_byte assumes a byte is 8 bits.");
-  constexpr size_t BITS_IN_BYTE = CHAR_BIT;
-  constexpr size_t BYTE_MASK = 0xff;
-  Word result = 0;
-  byte = byte & BYTE_MASK;
-  for (size_t i = 0; i < sizeof(Word); ++i)
-    result = (result << BITS_IN_BYTE) | byte;
-  return result;
-}
-
-// The goal of this function is to take in a block of arbitrary size and return
-// if it has any bytes equal to zero without branching. This is done by
-// transforming the block such that zero bytes become non-zero and non-zero
-// bytes become zero.
-// The first transformation relies on the properties of carrying in arithmetic
-// subtraction. Specifically, if 0x01 is subtracted from a byte that is 0x00,
-// then the result for that byte must be equal to 0xff (or 0xfe if the next byte
-// needs a carry as well).
-// The next transformation is a simple mask. All zero bytes will have the high
-// bit set after the subtraction, so each byte is masked with 0x80. This narrows
-// the set of bytes that result in a non-zero value to only zero bytes and bytes
-// with the high bit and any other bit set.
-// The final transformation masks the result of the previous transformations
-// with the inverse of the original byte. This means that any byte that had the
-// high bit set will no longer have it set, narrowing the list of bytes which
-// result in non-zero values to just the zero byte.
-template <typename Word> LIBC_INLINE constexpr bool has_zeroes(Word block) {
-  constexpr unsigned int LOW_BITS = repeat_byte<Word>(0x01);
-  constexpr Word HIGH_BITS = repeat_byte<Word>(0x80);
-  Word subtracted = block - LOW_BITS;
-  Word inverted = ~block;
-  return (subtracted & inverted & HIGH_BITS) != 0;
-}
-
-template <typename Word>
-LIBC_INLINE size_t string_length_wide_read(const char *src) {
-  const char *char_ptr = src;
-  // Step 1: read 1 byte at a t...
[truncated]

llvmbot · 2025-12-04T20:38:31Z

@llvm/pr-subscribers-backend-risc-v

Author: None (Sterling-Augustine)

Changes

[Previous commit had an incorrect default case when FIND_FIRST_CHARACTER_WIDE_READ_IMPL was not specified in config.json. This PR is identical to that one with one line fixed.]

As we implement more high-performance string-related functions, we have found a need for better control over their selection than the big-hammer LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr implementation coming, and unless I implement it in every variant, a simple binary value doesn't work.

This PR makes gives finer-grained control over high-performance functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any function they like, the user can now select one of four implementations at build time:

element, which reads byte-by-byte (or wchar by wchar)
wide, which reads by unsigned long
generic, which uses standard clang vector implemenations, if available
arch, which uses an architecture-specific implemenation

(Reading the code carefully, you may note that a user can actually specify any namespace they want, so we aren't technically limited to those 4.)

We may also want to switch from command-line #defines as it is currently done, to something more like
llvm-project/llvm/include/llvm/Config/llvm-config.h.cmake, and complexity out of the command-line. But that's a future problem.

Patch is 25.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/170738.diff

13 Files Affected:

(modified) libc/cmake/modules/LLVMLibCCompileOptionRules.cmake (+2-3)
(modified) libc/config/config.json (+8-3)
(modified) libc/config/linux/arm/config.json (+5-2)
(modified) libc/config/linux/config.json (+5-2)
(modified) libc/config/linux/riscv/config.json (+5-2)
(modified) libc/docs/configure.rst (+2-1)
(modified) libc/src/string/memory_utils/aarch64/inline_strlen.h (+10-6)
(modified) libc/src/string/memory_utils/generic/inline_strlen.h (+2-3)
(modified) libc/src/string/memory_utils/x86_64/inline_strlen.h (+9-5)
(added) libc/src/string/string_length.h (+213)
(modified) libc/src/string/string_utils.h (+8-156)
(modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+4-1)
(modified) utils/bazel/llvm-project-overlay/libc/libc_configure_options.bzl (+2-1)

diff --git a/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake b/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
index 4e9a9b66a63a7..f4e2a62d14b31 100644
--- a/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
+++ b/libc/cmake/modules/LLVMLibCCompileOptionRules.cmake
@@ -81,9 +81,8 @@ function(_get_compile_options_from_config output_var)
     list(APPEND config_options "-DLIBC_QSORT_IMPL=${LIBC_CONF_QSORT_IMPL}")
   endif()
 
-  if(LIBC_CONF_STRING_UNSAFE_WIDE_READ)
-    list(APPEND config_options "-DLIBC_COPT_STRING_UNSAFE_WIDE_READ")
-  endif()
+  list(APPEND config_options "-DLIBC_COPT_STRING_LENGTH_IMPL=${LIBC_CONF_STRING_LENGTH_IMPL}")
+  list(APPEND config_options "-DLIBC_COPT_FIND_FIRST_CHARACTER_IMPL=${LIBC_CONF_FIND_FIRST_CHARACTER_IMPL}")
 
   if(LIBC_CONF_MEMSET_X86_USE_SOFTWARE_PREFETCHING)
     list(APPEND config_options "-DLIBC_COPT_MEMSET_X86_USE_SOFTWARE_PREFETCHING")
diff --git a/libc/config/config.json b/libc/config/config.json
index a7844e4fe2dd1..f0ab3b9cce2e9 100644
--- a/libc/config/config.json
+++ b/libc/config/config.json
@@ -40,6 +40,7 @@
       "value": false,
       "doc": "Use an alternative printf float implementation based on 320-bit floats"
     },
+
     "LIBC_CONF_PRINTF_DISABLE_FIXED_POINT": {
       "value": false,
       "doc": "Disable printing fixed point values in printf and friends."
@@ -64,9 +65,13 @@
     }
   },
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": false,
-      "doc": "Read more than a byte at a time to perform byte-string operations like strlen."
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "element",
+      "doc": "Selects the implementation for string-length: 'element', 'word', 'clang_vector', or 'arch_vector'."
+    },
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "element",
+      "doc": "Selects the implementation for find-first-character-related functions: 'element', 'word', 'clang_vector', or 'arch_vector'."
     },
     "LIBC_CONF_MEMSET_X86_USE_SOFTWARE_PREFETCHING": {
       "value": false,
diff --git a/libc/config/linux/arm/config.json b/libc/config/linux/arm/config.json
index e7ad4544b104d..caa16744d389f 100644
--- a/libc/config/linux/arm/config.json
+++ b/libc/config/linux/arm/config.json
@@ -1,7 +1,10 @@
 {
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": false
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "element"
+    }
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "element"
     }
   }
 }
diff --git a/libc/config/linux/config.json b/libc/config/linux/config.json
index 30e8b2cdadabe..8e7db248dc1bd 100644
--- a/libc/config/linux/config.json
+++ b/libc/config/linux/config.json
@@ -1,7 +1,10 @@
 {
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": true
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "clang_vector",
+    },
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "word",
     }
   }
 }
diff --git a/libc/config/linux/riscv/config.json b/libc/config/linux/riscv/config.json
index e7ad4544b104d..caa16744d389f 100644
--- a/libc/config/linux/riscv/config.json
+++ b/libc/config/linux/riscv/config.json
@@ -1,7 +1,10 @@
 {
   "string": {
-    "LIBC_CONF_STRING_UNSAFE_WIDE_READ": {
-      "value": false
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "element"
+    }
+    "LIBC_CONF_FIND_FIRST_CHARACTER_IMPL": {
+      "value": "element"
     }
   }
 }
diff --git a/libc/docs/configure.rst b/libc/docs/configure.rst
index 362e293a4b714..43d3c0ec06d3b 100644
--- a/libc/docs/configure.rst
+++ b/libc/docs/configure.rst
@@ -58,8 +58,9 @@ to learn about the defaults for your platform and target.
 * **"setjmp" options**
     - ``LIBC_CONF_SETJMP_AARCH64_RESTORE_PLATFORM_REGISTER``: Make setjmp save the value of x18, and longjmp restore it. The AArch64 ABI delegates this register to platform ABIs, which can choose whether to make it caller-saved.
 * **"string" options**
+    - ``LIBC_CONF_FIND_FIRST_CHARACTER_IMPL``: Selects the implementation for find-first-character-related functions: 'element', 'word', 'clang_vector', or 'arch_vector'.
     - ``LIBC_CONF_MEMSET_X86_USE_SOFTWARE_PREFETCHING``: Inserts prefetch for write instructions (PREFETCHW) for memset on x86 to recover performance when hardware prefetcher is disabled.
-    - ``LIBC_CONF_STRING_UNSAFE_WIDE_READ``: Read more than a byte at a time to perform byte-string operations like strlen.
+    - ``LIBC_CONF_STRING_LENGTH_IMPL``: Selects the implementation for string-length: 'element', 'word', 'clang_vector', or 'arch_vector'.
 * **"threads" options**
     - ``LIBC_CONF_THREAD_MODE``: The implementation used for Mutex, acceptable values are LIBC_THREAD_MODE_PLATFORM, LIBC_THREAD_MODE_SINGLE, and LIBC_THREAD_MODE_EXTERNAL.
 * **"time" options**
diff --git a/libc/src/string/memory_utils/aarch64/inline_strlen.h b/libc/src/string/memory_utils/aarch64/inline_strlen.h
index eafaca9776a42..87f6cb8cf9bd5 100644
--- a/libc/src/string/memory_utils/aarch64/inline_strlen.h
+++ b/libc/src/string/memory_utils/aarch64/inline_strlen.h
@@ -15,7 +15,7 @@
 #include <arm_neon.h>
 #include <stddef.h> // size_t
 namespace LIBC_NAMESPACE_DECL {
-namespace neon {
+namespace internal::neon {
 [[maybe_unused]] LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE static size_t
 string_length(const char *src) {
   using Vector __attribute__((may_alias)) = uint8x8_t;
@@ -43,7 +43,7 @@ string_length(const char *src) {
                                  (cpp::countr_zero(cmp) >> 3));
   }
 }
-} // namespace neon
+} // namespace internal::neon
 } // namespace LIBC_NAMESPACE_DECL
 #endif // __ARM_NEON
 
@@ -51,7 +51,7 @@ string_length(const char *src) {
 #include "src/__support/macros/optimization.h"
 #include <arm_sve.h>
 namespace LIBC_NAMESPACE_DECL {
-namespace sve {
+namespace internal::sve {
 [[maybe_unused]] LIBC_INLINE static size_t string_length(const char *src) {
   const uint8_t *ptr = reinterpret_cast<const uint8_t *>(src);
   // Initialize the first-fault register to all true
@@ -92,15 +92,19 @@ namespace sve {
   len += svcntp_b8(all_true, before_zero);
   return len;
 }
-} // namespace sve
+} // namespace internal::sve
 } // namespace LIBC_NAMESPACE_DECL
 #endif // LIBC_TARGET_CPU_HAS_SVE
 
 namespace LIBC_NAMESPACE_DECL {
+namespace internal::arch_vector {
+[[maybe_unused]] LIBC_INLINE size_t string_length(const char *src) {
 #ifdef LIBC_TARGET_CPU_HAS_SVE
-namespace string_length_impl = sve;
+  return sve::string_length(src);
 #elif defined(__ARM_NEON)
-namespace string_length_impl = neon;
+  return neon::string_length(src);
 #endif
+}
+} // namespace internal::arch_vector
 } // namespace LIBC_NAMESPACE_DECL
 #endif // LLVM_LIBC_SRC_STRING_MEMORY_UTILS_AARCH64_INLINE_STRLEN_H
diff --git a/libc/src/string/memory_utils/generic/inline_strlen.h b/libc/src/string/memory_utils/generic/inline_strlen.h
index 69700e801bcea..7a565b36617ed 100644
--- a/libc/src/string/memory_utils/generic/inline_strlen.h
+++ b/libc/src/string/memory_utils/generic/inline_strlen.h
@@ -14,7 +14,7 @@
 #include "src/__support/common.h"
 
 namespace LIBC_NAMESPACE_DECL {
-namespace internal {
+namespace clang_vector {
 
 // Exploit the underlying integer representation to do a variable shift.
 LIBC_INLINE constexpr cpp::simd_mask<char> shift_mask(cpp::simd_mask<char> m,
@@ -46,9 +46,8 @@ LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE size_t string_length(const char *src) {
              cpp::find_first_set(mask);
   }
 }
-} // namespace internal
+} // namespace clang_vector
 
-namespace string_length_impl = internal;
 } // namespace LIBC_NAMESPACE_DECL
 
 #endif // LLVM_LIBC_SRC_STRING_MEMORY_UTILS_GENERIC_INLINE_STRLEN_H
diff --git a/libc/src/string/memory_utils/x86_64/inline_strlen.h b/libc/src/string/memory_utils/x86_64/inline_strlen.h
index 9e10d58363393..07b4a470f0d77 100644
--- a/libc/src/string/memory_utils/x86_64/inline_strlen.h
+++ b/libc/src/string/memory_utils/x86_64/inline_strlen.h
@@ -15,7 +15,8 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
-namespace string_length_internal {
+namespace internal::arch_vector {
+
 // Return a bit-mask with the nth bit set if the nth-byte in block_ptr is zero.
 template <typename Vector, typename Mask>
 LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE static Mask
@@ -92,15 +93,18 @@ namespace avx512 {
 }
 } // namespace avx512
 #endif
-} // namespace string_length_internal
 
+[[maybe_unused]] LIBC_INLINE size_t string_length(const char *src) {
 #if defined(__AVX512F__)
-namespace string_length_impl = string_length_internal::avx512;
+  return avx512::string_length(src);
 #elif defined(__AVX2__)
-namespace string_length_impl = string_length_internal::avx2;
+  return avx2::string_length(src);
 #else
-namespace string_length_impl = string_length_internal::sse2;
+  return sse2::string_length(src);
 #endif
+}
+
+} // namespace internal::arch_vector
 
 } // namespace LIBC_NAMESPACE_DECL
 
diff --git a/libc/src/string/string_length.h b/libc/src/string/string_length.h
new file mode 100644
index 0000000000000..c828c85c16a17
--- /dev/null
+++ b/libc/src/string/string_length.h
@@ -0,0 +1,213 @@
+//===-- String Length -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Basic implementation and dispatch mechanism for performance-sensitive string-
+// related code.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_STRING_STRING_LENGTH_H
+#define LLVM_LIBC_SRC_STRING_STRING_LENGTH_H
+
+#include "hdr/limits_macros.h"
+#include "hdr/stdint_proxy.h" // uintptr_t
+#include "hdr/types/size_t.h"
+#include "src/__support/CPP/type_traits.h" // cpp::is_same_v
+
+#if LIBC_HAS_VECTOR_TYPE
+#include "src/string/memory_utils/generic/inline_strlen.h"
+#endif
+#if defined(LIBC_TARGET_ARCH_IS_X86)
+#include "src/string/memory_utils/x86_64/inline_strlen.h"
+#elif defined(LIBC_TARGET_ARCH_IS_AARCH64)
+#include "src/string/memory_utils/aarch64/inline_strlen.h"
+#endif
+
+// Set sensible defaults
+#ifndef LIBC_COPT_STRING_LENGTH_IMPL
+#define LIBC_COPT_STRING_LENGTH_IMPL element
+#endif
+#ifndef LIBC_COPT_FIND_FIRST_CHARACTER_IMPL
+#define LIBC_COPT_FIND_FIRST_CHARACTER_IMPL element
+#endif
+
+namespace LIBC_NAMESPACE_DECL {
+namespace internal {
+
+#if !LIBC_HAS_VECTOR_TYPE
+// Forward any clang vector impls to architecture specific ones
+namespace arch_vector {}
+namespace clang_vector = arch_vector;
+#endif
+
+namespace element {
+// Element-by-element (usually a byte, but wider for wchar) implementations of
+// functions that search for data.  Slow, but easy to understand and analyze.
+
+// Returns the length of a string, denoted by the first occurrence
+// of a null terminator.
+LIBC_INLINE size_t string_length(const char *src) {
+  size_t length;
+  for (length = 0; *src; ++src, ++length)
+    ;
+  return length;
+}
+
+template <typename T> LIBC_INLINE size_t string_length_element(const T *src) {
+  size_t length;
+  for (length = 0; *src; ++src, ++length)
+    ;
+  return length;
+}
+
+LIBC_INLINE void *find_first_character(const unsigned char *src,
+                                       unsigned char ch, size_t n) {
+  for (; n && *src != ch; --n, ++src)
+    ;
+  return n ? const_cast<unsigned char *>(src) : nullptr;
+}
+} // namespace element
+
+namespace word {
+// Non-vector, implementations of functions that search for data by reading from
+// memory word-by-word.
+
+template <typename Word> LIBC_INLINE constexpr Word repeat_byte(Word byte) {
+  static_assert(CHAR_BIT == 8, "repeat_byte assumes a byte is 8 bits.");
+  constexpr size_t BITS_IN_BYTE = CHAR_BIT;
+  constexpr size_t BYTE_MASK = 0xff;
+  Word result = 0;
+  byte = byte & BYTE_MASK;
+  for (size_t i = 0; i < sizeof(Word); ++i)
+    result = (result << BITS_IN_BYTE) | byte;
+  return result;
+}
+
+// The goal of this function is to take in a block of arbitrary size and return
+// if it has any bytes equal to zero without branching. This is done by
+// transforming the block such that zero bytes become non-zero and non-zero
+// bytes become zero.
+// The first transformation relies on the properties of carrying in arithmetic
+// subtraction. Specifically, if 0x01 is subtracted from a byte that is 0x00,
+// then the result for that byte must be equal to 0xff (or 0xfe if the next byte
+// needs a carry as well).
+// The next transformation is a simple mask. All zero bytes will have the high
+// bit set after the subtraction, so each byte is masked with 0x80. This narrows
+// the set of bytes that result in a non-zero value to only zero bytes and bytes
+// with the high bit and any other bit set.
+// The final transformation masks the result of the previous transformations
+// with the inverse of the original byte. This means that any byte that had the
+// high bit set will no longer have it set, narrowing the list of bytes which
+// result in non-zero values to just the zero byte.
+template <typename Word> LIBC_INLINE constexpr bool has_zeroes(Word block) {
+  constexpr unsigned int LOW_BITS = repeat_byte<Word>(0x01);
+  constexpr Word HIGH_BITS = repeat_byte<Word>(0x80);
+  Word subtracted = block - LOW_BITS;
+  Word inverted = ~block;
+  return (subtracted & inverted & HIGH_BITS) != 0;
+}
+
+// Unsigned int is the default size for most processors, and on x86-64 it
+// performs better than larger sizes when the src pointer can't be assumed to
+// be aligned to a word boundary, so it's the size we use for reading the
+// string a block at a time.
+
+LIBC_INLINE size_t string_length(const char *src) {
+  using Word = unsigned int;
+  const char *char_ptr = src;
+  // Step 1: read 1 byte at a time to align to block size
+  for (; reinterpret_cast<uintptr_t>(char_ptr) % sizeof(Word) != 0;
+       ++char_ptr) {
+    if (*char_ptr == '\0')
+      return static_cast<size_t>(char_ptr - src);
+  }
+  // Step 2: read blocks
+  for (const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
+       !has_zeroes<Word>(*block_ptr); ++block_ptr) {
+    char_ptr = reinterpret_cast<const char *>(block_ptr);
+  }
+  // Step 3: find the zero in the block
+  for (; *char_ptr != '\0'; ++char_ptr) {
+    ;
+  }
+  return static_cast<size_t>(char_ptr - src);
+}
+
+LIBC_NO_SANITIZE_OOB_ACCESS LIBC_INLINE void *
+find_first_character(const unsigned char *src, unsigned char ch,
+                     size_t max_strlen = cpp::numeric_limits<size_t>::max()) {
+  using Word = unsigned int;
+  const unsigned char *char_ptr = src;
+  size_t cur = 0;
+
+  // If the maximum size of the string is small, the overhead of aligning to a
+  // word boundary and generating a bitmask of the appropriate size may be
+  // greater than the gains from reading larger chunks. Based on some testing,
+  // the crossover point between when it's faster to just read bytewise and read
+  // blocks is somewhere between 16 and 32, so 4 times the size of the block
+  // should be in that range.
+  if (max_strlen < (sizeof(Word) * 4)) {
+    return element::find_first_character(src, ch, max_strlen);
+  }
+  size_t n = max_strlen;
+  // Step 1: read 1 byte at a time to align to block size
+  for (; cur < n && reinterpret_cast<uintptr_t>(char_ptr) % sizeof(Word) != 0;
+       ++cur, ++char_ptr) {
+    if (*char_ptr == ch)
+      return const_cast<unsigned char *>(char_ptr);
+  }
+
+  const Word ch_mask = repeat_byte<Word>(ch);
+
+  // Step 2: read blocks
+  const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
+  for (; cur < n && !has_zeroes<Word>((*block_ptr) ^ ch_mask);
+       cur += sizeof(Word), ++block_ptr)
+    ;
+  char_ptr = reinterpret_cast<const unsigned char *>(block_ptr);
+
+  // Step 3: find the match in the block
+  for (; cur < n && *char_ptr != ch; ++cur, ++char_ptr) {
+    ;
+  }
+
+  if (cur >= n || *char_ptr != ch)
+    return static_cast<void *>(nullptr);
+
+  return const_cast<unsigned char *>(char_ptr);
+}
+
+} // namespace word
+
+// Dispatch mechanism for implementations of performance-sensitive
+// functions. Always measure, but generally from lower- to higher-performance
+// order:
+//
+// 1. element - read char-by-char or wchar-by-wchar
+// 3. word - read word-by-word
+// 3. clang_vector - read using clang's internal vector types
+// 4. arch_vector - hand-coded per architecture. Possibly in asm, or with
+// intrinsics.
+//
+// The called implemenation is chosen at build-time by setting
+// LIBC_CONF_{FUNC}_IMPL in config.json
+static constexpr auto &string_length_impl =
+    LIBC_COPT_STRING_LENGTH_IMPL::string_length;
+static constexpr auto &find_first_character_impl =
+    LIBC_COPT_FIND_FIRST_CHARACTER_IMPL::find_first_character;
+
+template <typename T> LIBC_INLINE size_t string_length(const T *src) {
+  if constexpr (cpp::is_same_v<T, char>)
+    return string_length_impl(src);
+  return element::string_length_element<T>(src);
+}
+
+} // namespace internal
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif //  LLVM_LIBC_SRC_STRING_STRING_LENGTH_H
diff --git a/libc/src/string/string_utils.h b/libc/src/string/string_utils.h
index cbce62ead0328..b0144e01a9006 100644
--- a/libc/src/string/string_utils.h
+++ b/libc/src/string/string_utils.h
@@ -14,172 +14,17 @@
 #ifndef LLVM_LIBC_SRC_STRING_STRING_UTILS_H
 #define LLVM_LIBC_SRC_STRING_STRING_UTILS_H
 
-#include "hdr/limits_macros.h"
-#include "hdr/stdint_proxy.h" // uintptr_t
 #include "hdr/types/size_t.h"
 #include "src/__support/CPP/bitset.h"
-#include "src/__support/CPP/type_traits.h" // cpp::is_same_v
 #include "src/__support/macros/attributes.h"
 #include "src/__support/macros/config.h"
 #include "src/__support/macros/optimization.h" // LIBC_UNLIKELY
 #include "src/string/memory_utils/inline_memcpy.h"
-
-#if defined(LIBC_COPT_STRING_UNSAFE_WIDE_READ)
-#if LIBC_HAS_VECTOR_TYPE
-#include "src/string/memory_utils/generic/inline_strlen.h"
-#elif defined(LIBC_TARGET_ARCH_IS_X86)
-#include "src/string/memory_utils/x86_64/inline_strlen.h"
-#elif defined(LIBC_TARGET_ARCH_IS_AARCH64) && defined(__ARM_NEON)
-#include "src/string/memory_utils/aarch64/inline_strlen.h"
-#else
-namespace string_length_impl = LIBC_NAMESPACE::wide_read;
-#endif
-#endif // defined(LIBC_COPT_STRING_UNSAFE_WIDE_READ)
+#include "src/string/string_length.h"
 
 namespace LIBC_NAMESPACE_DECL {
 namespace internal {
 
-template <typename Word> LIBC_INLINE constexpr Word repeat_byte(Word byte) {
-  static_assert(CHAR_BIT == 8, "repeat_byte assumes a byte is 8 bits.");
-  constexpr size_t BITS_IN_BYTE = CHAR_BIT;
-  constexpr size_t BYTE_MASK = 0xff;
-  Word result = 0;
-  byte = byte & BYTE_MASK;
-  for (size_t i = 0; i < sizeof(Word); ++i)
-    result = (result << BITS_IN_BYTE) | byte;
-  return result;
-}
-
-// The goal of this function is to take in a block of arbitrary size and return
-// if it has any bytes equal to zero without branching. This is done by
-// transforming the block such that zero bytes become non-zero and non-zero
-// bytes become zero.
-// The first transformation relies on the properties of carrying in arithmetic
-// subtraction. Specifically, if 0x01 is subtracted from a byte that is 0x00,
-// then the result for that byte must be equal to 0xff (or 0xfe if the next byte
-// needs a carry as well).
-// The next transformation is a simple mask. All zero bytes will have the high
-// bit set after the subtraction, so each byte is masked with 0x80. This narrows
-// the set of bytes that result in a non-zero value to only zero bytes and bytes
-// with the high bit and any other bit set.
-// The final transformation masks the result of the previous transformations
-// with the inverse of the original byte. This means that any byte that had the
-// high bit set will no longer have it set, narrowing the list of bytes which
-// result in non-zero values to just the zero byte.
-template <typename Word> LIBC_INLINE constexpr bool has_zeroes(Word block) {
-  constexpr unsigned int LOW_BITS = repeat_byte<Word>(0x01);
-  constexpr Word HIGH_BITS = repeat_byte<Word>(0x80);
-  Word subtracted = block - LOW_BITS;
-  Word inverted = ~block;
-  return (subtracted & inverted & HIGH_BITS) != 0;
-}
-
-template <typename Word>
-LIBC_INLINE size_t string_length_wide_read(const char *src) {
-  const char *char_ptr = src;
-  // Step 1: read 1 byte at a t...
[truncated]

michaelrj-google

LGTM

llvm-ci · 2025-12-05T00:01:18Z

LLVM Buildbot has detected a new failure on builder libc-arm32-qemu-debian-dbg running on libc-arm32-qemu-debian while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/215/builds/11020

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
    Missing ',' or '}' in object declaration

Call Stack (most recent call first):
  /home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/llvm-project/libc/cmake/modules/LibcConfig.cmake:109 (read_libc_config)
  /home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/llvm-project/libc/CMakeLists.txt:226 (load_libc_config)


-- Configuring incomplete, errors occurred!
See also "/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/build/CMakeFiles/CMakeOutput.log".
See also "/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/build/CMakeFiles/CMakeError.log".
FAILED: build.ninja 
/usr/bin/cmake --regenerate-during-build -S/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/llvm-project/runtimes -B/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/build
ninja: error: rebuilding 'build.ninja': subcommand failed
['ninja', 'libc'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 181, in step
    yield
  File "/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 143, in main
    run_command(['ninja', 'libc'])
  File "/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 196, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/llvm-libc-buildbot/buildbot-worker/libc-arm32-qemu-debian/libc-arm32-qemu-debian-dbg/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.11/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', 'libc']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@
@@@BUILD_STEP libc-unit-tests@@@
Running: ninja libc-unit-tests
[0/1] Re-running CMake...
-- Performing standalone runtimes build.
-- Could NOT find LLVM (missing: LLVM_DIR)
-- Could NOT find Clang (missing: Clang_DIR)
-- LLVM host triple: x86_64-unknown-linux-gnu
-- LLVM default target triple: x86_64-unknown-linux-gnu
-- Setting LIBC_NAMESPACE namespace to '__llvm_libc_21_0_0_git'
-- Set COMPILER_RESOURCE_DIR to /usr/lib/llvm-14/lib/clang/14.0.6 using --print-resource-dir
-- Building libc for arm on linux with LIBC_COMPILE_OPTIONS_DEFAULT: --target=arm-linux-gnueabihf;--target=arm-linux-gnueabihf
-- LIBC_CONF_ENABLE_STRONG_STACK_PROTECTOR: ON
-- LIBC_CONF_KEEP_FRAME_POINTER: ON
-- LIBC_CONF_ERRNO_MODE: LIBC_ERRNO_MODE_DEFAULT
-- LIBC_ADD_NULL_CHECKS: ON
-- LIBC_CONF_FREXP_INF_NAN_EXPONENT: 
-- LIBC_CONF_MATH_OPTIMIZATIONS: 0
-- LIBC_CONF_PRINTF_DISABLE_FIXED_POINT: OFF
-- LIBC_CONF_PRINTF_DISABLE_FLOAT: OFF
-- LIBC_CONF_PRINTF_DISABLE_INDEX_MODE: OFF
-- LIBC_CONF_PRINTF_DISABLE_STRERROR: OFF

llvm-ci · 2025-12-05T00:01:24Z

LLVM Buildbot has detected a new failure on builder libc-aarch64-ubuntu-dbg running on libc-aarch64-ubuntu while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/104/builds/36783

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
    Missing '}' or object member name

Call Stack (most recent call first):
  /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/llvm-project/libc/cmake/modules/LibcConfig.cmake:109 (read_libc_config)
  /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/llvm-project/libc/CMakeLists.txt:226 (load_libc_config)


-- Configuring incomplete, errors occurred!
See also "/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/build/CMakeFiles/CMakeOutput.log".
See also "/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/build/CMakeFiles/CMakeError.log".
FAILED: build.ninja 
/usr/share/cmake-3.20.0/bin/cmake --regenerate-during-build -S/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/llvm-project/runtimes -B/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/build
ninja: error: rebuilding 'build.ninja': subcommand failed
['ninja', 'libc'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 181, in step
    yield
  File "../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 143, in main
    run_command(['ninja', 'libc'])
  File "../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 196, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['ninja', 'libc']' returned non-zero exit status 1
@@@STEP_FAILURE@@@
@@@BUILD_STEP libc-unit-tests@@@
Running: ninja libc-unit-tests
[0/1] Re-running CMake...
-- Performing standalone runtimes build.
-- Could NOT find LLVM (missing: LLVM_DIR)
-- Could NOT find Clang (missing: Clang_DIR)
-- LLVM host triple: aarch64-unknown-linux-gnu
-- LLVM default target triple: aarch64-unknown-linux-gnu
-- Setting LIBC_NAMESPACE namespace to '__llvm_libc_20_0_0_git'
-- Set COMPILER_RESOURCE_DIR to /usr/lib/llvm-11/lib/clang/11.0.1 using --print-resource-dir
-- Building libc for aarch64 on linux with LIBC_COMPILE_OPTIONS_DEFAULT: 
-- LIBC_CONF_ENABLE_STRONG_STACK_PROTECTOR: ON
-- LIBC_CONF_KEEP_FRAME_POINTER: ON
-- LIBC_CONF_ERRNO_MODE: LIBC_ERRNO_MODE_DEFAULT
-- LIBC_ADD_NULL_CHECKS: ON
-- LIBC_CONF_FREXP_INF_NAN_EXPONENT: 
-- LIBC_CONF_MATH_OPTIMIZATIONS: 0
-- LIBC_CONF_PRINTF_DISABLE_FIXED_POINT: OFF
-- LIBC_CONF_PRINTF_DISABLE_FLOAT: OFF
-- LIBC_CONF_PRINTF_DISABLE_INDEX_MODE: OFF
-- LIBC_CONF_PRINTF_DISABLE_STRERROR: OFF

llvm-ci · 2025-12-05T00:01:26Z

LLVM Buildbot has detected a new failure on builder libc-aarch64-ubuntu-fullbuild-dbg running on libc-aarch64-ubuntu while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/71/builds/36775

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
    Missing '}' or object member name

Call Stack (most recent call first):
  /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-project/libc/cmake/modules/LibcConfig.cmake:109 (read_libc_config)
  /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-project/libc/CMakeLists.txt:226 (load_libc_config)


-- Configuring incomplete, errors occurred!
See also "/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/CMakeFiles/CMakeOutput.log".
See also "/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/CMakeFiles/CMakeError.log".
FAILED: build.ninja 
/usr/share/cmake-3.20.0/bin/cmake --regenerate-during-build -S/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-project/runtimes -B/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build
ninja: error: rebuilding 'build.ninja': subcommand failed
['ninja', 'libc'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 181, in step
    yield
  File "../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 143, in main
    run_command(['ninja', 'libc'])
  File "../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 196, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['ninja', 'libc']' returned non-zero exit status 1
@@@STEP_FAILURE@@@
@@@BUILD_STEP build libc-startup@@@
Running: ninja libc-startup
[0/1] Re-running CMake...
-- Performing standalone runtimes build.
-- Could NOT find LLVM (missing: LLVM_DIR)
-- Could NOT find Clang (missing: Clang_DIR)
-- LLVM host triple: aarch64-unknown-linux-gnu
-- LLVM default target triple: aarch64-unknown-linux-gnu
-- Setting LIBC_NAMESPACE namespace to '__llvm_libc_20_0_0_git'
-- Set COMPILER_RESOURCE_DIR to /usr/lib/llvm-11/lib/clang/11.0.1 using --print-resource-dir
-- Building libc for aarch64 on linux with LIBC_COMPILE_OPTIONS_DEFAULT: 
-- LIBC_CONF_ENABLE_STRONG_STACK_PROTECTOR: ON
-- LIBC_CONF_KEEP_FRAME_POINTER: ON
-- LIBC_CONF_ERRNO_MODE: LIBC_ERRNO_MODE_DEFAULT
-- LIBC_ADD_NULL_CHECKS: ON
-- LIBC_CONF_FREXP_INF_NAN_EXPONENT: 
-- LIBC_CONF_MATH_OPTIMIZATIONS: 0
-- LIBC_CONF_PRINTF_DISABLE_FIXED_POINT: OFF
-- LIBC_CONF_PRINTF_DISABLE_FLOAT: OFF
-- LIBC_CONF_PRINTF_DISABLE_INDEX_MODE: OFF
-- LIBC_CONF_PRINTF_DISABLE_STRERROR: OFF

michaelrj-google · 2025-12-05T00:04:55Z

libc/config/linux/arm/config.json

-      "value": false
+    "LIBC_CONF_STRING_LENGTH_IMPL": {
+      "value": "element"
+    }


it seems like the arm and riscv configs don't have commas on line 5

Can be fixed with #170776

llvm-ci · 2025-12-05T00:43:30Z

LLVM Buildbot has detected a new failure on builder libc-riscv32-qemu-yocto-fullbuild-dbg running on rv32gc-qemu-system while building libc,utils at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/196/builds/14093

Here is the relevant piece of the build log for the reference

Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
  failed parsing json string: * Line 6, Column 5

    Missing ',' or '}' in object declaration

Call Stack (most recent call first):
  /home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/llvm-project/libc/cmake/modules/LibcConfig.cmake:109 (read_libc_config)
  /home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/llvm-project/libc/CMakeLists.txt:226 (load_libc_config)


-- Configuring incomplete, errors occurred!
FAILED: build.ninja 
/usr/bin/cmake --regenerate-during-build -S/home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/llvm-project/runtimes -B/home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/build
ninja: error: rebuilding 'build.ninja': subcommand failed
['ninja', 'libc'] exited with return code 1.
The build step threw an exception...
Traceback (most recent call last):
  File "/home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 181, in step
    yield
  File "/home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 143, in main
    run_command(['ninja', 'libc'])
  File "/home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/build/../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py", line 196, in run_command
    util.report_run_cmd(cmd, cwd=directory)
  File "/home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/llvm-zorg/zorg/buildbot/builders/annotated/util.py", line 49, in report_run_cmd
    subprocess.check_call(cmd, shell=shell, *args, **kwargs)
  File "/usr/lib/python3.12/subprocess.py", line 413, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', 'libc']' returned non-zero exit status 1.
@@@STEP_FAILURE@@@
@@@BUILD_STEP build libc-startup@@@
Running: ninja libc-startup
[0/1] Re-running CMake...
-- Performing standalone runtimes build.
-- Could NOT find LLVM (missing: LLVM_DIR)
-- Could NOT find Clang (missing: Clang_DIR)
-- LLVM host triple: x86_64-unknown-linux-gnu
-- LLVM default target triple: x86_64-unknown-linux-gnu
-- Undefining _LARGEFILE_SOURCE
-- Undefining _FILE_OFFSET_BITS=64
-- Undefining __STDC_CONSTANT_MACROS
-- Undefining __STDC_FORMAT_MACROS
-- Undefining __STDC_LIMIT_MACROS
-- Setting LIBC_NAMESPACE namespace to '__llvm_libc_21_0_0_git'
-- Set COMPILER_RESOURCE_DIR to /usr/local/lib/clang/20 using --print-resource-dir
-- Building libc for riscv on linux with LIBC_COMPILE_OPTIONS_DEFAULT: --target=riscv32-unknown-linux-gnu;--target=riscv32-unknown-linux-gnu
-- Path for config files is: /home/libcrv32buildbot/bbroot/libc-riscv32-qemu-yocto-fullbuild-dbg/llvm-project/libc/config/linux/riscv
-- LIBC_CONF_ENABLE_STRONG_STACK_PROTECTOR: ON
-- LIBC_CONF_KEEP_FRAME_POINTER: ON
-- LIBC_CONF_ERRNO_MODE: LIBC_ERRNO_MODE_DEFAULT
-- LIBC_ADD_NULL_CHECKS: ON

statham-arm · 2025-12-05T09:48:59Z

libc/src/string/memory_utils/aarch64/inline_strlen.h

+  return sve::string_length(src);
 #elif defined(__ARM_NEON)
-namespace string_length_impl = neon;
+  return neon::string_length(src);


This #if caused a build failure for us downstream, because it doesn't have a fallback else clause, so that if you compile for the rare AArch64 targets without SVE or NEON, string_length ends up being an empty function, provoking compiler warnings – or errors, if you build at -Werror – about not returning a value and not using the src parameter.

I am unable to test this, but hopefully will be fixed with #170892

…e function selection (llvm#165613) (llvm#170738) [Previous commit had an incorrect default case when FIND_FIRST_CHARACTER_WIDE_READ_IMPL was not specified in config.json. This PR is identical to that one with one line fixed.] As we implement more high-performance string-related functions, we have found a need for better control over their selection than the big-hammer LIBC_CONF_STRING_LENGTH_WIDE_READ. For example, I have a memchr implementation coming, and unless I implement it in every variant, a simple binary value doesn't work. This PR makes gives finer-grained control over high-performance functions than the generic LIBC_CONF_UNSAFE_WIDE_READ option. For any function they like, the user can now select one of four implementations at build time: 1. element, which reads byte-by-byte (or wchar by wchar) 2. wide, which reads by unsigned long 3. generic, which uses standard clang vector implemenations, if available 4. arch, which uses an architecture-specific implemenation (Reading the code carefully, you may note that a user can actually specify any namespace they want, so we aren't technically limited to those 4.) We may also want to switch from command-line #defines as it is currently done, to something more like llvm-project/llvm/include/llvm/Config/llvm-config.h.cmake, and complexity out of the command-line. But that's a future problem.

Sterling-Augustine requested review from aaronmondal, keith and rupprecht as code owners December 4, 2025 20:37

Sterling-Augustine added the skip-precommit-approval PR for CI feedback, not intended for review label Dec 4, 2025

llvmbot added backend:RISC-V libc bazel "Peripheral" support tier build system: utils/bazel labels Dec 4, 2025

michaelrj-google approved these changes Dec 4, 2025

View reviewed changes

Sterling-Augustine merged commit ed7e66a into llvm:main Dec 4, 2025
34 checks passed

michaelrj-google reviewed Dec 5, 2025

View reviewed changes

statham-arm reviewed Dec 5, 2025

View reviewed changes

Sterling-Augustine mentioned this pull request Dec 5, 2025

Include inline_strlen.h on aarch64 only if the target has vector instrucions #170892

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reland Refactor WIDE_READ to allow finer control over high-performance function selection (#165613) #170738

Reland Refactor WIDE_READ to allow finer control over high-performance function selection (#165613) #170738

Uh oh!

Sterling-Augustine commented Dec 4, 2025

Uh oh!

llvmbot commented Dec 4, 2025

Uh oh!

llvmbot commented Dec 4, 2025

Uh oh!

michaelrj-google left a comment

Uh oh!

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

michaelrj-google Dec 5, 2025

Uh oh!

Sterling-Augustine Dec 5, 2025

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

statham-arm Dec 5, 2025

Uh oh!

Sterling-Augustine Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Reland Refactor WIDE_READ to allow finer control over high-performance function selection (#165613) #170738

Reland Refactor WIDE_READ to allow finer control over high-performance function selection (#165613) #170738

Uh oh!

Conversation

Sterling-Augustine commented Dec 4, 2025

Uh oh!

llvmbot commented Dec 4, 2025

Uh oh!

llvmbot commented Dec 4, 2025

Uh oh!

michaelrj-google left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

michaelrj-google Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Sterling-Augustine Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

llvm-ci commented Dec 5, 2025

Uh oh!

statham-arm Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Sterling-Augustine Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants