Skip to content

Conversation

@Sterling-Augustine
Copy link
Contributor

On exit from the loop, char_ptr had not been updated to match block_ptr, resulting in erroneous results. Moving all updates out of the loop fixes that.

On exit from the loop, char_ptr had not been updated to match
block_ptr, resulting in erroneous results. Moving all updates out of
the loop fixes that.
@llvmbot llvmbot added the libc label Nov 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2025

@llvm/pr-subscribers-libc

Author: None (Sterling-Augustine)

Changes

On exit from the loop, char_ptr had not been updated to match block_ptr, resulting in erroneous results. Moving all updates out of the loop fixes that.


Full diff: https://github.com/llvm/llvm-project/pull/166594.diff

2 Files Affected:

  • (modified) libc/src/string/string_utils.h (+5-5)
  • (modified) libc/test/src/string/memchr_test.cpp (+5)
diff --git a/libc/src/string/string_utils.h b/libc/src/string/string_utils.h
index 7feef56fb3676..c9a720bef98a0 100644
--- a/libc/src/string/string_utils.h
+++ b/libc/src/string/string_utils.h
@@ -136,11 +136,11 @@ find_first_character_wide_read(const unsigned char *src, unsigned char ch,
   const Word ch_mask = repeat_byte<Word>(ch);
 
   // Step 2: read blocks
-  for (const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
-       !has_zeroes<Word>((*block_ptr) ^ ch_mask) && cur < n;
-       ++block_ptr, cur += sizeof(Word)) {
-    char_ptr = reinterpret_cast<const unsigned char *>(block_ptr);
-  }
+  const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
+  for (; !has_zeroes<Word>((*block_ptr) ^ ch_mask) && cur < n;
+       ++block_ptr, cur += sizeof(Word))
+    ;
+  char_ptr = reinterpret_cast<const unsigned char *>(block_ptr);
 
   // Step 3: find the match in the block
   for (; *char_ptr != ch && cur < n; ++char_ptr, ++cur) {
diff --git a/libc/test/src/string/memchr_test.cpp b/libc/test/src/string/memchr_test.cpp
index ede841118fe03..1db5ecaed40cd 100644
--- a/libc/test/src/string/memchr_test.cpp
+++ b/libc/test/src/string/memchr_test.cpp
@@ -21,6 +21,11 @@ const char *call_memchr(const void *src, int c, size_t size) {
   return reinterpret_cast<const char *>(LIBC_NAMESPACE::memchr(src, c, size));
 }
 
+TEST(LlvmLibcMemChrTest, FromProtoC) {
+  const char *src = "protobuf_cpp_version$\n";
+  ASSERT_STREQ(call_memchr(src, '$', 22), "$\n");
+}
+
 TEST(LlvmLibcMemChrTest, FindsCharacterAfterNullTerminator) {
   // memchr should continue searching after a null terminator.
   const size_t size = 5;

Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

char_ptr = reinterpret_cast<const unsigned char *>(block_ptr);
}
const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
for (; !has_zeroes<Word>((*block_ptr) ^ ch_mask) && cur < n;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if here (and everywhere in this function, cur < n check should go first in the && condition? To ensure that whenever we're comparing the byte, this byte is actually inside the string (or a first byte of a word-size-block is inside the string)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are correct here.

if cur >= n, we shouldn't dereference block_ptr, because we might have just crossed a page boundary.

I don't think it matters for the first and third loops, as those are character by character so won't have the problem, but perhaps best to switch regardless for logical consistency.

Updating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, its worse than that. We should never dereference char_ptr if cur > n. Fixing.

@Sterling-Augustine Sterling-Augustine merged commit 7ff8a51 into llvm:main Nov 6, 2025
20 checks passed
vinay-deshmukh pushed a commit to vinay-deshmukh/llvm-project that referenced this pull request Nov 8, 2025
…6594)

On exit from the loop, char_ptr had not been updated to match block_ptr,
resulting in erroneous results. Moving all updates out of the loop fixes
that.

Adjust derefences to always be inside bounds checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants