-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[libc] Fix stale char_ptr for find_first_character_wide read #166594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc] Fix stale char_ptr for find_first_character_wide read #166594
Conversation
On exit from the loop, char_ptr had not been updated to match block_ptr, resulting in erroneous results. Moving all updates out of the loop fixes that.
|
@llvm/pr-subscribers-libc Author: None (Sterling-Augustine) ChangesOn exit from the loop, char_ptr had not been updated to match block_ptr, resulting in erroneous results. Moving all updates out of the loop fixes that. Full diff: https://github.com/llvm/llvm-project/pull/166594.diff 2 Files Affected:
diff --git a/libc/src/string/string_utils.h b/libc/src/string/string_utils.h
index 7feef56fb3676..c9a720bef98a0 100644
--- a/libc/src/string/string_utils.h
+++ b/libc/src/string/string_utils.h
@@ -136,11 +136,11 @@ find_first_character_wide_read(const unsigned char *src, unsigned char ch,
const Word ch_mask = repeat_byte<Word>(ch);
// Step 2: read blocks
- for (const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
- !has_zeroes<Word>((*block_ptr) ^ ch_mask) && cur < n;
- ++block_ptr, cur += sizeof(Word)) {
- char_ptr = reinterpret_cast<const unsigned char *>(block_ptr);
- }
+ const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr);
+ for (; !has_zeroes<Word>((*block_ptr) ^ ch_mask) && cur < n;
+ ++block_ptr, cur += sizeof(Word))
+ ;
+ char_ptr = reinterpret_cast<const unsigned char *>(block_ptr);
// Step 3: find the match in the block
for (; *char_ptr != ch && cur < n; ++char_ptr, ++cur) {
diff --git a/libc/test/src/string/memchr_test.cpp b/libc/test/src/string/memchr_test.cpp
index ede841118fe03..1db5ecaed40cd 100644
--- a/libc/test/src/string/memchr_test.cpp
+++ b/libc/test/src/string/memchr_test.cpp
@@ -21,6 +21,11 @@ const char *call_memchr(const void *src, int c, size_t size) {
return reinterpret_cast<const char *>(LIBC_NAMESPACE::memchr(src, c, size));
}
+TEST(LlvmLibcMemChrTest, FromProtoC) {
+ const char *src = "protobuf_cpp_version$\n";
+ ASSERT_STREQ(call_memchr(src, '$', 22), "$\n");
+}
+
TEST(LlvmLibcMemChrTest, FindsCharacterAfterNullTerminator) {
// memchr should continue searching after a null terminator.
const size_t size = 5;
|
michaelrj-google
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
libc/src/string/string_utils.h
Outdated
| char_ptr = reinterpret_cast<const unsigned char *>(block_ptr); | ||
| } | ||
| const Word *block_ptr = reinterpret_cast<const Word *>(char_ptr); | ||
| for (; !has_zeroes<Word>((*block_ptr) ^ ch_mask) && cur < n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if here (and everywhere in this function, cur < n check should go first in the && condition? To ensure that whenever we're comparing the byte, this byte is actually inside the string (or a first byte of a word-size-block is inside the string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are correct here.
if cur >= n, we shouldn't dereference block_ptr, because we might have just crossed a page boundary.
I don't think it matters for the first and third loops, as those are character by character so won't have the problem, but perhaps best to switch regardless for logical consistency.
Updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, its worse than that. We should never dereference char_ptr if cur > n. Fixing.
…6594) On exit from the loop, char_ptr had not been updated to match block_ptr, resulting in erroneous results. Moving all updates out of the loop fixes that. Adjust derefences to always be inside bounds checks.
On exit from the loop, char_ptr had not been updated to match block_ptr, resulting in erroneous results. Moving all updates out of the loop fixes that.