-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Fix undefined-behaviour in regex engine. #73071
Conversation
@llvm/pr-subscribers-llvm-support Author: Tanmay (tanmaysachan) ChangesRunning the This patch adds a check. Full diff: https://github.com/llvm/llvm-project/pull/73071.diff 1 Files Affected:
diff --git a/llvm/lib/Support/regengine.inc b/llvm/lib/Support/regengine.inc
index f23993abc6e7e71..54dd96ab9cfada5 100644
--- a/llvm/lib/Support/regengine.inc
+++ b/llvm/lib/Support/regengine.inc
@@ -146,7 +146,9 @@ matcher(struct re_guts *g, const char *string, size_t nmatch,
const char *stop;
/* simplify the situation where possible */
- if (g->cflags®_NOSUB)
+ if (!string)
+ return(REG_INVARG);
+ if (g->cflags®_NOSUB)
nmatch = 0;
if (eflags®_STARTEND) {
start = string + pmatch[0].rm_so;
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
f882527
to
3b5b785
Compare
@nikic ping. Edit: build seems to be failing for some reason, trying to fix. |
Any chance of a test case? (Not sure, but I'd expect we have unit tests for the regex api that could be extended to cover this) |
We do have some tests here: https://github.com/llvm/llvm-project/blob/main/llvm/unittests/Support/RegexTest.cpp |
@dwblaikie Thanks! Added the unittest. |
735524b
to
9abb2cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
- Running the regex engine on an empty string causes "Applying non-zero offset to null pointer" UB. - Bug discovered through "mlir-text-parser-fuzzer" module. - This patch puts a check in the matcher and adds a corresponding test.
9abb2cb
to
c983585
Compare
Squashed the commits. |
Hmm, I was going to merge this but I started thinking about how to tidy up the commit message and then went down a rabbit hole of: Wait, why do we need this? Could you help me understand a bit more what code is invalid here, down in the regex implementation? Because we do pass down the length (which should be zero) - so the underlying code should never dereference the pointer, right? So how do we end up with UB? |
@dwblaikie In regengine.inc (line 152), we use the start and end (both 0 here) and add it to the string pointer to find the bounds (null in this case). This causes the UBSan to act. |
Oh, is it that null+0 is UB? I guess that's fair... Hmm :/ |
Null+0 is a UB, but even if we ignore that, the engine still crashes if string = null with length = 0 along with null bounds. I believe that's by design since the regex ^$ should match with length = 0 string, so the function should still execute (just not with a null string). |
I don't understand that bit, and what got me asking more questions/concerns - if you pass in a pointer+length where length is zero, the implementation cannot/should not dereference that pointer. So it shouldn't be a problem if that pointer is null, because it should never be dereferenced. (this comes up with memcpy, which technically requires a non-null pointer, and even when the length is zero it's still technically UB and is a problem ( https://www.imperialviolet.org/2016/06/26/nonnull.html ) but that doesn't usually come up in user code - they'd have to have specifically annotated a pointer parameter as nonnull for the compiler to make any assumptions there, etc). But, yeah, the null+0 is enough to explain why we need a fix here... and I guess this is as good as any. |
Running the
mlir-text-parser-fuzzer
on a random corpus discovers a path that causes application of offset to a null pointer (UB) in the regex engine.This patch adds a check.
Input:
Binary input, generated by fuzzer.
Output: