bugfix: ngx.re: split() might enter infinite loops #106

thibaultcha · 2017-04-27T21:56:45Z

This is a proposed fix for #104. The issue encountered is similar to #79, but #79 cannot address this issue, since the introduced branch only runs when regex == "".

This proposes to unify the behavior triggered by ambiguous regexes, and that behavior would be to return the subject untouched (because the regex is ambiguous). I believe this behavior might be preferable than to split all characters, but sadly, it is a breaking change, alas :'(

Thoughts on how to best handle this?

agentzh · 2017-04-27T22:37:08Z

@thibaultcha I think we should follow perl's split's behavior here:

$ perl -e 'print join ",", split /|/, "abcd"'
a,b,c,d

agentzh · 2017-04-27T22:39:30Z

@thibaultcha I think instead of checking if the regex is empty, we should check if the matched separator is an empty string.

thibaultcha · 2017-04-28T05:15:54Z

@agentzh Updated the patch. We now mimic the Perl behavior with empty matches and with /^/ as well:

$ perl -e 'print join ":", split //, "abcd"'
a:b:c:d

$ perl -e 'print join ":", split /|/, "abcd"'
a:b:c:d

$ perl -e 'print join ":", split /()/, "abcd"'
a::b::c::d

$ perl -e 'print join ":", split /^/, "ab\ncd"'
ab
:cd

$ perl -e 'print join ":", split /^/m, "ab\ncd"'
ab
:cd

$ perl -e 'print join ":", split / ^/, "ab\ncd"'
ab
cd

$ perl -e 'print join ":", split / ^/x, "ab\ncd"'
ab
:cd

What are your thoughts on respecting this behavior?

agentzh · 2017-04-28T05:19:23Z

lib/ngx/re.lua

            res_idx = res_idx + 1
            res[res_idx] = sub(subj, sub_idx, from - 1)

+


Extra blank line?

oops, oversight...

agentzh · 2017-04-28T05:20:50Z

lib/ngx/re.lua

@@ -89,6 +94,10 @@ function _M.split(subj, regex, opts, ctx, max, res)
    -- needed because of further calls to string.sub in this function.
    subj = tostring(subj)

+    if not opts then
+      opts = ""


2-space indentations?

agentzh · 2017-04-28T05:21:27Z

lib/ngx/re.lua

-            res_idx = res_idx + 1
-            pos = pos + 1
-        end
+    local start_regex = find(regex, "^", nil, true) ~= nil


Hmm, this is too hacky to scan the regex literal using string.find.

I think the regex engine should handle this automatically. If not, then there must be some other issues.

Not satisfied with it either... Open to suggestions!

I think the regex engine should handle this automatically. If not, then there must be some other issues.

Sorry missed this. Hmm, I will investigate.

On subj = "ab\ncd" with /^/ If we don't add the m flag, we get:

{ "a", "b\ncd" }

So we do not match on newlines automatically, but only at position 0, because it is our first character, and the rest doesn't get matched. It seemed to me (after a bit of online research) that Perl's split was giving some special care of its own to /^/, but I could definitely be wrong.

Maybe you should try the "m" regex flag in the Lua split? By default, ^ only matches the beginning of the string IIRC.

Maybe you should try the "m" regex flag in the Lua split?

You mean require users to add it themselves in the opts argument? Because Perl's split behavior is - apparently - to add the m automatically when /^/ is given.

@thibaultcha Well, here we should follow our own semantics instead of perl's:

$ resty -e 'print((ngx.re.match("a\nb", "^b", "")) and "true" or "false")' false $ resty -e 'print((ngx.re.match("a\nb", "^b", "m")) and "true" or "false")' true

Makes our lives easier, I'm not against it! 😅 Will update.

thibaultcha · 2017-04-29T06:01:35Z

I updated the patch to not incorporate Perl's split behavior with /^/. However, to preserve the behavior introduced with #79, we still check for empty regexes. I'm not sure if we should check for all empty matches regexes like () or |, which currently have a different behavior than "":

-- empty regex:
ngx_re.split("abcd", "")   -- { "a", "b", "c", "d" }
-- others:
ngx_re.split("abcd", "|")  -- { " ", "a", "b", "c", "d" }
ngx_re.split("abcd", "()") -- { " ", " ", "a", "b", "c", "d" }

Should we do something about it?

agentzh · 2017-04-29T21:52:08Z

@thibaultcha I think instead of checking empty patterns, we should check empty captures instead.

thibaultcha · 2017-04-29T22:55:14Z

I'm not sure what you mean, nor how.

agentzh · 2017-04-29T23:18:02Z

@thibaultcha Just check if the separator pattern actually matches any non-zero length of input data (by checking the pos returned by PCRE). If yes, simply move the cursor to the next character of the input.

thibaultcha · 2017-05-09T03:01:52Z

@agentzh Sorry for the delay! I pushed a new version of the patch. Moving to the next character on empty matches (what the patch was already doing, but only for empty regexes) does not seem to be enough for cases like:

$ perl -e 'print join ":", split /^/m, "ab\ncd"'
ab
:cd

We now mimic this behavior in ngx_re.split() if the user specifies the m flag:

ngx_re.split("ab\ncd", "^", "m")
{ "ab\n", "cd" }

To do this we handle empty matches by running a second time the regex further ahead, it seems to be the only way to do satisfy both the empty regex and the /^/m one.

Let me know what you think of it!

PS: I don't really like this code duplication between the max and the non-max branches. It leads to an overly complicated function at first look, and needs twice as many tests for both branches :/

thibaultcha · 2017-05-15T22:36:18Z

@agentzh Have you had some time to review this? Considering it is a bugfix, I think it needs some attention. As pointed out in my previous comment, I think we need to look-ahead when such empty matches happen, to know how many characters must be skipped (not necessary the next character as previously mentioned, implemented). Thanks!

agentzh · 2017-05-15T23:46:19Z

lib/ngx/re.lua

+                local old_pos = ctx.pos
+
+                local from2, _, _, err2 = re_split_helper(subj, compiled,
+                                              compile_once, flags, ctx)


@thibaultcha It looks strange to me to perform a second match in the same location. Why? You said it is "the only way", but I still don't understand it. Will you elaborate?

Sure, this needs elaboration. So, considering the following string:

"abcd\nefgh"

And the following regex:

/^/m

We have a match at characters 0 and 5. If the only thing we do is moving to the next character after the first match, we have something like the following result:

{ "a", "bcd", "e", "fgh" }

Because our function is not smart enough to know that an empty match does not necessarily mean we are splitting char-by-char. What we need is keep the first match (0), and find where the next match is (we know it'll be empty too), which will be 5. We can then deduce our first sub-string is 1 to 4:

{ "abcd" }

Otherwise, a simple increment would have cut 0 to 1, and then 2 to 4:

{ "a", "bcd" }

Simply incrementing the string pos by one would work for regexps like /()/ were we know the next character is the next match, but not for subjects were we don't know where the next empty match will be at. At least, that's what seems to be the only way (insisting on the "seems") :)

@thibaultcha It appears clearer to me now. Now I'd just like to be this match's result not wasted.

agentzh · 2017-05-16T00:39:25Z

lib/ngx/re.lua

+
+                from = from2
+                to = from2
+                ctx.pos = old_pos


Maybe we should not backtrack to the old position? This way we can make this extra match not get wasted. The same applies to the to index returned by this extra match.

Hmmm yeah I thought this was already the case (edited my explanatory comment), but just realized it actually is backtracking to the old position. Will try to get rid of it. Would want nothing less performant than that for lua-resty-core :)

yields empty matches

thibaultcha · 2017-05-16T02:59:06Z

@agentzh I have updated the patch with a new, smaller implementation that I find myself quite found of :) None of the existing tests were modified, some were added to cover additional or missing ground.

thibaultcha · 2017-05-16T03:00:17Z

Also this patch was rebased on master.

agentzh · 2017-05-16T19:24:09Z

@thibaultcha This indeed looks much better. Thanks!

@doujiang24 @dndx Will you review this PR when you guys have a chance? Thanks!

thibaultcha · 2017-06-05T23:26:55Z

Ping :)

agentzh · 2017-06-06T18:08:35Z

@thibaultcha Sorry for the delay! I'm still waiting for @doujiang24's review :)

doujiang24

@thibaultcha @agentzh Sorry for the delay!
LGTM :)

thibaultcha · 2017-06-09T18:00:01Z

Thanks @doujiang24 !

agentzh · 2017-06-09T20:23:55Z

@thibaultcha Merged. Thanks!

thibaultcha mentioned this pull request Apr 27, 2017

I use ngx.re.split(),but receive http 500. #104

Closed

thibaultcha force-pushed the fix/re-split-infinite-loop branch from 96169cc to b3fecd6 Compare April 28, 2017 05:10

agentzh reviewed Apr 28, 2017

View reviewed changes

thibaultcha force-pushed the fix/re-split-infinite-loop branch from b3fecd6 to e89ec2a Compare April 28, 2017 05:21

agentzh reviewed Apr 28, 2017

View reviewed changes

thibaultcha force-pushed the fix/re-split-infinite-loop branch 2 times, most recently from 999aaca to 247e63c Compare April 29, 2017 05:56

thibaultcha force-pushed the fix/re-split-infinite-loop branch 3 times, most recently from 3080f9e to 4ab563f Compare May 9, 2017 02:55

thibaultcha force-pushed the fix/re-split-infinite-loop branch 2 times, most recently from 7e61a8f to 590e523 Compare May 9, 2017 03:06

agentzh reviewed May 15, 2017

View reviewed changes

agentzh reviewed May 16, 2017

View reviewed changes

thibaultcha force-pushed the fix/re-split-infinite-loop branch 3 times, most recently from ae4006f to 1138688 Compare May 16, 2017 02:49

bugfix: ngx.re: split() might enter infinite loops when the regex

8d5e3b8

yields empty matches

thibaultcha force-pushed the fix/re-split-infinite-loop branch from 1138688 to 8d5e3b8 Compare May 16, 2017 02:57

doujiang24 approved these changes Jun 9, 2017

View reviewed changes

agentzh closed this Jun 9, 2017

thibaultcha deleted the fix/re-split-infinite-loop branch January 12, 2018 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: ngx.re: split() might enter infinite loops #106

bugfix: ngx.re: split() might enter infinite loops #106

thibaultcha commented Apr 27, 2017 •

edited

Loading

agentzh commented Apr 27, 2017

agentzh commented Apr 27, 2017

thibaultcha commented Apr 28, 2017

agentzh Apr 28, 2017

thibaultcha Apr 28, 2017

agentzh Apr 28, 2017

agentzh Apr 28, 2017

agentzh Apr 28, 2017

thibaultcha Apr 28, 2017

thibaultcha Apr 28, 2017

thibaultcha Apr 28, 2017

agentzh Apr 28, 2017

thibaultcha Apr 28, 2017 •

edited

Loading

agentzh Apr 28, 2017

thibaultcha Apr 28, 2017

thibaultcha commented Apr 29, 2017 •

edited

Loading

agentzh commented Apr 29, 2017

thibaultcha commented Apr 29, 2017

agentzh commented Apr 29, 2017

thibaultcha commented May 9, 2017 •

edited

Loading

thibaultcha commented May 15, 2017

agentzh May 15, 2017

thibaultcha May 16, 2017 •

edited

Loading

agentzh May 16, 2017

agentzh May 16, 2017

thibaultcha May 16, 2017

thibaultcha commented May 16, 2017

thibaultcha commented May 16, 2017

agentzh commented May 16, 2017

thibaultcha commented Jun 5, 2017

agentzh commented Jun 6, 2017

doujiang24 left a comment

thibaultcha commented Jun 9, 2017

agentzh commented Jun 9, 2017

		res_idx = res_idx + 1
		res[res_idx] = sub(subj, sub_idx, from - 1)

bugfix: ngx.re: split() might enter infinite loops #106

bugfix: ngx.re: split() might enter infinite loops #106

Conversation

thibaultcha commented Apr 27, 2017 • edited Loading

agentzh commented Apr 27, 2017

agentzh commented Apr 27, 2017

thibaultcha commented Apr 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thibaultcha Apr 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thibaultcha commented Apr 29, 2017 • edited Loading

agentzh commented Apr 29, 2017

thibaultcha commented Apr 29, 2017

agentzh commented Apr 29, 2017

thibaultcha commented May 9, 2017 • edited Loading

thibaultcha commented May 15, 2017

Choose a reason for hiding this comment

thibaultcha May 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thibaultcha commented May 16, 2017

thibaultcha commented May 16, 2017

agentzh commented May 16, 2017

thibaultcha commented Jun 5, 2017

agentzh commented Jun 6, 2017

doujiang24 left a comment

Choose a reason for hiding this comment

thibaultcha commented Jun 9, 2017

agentzh commented Jun 9, 2017

thibaultcha commented Apr 27, 2017 •

edited

Loading

thibaultcha Apr 28, 2017 •

edited

Loading

thibaultcha commented Apr 29, 2017 •

edited

Loading

thibaultcha commented May 9, 2017 •

edited

Loading

thibaultcha May 16, 2017 •

edited

Loading