Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular Expression: java.lang.StackOverflow on matching long string #121

Open
TheMatt2 opened this issue Aug 6, 2021 · 0 comments
Open

Comments

@TheMatt2
Copy link
Contributor

TheMatt2 commented Aug 6, 2021

Jython's re module can cause a stack overflow if a sufficiently long string is matched.

Triggering a stack overflow this way will also generate a warning: "PyTableCode.call caught a Throwable that is not an Exception:"

This behavior seems to occur on any regular expression that uses a variable length expression, that is then repeated.
Furthermore, the crash is nondeterministic. Sometimes the program will crash on first attempt to match, sometimes it will fail on the second or third attempt to match the same string.

Here are a few examples of regular expressions that have this behavior

(a)+
(?:a)+
(a|b)+
(ab?)+
(a)*
(?:a)*
(a|b)*
(ab?)*
(a){0,}
(?:a){0,}
(a|b){0,}
(ab?){0,}

Example Traceback:

jython_re_bug.py:17: RuntimeWarning: PyTableCode.call caught a Throwable that is not an Exception:
java.lang.StackOverflowError
Jython internals might be in a bad state now that can cause deadlocks later on.
See http://bugs.jython.org/issue2536 for details.
  p.match(x)
Traceback (most recent call last):
  File "jython_re_bug.py", line 17, in <module>
    p.match(x)
RuntimeError: maximum recursion depth exceeded (Java StackOverflowError)

Steps to Reproduce

Here is a script that will reproduce the Stack Overflow

import re
re.match("(a)+", "a" * 15000)

Expected Behavior
The Python regular expression library should not be using recursion that would allow it to fail for very long strings.

Other Notes
System
Ubuntu 20.04.2 LTS
Jython 2.7.2b2-DEV (uncontrolled:+, Jan 12 2020, 08:32:17)
[OpenJDK 64-Bit Server VM (Ubuntu)] on java11.0.11

While it may take a large amount of characters for the example expression,
it has been observed more complicated patterns can trigger a Stack Overflow at around ~5000 characters.

There seems to be a similar bug in the Java regular expression library.
https://rules.sonarsource.com/java/RSPEC-5998

This affects the following pattern types:

(a|b)+
(ab?)+
(a|b)*
(ab?)*
(a|b){0,}
(ab?){0,}

However, that means there are patterns that are unsafe in Jython that are safe in both Python and Java.

Specifically,

(a)+
(?:a)+
(a)*
(?:a)*
(a){0,}
(?:a){0,}

These 6 represent the biggest problem, since they do not comply with either of the relevant library implementations.

Regardless, have a regular expression library that can cause in stack overflow from the matched string, and where stack overflowing can cause deadlocks (http://bugs.jython.org/issue2536), limits the usefulness of the regular expression library.

I am not aware of any good workarounds at this time, since even the Java regular expression library will cause a stack overflow in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant