-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Problem
Java's java.util.regex engine is susceptible to catastrophic backtracking (ReDoS) on certain regex patterns. Unlike Perl's NFA engine which has optimizations to avoid exponential backtracking, Java's engine can get stuck inside a single Matcher.find() call indefinitely, consuming 100% CPU.
This was discovered during ExifTool test runs where GIF.t and QuickTime.t hung with 100% CPU. Thread dumps showed the JVM stuck inside Pattern$Branch.match() called from RuntimeRegex.matchRegexDirect().
Thread dump (from GIF.t)
"main" runnable
at java.util.regex.Pattern$Branch.match(Pattern.java:4915)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4970)
at java.util.regex.Pattern$Begin.match(Pattern.java:3852)
at java.util.regex.Matcher.search(Matcher.java:1765)
at java.util.regex.Matcher.find(Matcher.java:785)
at org.perlonjava.runtime.regex.RuntimeRegex.matchRegexDirect(RuntimeRegex.java:456)
Why loop count limits don't work
The hang occurs inside a single Matcher.find() call — our while loop body never executes. Limiting loop iterations has no effect because the JVM is stuck in the regex engine's internal backtracking.
Solution implemented
The standard Java solution is a CharSequence wrapper that checks for timeout during charAt() calls. Since Java's regex engine calls charAt() millions of times during backtracking, this provides fine-grained interruption:
RegexTimeoutCharSequence: wraps the input string, checks elapsed time every 4096charAt()calls, throwsRegexTimeoutExceptionif a 10-second deadline is exceeded.- Applied to all three regex entry points:
matchRegexDirect(m//),replaceRegex(s///), andsplit. - On timeout, a warning is emitted and the operation is treated as a failed match.
Files
RegexTimeoutCharSequence.java— the CharSequence wrapperRegexTimeoutException.java— the exception classRuntimeRegex.java— match and replace protectedOperator.java— split protected
Notes
- The 10-second timeout is generous for any legitimate regex. Catastrophic backtracking would hang forever without it.
- Perl 5 largely avoids this problem due to NFA optimizations in its regex engine.
- The existing
matchRegexWithTimeout(usingExecutorService) was ineffective becauseThread.interrupt()cannot interrupt Java's regex engine — only theCharSequence.charAt()interception approach works.
Generated with Devin