Skip to content

Regex catastrophic backtracking causes infinite hang in Java regex engine #254

@fglock

Description

@fglock

Problem

Java's java.util.regex engine is susceptible to catastrophic backtracking (ReDoS) on certain regex patterns. Unlike Perl's NFA engine which has optimizations to avoid exponential backtracking, Java's engine can get stuck inside a single Matcher.find() call indefinitely, consuming 100% CPU.

This was discovered during ExifTool test runs where GIF.t and QuickTime.t hung with 100% CPU. Thread dumps showed the JVM stuck inside Pattern$Branch.match() called from RuntimeRegex.matchRegexDirect().

Thread dump (from GIF.t)

"main" runnable
  at java.util.regex.Pattern$Branch.match(Pattern.java:4915)
  at java.util.regex.Pattern$GroupHead.match(Pattern.java:4970)
  at java.util.regex.Pattern$Begin.match(Pattern.java:3852)
  at java.util.regex.Matcher.search(Matcher.java:1765)
  at java.util.regex.Matcher.find(Matcher.java:785)
  at org.perlonjava.runtime.regex.RuntimeRegex.matchRegexDirect(RuntimeRegex.java:456)

Why loop count limits don't work

The hang occurs inside a single Matcher.find() call — our while loop body never executes. Limiting loop iterations has no effect because the JVM is stuck in the regex engine's internal backtracking.

Solution implemented

The standard Java solution is a CharSequence wrapper that checks for timeout during charAt() calls. Since Java's regex engine calls charAt() millions of times during backtracking, this provides fine-grained interruption:

  • RegexTimeoutCharSequence: wraps the input string, checks elapsed time every 4096 charAt() calls, throws RegexTimeoutException if a 10-second deadline is exceeded.
  • Applied to all three regex entry points: matchRegexDirect (m//), replaceRegex (s///), and split.
  • On timeout, a warning is emitted and the operation is treated as a failed match.

Files

  • RegexTimeoutCharSequence.java — the CharSequence wrapper
  • RegexTimeoutException.java — the exception class
  • RuntimeRegex.java — match and replace protected
  • Operator.java — split protected

Notes

  • The 10-second timeout is generous for any legitimate regex. Catastrophic backtracking would hang forever without it.
  • Perl 5 largely avoids this problem due to NFA optimizations in its regex engine.
  • The existing matchRegexWithTimeout (using ExecutorService) was ineffective because Thread.interrupt() cannot interrupt Java's regex engine — only the CharSequence.charAt() interception approach works.

Generated with Devin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions