Summary
RuntimeHelpers.matches() calls Pattern.compile(regexp) on every CEL evaluation, even when the regex pattern is a constant string literal. This is a significant performance bottleneck for libraries like protovalidate-java that evaluate the same regex patterns millions of times.
Problem
In RuntimeHelpers.java, the matches method recompiles the regex on every call:
public static boolean matches(String string, String regexp, CelOptions celOptions) {
Pattern pattern = Pattern.compile(regexp); // called every evaluation
...
}
In protovalidate, CEL expressions like input.matches('^0x[0-9a-f]{64}$') are compiled into programs that are cached and reused. But the regex pattern itself is recompiled from the string on every eval() call. This applies to both runtime paths (DefaultInterpreter and ProgramPlanner), since both funnel through RuntimeHelpers.matches().
Impact
Benchmarks against real-world proto validation patterns show 38-82% end-to-end improvement when patterns are cached:
| Pattern |
Unpatched |
Patched |
Improvement |
Hex hash ^0x[0-9a-f]{64}$ |
4,683 ns |
2,897 ns |
38% |
| Blockchain address (alternation pattern) |
13,868 ns |
2,455 ns |
82% |
UTXO pattern ^[0-9a-f]{64}:[0-9]+$ |
16,038 ns |
8,403 ns |
48% |
The cost of Pattern.compile() scales with pattern complexity — alternation patterns like ^(0x[0-9a-fA-F]{40}|[1-9A-HJ-NP-Za-km-z]{26,64}|bc1[0-9a-zA-Z]{25,87})$ cost ~11μs per compile.
Context: cel-go already solves this
cel-go has MatchesRegexOptimization in interpreter/optimizations.go which precompiles regex patterns at program creation time. protovalidate-go enables it via cel.OptOptimize. cel-java has no equivalent mechanism.
Suggested fix
A minimal fix — add a ConcurrentHashMap cache to RuntimeHelpers.matches():
@SuppressWarnings("Immutable")
private static final ConcurrentHashMap<String, Pattern> COMPILED_PATTERNS =
new ConcurrentHashMap<>();
public static boolean matches(String string, String regexp, CelOptions celOptions) {
Pattern pattern = COMPILED_PATTERNS.computeIfAbsent(regexp, Pattern::compile);
// ... rest unchanged
}
This is a 3-line change. The cache is unbounded, which is safe because:
- In practice, regex patterns in CEL come from compiled proto definitions (finite, small set)
- If bounded eviction is desired, a
Caffeine or LRU cache could replace ConcurrentHashMap
A more sophisticated approach (matching cel-go's MatchesRegexOptimization) would precompile at program creation time in ProgramPlanner.planCall(), but that only helps the planner runtime path and doesn't fix DefaultInterpreter.
Summary
RuntimeHelpers.matches()callsPattern.compile(regexp)on every CEL evaluation, even when the regex pattern is a constant string literal. This is a significant performance bottleneck for libraries like protovalidate-java that evaluate the same regex patterns millions of times.Problem
In
RuntimeHelpers.java, thematchesmethod recompiles the regex on every call:In protovalidate, CEL expressions like
input.matches('^0x[0-9a-f]{64}$')are compiled into programs that are cached and reused. But the regex pattern itself is recompiled from the string on everyeval()call. This applies to both runtime paths (DefaultInterpreterandProgramPlanner), since both funnel throughRuntimeHelpers.matches().Impact
Benchmarks against real-world proto validation patterns show 38-82% end-to-end improvement when patterns are cached:
^0x[0-9a-f]{64}$^[0-9a-f]{64}:[0-9]+$The cost of
Pattern.compile()scales with pattern complexity — alternation patterns like^(0x[0-9a-fA-F]{40}|[1-9A-HJ-NP-Za-km-z]{26,64}|bc1[0-9a-zA-Z]{25,87})$cost ~11μs per compile.Context: cel-go already solves this
cel-go has
MatchesRegexOptimizationininterpreter/optimizations.gowhich precompiles regex patterns at program creation time. protovalidate-go enables it viacel.OptOptimize. cel-java has no equivalent mechanism.Suggested fix
A minimal fix — add a
ConcurrentHashMapcache toRuntimeHelpers.matches():This is a 3-line change. The cache is unbounded, which is safe because:
Caffeineor LRU cache could replaceConcurrentHashMapA more sophisticated approach (matching cel-go's
MatchesRegexOptimization) would precompile at program creation time inProgramPlanner.planCall(), but that only helps the planner runtime path and doesn't fixDefaultInterpreter.