Skip to content

Cache compiled RegExp program to eliminate per-match recompilation #596

@frostney

Description

@frostney

Summary

Store the compiled TRegExpProgram on the RegExp object instance so that exec, test, match, matchAll, search, replace, and split reuse it instead of recompiling on every call.

Why

ExecuteRegExp calls CompileRegExp on every match attempt. Additionally, CreateRegExpObject calls ValidateRegExpPatternNew which also compiles and discards the result. This means every RegExp construction compiles twice (once to validate, once on first exec), and every subsequent match recompiles from scratch.

For String.prototype.replace(/pattern/g, ...) on a large string or matchAll iterators, the pattern is recompiled for every match position. Compilation involves parsing the entire pattern, building the instruction array, and constructing character class tables — unnecessary work when the pattern hasn't changed.

Current behavior

CreateRegExpObject → ValidateRegExpPattern → CompileRegExp (result discarded)
exec/test/match    → ExecuteRegExp         → CompileRegExp (recompiled every call)

Expected behavior

CreateRegExpObject → CompileRegExp → store program on object
exec/test/match    → read cached program → ExecuteRegExpVM (no recompilation)

Scope notes

  • source/units/Goccia.RegExp.Engine.pas:146CompileRegExp call in ExecuteRegExp
  • source/units/Goccia.RegExp.Compiler.pas:1528-1534ValidateRegExpPatternNew compiles and discards
  • source/units/Goccia.RegExp.Runtime.pas:134-135CreateRegExpObject calls validate
  • The compiled program can be stored as an internal data property on the TGocciaObjectValue, or as a pointer field if a lighter mechanism is preferred
  • ValidateRegExpPatternNew can be replaced by the construction-time compile — if compilation succeeds, the pattern is valid
  • Related: PR Replace TRegExpr with purpose-built backtracking bytecode VM regex engine #585 (regex engine replacement)

Metadata

Metadata

Assignees

No one assigned

    Labels

    engineTGocciaEngine: language semantics, ECMAScript built-ins, parser, interpreter, bytecode VMperformancePerformance improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions