Fix RegExp lookbehind direction#673
Conversation
Implement backward-direction matching for lookbehind subpatterns so captures, backreferences, nested lookaround, and sticky/global matches follow ECMAScript lookbehind semantics. Closes #612
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR implements lookbehind pattern support with captures, back-references, and nested lookaround by introducing backward-mode compilation and execution. The compiler adds a ChangesLookbehind captures and bidirectional regex execution
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Benchmark Results407 benchmarks Interpreted: 🟢 22 improved · 🔴 129 regressed · 256 unchanged · avg -1.7% arraybuffer.js — Interp: 🔴 5, 9 unch. · avg -3.0% · Bytecode: 🟢 5, 9 unch. · avg +1.9%
arrays.js — Interp: 🔴 9, 10 unch. · avg -4.0% · Bytecode: 🔴 14, 5 unch. · avg -7.3%
async-await.js — Interp: 🔴 2, 4 unch. · avg -3.2% · Bytecode: 🔴 3, 3 unch. · avg -2.6%
async-generators.js — Interp: 🔴 1, 1 unch. · avg -2.9% · Bytecode: 🔴 1, 1 unch. · avg -2.2%
base64.js — Interp: 🔴 2, 8 unch. · avg -1.3% · Bytecode: 🔴 2, 8 unch. · avg -0.4%
classes.js — Interp: 🟢 5, 🔴 11, 15 unch. · avg -1.3% · Bytecode: 🟢 2, 🔴 11, 18 unch. · avg -1.5%
closures.js — Interp: 11 unch. · avg +0.1% · Bytecode: 🟢 1, 🔴 4, 6 unch. · avg -2.7%
collections.js — Interp: 🔴 2, 10 unch. · avg -1.2% · Bytecode: 🟢 2, 🔴 4, 6 unch. · avg -2.0%
csv.js — Interp: 🔴 6, 7 unch. · avg -2.9% · Bytecode: 🟢 1, 12 unch. · avg +0.5%
destructuring.js — Interp: 🟢 1, 🔴 1, 20 unch. · avg -0.9% · Bytecode: 🔴 10, 12 unch. · avg -2.3%
fibonacci.js — Interp: 🔴 3, 5 unch. · avg -1.7% · Bytecode: 🔴 2, 6 unch. · avg -1.4%
float16array.js — Interp: 🟢 1, 🔴 14, 17 unch. · avg -2.6% · Bytecode: 🟢 1, 🔴 8, 23 unch. · avg -3.6%
for-of.js — Interp: 🔴 3, 4 unch. · avg -0.5% · Bytecode: 🔴 4, 3 unch. · avg -4.5%
generators.js — Interp: 🔴 2, 2 unch. · avg -1.3% · Bytecode: 4 unch. · avg -0.1%
iterators.js — Interp: 🔴 37, 5 unch. · avg -6.4% · Bytecode: 🔴 21, 21 unch. · avg -3.1%
json.js — Interp: 🔴 4, 16 unch. · avg -1.2% · Bytecode: 🟢 1, 🔴 2, 17 unch. · avg -0.3%
jsx.jsx — Interp: 🟢 1, 20 unch. · avg +1.3% · Bytecode: 🔴 7, 14 unch. · avg -2.1%
modules.js — Interp: 🔴 2, 7 unch. · avg -1.9% · Bytecode: 🔴 2, 7 unch. · avg -1.4%
numbers.js — Interp: 11 unch. · avg -1.0% · Bytecode: 🟢 1, 🔴 2, 8 unch. · avg -0.6%
objects.js — Interp: 🟢 3, 4 unch. · avg +2.8% · Bytecode: 7 unch. · avg +0.7%
promises.js — Interp: 🔴 3, 9 unch. · avg -1.5% · Bytecode: 🟢 3, 🔴 2, 7 unch. · avg +2.1%
regexp.js — Interp: 🔴 1, 10 unch. · avg -1.2% · Bytecode: 🟢 2, 9 unch. · avg -0.1%
strings.js — Interp: 🟢 1, 🔴 2, 16 unch. · avg -0.6% · Bytecode: 🔴 5, 14 unch. · avg -3.0%
tsv.js — Interp: 🟢 2, 7 unch. · avg +1.6% · Bytecode: 🔴 2, 7 unch. · avg -2.1%
typed-arrays.js — Interp: 🟢 7, 🔴 3, 12 unch. · avg +13.8% · Bytecode: 🔴 13, 9 unch. · avg -22.4%
uint8array-encoding.js — Interp: 🟢 1, 🔴 8, 9 unch. · avg -1.0% · Bytecode: 🟢 10, 🔴 2, 6 unch. · avg +33.7%
weak-collections.js — Interp: 🔴 8, 7 unch. · avg -20.2% · Bytecode: 🟢 3, 🔴 3, 9 unch. · avg +3.3%
Deterministic profile diffDeterministic profile diff: no significant changes. Measured on ubuntu-latest x64. Benchmark ranges compare cached main-branch min/max ops/sec with the PR run; overlapping ranges are treated as unchanged noise. Percentage deltas are secondary context. |
Suite TimingTest Runner (interpreted: 9,736 passed; bytecode: 9,736 passed)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Test runner worker shutdown frees thread-local heaps in bulk; that shutdown reclamation is not counted as GC collections or collected objects.
Benchmarks (interpreted: 407; bytecode: 407)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Benchmark runner performs explicit between-file collections, so collection and collected-object counts can be much higher than the test runner.
Measured on ubuntu-latest x64. |
test262 Conformance
Areas closest to 100%
Per-test deltas (+12 / -0)Newly passing (12):
Steady-state failures are non-blocking; regressions vs the cached main baseline (lower total pass count, or any PASS → non-PASS transition) fail the conformance gate. Measured on ubuntu-latest x64, bytecode mode. Areas grouped by the first two test262 path components; minimum 25 attempted tests, areas already at 100% excluded. Δ vs main compares against the most recent cached |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/units/Goccia.RegExp.Compiler.pas`:
- Around line 2355-2390: CompileAlternative currently reverses only top-level
terms but multi-code-point class atoms (emitted in EmitClassContents / any
string-sequence emitter) are still emitted in forward order; update the emitter
(e.g., EmitClassContents or the routine that emits string/sequence atoms such as
EmitStringSequence) to check FBackward and, when true, emit the code points of
multi-code-point class strings in reverse order (or build a reversed temporary
buffer) so that class contents honor backward lookbehind; ensure this change
preserves the original OriginalStart/Length bookkeeping used by
CompileAlternative/EmitBodyAt.
In `@tests/built-ins/RegExp/prototype/exec.js`:
- Around line 446-457: The test is meant to verify sticky-mode behavior but the
RegExp assigned to the variable sticky uses the global flag (/g) instead of the
sticky flag; update the regex literal used in the test (the variable sticky in
the test "lookbehind works with variable-length and sticky matches") to use /y
(sticky) rather than /g so exec will honor lastIndex anchoring and properly test
sticky matches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 28835f3d-d162-4056-8586-486001a74de0
📒 Files selected for processing (3)
source/units/Goccia.RegExp.Compiler.passource/units/Goccia.RegExp.VM.pastests/built-ins/RegExp/prototype/exec.js
- Emit unicode set string atoms backward inside lookbehind bodies. - Exercise sticky lookbehind with the sticky flag and explicit lastIndex.
Summary
Testing
Commands run:
./build.pas loadertest/built-ins/RegExp/lookBehind/*.jswith Goccia compatibility flags./build.pas testrunner./build/GocciaTestRunner tests/built-ins/RegExp/prototype/exec.js --asi --unsafe-ffi --no-progress./build/GocciaTestRunner tests/built-ins/RegExp/prototype/exec.js --asi --unsafe-ffi --mode=bytecode --no-progress./build/GocciaTestRunner tests/built-ins/RegExp --asi --unsafe-ffi --no-progress./format.pas --check./fixtures/ffi/build.sh./build/GocciaTestRunner tests --asi --unsafe-ffi --no-progress