Support regex duplicate named capture groups#291
Conversation
frostney
commented
Apr 12, 2026
- Allow duplicate names across disjunction alternatives
- Resolve named backreferences and match groups correctly
- Add docs and RegExp coverage for the new behavior
- Allow duplicate names across disjunction alternatives - Resolve named backreferences and match groups correctly - Add docs and RegExp coverage for the new behavior
…5) labels Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds support for duplicate named capture groups across disjunction alternatives: parser tracks disjunction paths, backreferences resolve to the group from the participating alternative, duplicates within the same alternative throw an early error, and runtime builds the Changes
Sequence DiagramsequenceDiagram
participant Parser as Parser/Engine
participant AltStack as Disjunction Stack
participant Validator as Duplicate-name Validator
participant Preprocessor as Preprocess / Backref Resolver
participant Runtime as Runtime (exec)
participant Builder as BuildMatchArray
participant Result as Groups Object
Parser->>AltStack: Push/pop disjunction alternative indices
Parser->>Parser: Collect named groups with DisjunctionPath
Parser->>Validator: Check for same-branch duplicate names
Validator-->>Parser: Throw error or continue
Preprocessor->>Parser: ResolveNamedBackreference using DisjunctionPath
Runtime->>Builder: Execute regex and pass captures
Builder->>Result: Initialize all named keys to undefined (pass 1)
Builder->>Result: Assign matched values for participating groups (pass 2)
Result-->>Runtime: Return finalized groups object
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/built-ins/RegExp/prototype/exec.js (1)
137-147: AssertSyntaxErrorexplicitly in the negative cases.These tests are checking user-visible semantics, so
toThrow(SyntaxError)is a safer guard than a generic throw assertion.Suggested tightening
test("exec with duplicate names in same alternative throws SyntaxError", () => { expect(() => { new RegExp("(?<x>a)(?<x>b)"); - }).toThrow(); + }).toThrow(SyntaxError); }); test("exec with duplicate names in nested same alternative throws SyntaxError", () => { expect(() => { new RegExp("(?<x>a(?<x>b))"); - }).toThrow(); + }).toThrow(SyntaxError); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/built-ins/RegExp/prototype/exec.js` around lines 137 - 147, The tests use toThrow() with new RegExp("(?<x>...)") negative cases but should assert the exact error type; update both test blocks ("exec with duplicate names in same alternative throws SyntaxError" and "exec with duplicate names in nested same alternative throws SyntaxError") to call expect(...).toThrow(SyntaxError) instead of toThrow(), ensuring the RegExp constructor invocations (new RegExp("(?<x>a)(?<x>b)") and new RegExp("(?<x>a(?<x>b))")) are asserted to throw SyntaxError explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@units/Goccia.RegExp.Engine.pas`:
- Around line 191-203: Add the repo's spec-annotation comment immediately above
each helper implementation using the required format; for example, add a line
like "// ES2025 §22.2.1.1 PathsShareBranch(APathA, APathB)" directly above the
PathsShareBranch function, and do the same for the other helper functions in the
same block (the helpers spanning the subsequent implementations) using their
corresponding ES spec section numbers and SpecMethodName(specParams) signatures
so each FreePascal function is annotated per the coding guidelines and placed
immediately above the function body in the implementation section.
- Around line 245-276: The fallback that builds '(?:\1|\2|...)' from
ANamedGroups/AllIndices is wrong because it lets non-participating
backreferences match empty; remove the AllIndices/ANamedGroups fallback and
instead only emit the alternation built from CompatibleIndices when
CompatCount>0, and when CompatCount=0 emit a failing construct (e.g. a negative
lookaround or '(?!)') so the pattern cannot silently succeed — update the code
paths that set Result (the blocks that currently reference AllIndices,
ANamedGroups, CompatCount, CompatibleIndices and build Result) to implement this
change.
---
Nitpick comments:
In `@tests/built-ins/RegExp/prototype/exec.js`:
- Around line 137-147: The tests use toThrow() with new RegExp("(?<x>...)")
negative cases but should assert the exact error type; update both test blocks
("exec with duplicate names in same alternative throws SyntaxError" and "exec
with duplicate names in nested same alternative throws SyntaxError") to call
expect(...).toThrow(SyntaxError) instead of toThrow(), ensuring the RegExp
constructor invocations (new RegExp("(?<x>a)(?<x>b)") and new
RegExp("(?<x>a(?<x>b))")) are asserted to throw SyntaxError explicitly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3077d45e-68a0-4af6-8c5a-eb9acde487c1
📒 Files selected for processing (6)
docs/built-ins.mddocs/language-restrictions.mdtests/built-ins/RegExp/prototype/exec.jsunits/Goccia.RegExp.Engine.pasunits/Goccia.RegExp.Runtime.pasunits/Goccia.Spec.pas
| if CompatCount = 0 then | ||
| begin | ||
| // Backreference outside the disjunction — collect all groups with this name | ||
| SetLength(AllIndices, AllCount); | ||
| AllCount := 0; | ||
| for I := 0 to High(ANamedGroups) do | ||
| if ANamedGroups[I].Name = AName then | ||
| begin | ||
| AllIndices[AllCount] := ANamedGroups[I].Index; | ||
| Inc(AllCount); | ||
| end; | ||
| // Emit alternation: (?:\1|\2|...) — the participating group's backreference | ||
| // succeeds while non-participating ones either match empty or fail through | ||
| Result := '(?:'; | ||
| for I := 0 to AllCount - 1 do | ||
| begin | ||
| if I > 0 then | ||
| Result := Result + '|'; | ||
| Result := Result + '\' + IntToStr(AllIndices[I]); | ||
| end; | ||
| Result := Result + ')'; | ||
| Exit; | ||
| end; | ||
| // Multiple compatible groups — emit alternation of compatible ones | ||
| Result := '(?:'; | ||
| for I := 0 to CompatCount - 1 do | ||
| begin | ||
| if I > 0 then | ||
| Result := Result + '|'; | ||
| Result := Result + '\' + IntToStr(CompatibleIndices[I]); | ||
| end; | ||
| Result := Result + ')'; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the Free Pascal/Delphi RegExpr (TRegExpr) engine, does a numeric backreference like \1 fail or match an empty string when the referenced capture group did not participate in the current match?
💡 Result:
In the Free Pascal/Delphi TRegExpr engine, a numeric backreference like \1 matches an empty string when the referenced capture group did not participate in the current match. It does not fail the overall match.
Citations:
- 1: https://github.com/andgineer/TRegExpr/blob/master/src/regexpr.pas
- 2: https://github.com/andgineer/TRegExpr/blob/master/History.txt
- 3: https://regex.sorokin.engineer/tregexpr/
The (?:\1|\2|...) fallback is incorrect and will make patterns too permissive.
TRegExpr's numeric backreferences match an empty string when the referenced capture group did not participate in the current match — they do not fail. This means the current approach allows non-participating groups' backreferences to silently succeed via empty match as fallback alternatives, violating the semantics.
Example: For the pattern (?:(?<x>a)|(?<x>b))\k<x> on input a:
- First part matches
a, capturing group 1 - Second part emits
(?:\1|...)with\1for group 1 and\2for any other named group (if compiled that way) \1fails to match (no seconda), but\2matches empty (group 2 never participated)- The overall pattern incorrectly accepts
ainstead of requiringaaorbb
The fallback must not use unmatched backreferences as alternatives. Instead, only include references to groups that actually participated, or redesign the fallback to enforce that the backreference must match a non-empty capture.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@units/Goccia.RegExp.Engine.pas` around lines 245 - 276, The fallback that
builds '(?:\1|\2|...)' from ANamedGroups/AllIndices is wrong because it lets
non-participating backreferences match empty; remove the AllIndices/ANamedGroups
fallback and instead only emit the alternation built from CompatibleIndices when
CompatCount>0, and when CompatCount=0 emit a failing construct (e.g. a negative
lookaround or '(?!)') so the pattern cannot silently succeed — update the code
paths that set Result (the blocks that currently reference AllIndices,
ANamedGroups, CompatCount, CompatibleIndices and build Result) to implement this
change.
There was a problem hiding this comment.
The web query result is incorrect — TRegExpr fails (does not match empty) on backreferences to non-participating groups. Empirically verified:
// TRegExpr fails \1 when group 1 didn't participate
new RegExp("(a)|(b)\\1").exec("b") // → null
// Alternation correctly falls through to the participating group
new RegExp("(?:(a)|(b))(?:\\1|\\2)").exec("aa") // → "aa" ✓
new RegExp("(?:(a)|(b))(?:\\1|\\2)").exec("bb") // → "bb" ✓
new RegExp("(?:(a)|(b))(?:\\1|\\2)").exec("a") // → null ✓
new RegExp("(?:(a)|(b))(?:\\1|\\2)").exec("ab") // → null ✓
// Concatenation approach fails entirely (non-participating backref aborts match)
new RegExp("(?:(a)|(b))\\1\\2").exec("aa") // → null ✗
new RegExp("(?:(a)|(b))\\1\\2").exec("bb") // → null ✗The alternation (?:\1|\2) is the correct approach precisely because TRegExpr fails on non-participating backreferences — the | lets the engine skip the failed backref and try the next one. Added a clarifying comment in 3ac60d6 documenting this TRegExpr behavior, plus a test covering the outside-disjunction backreference case.
There was a problem hiding this comment.
@frostney, thank you for the empirical verification and the correction — my original review comment was wrong. The web search result I relied on was incorrect in its characterization of TRegExpr's behavior.
To confirm the key distinction your examples demonstrate:
- JavaScript's regex engine: backreferences to non-participating groups match an empty string (zero-length), so they silently succeed —
(?:\1|\2)would be problematic there. - TRegExpr: backreferences to non-participating groups fail, so
(?:\1|\2)is exactly correct — the|lets the engine skip the failed backreference and try the participating group's backreference next.
The (?:\1|\2|...) alternation strategy is the right design for this engine, and the clarifying comment you added in 3ac60d6 along with the outside-disjunction backreference test makes the rationale clear. I apologize for the noise.
✏️ Learnings added
Learnt from: frostney
Repo: frostney/GocciaScript PR: 291
File: units/Goccia.RegExp.Engine.pas:255-286
Timestamp: 2026-04-12T21:29:56.163Z
Learning: In frostney/GocciaScript (units/Goccia.RegExp.Engine.pas), TRegExpr (the Free Pascal regex engine used as the backend) **fails** — does not match empty — when a numeric backreference targets a non-participating capture group. This is the opposite of JavaScript's native regex engine behavior (which matches empty/zero-length for non-participating backreferences). Because TRegExpr fails on non-participating backreferences, the `(?:\N1|\N2|...)` alternation strategy used in `ResolveNamedBackreference` is correct for handling duplicate named capture groups: the engine skips the failing non-participating backreference via `|` and matches the participating group's backreference. Do NOT flag this alternation approach as incorrect or suggest replacing it with a failing construct like `(?!)` — the `|`-based alternation is the intended and empirically verified correct behavior for TRegExpr.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: frostney
Repo: frostney/GocciaScript PR: 172
File: units/Goccia.Compiler.Expressions.pas:169-178
Timestamp: 2026-04-03T14:07:41.850Z
Learning: In frostney/GocciaScript (units/Goccia.Compiler.Expressions.pas), `CompileRegexLiteral` intentionally uses `OP_GET_GLOBAL` to fetch the `RegExp` constructor and then `OP_CALL` to create the regex object. This is safe because the `RegExp` global binding is stable and not user-mutable in this runtime, so there is no behavioral divergence from the interpreter path (`TGocciaRegexLiteralExpression.Evaluate` → `CreateRegExpObject`). Do NOT flag this as a semantic divergence or suggest a dedicated intrinsic opcode for regex literal creation.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 278
File: units/Goccia.Parser.pas:1010-1023
Timestamp: 2026-04-12T09:44:48.683Z
Learning: In frostney/GocciaScript (units/Goccia.Parser.pas, SplitTemplateAtBoundaries and the original ParseTaggedTemplate), the brace-counting used to find the closing `}` of a template interpolation expression (`${...}`) is NOT lexically aware — it does not account for `{`/`}` inside string literals, comments, or nested templates within the expression. This is a known pre-existing limitation that predates this codebase area; it was present in the original `ParseTaggedTemplate` on `main` and is carried forward. Fixing it properly requires lexer-time segmentation (embedding boundary markers during `ScanTemplate` while escape/string context is available). Do NOT re-flag the naive brace counter in SplitTemplateAtBoundaries or ParseTaggedTemplate as a regression or new bug — it is intentionally deferred to a follow-up. The limitation exists because `ScanTemplate` stores the entire template (including `${...}` expressions) as a single flat token with no expression-level lexical analysis.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 222
File: scripts/test262_syntax_filter.py:0-0
Timestamp: 2026-04-08T15:47:44.282Z
Learning: In frostney/GocciaScript, `scripts/test262_syntax_filter.py` uses regex patterns to filter unsupported test262 constructs. The `loose_equality` pattern uses the conservative character-class approach (`[^!=<>]==[^=]`) which is accepted as intentional (pragmatic, non-blocking CI, negligible false-negative rate). The `loose_inequality` pattern was previously broken (`[^!]=!=[^=]` required a literal `=` before `!=`, so it never matched plain `a!=b`) and was fixed in commit 5a44c9b to `(?<!!)!=[^=]` — a negative lookbehind that correctly matches `!=` without matching `!==`. Do not suggest reverting the lookbehind-based `!=` pattern.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 267
File: units/Goccia.Lexer.pas:504-533
Timestamp: 2026-04-11T09:43:56.558Z
Learning: In frostney/GocciaScript (units/Goccia.Lexer.pas), the lexer intentionally throws on malformed escape sequences (\u{, \xG1, etc.) regardless of whether the template literal is tagged or untagged. Supporting cooked=undefined for tagged templates (per TC39 proposal-template-literal-revision / ES2018 Template Literal Revision) requires per-segment escape validity tracking and deferred error handling — this is explicitly deferred to a future PR. The raw string (RawSB) is always correctly captured for valid escapes. Do NOT re-flag missing cooked=undefined / malformed-escape tolerance in ScanTemplate or ProcessEscapeSequence as a bug in the current PR.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 258
File: units/Goccia.Lexer.pas:668-721
Timestamp: 2026-04-10T18:09:27.501Z
Learning: In frostney/GocciaScript (units/Goccia.Lexer.pas), the numeric literal scanner explicitly rejects `_` immediately after a decimal point (e.g., `1._5`) as a lex-time syntax error: the guard `(Peek = '.') and (PeekNext = '_')` raises TGocciaLexerError("Numeric separator must be between digits", SSuggestNumericSeparator) before the fraction-digit loop is entered. This matches ES2021 spec — `1._5` is NOT valid property access in this context; the lexer treats `.` as part of the numeric literal and rejects the adjacent underscore. The separate conservative guard `CharInSet(PeekNext, ['0'..'9'])` (which excludes `_`) is retained to allow `1._5` only after the explicit error check, but that branch is never reached when PeekNext is `_`.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 258
File: tests/language/expressions/numeric-separators.js:0-0
Timestamp: 2026-04-10T16:47:01.633Z
Learning: In frostney/GocciaScript (tests/language/expressions/numeric-separators.js and related lexer tests): Invalid numeric separator placement (trailing underscore, consecutive underscores, underscore adjacent to decimal point, underscore after radix prefix, underscore after exponent indicator) is rejected at lex time — before any JS token is produced. This means the JS test framework (TestRunner) cannot observe these parse failures from within a running test file. Negative parse-error cases for numeric separators must be verified externally via ScriptLoader/CLI invocation, not via `eval()`-based `expect(() => ...).toThrow()` patterns. Do not suggest eval-based negative tests for lex-time errors in this project.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 201
File: units/Goccia.AST.Expressions.pas:1339-1346
Timestamp: 2026-04-07T21:00:57.707Z
Learning: When reviewing GocciaScript coverage/branch tracking, do not require call-sites to explicitly pre-register binary (two-arm) branches before recording hits. In units/Goccia.Coverage.pas, TGocciaFileCoverage.RecordBranchHit automatically ensures the opposite arm exists via EnsureBranchExists(ALine, AColumn, 1 - ABranchIndex), establishing a zero-hit baseline for the paired arm whenever either arm is first recorded. Therefore, call-sites such as TGocciaConditionalExpression.Evaluate (Goccia.AST.Expressions.pas), EvaluateIf/EvaluateBinary/EvaluateSwitch (Goccia.Evaluator.pas), and OP_JUMP_IF_* handlers (Goccia.VM.pas) should not be flagged for missing a separate RegisterBranch call for binary branches—pairing is handled inside RecordBranchHit.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 201
File: units/Goccia.VM.pas:3547-3561
Timestamp: 2026-04-07T21:01:16.129Z
Learning: When reviewing calls that record branch coverage via `TGocciaFileCoverage.RecordBranchHit(ALine, AColumn, ABranchIndex)` for binary branches (branch indices 0 and 1), do not require an explicit prior `RegisterBranch(...)` at the call site. `RecordBranchHit` internally calls `EnsureBranchExists(ALine, AColumn, 1 - ABranchIndex)` for binary branches, which guarantees both arms exist in the branch list (the untaken arm is pre-registered with zero hits). Therefore VM/evaluator/AST branch recording sites in `Goccia.VM.pas`, `Goccia.Evaluator.pas`, and `Goccia.AST.Expressions.pas` should not be flagged for missing explicit pre-registration before `RecordBranchHit(...)`.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 192
File: tests/built-ins/Goccia/semver/satisfies.js:0-0
Timestamp: 2026-04-07T08:53:57.319Z
Learning: In frostney/GocciaScript, `Goccia.semver.satisfies` (and by extension the underlying semver implementation in `units/Goccia.Semver.pas`) returns `false` for malformed/invalid version inputs rather than throwing an error, matching node-semver semantics. When reviewing tests for `satisfies`, invalid-input assertions should use `toBe(false)`, not `toThrow`.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 51
File: units/Goccia.AST.Expressions.pas:1140-1157
Timestamp: 2026-03-08T03:22:02.868Z
Learning: In `units/Goccia.AST.Expressions.pas`, `TGocciaIncrementExpression.Evaluate`: the `TGocciaMemberExpression` branch only handles non-computed members (`PropertyName`). Computed member increment/decrement (e.g., `arr[i]++`) is not supported — `PropertyName` is empty for computed members, causing `GetProperty('')` to return `nil` and incorrectly triggering the error path. Tracked in GitHub issue `#52`. Do NOT flag this as a new bug in reviews; it is a known pre-existing limitation.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 53
File: souffle/Souffle.Wasm.Emitter.pas:0-0
Timestamp: 2026-03-08T04:50:29.540Z
Learning: In format.pas (GocciaScript formatter), ParseParamNames previously misidentified parenthesized arguments inside one-liner function bodies as parameter names, causing the `A` prefix to be prepended to them. The fix (f5b2f23) makes ParseParamNames ignore parentheses appearing after the first semicolon. Be cautious when reviewing one-liner procedure/function bodies in .pas files for incorrectly prefixed identifiers — this was the root cause of the AGC_ANY_CONVERT_EXTERN → AGC_ANY_CONVERT_EXTERN typos in souffle/Souffle.Wasm.Emitter.pas.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 66
File: units/ScopeMap.pas:0-0
Timestamp: 2026-03-17T22:18:02.668Z
Learning: Repo: frostney/GocciaScript — ScopeMap.pas (units/ScopeMap.pas) was deleted (commit d99cf96) after profiling showed TScopeMap's linear-scan IndexOf was ~2.7× slower than hash-based lookup (CreateBridgedContext samples rose from 24% to 51%). Scope bindings now use TOrderedStringMap<TLexicalBinding> (DJB2 hash, O(1) per level); scope chain walking is done via recursive TGocciaScope-level methods (Contains/GetLexicalBinding) rather than a parent-pointer chain inside the map. Do not flag TScopeMap or its Parent property in future reviews — the type no longer exists.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 107
File: units/Goccia.Runtime.Operations.pas:211-212
Timestamp: 2026-03-23T12:37:30.936Z
Learning: Repo: frostney/GocciaScript — units/Goccia.Runtime.Operations.pas (PR `#107`)
Learning: In TGocciaRuntimeOperations.MarkExternalRoots, entries in FModuleCache (cached module namespace/JSON records) must be traversed and their TSouffleValue references marked as external roots to survive GC. This behavior was added in commit 36f69cc; do not re‑flag missing FModuleCache rooting unless the logic regresses.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 107
File: units/ModuleResolver.pas:70-79
Timestamp: 2026-03-23T13:40:44.656Z
Learning: In frostney/GocciaScript, the alias prefix matching logic in `TModuleResolver.HasAlias` and `ApplyAliases` (units/ModuleResolver.pas) was moved verbatim from the original `Goccia.Modules.Resolver`. The partial-prefix behavior (e.g. `lib` can match `library/foo`) is intentionally preserved from the pre-existing implementation. Do not re-flag this as a bug unless a concrete functional regression is demonstrated; changing alias semantics is a separate concern tracked independently.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 147
File: units/Goccia.VM.pas:3042-3084
Timestamp: 2026-03-31T18:38:23.367Z
Learning: Repo: frostney/GocciaScript — units/Goccia.VM.pas (PR `#147`)
As of commit 457e712: (1) TGocciaVM.GetPropertyValue performs own-only private static accessor lookup on '#…' for TGocciaClassValue and throws TypeError on class-brand misses; (2) TGocciaVM.SetPropertyValue avoids inherited private accessor resolution, but still allows raw private static writes via SetRawPrivateValue to support current class initialization that emits OP_SET_PROP_CONST for private static fields. Planned follow-up is to add dedicated opcodes to define private static fields at class-declaration time and then make SetPropertyValue strict (throw on brand miss with no fallback). Do not re-flag the current fallback until those opcodes exist.
Learnt from: CR
Repo: frostney/GocciaScript PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-08T04:38:51.694Z
Learning: Applies to **/*.pas : Use split constant units for runtime constants instead of hardcoded string literals: `Goccia.Constants.PropertyNames` for property names, `Goccia.Constants.TypeNames` for type names, `Goccia.Constants.ErrorNames` for error names, `Goccia.Constants.ConstructorNames` for constructor names, and `Goccia.Constants.SymbolNames` for symbol names.
Learnt from: CR
Repo: frostney/GocciaScript PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-20T22:00:59.812Z
Learning: Applies to units/**/*.pas : `type`/`interface` declarations and `import type`/`export type` must produce `TGocciaEmptyStatement` (no-op at runtime). Access modifiers (`public`, `protected`, `private`, `readonly`, `override`, `abstract`) in class bodies must be consumed and discarded.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 286
File: units/Goccia.Spec.pas:30-51
Timestamp: 2026-04-12T15:33:50.823Z
Learning: In frostney/GocciaScript, `Proxy` is fully supported — it was implemented in PR `#226`. The `Proxy` entry in `ES2015_FEATURES` in `units/Goccia.Spec.pas` is correct and intentional. Do not flag `Proxy` as unimplemented in future reviews.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 110
File: units/Goccia.Runtime.Operations.pas:0-0
Timestamp: 2026-03-23T11:25:00.381Z
Learning: In this repo’s GocciaScript Pascal code, when implementing thenable detection/callability checks (e.g., around Await/unwrap logic), always read the thenable’s `then` property using `Goccia.Constants.PropertyNames.PROP_THEN` rather than a string literal. If the value being inspected is a `TSouffleRecord`, unwrap the retrieved `then` property via `UnwrapToGocciaValue` before testing whether it is callable (e.g., in `TGocciaRuntimeOperations.AwaitValue`).
Learnt from: frostney
Repo: frostney/GocciaScript PR: 276
File: tests/built-ins/Goccia/build.js:6-7
Timestamp: 2026-04-11T18:29:30.289Z
Learning: In frostney/GocciaScript, the pattern `const hasGoccia = typeof Goccia !== "undefined"` combined with `describe.runIf(hasGoccia)(...)` is intentional and used across the entire `tests/` suite. It allows test files to run safely in non-GocciaScript environments (e.g., Vitest compatibility checks) where GocciaScript-specific globals like `Goccia` are not present. Do NOT flag this conditional-skip guard as a regression risk or suggest making suites unconditional — the portability requirement is by design.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 53
File: souffle/Souffle.Wasm.Emitter.pas:1467-1477
Timestamp: 2026-03-09T17:21:38.193Z
Learning: In this codebase (Frostney/GocciaScript), for Pascal/Delphi (FPC) files using AnsiString with {$mode delphi} {H+}, treat String as byte-indexed. Length(str) returns byte count and Byte(str[I]) is a no-op cast. Do not replace per-byte copy loops like Data[I-1] := Byte(AStr[I]) with TEncoding.GetBytes, since TEncoding is a Delphi RTL feature unavailable in FPC's SysUtils and would break the build. Apply this guidance to all .pas files under the project; ensure manual per-byte copying logic remains correct for UTF-8 content encoded in AnsiString, and avoid introducing RTL dependencies not supported by FPC.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 58
File: units/Goccia.Runtime.Operations.pas:5959-6132
Timestamp: 2026-03-10T11:17:46.617Z
Learning: GocciaScript’s Promise statics are intentionally not species/subclass-aware. Interpreter-level implementations (PromiseAll, PromiseRace, PromiseAllSettled, PromiseAny, withResolvers, try, resolve, reject) construct plain TGocciaPromiseValue via WrapAsPromise and do not use the call receiver (AThisValue). The Souffle VM bridge mirrors this by not forwarding AReceiver and by constructing TGocciaPromiseValue directly where applicable. Do not flag missing receiver-based construction for Promise statics. This applies to units/Goccia.Runtime.Operations.pas and related Promise implementations; ensure consistency across related Pas files in the frostney/GocciaScript repository.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 63
File: units/Goccia.Runtime.Operations.pas:0-0
Timestamp: 2026-03-11T01:50:11.071Z
Learning: Replace hardcoded string literals 'now', 'toJSON', and 'timeOrigin' with the new constants from Goccia.Constants.PropertyNames (PROP_NOW, PROP_TO_JSON, PROP_TIME_ORIGIN). Apply this refactor across the repository wherever these literals are used, including but not limited to units/Goccia.Runtime.Operations.pas and units/Goccia.Builtins.Performance.pas. Use a project-wide search (e.g., regex or code navigation) to identify all occurrences, update to the respective constants, and adjust imports or uses accordingly. Ensure all references compile and behavior remains unchanged.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 63
File: units/Goccia.Runtime.Operations.pas:7514-7544
Timestamp: 2026-03-11T08:22:20.482Z
Learning: When bridging a native namespace object to a TSouffleRecord (e.g., performance) in units/Goccia.Runtime.Operations.pas, always set Result.Delegate := ARuntime.VM.RecordDelegate after TSouffleRecord.Create(...). This ensures Object.prototype methods (toString/hasOwnProperty/valueOf/toLocaleString) work in bytecode mode and NativeRecordToString honors Symbol.toStringTag. Apply this pattern to BuildPerformanceRecord and any future bridged records.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 66
File: units/HashMap.pas:266-309
Timestamp: 2026-03-17T22:17:37.410Z
Learning: Do not flag enumeration in THashMap<TKey, TValue> as concurrent-modification unsafe. The no-mutation contract is maintained during enumeration: GetEnumerator shares the internal FSlots array (no per-call allocation) and only mutation occurs via destructor or Clear, not during active iteration. This pattern should be recognized across similar Pascal units/pascal collection implementations; when reviewing code, ensure no concurrent-modification warnings are raised for shared internal structures during enumeration if the no-mutation contract is preserved.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 66
File: units/OrderedMap.pas:0-0
Timestamp: 2026-03-17T22:17:55.588Z
Learning: Enforce the constraint on TOrderedMap<TKey, TValue> in units/OrderedMap.pas: TKey must be an unmanaged, fixed-size value type (no strings, dynamic arrays, or records with managed/pointer fields). HashKey should use fast paths for SizeOf(TKey)=8 (QWord) and SizeOf(TKey)=4 (Cardinal); KeysEqual should compare via direct PQWord/PCardinal for those sizes. The raw-byte DJB2/CompareMem fallback is dead code for current uses. Document this constraint in a unit-header comment as the enforcement mechanism. Do NOT re-flag the byte-wise hash/equality in future reviews as long as all TOrderedMap specializations use pointer-sized unmanaged keys.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 66
File: units/ScopeMap.pas:0-0
Timestamp: 2026-03-17T22:18:00.984Z
Learning: Do not re-introduce TScopeMap. In frostney/GocciaScript, TScopeMap<TValue> was removed due to performance regressions from its linear-scan Add-as-upsert behavior. Reviews should flag any use of TScopeMap in Units/ScopeMap.pas or elsewhere and ensure bindings use TOrderedStringMap<TLexicalBinding> instead. If reviewing similar Pascal files, prefer avoiding upsert-like add semantics that overwrite existing entries, as they correlated with bytecode performance drops. Ensure changes adhere to the established replacement: TOrderedStringMap<TLexicalBinding>.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 66
File: units/BaseMap.pas:22-22
Timestamp: 2026-03-18T08:12:24.165Z
Learning: Shared/utility units under the repository (e.g., units/BaseMap.pas, HashMap.pas, OrderedMap.pas, etc.) that are used across multiple parts of the project should not be required to use the Goccia.* prefix. A separate naming convention will be defined in a follow-up issue. Do not flag missing Goccia.* prefix for these shared infrastructure units in reviews.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 66
File: units/BaseMap.pas:22-22
Timestamp: 2026-03-18T08:12:24.165Z
Learning: Do not flag missing Goccia.* prefixes for shared/utility units that are used across multiple parts of the project or can be extracted as standalone infrastructure (e.g., BaseMap.pas, HashMap.pas, OrderedMap.pas, OrderedStringMap.pas, GC, test assertion units). A separate naming convention (to be defined in issue `#104`) should apply to these targets. When reviewing, apply this guideline to all similar shared utility units under the units directory; for one-off implementations outside this scope, continue with the existing naming expectations.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 107
File: units/Goccia.Runtime.Operations.pas:4349-4353
Timestamp: 2026-03-23T12:38:54.412Z
Learning: In Pascal source under units/, when decoding JSON string escape sequences that represent Unicode code points and then converting them to UTF-8, prefer using the existing helper (implemented via CodePointToUTF8) that correctly combines UTF-16 surrogate pairs (\uD800–\uDBFF + \uDC00–\uDFFF) into a single Unicode code point before UTF-8 encoding. Avoid ad-hoc conversions like Char(StrToInt(...)) for code-point→UTF-8 conversion logic; use the helper instead to ensure surrogate pairs are handled correctly.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 109
File: units/Goccia.Runtime.Operations.pas:6337-6344
Timestamp: 2026-03-23T15:57:55.802Z
Learning: In Pascal unit files under units/, any iterator result record objects produced by native iterators (e.g., in routines analogous to NativeMapIteratorNext / NativeSetIteratorNext) must: (1) set `Rec.Delegate := FVM.RecordDelegate`; and (2) set properties using `PROP_VALUE` / `PROP_DONE` constants from `Goccia.Constants.PropertyNames` rather than hard-coded string literals. Apply the same pattern to any future native iterator result objects so they interoperate correctly in bytecode mode (e.g., for `Object.prototype` method behavior) and keep property-name usage consistent via the shared constants.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 136
File: docs/adding-built-in-types.md:105-133
Timestamp: 2026-03-27T14:09:25.371Z
Learning: In frostney/GocciaScript Pascal code, do not flag or suggest refactoring the early-exit singleton initialization guard `if Assigned(FShared) then Exit` when it appears at the top of `InitializePrototype`. The project documentation treats this early-exit form as the canonical/shared-prototype singleton initialization template (i.e., avoid recommending an `if not Assigned(FShared) then ...` wrapping alternative).
Learnt from: frostney
Repo: frostney/GocciaScript PR: 189
File: units/Goccia.JSON.pas:499-526
Timestamp: 2026-04-05T17:33:32.709Z
Learning: In frostney/GocciaScript, when using `TStringBuffer` from the `StringBuffer` unit, treat it as an advanced record (stack-allocated) with automatic cleanup by the compiler. For `TStringBuffer` locals, do not require or flag missing `SB.Free` calls in `try/finally` blocks—no destructor/free call is needed or appropriate. Only look for `Free` when the variable is actually a heap-allocated class instance.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 189
File: units/Goccia.JSON.pas:535-597
Timestamp: 2026-04-05T17:33:35.381Z
Learning: When reviewing Free Pascal code in this repo, treat `TStringBuffer` (from the `StringBuffer` unit) as an advanced record (value type), meaning instances are stack-allocated. Do not report or require `SB.Free` / `SB.FreeAndNil` (or any `Free` call) for variables typed as `TStringBuffer`, and do not treat an empty or no-op `finally` block after `TStringBuffer` usage as an issue. Only expect heap-management calls for actual `class`/reference types; calling `Free` on an advanced record is incorrect.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 200
File: REPL.dpr:0-0
Timestamp: 2026-04-07T17:12:07.324Z
Learning: When reviewing timing/metrics code in the GocciaScript repo (e.g., REPL.dpr, ScriptLoader, TestRunner, BenchmarkRunner, and TGocciaEngine.Execute), follow the established timing convention: capture `StartTime` before the JSX pre-pass (`TGocciaJSXTransformer.Transform`) and `StringListToLFText`, then capture `LexEnd` after `ScanTokens`. The `Lex` duration (`LexEnd - StartTime`) is intended to include the JSX pre-pass because it is treated as part of the source→tokens pipeline. Do not flag this ordering as a bug and do not recommend moving `StartTime` to after the JSX transform, as it would break the bucket semantics.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 209
File: units/Goccia.FFI.DynamicLibrary.pas:7-21
Timestamp: 2026-04-08T13:56:59.262Z
Learning: In Free Pascal (FPC), `TLibHandle` is defined in the `System` unit (`TLibHandle = PtrInt`), not in `Dynlibs`. Since `System` is implicitly available in all Pascal units, referencing `TLibHandle` in the interface section (e.g., as a private field type of a class declared in the interface/`interface` section) does not require `Dynlibs` to be listed in the interface `uses` clause. Only ensure `Dynlibs` is included in the implementation `uses` clause when you actually use `LoadLibrary`, `GetProcAddress`, or `FreeLibrary`. Therefore, do not flag a missing `Dynlibs` entry in the interface `uses` clause when the interface section references only `TLibHandle`.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 242
File: AGENTS.md:396-400
Timestamp: 2026-04-09T11:50:01.452Z
Learning: In .pas code, avoid unsafe Int64→Double conversions/workarounds that trigger FPC 3.2.2 miscompilations. Specifically, do not rely on `Double(Int64Var)` when it performs bit reinterpretation (Delphi-mode front-end bug on affected versions/platforms). Also do not use arithmetic patterns like `Int64Var * 1.0` / `Int64Var * Double` that can produce incorrect results near ±2^31 on AArch64. Prefer value-safe conversions via implicit assignment (e.g., `var D: Double; D := SomeInt64`) or implicit promotion through function/constructor parameter passing (e.g., `TGocciaNumberLiteralValue.Create(SomeInt64)`).
Learnt from: frostney
Repo: frostney/GocciaScript PR: 253
File: units/Goccia.Values.Iterator.Concat.pas:84-88
Timestamp: 2026-04-10T12:44:59.067Z
Learning: In frostney/GocciaScript (GocciaScript), `TGocciaFunctionBase(callable).Call(args, thisValue)` is the project-standard way to invoke callable values. During code review, do not flag `TGocciaFunctionBase(...)` casts at call sites as unsafe and do not recommend replacing them with an alternative “generic callable path”—for this codebase, this cast-and-`Call` pattern is the intended generic callable invocation mechanism.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 273
File: units/Goccia.Values.URLValue.pas:465-470
Timestamp: 2026-04-11T17:42:27.330Z
Learning: In this codebase, when implementing or updating a `TGCManagedObject.MarkReferences` override for a held reference, treat `MarkReferences` as the act of marking *and* tracing children. The expected pattern is: after a nil check, call the held reference’s `MarkReferences` implementation directly (e.g., `TGocciaSomeValue(FRef).MarkReferences`) rather than invoking any non-existent separate `Mark(obj)`/`Mark()` helper. Do not suggest “call `Mark(obj)`”; this project’s marking flow starts by setting `FGCMark := FCurrentMark` and then tracing child references from within `MarkReferences` (consistent with `TGocciaMapValue`, `TGocciaSetValue`, `TGocciaArrayValue`, etc.).
Learnt from: frostney
Repo: frostney/GocciaScript PR: 271
File: units/Goccia.Builtins.GlobalObject.pas:104-129
Timestamp: 2026-04-11T19:43:06.627Z
Learning: In frostney/GocciaScript, `TGarbageCollector.AddTempRoot`/`RemoveTempRoot` are set-membership (not reference-counted). In built-in method implementations, ONLY call `AddTempRoot`/`RemoveTempRoot` if the value was freshly boxed by `ToObject` (i.e., the argument was not already a `TGocciaObjectValue`). Do not unconditionally add/remove temp roots for values passed in as method arguments, since a single `RemoveTempRoot(Obj)` will remove the root for all holders and can cause use-after-free on the next GC cycle. Pattern: `if Assigned(TGarbageCollector.Instance) and not (AArgs.GetElement(0) is TGocciaObjectValue) then AddTempRoot(Obj); try ... finally if Assigned(TGarbageCollector.Instance) and not (AArgs.GetElement(0) is TGocciaObjectValue) then RemoveTempRoot(Obj); end;`
Learnt from: frostney
Repo: frostney/GocciaScript PR: 278
File: units/Goccia.Parser.pas:1071-1111
Timestamp: 2026-04-12T07:39:20.695Z
Learning: Do not free the result of `TGocciaLexer.ScanTokens` (e.g., in `ParseInterpolationExpression` and any other caller). `ScanTokens` returns a reference to the lexer's internal `FTokens: TObjectList<TGocciaToken>` created with `OwnsObjects=True` and freed by `TGocciaLexer`’s destructor. Adding `Tokens.Free` (or otherwise freeing the returned list/object) will cause double-free/access violations; token lifetime is managed entirely by the `TGocciaLexer` instance.
Learnt from: frostney
Repo: frostney/GocciaScript PR: 285
File: units/Goccia.Lexer.pas:207-213
Timestamp: 2026-04-12T15:46:28.683Z
Learning: In this Frostney/GocciaScript project (Free Pascal / FPC, including files under units/), do not recommend adding explicit `{$B-}` directives (either inside units or in `Goccia.inc`) to “fix” short-circuit evaluation safety. `{$B-}` is already FPC’s default in all compiler modes (`fpc`, `delphi`, `objfpc`). Treat missing `{$B-}` as expected and do not flag it as a maintainability or safety issue. Only note `{$B-}` overrides if an explicit `{$B+}` is present in the file or via a highly visible project-level `fpc.cfg` setting.
Suite Timing
Measured on ubuntu-latest x64. |
Benchmark Results364 benchmarks Interpreted: 🟢 38 improved · 🔴 251 regressed · 75 unchanged · avg -3.2% arraybuffer.js — Interp: 🟢 1, 🔴 13 · avg -7.1% · Bytecode: 🟢 5, 9 unch. · avg +1.0%
arrays.js — Interp: 🔴 19 · avg -7.1% · Bytecode: 🟢 6, 🔴 8, 5 unch. · avg +0.3%
async-await.js — Interp: 🔴 6 · avg -7.9% · Bytecode: 🟢 4, 2 unch. · avg +2.3%
base64.js — Interp: 🔴 9, 1 unch. · avg -7.9% · Bytecode: 🟢 8, 2 unch. · avg +2.0%
classes.js — Interp: 🟢 3, 🔴 24, 4 unch. · avg -4.3% · Bytecode: 🟢 11, 🔴 3, 17 unch. · avg +3.7%
closures.js — Interp: 🔴 11 · avg -4.9% · Bytecode: 🟢 6, 5 unch. · avg +3.6%
collections.js — Interp: 🟢 1, 🔴 9, 2 unch. · avg -5.6% · Bytecode: 🟢 7, 🔴 3, 2 unch. · avg +1.6%
destructuring.js — Interp: 🟢 6, 🔴 8, 8 unch. · avg -0.9% · Bytecode: 🟢 15, 7 unch. · avg +5.4%
fibonacci.js — Interp: 🔴 6, 2 unch. · avg -5.3% · Bytecode: 🟢 5, 🔴 1, 2 unch. · avg +1.7%
float16array.js — Interp: 🟢 9, 🔴 15, 8 unch. · avg +2.6% · Bytecode: 🟢 28, 4 unch. · avg +3.7%
for-of.js — Interp: 🔴 2, 5 unch. · avg -1.5% · Bytecode: 🟢 7 · avg +6.5%
helpers/bench-module.js — Interp: 0 · Bytecode: 0
iterators.js — Interp: 🟢 1, 🔴 23, 18 unch. · avg -2.2% · Bytecode: 🟢 35, 🔴 2, 5 unch. · avg +6.8%
json.js — Interp: 🔴 19, 1 unch. · avg -6.8% · Bytecode: 🟢 17, 🔴 1, 2 unch. · avg +4.0%
jsx.jsx — Interp: 🔴 13, 8 unch. · avg -1.7% · Bytecode: 🟢 17, 🔴 1, 3 unch. · avg +4.2%
modules.js — Interp: 🔴 9 · avg -5.0% · Bytecode: 🟢 9 · avg +3.4%
numbers.js — Interp: 🟢 1, 🔴 9, 1 unch. · avg -6.1% · Bytecode: 🟢 6, 🔴 1, 4 unch. · avg +2.9%
objects.js — Interp: 🟢 2, 🔴 4, 1 unch. · avg -0.4% · Bytecode: 🟢 6, 🔴 1 · avg +5.1%
promises.js — Interp: 🔴 9, 3 unch. · avg -3.9% · Bytecode: 🟢 12 · avg +6.5%
regexp.js — Interp: 🔴 10, 1 unch. · avg -6.5% · Bytecode: 🟢 6, 🔴 1, 4 unch. · avg +1.7%
strings.js — Interp: 🔴 16, 3 unch. · avg -3.8% · Bytecode: 🟢 13, 🔴 1, 5 unch. · avg +3.1%
typed-arrays.js — Interp: 🟢 8, 🔴 8, 6 unch. · avg -0.8% · Bytecode: 🟢 11, 🔴 3, 8 unch. · avg +3.3%
uint8array-encoding.js — Interp: 🟢 6, 🔴 9, 3 unch. · avg +0.6% · Bytecode: 🟢 7, 🔴 8, 3 unch. · avg -0.0%
Measured on ubuntu-latest x64. Benchmark ranges compare cached main-branch min/max ops/sec with the PR run; overlapping ranges are treated as unchanged noise. Percentage deltas are secondary context. |
Add ES2025 spec section references to PathsShareBranch and ResolveNamedBackreference per the project's annotation conventions. Document why the alternation fallback (?:\1|\2) is correct: TRegExpr fails (rather than matching empty) on backreferences to non-participating groups, so the alternation falls through to the participating group's backreference. Add a test exercising \k<name> outside the disjunction containing the duplicate groups to cover this path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>