Reject unescaped line terminators in string literals#623
Conversation
The lexer now raises SyntaxError when a raw LF, CR, LS (U+2028), or PS (U+2029) appears inside a single- or double-quoted string literal. LineContinuation (backslash before line terminator) remains valid. Also fixes TGocciaLexerError not being recognized as SyntaxError by try/catch in user code and the toThrow() test matcher. Closes #619 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR enforces ES2026 compliance by rejecting unescaped line terminators (LF, CR) inside single- and double-quoted string literals. The lexer now raises ChangesString Literal Line Terminator Validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Suite TimingTest Runner (interpreted: 9,215 passed; bytecode: 9,215 passed)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Test runner worker shutdown frees thread-local heaps in bulk; that shutdown reclamation is not counted as GC collections or collected objects.
Benchmarks (interpreted: 407; bytecode: 407)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Benchmark runner performs explicit between-file collections, so collection and collected-object counts can be much higher than the test runner.
Measured on ubuntu-latest x64. |
Benchmark Results407 benchmarks Interpreted: 🟢 34 improved · 🔴 179 regressed · 194 unchanged · avg -1.9% arraybuffer.js — Interp: 🟢 1, 🔴 2, 11 unch. · avg -0.8% · Bytecode: 🟢 4, 🔴 2, 8 unch. · avg +1.3%
arrays.js — Interp: 🔴 13, 6 unch. · avg -3.0% · Bytecode: 🟢 15, 4 unch. · avg +5.0%
async-await.js — Interp: 🟢 1, 🔴 3, 2 unch. · avg -0.8% · Bytecode: 🟢 5, 1 unch. · avg +5.2%
async-generators.js — Interp: 🔴 1, 1 unch. · avg -3.2% · Bytecode: 🟢 1, 1 unch. · avg +5.7%
base64.js — Interp: 🟢 7, 🔴 3 · avg +4.0% · Bytecode: 🟢 9, 1 unch. · avg +4.3%
classes.js — Interp: 🔴 16, 15 unch. · avg -2.9% · Bytecode: 🟢 2, 🔴 8, 21 unch. · avg -0.6%
closures.js — Interp: 🔴 3, 8 unch. · avg -2.6% · Bytecode: 🟢 3, 8 unch. · avg +2.0%
collections.js — Interp: 🔴 6, 6 unch. · avg -3.7% · Bytecode: 🟢 9, 3 unch. · avg +7.6%
csv.js — Interp: 🔴 11, 2 unch. · avg -6.3% · Bytecode: 🟢 2, 🔴 10, 1 unch. · avg -3.5%
destructuring.js — Interp: 🔴 9, 13 unch. · avg -2.5% · Bytecode: 🟢 15, 7 unch. · avg +4.6%
fibonacci.js — Interp: 🔴 7, 1 unch. · avg -4.8% · Bytecode: 🟢 4, 4 unch. · avg +4.0%
float16array.js — Interp: 🟢 6, 🔴 6, 20 unch. · avg +0.8% · Bytecode: 🟢 27, 🔴 4, 1 unch. · avg +7.1%
for-of.js — Interp: 7 unch. · avg -0.2% · Bytecode: 🟢 6, 1 unch. · avg +5.0%
generators.js — Interp: 🔴 1, 3 unch. · avg +0.1% · Bytecode: 4 unch. · avg +0.9%
iterators.js — Interp: 🟢 1, 🔴 20, 21 unch. · avg -4.1% · Bytecode: 🟢 27, 15 unch. · avg +3.9%
json.js — Interp: 🔴 17, 3 unch. · avg -8.0% · Bytecode: 🟢 3, 🔴 5, 12 unch. · avg -0.7%
jsx.jsx — Interp: 🔴 7, 14 unch. · avg -1.8% · Bytecode: 🔴 17, 4 unch. · avg -4.6%
modules.js — Interp: 🔴 3, 6 unch. · avg -5.4% · Bytecode: 🔴 7, 2 unch. · avg -5.4%
numbers.js — Interp: 🔴 2, 9 unch. · avg -2.1% · Bytecode: 🟢 5, 🔴 1, 5 unch. · avg +1.2%
objects.js — Interp: 🔴 2, 5 unch. · avg -2.4% · Bytecode: 🔴 5, 2 unch. · avg -6.4%
promises.js — Interp: 🟢 1, 🔴 4, 7 unch. · avg -0.9% · Bytecode: 🟢 3, 🔴 2, 7 unch. · avg +1.5%
regexp.js — Interp: 🟢 6, 🔴 1, 4 unch. · avg +4.2% · Bytecode: 🟢 6, 🔴 3, 2 unch. · avg -0.2%
strings.js — Interp: 🔴 9, 10 unch. · avg -3.1% · Bytecode: 🟢 4, 🔴 1, 14 unch. · avg +0.7%
tsv.js — Interp: 🔴 9 · avg -8.2% · Bytecode: 🔴 7, 2 unch. · avg -6.0%
typed-arrays.js — Interp: 🟢 2, 🔴 7, 13 unch. · avg +6.4% · Bytecode: 🟢 5, 🔴 11, 6 unch. · avg -0.1%
uint8array-encoding.js — Interp: 🟢 1, 🔴 16, 1 unch. · avg -24.9% · Bytecode: 🟢 4, 🔴 7, 7 unch. · avg -0.1%
weak-collections.js — Interp: 🟢 8, 🔴 1, 6 unch. · avg +29.6% · Bytecode: 🟢 10, 🔴 4, 1 unch. · avg -3.6%
Deterministic profile diffDeterministic profile diff: no significant changes. Measured on ubuntu-latest x64. Benchmark ranges compare cached main-branch min/max ops/sec with the PR run; overlapping ranges are treated as unchanged noise. Percentage deltas are secondary context. |
test262 Conformance
Areas closest to 100%
Per-test deltas (+527 / -0)Newly passing (527):
Steady-state failures are non-blocking; regressions vs the cached main baseline (lower total pass count, or any PASS → non-PASS transition) fail the conformance gate. Measured on ubuntu-latest x64, bytecode mode. Areas grouped by the first two test262 path components; minimum 25 attempted tests, areas already at 100% excluded. Δ vs main compares against the most recent cached |
Remove the JS test directory that used new Function() and add proper Pascal unit tests to Goccia.Lexer.Test.pas instead — covers all four line terminators (LF, CR, LS, PS) and verifies LineContinuation still produces the expected value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… tests ES2019 (proposal-json-superset) allows U+2028 and U+2029 in string literals. Only LF and CR remain forbidden. Corrects the lexer check from IsLineTerminator (which includes LS/PS) to Peek = #10 or #13. Moves tests from Pascal unit layer to CLI CI layer (scripts/test-cli-lexer.ts) — the correct home for lexer error verification. Adds positive LS/PS test to string-literals.js. Restores raw LS/PS bytes in RegExp/unicode.js that were incorrectly escaped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/test-cli-lexer.ts`:
- Around line 46-52: The test cases array named "cases" is missing the
single-quoted CRLF scenario; add an entry for the single-quoted CRLF rejection
(i.e., the string containing '\r\n' wrapped in single quotes) with the same
description "CRLF in single-quoted string" so both quote styles cover LF, CR,
and CRLF (update the const cases: [string, string][] to include the new
['\'hello\r\nworld\'', "CRLF in single-quoted string"] entry).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9330ac34-80fd-4fd7-a155-be6dae4249f8
📒 Files selected for processing (8)
scripts/test-cli-lexer.tssource/units/Goccia.Builtins.TestingLibrary.passource/units/Goccia.Error.Suggestions.passource/units/Goccia.Evaluator.passource/units/Goccia.Lexer.passource/units/Goccia.VM.pastests/built-ins/RegExp/unicode.jstests/language/expressions/string-literals.js
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
\n/\r.\before a line terminator) remains valid and produces the empty string, matching spec behavior.TGocciaLexerErrornot being recognized asSyntaxErrorby user-leveltry/catch(evaluator + VM) and thetoThrow(SyntaxError)test matcher — both previously fell through to genericError.tests/built-ins/RegExp/unicode.jsto use\nescapes instead of raw newline bytes in string literals.Testing