Add atob and btoa global functions#256
Conversation
- Implement WHATWG base64 encoding and forgiving decode - Add InvalidCharacterError DOMException handling - Cover both globals with built-in tests and docs
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds global functions btoa(data) and atob(data) with full Base64 encode/decode semantics: btoa enforces Latin‑1 input (throws InvalidCharacterError for code points > U+00FF); atob implements forgiving decoding (strips ASCII whitespace, tolerates missing padding, validates alphabet and length). Documentation, tests, benchmarks, and DOMException-style error helpers added. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/built-ins/btoa.js`:
- Around line 51-56: The test "coerces non-string arguments to string" currently
passes only string literals; update the expectations to actually pass non-string
values to btoa (e.g., call btoa(123), btoa(true), btoa(null), btoa(undefined))
while keeping the same expected Base64 outputs so the test verifies ToString
coercion in the btoa implementation; change the four expect(...) calls to use
the non-string inputs (123, true, null, undefined) while still asserting "MTIz",
"dHJ1ZQ==", "bnVsbA==", and "dW5kZWZpbmVk" respectively.
- Around line 38-49: Update the "throws InvalidCharacterError for characters >
U+00FF" test to assert the exact DOMException contract instead of just that
something threw: invoke btoa for each case inside a try/catch and in the catch
assert that error.name === "InvalidCharacterError" and error.code === 5 (keep
the same inputs: String.fromCharCode(0x0100), "中", and "abc" +
String.fromCharCode(0x100)); leave the "throws TypeError when called with no
arguments" test unchanged.
In `@units/Goccia.Builtins.Globals.pas`:
- Around line 713-763: The code in AtobCallback builds Cleaned and ResultStr by
repeated string concatenation causing O(n^2) allocations; instead preallocate
and fill buffers by index: for Cleaned, SetLength to Length(Data), iterate and
write accepted chars into Cleaned[Pos] (track Pos) then SetLength(Cleaned, Pos);
for ResultStr, compute max output size (Length(Decoded) * 2),
SetLength(ResultStr, MaxLen), write bytes/UTF-8 bytes into ResultStr by index
(tracking OutPos) and finally SetLength(ResultStr, OutPos); keep using
DecodeStringBase64, IsBase64Char and ThrowInvalidCharacterError unchanged, just
replace the concatenations with indexed writes to avoid quadratic behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: cd7f1050-a935-4541-8258-7f1840a1946e
📒 Files selected for processing (6)
docs/built-ins.mdtests/built-ins/atob.jstests/built-ins/btoa.jsunits/Goccia.Builtins.Globals.pasunits/Goccia.Constants.ErrorNames.pasunits/Goccia.Values.ErrorHelper.pas
Benchmark Results291 benchmarks Interpreted: 🟢 6 improved · 🔴 242 regressed · 🆕 10 new · 33 unchanged · avg -6.1% arraybuffer.js — Interp: 🟢 1, 🔴 11, 2 unch. · avg -7.0% · Bytecode: 🟢 7, 🔴 1, 6 unch. · avg +2.5%
arrays.js — Interp: 🔴 19 · avg -8.1% · Bytecode: 🟢 8, 🔴 3, 8 unch. · avg +1.1%
async-await.js — Interp: 🔴 6 · avg -10.5% · Bytecode: 🟢 4, 🔴 1, 1 unch. · avg +2.2%
base64.js — Interp: 🆕 10 · Bytecode: 🆕 10
classes.js — Interp: 🟢 1, 🔴 28, 2 unch. · avg -6.5% · Bytecode: 🟢 1, 🔴 3, 27 unch. · avg -0.3%
closures.js — Interp: 🔴 11 · avg -9.0% · Bytecode: 🟢 3, 8 unch. · avg +0.7%
collections.js — Interp: 🔴 11, 1 unch. · avg -8.3% · Bytecode: 🟢 8, 🔴 1, 3 unch. · avg +1.8%
destructuring.js — Interp: 🔴 17, 5 unch. · avg -4.2% · Bytecode: 🟢 11, 🔴 2, 9 unch. · avg +2.0%
fibonacci.js — Interp: 🔴 8 · avg -7.8% · Bytecode: 🔴 3, 5 unch. · avg -0.1%
for-of.js — Interp: 🔴 7 · avg -3.7% · Bytecode: 🟢 1, 🔴 2, 4 unch. · avg -1.1%
helpers/bench-module.js — Interp: 0 · Bytecode: 0
iterators.js — Interp: 🔴 24, 3 unch. · avg -6.1% · Bytecode: 🟢 9, 🔴 7, 11 unch. · avg +0.5%
json.js — Interp: 🔴 19, 1 unch. · avg -8.1% · Bytecode: 🟢 4, 🔴 3, 13 unch. · avg +0.2%
jsx.jsx — Interp: 🔴 11, 10 unch. · avg -2.1% · Bytecode: 🟢 4, 🔴 4, 13 unch. · avg -0.5%
modules.js — Interp: 🔴 8, 1 unch. · avg -5.0% · Bytecode: 🟢 2, 🔴 7 · avg -0.9%
numbers.js — Interp: 🔴 10, 1 unch. · avg -7.6% · Bytecode: 🟢 5, 🔴 1, 5 unch. · avg +6.0%
objects.js — Interp: 🟢 1, 🔴 5, 1 unch. · avg -3.1% · Bytecode: 🟢 3, 🔴 2, 2 unch. · avg -0.1%
promises.js — Interp: 🔴 11, 1 unch. · avg -5.6% · Bytecode: 🟢 6, 6 unch. · avg +1.9%
regexp.js — Interp: 🔴 11 · avg -5.6% · Bytecode: 🟢 2, 🔴 2, 7 unch. · avg -0.3%
strings.js — Interp: 🔴 11 · avg -7.5% · Bytecode: 🟢 5, 🔴 1, 5 unch. · avg +1.1%
typed-arrays.js — Interp: 🟢 3, 🔴 14, 5 unch. · avg -3.4% · Bytecode: 🟢 6, 🔴 5, 11 unch. · avg +1.3%
Measured on ubuntu-latest x64. Benchmark ranges compare cached main-branch min/max ops/sec with the PR run; overlapping ranges are treated as unchanged noise. Percentage deltas are secondary context. |
Suite Timing
Measured on ubuntu-latest x64. |
- Speed up whitespace stripping and Latin-1 decoding in `atob` - Strengthen `btoa` tests for invalid input and coercion - Add base64 benchmarks for `atob`/`btoa` round trips
There was a problem hiding this comment.
🧹 Nitpick comments (1)
units/Goccia.Builtins.Globals.pas (1)
607-652: Malformed UTF-8 input could yield incorrect code points.The helper doesn't validate that continuation bytes actually have the
10xxxxxxpattern, and when a sequence is truncated, it produces a partial result without signaling an error. Forbtoa's Latin-1 check this is likely benign (input comes fromToStringLiteral.Valuewhich should be valid UTF-8), but malformed input could theoretically pass the> $FFcheck incorrectly.Consider adding continuation-byte validation for robustness:
🛡️ Optional hardening
else if (B and $E0) = $C0 then begin Result := (B and $1F) shl 6; Inc(AIndex); - if AIndex <= Length(AUTF8String) then - Result := Result or (Ord(AUTF8String[AIndex]) and $3F); + if (AIndex <= Length(AUTF8String)) and ((Ord(AUTF8String[AIndex]) and $C0) = $80) then + Result := Result or (Ord(AUTF8String[AIndex]) and $3F) + else + Result := $FFFD; // Replacement character signals invalid sequence Inc(AIndex); end🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@units/Goccia.Builtins.Globals.pas` around lines 607 - 652, NextUTF8CodePoint currently assumes continuation bytes are valid and may produce incorrect code points for malformed or truncated UTF-8; update the function to validate each continuation byte before using it by checking AIndex <= Length(AUTF8String) and that (Ord(AUTF8String[AIndex]) and $C0) = $80 for each continuation, and if a check fails return the Unicode replacement character (U+FFFD / $FFFD) and advance by one (or otherwise skip the invalid byte) so malformed sequences do not produce partial/incorrect code points.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@units/Goccia.Builtins.Globals.pas`:
- Around line 607-652: NextUTF8CodePoint currently assumes continuation bytes
are valid and may produce incorrect code points for malformed or truncated
UTF-8; update the function to validate each continuation byte before using it by
checking AIndex <= Length(AUTF8String) and that (Ord(AUTF8String[AIndex]) and
$C0) = $80 for each continuation, and if a check fails return the Unicode
replacement character (U+FFFD / $FFFD) and advance by one (or otherwise skip the
invalid byte) so malformed sequences do not produce partial/incorrect code
points.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bbe920a1-e4aa-4b68-9745-c30565bdb430
📒 Files selected for processing (3)
benchmarks/base64.jstests/built-ins/btoa.jsunits/Goccia.Builtins.Globals.pas
- Make UTF-8 decoding return U+FFFD for truncated or invalid sequences - Tighten continuation-byte validation in the globals builtin decoder
Summary
atobandbtoabuilt-ins with WHATWG-style base64 behavior.InvalidCharacterErrorDOMException support for invalid Latin-1 and base64 input.docs/built-ins.md.Testing
./build.pas testrunner && ./build/TestRunner testsatobandbtoa