Skip to content

Commit

Permalink
feat: add parsing of surrogate pairs in string literals
Browse files Browse the repository at this point in the history
mend: related tests
  • Loading branch information
leviongit committed May 23, 2024
1 parent 3af474d commit 3fdd234
Show file tree
Hide file tree
Showing 6 changed files with 26 additions and 5 deletions.
21 changes: 21 additions & 0 deletions json.rb
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,27 @@ def __read_escape(str)
cp = cp * 0x10 + __readexpect_hexdigit
cp = cp * 0x10 + __readexpect_hexdigit

if (cp >= 0xd800 && cp <=0xdfff)
__failed("unexpected unpaired high surrogate") if cp >= 0xdc00

__expectb!(0x5c) && # \
__expectb!(0x75) || # u
__failed("expected second unicode escape in surrogate pair")

cp2 = __readexpect_hexdigit
cp2 = cp2 * 0x10 + __readexpect_hexdigit
cp2 = cp2 * 0x10 + __readexpect_hexdigit
cp2 = cp2 * 0x10 + __readexpect_hexdigit

__failed("low surrogate not in low surrogate range") unless cp2 >= 0xdc00 && cp2 <= 0xdfff

cp &= 0x3ff
cp2 &= 0x3ff
cp *= 0x400
cp += cp2
cp += 0x10000
end

str << [cp].pack(STRING_U) # is this faster than just doing the ~~mental~~ bit arithmetic? idk probably?
return true
when nil
Expand Down
2 changes: 1 addition & 1 deletion test/data/n_string_1_surrogate_then_escape.err
Original file line number Diff line number Diff line change
@@ -1 +1 @@

%q(Expected "u", but got "\"" at [1:10])
2 changes: 1 addition & 1 deletion test/data/n_string_1_surrogate_then_escape_u.err
Original file line number Diff line number Diff line change
@@ -1 +1 @@

%q(expected hex digit [0-9a-fA-F] got "\"" at [1:11])
2 changes: 1 addition & 1 deletion test/data/n_string_1_surrogate_then_escape_u1.err
Original file line number Diff line number Diff line change
@@ -1 +1 @@

%q(expected hex digit [0-9a-fA-F] got "\"" at [1:12])
2 changes: 1 addition & 1 deletion test/data/n_string_1_surrogate_then_escape_u1x.err
Original file line number Diff line number Diff line change
@@ -1 +1 @@

%q(expected hex digit [0-9a-fA-F] got "x" at [1:12])
2 changes: 1 addition & 1 deletion test/data/n_string_incomplete_surrogate_escape_invalid.err
Original file line number Diff line number Diff line change
@@ -1 +1 @@
%q(unexpected escape "x" at [1:16])
%q(low surrogate not in low surrogate range at [1:15])

1 comment on commit 3fdd234

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark

Benchmark suite Current: 3fdd234 Previous: 6643fc6 Ratio
Time to parse crimes.json 6605.283100000001 ms (± 64.48ms)

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.