Fix parsing unicode on v1.0 #2074

elia · 2020-02-01T17:30:15Z

backport of #2073

JavaScript works with UTF-16/UCS-2 and the strings we're passing to
the lexer ar not UTF-8, so the lexer tries to unpack them as 8-bit
chars, messing up the source ranges in which to look for pieces of
code and ultimately shifting all source lookups after having
encountered a unicode string.

E.g.:

  # the string '5' is reported at index 10 when using C*, 6 with U*
  "123️⃣45".unpack('C*').index('5'.unpack('C*').first) # => 10
  "123️⃣45".unpack('U*').index('5'.unpack('U*').first) # => 6

JavaScript works with UTF-16/UCS-2 and the strings we're passing to the lexer ar not UTF-8, so the lexer tries to unpack them as 8-bit chars, messing up the source ranges in which to look for pieces of code and ultimately shifting all source lookups after having encountered a unicode string. E.g.: # the string '5' is reported at index 10 when using C*, 6 with U* "123️⃣45".unpack('C*').index('5'.unpack('C*').first) # => 10 "123️⃣45".unpack('U*').index('5'.unpack('U*').first) # => 6

elia added bug parser encoding eval labels Feb 1, 2020

elia added this to the v1.0 milestone Feb 1, 2020

elia self-assigned this Feb 1, 2020

elia merged commit 6ea8f2e into 1-0-stable Feb 1, 2020

elia deleted the elia/fix-parsing-unicode-v1.0 branch February 1, 2020 18:05

This was referenced Feb 1, 2020

opal-parser of Opal 1.0 fails to parse non-ascii comments #2066

Closed

https://opalrb.com/try/ uses Opal 0.11.4 ? opal/opal.github.io#56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing unicode on v1.0 #2074

Fix parsing unicode on v1.0 #2074

elia commented Feb 1, 2020 •

edited

Loading

Fix parsing unicode on v1.0 #2074

Fix parsing unicode on v1.0 #2074

Conversation

elia commented Feb 1, 2020 • edited Loading

elia commented Feb 1, 2020 •

edited

Loading