Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing unicode on v1.0 #2074

Merged
merged 1 commit into from
Feb 1, 2020
Merged

Conversation

elia
Copy link
Member

@elia elia commented Feb 1, 2020

backport of #2073

JavaScript works with UTF-16/UCS-2 and the strings we're passing to
the lexer ar not UTF-8, so the lexer tries to unpack them as 8-bit
chars, messing up the source ranges in which to look for pieces of
code and ultimately shifting all source lookups after having
encountered a unicode string.

E.g.:

  # the string '5' is reported at index 10 when using C*, 6 with U*
  "123️⃣45".unpack('C*').index('5'.unpack('C*').first) # => 10
  "123️⃣45".unpack('U*').index('5'.unpack('U*').first) # => 6

JavaScript works with UTF-16/UCS-2 and the strings we're passing to
the lexer ar not UTF-8, so the lexer tries to unpack them as 8-bit
chars, messing up the source ranges in which to look for pieces of
code and ultimately shifting all source lookups after having
encountered a unicode string.

E.g.:
  # the string '5' is reported at index 10 when using C*, 6 with U*
  "123️⃣45".unpack('C*').index('5'.unpack('C*').first) # => 10
  "123️⃣45".unpack('U*').index('5'.unpack('U*').first) # => 6
@elia elia added this to the v1.0 milestone Feb 1, 2020
@elia elia self-assigned this Feb 1, 2020
@elia elia merged commit 6ea8f2e into 1-0-stable Feb 1, 2020
@elia elia deleted the elia/fix-parsing-unicode-v1.0 branch February 1, 2020 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant