JavaScript engines are using UTF-16 internally but String are UTF-8 encoded by default #2117
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reasoning
I've been able to enable 22 specs 🎉
Most notably the
String#bytesize
method is now working as expected.On the other hand, I had to disable 2 specs, one on
String#intern
and another one onString#to_sym
.But if you take a closer look, they were working by luck because the value of the
encoding
attribute on String primitive was UTF-16LE.In fact, the method
String#to_sym
was always returning a UTF-16LE Symbol even when the String was not a UTF-16LE encoded String.Implementation
The method
force_encoding
should only update theencoding
attribute but it should not modify how the string is encoded. As a result, the methodsbytesize
andeach_byte
should not rely on theencoding
attribute.I've introduced an "internal encoding" that will be updated when
encode
is called (but not whenforce_encoding
is called). Thebytesize
andeach_byte
now rely on the "internal encoding" instead of the "encoding" attribute./cc @mojavelinux