Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String#each_byte when the encoding is UTF-8 is incomplete #2138

Closed
ggrossetie opened this issue Dec 10, 2020 · 0 comments · Fixed by #2140
Closed

String#each_byte when the encoding is UTF-8 is incomplete #2138

ggrossetie opened this issue Dec 10, 2020 · 0 comments · Fixed by #2140

Comments

@ggrossetie
Copy link
Member

The current implementation seems wrong/incomplete because we are using the UTF-16LE encoding by default (see #2117).

Anyway, this is what we get today:

$ cat test.rb -p
p '👋'.bytes
Ruby 2.6.3 (MRI)
$ ruby -v
ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
$ ruby test.rb                 
[240, 159, 145, 139]
Opal v1.0.0 (f2d0d1adc)
$ bundle exec opal test.rb
[61, 216, 75, 220]

And this is what I get if I force the encoding to UTF-8:

$ cat test.rb -p
p '👋'.force_encoding('utf-8').bytes
Ruby 2.6.3 (MRI)
$ ruby test.rb
[240, 159, 145, 139]
Opal v1.0.0 (f2d0d1adc)
$ bundle exec opal test.rb  
URIError: URI malformed
  from encodeURIComponent (<anonymous>)
  from corelib/string/encoding.rb:81:1:in `$each_byte'
  from corelib/runtime.js:1729:5:in `Opal.send2'
  from corelib/runtime.js:1719:5:in `Opal.send'
  from corelib/string/encoding.rb:178:5:in `each_byte'
  from corelib/basic_object.rb:34:1:in `$__send__'
  from corelib/runtime.js:1729:5:in `Opal.send2'
  from corelib/runtime.js:1719:5:in `Opal.send'
  from corelib/enumerator.rb:48:5:in `__send__'
  from corelib/enumerable.rb:428:1:in `$entries'

I found the following implementation: https://github.com/feross/buffer/blob/f52dffd9df0445b93c0c9065c2f8f0f46b2c729a/index.js#L1954-L2032 which seems to be working fine.

We could also use the following implementation in a Node.js environment:

function utf8ToBytes(string) {
  const buff = Buffer.from(string) // default encoding is utf8
  Array.from(buff.values())
}
ggrossetie added a commit to ggrossetie/opal that referenced this issue Dec 12, 2020
ggrossetie added a commit to ggrossetie/opal that referenced this issue Dec 12, 2020
s-leroux pushed a commit to s-leroux/opal that referenced this issue May 24, 2021
s-leroux pushed a commit to s-leroux/opal that referenced this issue May 25, 2021
s-leroux pushed a commit to s-leroux/opal that referenced this issue May 26, 2021
s-leroux pushed a commit to s-leroux/opal that referenced this issue May 26, 2021
ggrossetie added a commit to ggrossetie/opal that referenced this issue Jul 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant