Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding of String.name is ASCII-8BIT #5208

Closed
wezm opened this Issue Jun 5, 2018 · 4 comments

Comments

Projects
None yet
3 participants
@wezm
Copy link

wezm commented Jun 5, 2018

Environment

  • JRuby version: jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 OpenJDK 64-Bit Server VM 25.172-b11 on 1.8.0_172-b11 +indy +jit [linux-x86_64]
  • Operating system and platform: Arch Linux Linux wes-thinkpad 4.16.13-1-ARCH #1 SMP PREEMPT Thu May 31 23:29:29 UTC 2018 x86_64 GNU/Linux

Expected Behavior

This is more of an observed difference between MRI Ruby and jruby 9.2.0.0, which I'm not sure is incorrect but the difference tripped up some code in our code base, so I thought I'd point it out. For the purposes of this report the expected behaviour is to be the same as MRI but I understand the difference might be allowed.

Expected behaviour is encoding of string returned by name on a class to be encoded the same as MRI (2.5.1, 2.6.0-preview2):

String.name.encoding
=> #<Encoding:US-ASCII>

class Hèllo; end
=> nil
irb(main):003:0> Hèllo.name.encoding
=> #<Encoding:UTF-8>

Actual Behavior

String.name.encoding
=> #<Encoding:ASCII-8BIT>

class Hèllo; end
=> nil
irb(main):002:0> Hèllo.name.encoding
=> #<Encoding:UTF-8>

@wezm wezm changed the title Encoding of Class.name is ASCII-8BIT Encoding of String.name is ASCII-8BIT Jun 5, 2018

@enebo enebo added this to the JRuby 9.2.1.0 milestone Jun 5, 2018

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Jun 5, 2018

Ah a regression from fixing all our encoding issues. Ironic. when we register class names we have no eager symbol and it bubbles down to calculateRubyName which calls runtime.newString() which will set encoding to ASCII-8BIT (because default ByteList constructor assumes this encoding).

Our solution will be to be a bit smarter about this encoding because any class which is 7bit clean regardless of encoding specified should be US-ASCII (this is a more general rule of symbols but it will behave the same for class names).

@enebo

This comment has been minimized.

Copy link
Member

enebo commented Jun 5, 2018

I should not really say regression here either since 9.1 reports UTF-8 for String and it should be US-ASCII. UTF-8 is much more preferable to ASCII-8BIT, but still wrong.

@enebo enebo closed this in 26bee8c Jun 11, 2018

@wezm

This comment has been minimized.

Copy link
Author

wezm commented Sep 6, 2018

Hi folks, would love to get this fix into a release so I can continue upgrading our project to work with JRuby >= 9.2. Any chance there will be a new release with this fix soon?

@headius

This comment has been minimized.

Copy link
Member

headius commented Sep 6, 2018

@wezm There's a chance! 😁

Seriously though...we know it's been a while since last release. We'll be putting out a 9.2.1 soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.