Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII #517

Closed
coffeejunk opened this Issue Jan 29, 2013 · 4 comments

Projects

None yet

3 participants

@coffeejunk

When converting a Number to a String with the .to_s method, the encoding of the resulting String is US-ASCII/ASCII-8BIT. (This is also the behavior of mri and rubinius/rubinius#2136)

jruby-1.7.2 :001 > __ENCODING__
 => #<Encoding:UTF-8> 
jruby-1.7.2 :002 > Encoding.default_internal
 => nil 
jruby-1.7.2 :003 > Encoding.default_external
 => #<Encoding:UTF-8> 
jruby-1.7.2 :004 > "abc".encoding
 => #<Encoding:UTF-8> 
jruby-1.7.2 :005 > 1.to_s.encoding
 => #<Encoding:US-ASCII> 
jruby-1.7.2 :006 > 1.to_r.to_s.encoding
 => #<Encoding:ASCII-8BIT> 
jruby-1.7.2 :007 > 1.0.to_s.encoding
 => #<Encoding:US-ASCII> 
jruby-1.7.2 :008 > Encoding.default_internal = "UTF-8"
 => "UTF-8" 
jruby-1.7.2 :009 > 1.0.to_s.encoding
 => #<Encoding:US-ASCII> 
$ jruby -v
jruby 1.7.2 (1.9.3p327) 2013-01-04 302c706 on Java HotSpot(TM) 64-Bit Server VM 1.6.0_37-b06-434-11M3909 [darwin-x86_64]
$ uname -a
Darwin Mandallia.local 12.2.1 Darwin Kernel Version 12.2.1: Thu Oct 18 16:32:48 PDT 2012; root:xnu-2050.20.9~2/RELEASE_X86_64 x86_64
@enebo
Member
enebo commented Jan 29, 2013

Ah...I was going to tell you to open an issue on redmine for this, but you did already. We will wait and see what MRI decides then. You might also want to provide an example on the redmine bug showing why it is undesirable.

@BanzaiMan
Member

http://bugs.ruby-lang.org/issues/7752#note-4

On current policy, strings which always include only US-ASCII characters are US-ASCII.
If there is a practical issue, I may change the policy in the future.

Note that US-ASCII string is faster than UTF-8 on getting length or index access.

Looks like the ticket would be closed as NOTABUG. (At least, there is an explicitly stated policy about encoding here.)

One thing we take away from here, though, is that JRuby is not doing something right. In particular:

1.to_r.to_s.encoding #=> #<Encoding:ASCII-8BIT> in JRuby, #<Encoding:US-ASCII> in MRI
@BanzaiMan
Member

Fixed the issue above with e236a6a.

@enebo enebo closed this Feb 19, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment