Hash has bug to set encoding into key string wrongly when the key string is used once with different encoding #3405

Closed
tagomoris opened this Issue Oct 21, 2015 · 7 comments

Comments

Projects
None yet
3 participants
@tagomoris

On JRuby 9.0.1.0, Hash sets wrong encoding only when the key string is used as Hash key with different encoding.

hash1 = {}
hash1['str'.force_encoding('ASCII-8BIT')] = 1
p hash1.keys.first.encoding # ASCII-8BIT

hash2 = {}
hash2['str'.force_encoding('UTF-8')] = 1
p hash2.keys.first.encoding # ASCII-8BIT !? (expected: UTF-8)

Script to show situations to reproduce bugs:

# encoding: ascii-8bit

str = 'hello'
obj1 = {'hello' => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'hello'.force_encoding('UTF-8')
obj1 = {'hello'.force_encoding('UTF-8') => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'hello'.force_encoding('UTF-8')
obj1 = {}
obj1[str] = 1
obj2 = {}
obj2[str] = 2
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'world'.force_encoding('UTF-8')
obj1 = {'world'.force_encoding('UTF-8') => 1}
obj2 = {str => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

str = 'unused'
obj1 = {'unused'.force_encoding('UTF-8') => 1}
obj2 = {str.force_encoding('UTF-8') => 2}
p({str: str, enc: str.encoding, k1enc: obj1.keys.first.encoding, k2enc: obj2.keys.first.encoding})

Expected result is:

  • First line: all keys are encoded in ASCII-8BIT
  • Rest: all keys are encoded in UTF-8

Actual result in JRuby 9.0.1.0 is:

  • First line: all keys are encoded in ASCII-8BIT
  • All hello as hash keys are encoded in ASCII-8BIT
@tagomoris

This comment has been minimized.

Show comment
Hide comment
@tagomoris

tagomoris Oct 21, 2015

JRuby 1.7.x and MRI works as expected, and only JRuby 9.0.1.0 works in wrong way.

All outputs from script above in JRuby 9.0.1.0, JRuby 1.7.22 and MRI 2.2.2:

# $ ruby -v ; ruby ~/encoding_test.rb
# jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.31-b07 on 1.8.0_31-b13 +jit [darwin-x86_64]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

# $ ruby -v ; ruby ~/encoding_test.rb
# jruby 1.7.22 (1.9.3p551) 2015-08-20 c28f492 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 +jit [darwin-x86_64]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

# $ ruby -v ; ruby ~/encoding_test.rb
# ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

JRuby 1.7.x and MRI works as expected, and only JRuby 9.0.1.0 works in wrong way.

All outputs from script above in JRuby 9.0.1.0, JRuby 1.7.22 and MRI 2.2.2:

# $ ruby -v ; ruby ~/encoding_test.rb
# jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.31-b07 on 1.8.0_31-b13 +jit [darwin-x86_64]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

# $ ruby -v ; ruby ~/encoding_test.rb
# jruby 1.7.22 (1.9.3p551) 2015-08-20 c28f492 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 +jit [darwin-x86_64]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

# $ ruby -v ; ruby ~/encoding_test.rb
# ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
# {:str=>"hello", :enc=>#<Encoding:ASCII-8BIT>, :k1enc=>#<Encoding:ASCII-8BIT>, :k2enc=>#<Encoding:ASCII-8BIT>}
# {:str=>"hello", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"world", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}
# {:str=>"unused", :enc=>#<Encoding:UTF-8>, :k1enc=>#<Encoding:UTF-8>, :k2enc=>#<Encoding:UTF-8>}

@enebo enebo added this to the JRuby 9.0.4.0 milestone Oct 21, 2015

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 23, 2015

Member

Huh, wacky.

Member

headius commented Oct 23, 2015

Huh, wacky.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Oct 23, 2015

Member

Ok, I think this is due to our Hash logic using the new "string".freeze optimization for String keys. They are getting deduplicated, but improperly not considering encoding in the dedup store.

Options:

  • Don't dedup keys except for literal hashes with literal string keys.
  • Fix dedup to consider encoding as well.
Member

headius commented Oct 23, 2015

Ok, I think this is due to our Hash logic using the new "string".freeze optimization for String keys. They are getting deduplicated, but improperly not considering encoding in the dedup store.

Options:

  • Don't dedup keys except for literal hashes with literal string keys.
  • Fix dedup to consider encoding as well.

@enebo enebo modified the milestones: JRuby 9.0.5.0, JRuby 9.0.4.0 Nov 12, 2015

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 19, 2015

Member

I will fix this today.

Member

headius commented Nov 19, 2015

I will fix this today.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 19, 2015

Member

I have chosen the second option for now. The proper way for us to dedup hash keys is to only do it for literal hashes with literal string keys, but that will require a bit more work in IR compilation. The short term fix of only using the cached string if the encoding matches will work for now (and may be necessary anyway).

Member

headius commented Nov 19, 2015

I have chosen the second option for now. The proper way for us to dedup hash keys is to only do it for literal hashes with literal string keys, but that will require a bit more work in IR compilation. The short term fix of only using the cached string if the encoding matches will work for now (and may be necessary anyway).

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 19, 2015

Member

I'm not sure the best way to spec this out, since it depends on literal strings in hashes having specific encodings; this is not easy to spec in a single file. I've filed ruby/spec#162 to discuss that, and we'll proceed for now without a spec or test to go with my fix.

Member

headius commented Nov 19, 2015

I'm not sure the best way to spec this out, since it depends on literal strings in hashes having specific encodings; this is not easy to spec in a single file. I've filed ruby/spec#162 to discuss that, and we'll proceed for now without a spec or test to go with my fix.

@headius headius closed this in f6553cf Nov 19, 2015

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Nov 19, 2015

Member

I've committed a fix for the dedup cache to consider encoding. It's a trivial change, preventing new strings with the same contents but different encodings from updating the cache or picking up a wrong-encoding string.

I'll file a separate issue to fix dedup behavior wrt Hash keys, and we'll work on specs separately in ruby/spec#162.

Member

headius commented Nov 19, 2015

I've committed a fix for the dedup cache to consider encoding. It's a trivial change, preventing new strings with the same contents but different encodings from updating the cache or picking up a wrong-encoding string.

I'll file a separate issue to fix dedup behavior wrt Hash keys, and we'll work on specs separately in ruby/spec#162.

headius added a commit to ruby/spec that referenced this issue Nov 20, 2015

Literal strings as literal hash keys must preserve encoding.
Specs for #162.

I confirmed it does test behavior from jruby/jruby#3405:

jruby 9.0.4.0 (2.2.2) 2015-11-12 b9fb7aa Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +jit [darwin-x86_64]
..................F

1)
Hash literal does not change encoding of literal string keys during creation FAILED
Expected #<Encoding:ASCII-8BIT>
 to equal #<Encoding:UTF-8>

/Users/headius/projects/rubyspec/language/hash_spec.rb:145:in `block in (root)'
org/jruby/RubyBasicObject.java:1633:in `instance_eval'
org/jruby/RubyEnumerable.java:1552:in `all?'
org/jruby/RubyFixnum.java:301:in `times'
org/jruby/RubyArray.java:1560:in `each'
/Users/headius/projects/rubyspec/language/hash_spec.rb:6:in `<top>'
org/jruby/RubyKernel.java:957:in `load'
org/jruby/RubyBasicObject.java:1633:in `instance_eval'
org/jruby/RubyArray.java:1560:in `each'

Finished in 0.091000 seconds

1 file, 19 examples, 44 expectations, 1 failure, 0 errors

jruby 9.0.5.0-SNAPSHOT (2.3.0) 2015-11-20 a8eab95 Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +jit [darwin-x86_64]
...................

Finished in 0.101000 seconds

1 file, 19 examples, 46 expectations, 0 failures, 0 errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment