Please sign in to comment.
Fixing hashing consistency across JVM instances.
Several of the core JRuby classes calculate hash codes based on java or ruby object ids. This doesn't produce consistent hashing across JVM instances, which is needed for distributed frameworks. For example, Hadoop uses hashCode values to distribute keys from the map phase to the same reducer task (partitioning). This commit adds hashCode (and ruby's hash method) implementations for RubyBoolean, RubyNil, and RubySymbol. RubyBoolean and RubyNil simply return static, randomly-generated hashCode values that are hard-coded. This replaces the default java Object#hashCode. For RubySymbol, the previous implementation of hashCode returned the symbol's id, which could be different depending on the order in which symbols are created. This updates it to calculate a hashCode based on the raw symbolBytes like the RubyString implementation, but with a RubySymbol-specific seed and without the encoding addition for 1.9. This value is calculated when symbols are instantiated so the performance impact should be minimal. This commit also adds a RubyInstanceConfig setting and CLI option for consistent hashing, jruby.consistent.hashing.enabled, which controls whether the Ruby runtime's hash seeds (k0 and k1) are generated randomly. When set to true, they are set to static values. These hash seeds are used to hash RubyString objects, so this will make string hash codes consistent across JVMs.
- Loading branch information...
Showing with 76 additions and 4 deletions.
- +8 −2 src/org/jruby/Ruby.java
- +14 −0 src/org/jruby/RubyBoolean.java
- +8 −0 src/org/jruby/RubyInstanceConfig.java
- +11 −1 src/org/jruby/RubyNil.java
- +13 −1 src/org/jruby/RubySymbol.java
- +1 −0 src/org/jruby/util/cli/Options.java
- +12 −0 test/org/jruby/test/TestRubyNil.java
- +9 −0 test/org/jruby/test/TestRubySymbol.java