Fixing hashing consistency across JVM instances. #640
Several of the core JRuby classes calculate hash codes based on java or ruby object ids. This doesn't produce consistent hashing across JVM instances, which is needed for distributed frameworks. For example, Hadoop uses hashCode values to distribute keys from the map phase to the same reducer task (partitioning). This commit adds hashCode (and ruby's hash method) implementations for RubyBoolean, RubyNil, and RubySymbol. RubyBoolean and RubyNil simply return static, randomly-generated hashCode values that are hard-coded. This replaces the default java Object#hashCode. For RubySymbol, the previous implementation of hashCode returned the symbol's id, which could be different depending on the order in which symbols are created. This updates it to calculate a hashCode based on the raw symbolBytes like the RubyString implementation, but with a RubySymbol-specific seed and without the encoding addition for 1.9. This value is calculated when symbols are instantiated so the performance impact should be minimal. This commit also adds a RubyInstanceConfig setting and CLI option for consistent hashing, jruby.consistent.hashing.enabled, which controls whether the Ruby runtime's hash seeds (k0 and k1) are generated randomly. When set to true, they are set to static values. These hash seeds are used to hash RubyString objects, so this will make string hash codes consistent across JVMs. (later commit...) Updating hashCode implementations. Per discussion on the last commit's pull request , updating the implementations of hashCode for RubyNil and RubyBoolean. Now the hashCode behavior for nil and booleans will only change when consistent hashing is enabled. Adds a hashCode instance variable to RubyBoolean and RubyNil that is set in the constructor to the Object#hashCode value (using System.identityHashCode) or a static value. : #590
Ok, finally got this merged in. Appears to be working (note I shortened the property from "consistent.hashing.enabled" to "consistent.hashing".
Sorry for the delay!
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments.