Fixing hashing consistency across JVM instances. #640

Merged
merged 1 commit into from May 21, 2013

Projects

None yet

3 participants

@rdblue

Fixing an older pull request #590 by squashing the commits into one.

@rdblue rdblue Fixing hashing consistency across JVM instances.
Several of the core JRuby classes calculate hash codes based on java or ruby
object ids. This doesn't produce consistent hashing across JVM instances, which
is needed for distributed frameworks. For example, Hadoop uses hashCode values
to distribute keys from the map phase to the same reducer task (partitioning).

This commit adds hashCode (and ruby's hash method) implementations for
RubyBoolean, RubyNil, and RubySymbol. RubyBoolean and RubyNil simply return
static, randomly-generated hashCode values that are hard-coded. This replaces
the default java Object#hashCode.

For RubySymbol, the previous implementation of hashCode returned the symbol's
id, which could be different depending on the order in which symbols are
created. This updates it to calculate a hashCode based on the raw symbolBytes
like the RubyString implementation, but with a RubySymbol-specific seed and
without the encoding addition for 1.9. This value is calculated when symbols
are instantiated so the performance impact should be minimal.

This commit also adds a RubyInstanceConfig setting and CLI option for
consistent hashing, jruby.consistent.hashing.enabled, which controls whether
the Ruby runtime's hash seeds (k0 and k1) are generated randomly. When set to
true, they are set to static values. These hash seeds are used to hash
RubyString objects, so this will make string hash codes consistent across JVMs.

(later commit...)

Updating hashCode implementations.

Per discussion on the last commit's pull request [1], updating the
implementations of hashCode for RubyNil and RubyBoolean. Now the hashCode
behavior for nil and booleans will only change when consistent hashing is
enabled. Adds a hashCode instance variable to RubyBoolean and RubyNil that is
set in the constructor to the Object#hashCode value (using
System.identityHashCode) or a static value.

[1]: jruby#590
af1d387
@jrubyci jrubyci merged commit af1d387 into jruby:master May 21, 2013
@headius
JRuby Team member

Ok, finally got this merged in. Appears to be working (note I shortened the property from "consistent.hashing.enabled" to "consistent.hashing".

ext-jruby-local ~/projects/jruby $ jruby -e "p 'foo'.hash"
-1241175443

ext-jruby-local ~/projects/jruby $ jruby -e "p 'foo'.hash"
2083365819

ext-jruby-local ~/projects/jruby $ jruby -e "p 'foo'.hash"
-1992262216

ext-jruby-local ~/projects/jruby $ jruby -Xconsistent.hashing=true -e "p 'foo'.hash"
915909962

ext-jruby-local ~/projects/jruby $ jruby -Xconsistent.hashing=true -e "p 'foo'.hash"
915909962

Sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment