Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing hashing consistency across JVM instances. #640

merged 1 commit into from May 21, 2013


Copy link

@rdblue rdblue commented Apr 16, 2013

Fixing an older pull request #590 by squashing the commits into one.

Several of the core JRuby classes calculate hash codes based on java or ruby
object ids. This doesn't produce consistent hashing across JVM instances, which
is needed for distributed frameworks. For example, Hadoop uses hashCode values
to distribute keys from the map phase to the same reducer task (partitioning).

This commit adds hashCode (and ruby's hash method) implementations for
RubyBoolean, RubyNil, and RubySymbol. RubyBoolean and RubyNil simply return
static, randomly-generated hashCode values that are hard-coded. This replaces
the default java Object#hashCode.

For RubySymbol, the previous implementation of hashCode returned the symbol's
id, which could be different depending on the order in which symbols are
created. This updates it to calculate a hashCode based on the raw symbolBytes
like the RubyString implementation, but with a RubySymbol-specific seed and
without the encoding addition for 1.9. This value is calculated when symbols
are instantiated so the performance impact should be minimal.

This commit also adds a RubyInstanceConfig setting and CLI option for
consistent hashing, jruby.consistent.hashing.enabled, which controls whether
the Ruby runtime's hash seeds (k0 and k1) are generated randomly. When set to
true, they are set to static values. These hash seeds are used to hash
RubyString objects, so this will make string hash codes consistent across JVMs.

(later commit...)

Updating hashCode implementations.

Per discussion on the last commit's pull request [1], updating the
implementations of hashCode for RubyNil and RubyBoolean. Now the hashCode
behavior for nil and booleans will only change when consistent hashing is
enabled. Adds a hashCode instance variable to RubyBoolean and RubyNil that is
set in the constructor to the Object#hashCode value (using
System.identityHashCode) or a static value.

[1]: #590
@jrubyci jrubyci merged commit af1d387 into jruby:master May 21, 2013
Copy link

@headius headius commented May 21, 2013

Ok, finally got this merged in. Appears to be working (note I shortened the property from "consistent.hashing.enabled" to "consistent.hashing".

ext-jruby-local ~/projects/jruby $ jruby -e "p 'foo'.hash"

ext-jruby-local ~/projects/jruby $ jruby -e "p 'foo'.hash"

ext-jruby-local ~/projects/jruby $ jruby -e "p 'foo'.hash"

ext-jruby-local ~/projects/jruby $ jruby -Xconsistent.hashing=true -e "p 'foo'.hash"

ext-jruby-local ~/projects/jruby $ jruby -Xconsistent.hashing=true -e "p 'foo'.hash"

Sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants