New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] New implementation of non-small hashes #2328

Merged
merged 12 commits into from Dec 17, 2014

Conversation

Projects
None yet
4 participants
@chrisseaton
Contributor

chrisseaton commented Dec 16, 2014

@eregon @nirvdrum please review.

We've got two new things here. First of all we've got a proper implementation of hash where there are more than 3 key-value pairs. We have to have a custom data structure because we need to call Ruby methods for hash and eql? with somewhere to store the state for the call site. We can't re-use JRuby's, for the usual reasons, and using the Truffle-style of storage strategies we don't want encapsulation - we want the logic to be in the nodes.

Secondly I've also moved some of the logic of the hash into a node - the basic operation of finding the right bucket. That allows us to store the state and possibly do some branch profiling and value profiling but still provide a nice method call interface.

I've added a test to stress hash implementations.

We aren't doing any rebalancing for overloaded indices yet - we never grow the number of slots.

@Specialization
public RubyArray uniq(RubyArray array) {
notDesignedForCompilation();

This comment has been minimized.

@eregon

eregon Dec 16, 2014

Member

MRI uses a temporary Hash to avoid O(n2) here.

@eregon

eregon Dec 16, 2014

Member

MRI uses a temporary Hash to avoid O(n2) here.

This comment has been minimized.

@eregon

eregon Dec 16, 2014

Member

So maybe we could have an overload of notDesignedForCompilation() mapping to CompilerAsserts.neverPartOfCompilation(String message) so to keep track of what is not compilation ready when not obvious.

@eregon

eregon Dec 16, 2014

Member

So maybe we could have an overload of notDesignedForCompilation() mapping to CompilerAsserts.neverPartOfCompilation(String message) so to keep track of what is not compilation ready when not obvious.

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014

Contributor

Good idea - we can go through and add a reason for all the notDesigneds when we do a spring clean after 0.6 is released.

@chrisseaton

chrisseaton Dec 16, 2014

Contributor

Good idea - we can go through and add a reason for all the notDesigneds when we do a spring clean after 0.6 is released.

public class HashOperations {
public static final int SMALL_HASH_SIZE = Options.TRUFFLE_HASHES_SMALL.load();
public static final int[] CAPACITIES = Arrays.copyOf(org.jruby.RubyHash.MRI_PRIMES, org.jruby.RubyHash.MRI_PRIMES.length - 1);

This comment has been minimized.

@eregon

eregon Dec 16, 2014

Member

This might be worth to re-examine.
I read a couple times primes are not really needed that much anymore if there is a good hash function. But maybe it's FUD.

@eregon

eregon Dec 16, 2014

Member

This might be worth to re-examine.
I read a couple times primes are not really needed that much anymore if there is a good hash function. But maybe it's FUD.

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014

Contributor

Could be - but I'll leave it like this until we are in a position to benchmark it on real code.

@chrisseaton

chrisseaton Dec 16, 2014

Contributor

Could be - but I'll leave it like this until we are in a position to benchmark it on real code.

@eregon

This comment has been minimized.

Show comment
Hide comment
@eregon

eregon Dec 16, 2014

Member

Looks good globally.
Of course HashOperations should disappear, except maybe for some debug stuff. As well as DebugOps.send.

I am slightly uncomfortable with the naming of Bucket for what is a "hashtable entry".
But I understand the need to differentiate from a usual Map.Entry-like entry.
But maybe we don't want such Map.Entry-like entry? Walking directly on what is called Buckets here sounds good (might use an iterator), except maybe for data race issues.
In my intuition, a bucket is usually an element in the storage array, the array of buckets/slots. We likely don't have such a concept as an object in a practical implementation, as it's just a linked chain of hashtable entries.

Member

eregon commented Dec 16, 2014

Looks good globally.
Of course HashOperations should disappear, except maybe for some debug stuff. As well as DebugOps.send.

I am slightly uncomfortable with the naming of Bucket for what is a "hashtable entry".
But I understand the need to differentiate from a usual Map.Entry-like entry.
But maybe we don't want such Map.Entry-like entry? Walking directly on what is called Buckets here sounds good (might use an iterator), except maybe for data race issues.
In my intuition, a bucket is usually an element in the storage array, the array of buckets/slots. We likely don't have such a concept as an object in a practical implementation, as it's just a linked chain of hashtable entries.

@nirvdrum

This comment has been minimized.

Show comment
Hide comment
@nirvdrum

nirvdrum Dec 16, 2014

Contributor

This needs to merge in the change from 7a4ab54.

Contributor

nirvdrum commented Dec 16, 2014

This needs to merge in the change from 7a4ab54.

for (int n = 0; n < RubyHash.HASHES_SMALL; n++) {
for (int n = 0; n < HashOperations.SMALL_HASH_SIZE; n++) {
if (n < size && eqlNode.call(frame, store[n * 2], "eql?", null, key)) {
return store[n * 2 + 1];

This comment has been minimized.

@nirvdrum

nirvdrum Dec 16, 2014

Contributor

This code was already here, but maybe it'd be easier to follow if KEY_OFFSET and VALUE_OFFSET constants were used.

@nirvdrum

nirvdrum Dec 16, 2014

Contributor

This code was already here, but maybe it'd be easier to follow if KEY_OFFSET and VALUE_OFFSET constants were used.

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014

Contributor

Good idea - I'll do that on the master branch later.

@chrisseaton

chrisseaton Dec 16, 2014

Contributor

Good idea - I'll do that on the master branch later.

@nirvdrum

This comment has been minimized.

Show comment
Hide comment
@nirvdrum

nirvdrum Dec 16, 2014

Contributor

I'll have to check out the branch to navigate the code since I'm finding it too annoying in GItHub's web UI. But at first blush this looks pretty good.

Contributor

nirvdrum commented Dec 16, 2014

I'll have to check out the branch to navigate the code since I'm finding it too annoying in GItHub's web UI. But at first blush this looks pretty good.

@eregon

This comment has been minimized.

Show comment
Hide comment
@eregon

eregon Dec 16, 2014

Member

Looks good!
previousBucket might not be needed for many operations, but this is future optimization.
It can also be re-computed cheaply if we know the chain at an index does not get too long.

Member

eregon commented on 35c1350 Dec 16, 2014

Looks good!
previousBucket might not be needed for many operations, but this is future optimization.
It can also be re-computed cheaply if we know the chain at an index does not get too long.

@chrisseaton

This comment has been minimized.

Show comment
Hide comment
@chrisseaton

chrisseaton Dec 16, 2014

Contributor

I renamed the buckets to entries and removed the backwards link in the bucket chain.

Contributor

chrisseaton commented Dec 16, 2014

I renamed the buckets to entries and removed the backwards link in the bucket chain.

@eregon

This comment has been minimized.

Show comment
Hide comment
@eregon

eregon Dec 16, 2014

Member

This looks funny but I guess it makes sense once you know this Entry is actually a chain of Entries, that is a bucket.

This looks funny but I guess it makes sense once you know this Entry is actually a chain of Entries, that is a bucket.

@eregon

This comment has been minimized.

Show comment
Hide comment
@eregon

eregon Dec 16, 2014

Member

needs a rename here.

needs a rename here.

@eregon

This comment has been minimized.

Show comment
Hide comment
@eregon

eregon Dec 16, 2014

Member

the chain of entries for a given index, forming a bucket

the chain of entries for a given index, forming a bucket

chrisseaton added a commit that referenced this pull request Dec 17, 2014

Merge pull request #2328 from jruby/truffle-hash
[Truffle] New implementation of non-small hashes

@chrisseaton chrisseaton merged commit 8c9f381 into master Dec 17, 2014

1 check failed

continuous-integration/travis-ci The Travis CI build failed
Details

@chrisseaton chrisseaton deleted the truffle-hash branch Dec 17, 2014

@enebo enebo added this to the JRuby 9.0.0.0 milestone Dec 22, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment