Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Truffle] New implementation of non-small hashes #2328

Merged
merged 12 commits into from Dec 17, 2014
Merged

[Truffle] New implementation of non-small hashes #2328

merged 12 commits into from Dec 17, 2014

Conversation

@chrisseaton
Copy link
Contributor

@chrisseaton chrisseaton commented Dec 16, 2014

@eregon @nirvdrum please review.

We've got two new things here. First of all we've got a proper implementation of hash where there are more than 3 key-value pairs. We have to have a custom data structure because we need to call Ruby methods for hash and eql? with somewhere to store the state for the call site. We can't re-use JRuby's, for the usual reasons, and using the Truffle-style of storage strategies we don't want encapsulation - we want the logic to be in the nodes.

Secondly I've also moved some of the logic of the hash into a node - the basic operation of finding the right bucket. That allows us to store the state and possibly do some branch profiling and value profiling but still provide a nice method call interface.

I've added a test to stress hash implementations.

We aren't doing any rebalancing for overloaded indices yet - we never grow the number of slots.


@Specialization
public RubyArray uniq(RubyArray array) {
notDesignedForCompilation();

This comment has been minimized.

@eregon

eregon Dec 16, 2014
Member

MRI uses a temporary Hash to avoid O(n2) here.

This comment has been minimized.

@eregon

eregon Dec 16, 2014
Member

So maybe we could have an overload of notDesignedForCompilation() mapping to CompilerAsserts.neverPartOfCompilation(String message) so to keep track of what is not compilation ready when not obvious.

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014
Author Contributor

Good idea - we can go through and add a reason for all the notDesigneds when we do a spring clean after 0.6 is released.

/**
* A bucket in the Ruby hash. That is, a container for a key and a value, and a member of two lists - the chain of
* buckets for a given index, and the chain of buckets for the insertion order across the whole hash. Both of those
* chains are doubly-linked to enable O(1) deletions (after the correct bucket is found).

This comment has been minimized.

@eregon

eregon Dec 16, 2014
Member

Do we need double-linking in the bucket chain?
I am used to traditional singly-linked list for chaining buckets and since we have to walk the chain anyway, we might as well remember previous, no?

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014
Author Contributor

Yes, the search result gives us what we need - fixed.

* <li>There is nothing at that index, in which case the bucket and last bucket in the chain will be
* {@code null}</li>
* <li>There were buckets at that index, but none for our key, in which case the bucket will be null, but the last
* bucket will the last bucket in the chain at that index, presumably where we will want to insert your new

This comment has been minimized.

@eregon

eregon Dec 16, 2014
Member

will be

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014
Author Contributor

Fixed.

public class HashOperations {

public static final int SMALL_HASH_SIZE = Options.TRUFFLE_HASHES_SMALL.load();
public static final int[] CAPACITIES = Arrays.copyOf(org.jruby.RubyHash.MRI_PRIMES, org.jruby.RubyHash.MRI_PRIMES.length - 1);

This comment has been minimized.

@eregon

eregon Dec 16, 2014
Member

This might be worth to re-examine.
I read a couple times primes are not really needed that much anymore if there is a good hash function. But maybe it's FUD.

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014
Author Contributor

Could be - but I'll leave it like this until we are in a position to benchmark it on real code.


if (removed == null) {
return getContext().getCoreLibrary().getNilObject();
if (bucket.getPreviousInLookup() == null) {

This comment has been minimized.

@eregon

eregon Dec 16, 2014
Member

So here is the only usage of getPreviousInLookup().
I think it is sensibly OK to walk the chain of buckets at that index there and check with Bucket ==.
(The terminology is confusing to me, a bucket should contain the chain of entries, no?).

@eregon
Copy link
Member

@eregon eregon commented Dec 16, 2014

Looks good globally.
Of course HashOperations should disappear, except maybe for some debug stuff. As well as DebugOps.send.

I am slightly uncomfortable with the naming of Bucket for what is a "hashtable entry".
But I understand the need to differentiate from a usual Map.Entry-like entry.
But maybe we don't want such Map.Entry-like entry? Walking directly on what is called Buckets here sounds good (might use an iterator), except maybe for data race issues.
In my intuition, a bucket is usually an element in the storage array, the array of buckets/slots. We likely don't have such a concept as an object in a practical implementation, as it's just a linked chain of hashtable entries.

public static boolean isOtherObjectArray(RubyHash hash, RubyHash other) {
return other.getStore() instanceof Object[];
// Arrays are covariant in Java!
return hash.getStore() instanceof Object[] && !(hash.getStore() instanceof Bucket[]);

This comment has been minimized.

@eregon

eregon Dec 16, 2014
Member

So this means there is no good way to check that the ObjectArray strategy is actually only using just a Object[] with instanceof?
But getClass() should do it then, so the assertions in RubyHash constructor should be adapted?
Wondering if 2 instanceof is also better than 1 getClass().

@nirvdrum
Copy link
Contributor

@nirvdrum nirvdrum commented Dec 16, 2014

This needs to merge in the change from 7a4ab54.


for (int n = 0; n < RubyHash.HASHES_SMALL; n++) {
for (int n = 0; n < HashOperations.SMALL_HASH_SIZE; n++) {
if (n < size && eqlNode.call(frame, store[n * 2], "eql?", null, key)) {
return store[n * 2 + 1];

This comment has been minimized.

@nirvdrum

nirvdrum Dec 16, 2014
Contributor

This code was already here, but maybe it'd be easier to follow if KEY_OFFSET and VALUE_OFFSET constants were used.

This comment has been minimized.

@chrisseaton

chrisseaton Dec 16, 2014
Author Contributor

Good idea - I'll do that on the master branch later.


import java.util.LinkedHashMap;
import java.util.*;

This comment has been minimized.

@nirvdrum

nirvdrum Dec 16, 2014
Contributor

Minor, but JRuby core prefers these be expanded.

@nirvdrum
Copy link
Contributor

@nirvdrum nirvdrum commented Dec 16, 2014

I'll have to check out the branch to navigate the code since I'm finding it too annoying in GItHub's web UI. But at first blush this looks pretty good.

@eregon

This comment has been minimized.

Copy link
Member

@eregon eregon commented on 35c1350 Dec 16, 2014

Looks good!
previousBucket might not be needed for many operations, but this is future optimization.
It can also be re-computed cheaply if we know the chain at an index does not get too long.

@chrisseaton
Copy link
Contributor Author

@chrisseaton chrisseaton commented Dec 16, 2014

I renamed the buckets to entries and removed the backwards link in the bucket chain.

@eregon

This comment has been minimized.

This looks funny but I guess it makes sense once you know this Entry is actually a chain of Entries, that is a bucket.

@eregon

This comment has been minimized.

needs a rename here.

@eregon

This comment has been minimized.

the chain of entries for a given index, forming a bucket

chrisseaton added a commit that referenced this pull request Dec 17, 2014
[Truffle] New implementation of non-small hashes
@chrisseaton chrisseaton merged commit 8c9f381 into master Dec 17, 2014
1 check failed
1 check failed
continuous-integration/travis-ci The Travis CI build failed
Details
@chrisseaton chrisseaton deleted the truffle-hash branch Dec 17, 2014
@enebo enebo added this to the JRuby 9.0.0.0 milestone Dec 22, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.