Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC] Use a compact hash table for RubyHash instead of the buckets strategy #3172

Merged
merged 30 commits into from
Dec 13, 2023

Conversation

moste00
Copy link
Contributor

@moste00 moste00 commented Jul 22, 2023

This implements a "Compact" approach to hash tables, described at a high level
here. TLDR : This is a representation strategy that efficiently (hopefully!) allows insertion-order preserving hash tables. It aims to replace the BucketHashStorage strategy.

Currently appears to pass all the tests.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 22, 2023
Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass. Good start :)

@moste00 moste00 requested a review from eregon August 7, 2023 20:38
Comment on lines 362 to 364
int signMask = h >> 31;
// make positive
h = h ^ signMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would h & ((1<<31)-1) do the same? (if so it seems more efficient)
And if isn't the h & (max - 1) already enough on its own to make it positive?
Maybe we can even use return h & (max - 2) to do all of it at once? And assert max >= 4 && isPowerOfTwo(4).
There is a definition of isPowerOfTwo() in IntegerNodes, you could move it to RubyGuards to reuse it.

I'm not sure we should discard the lowest bit of the hash though, it might be better to (h & ((index.length>>1) - 1)) << 1 instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would h & ((1<<31)-1) do the same?

Hmmm, ((1 << 31) - 1) is the bitmask <0><1 {31 times}> right ? I had this idea too, although I thought of it more directly in terms of the mask shutting down the sign bit of the 2 comp representation.

I rejected the idea at first because I thought it would be "less fair" than the shifted absolute value function I'm using, meaning that it results in some indices having more collisions than others. I don't remember my reasoning, but it was wrong! I checked now and that '&' actually have the same property as the absolute value, it just maps each negative numbers to exactly one positive number, just not its abs. Yup, checks out, it would work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isn't the h & (max - 1) already enough on its own to make it positive?

Also yes, thanks for pointing that out. Although I'm not sure that actually have the same semantics of "Make positive, THEN take the mod", but I don't think anybody cares about the mod operation per se, all we want is just a number between 0 and max-1, and h & (max - 1) does that nicely.

return h & (max - 2)

Brilliant fusion of operations !

assert max >= 4 && isPowerOfTwo(4)

I think you meant putting max inside the isPowerOfTwo call, not 4. Also, 2 is technically a valid power of 2 right ? it will always result in 0 which is the only valid index anyway. Regardless, max will always be >= 8 by the unstated precondition of the initialCapacity constructor, which is that the passed capacity be greater than or equal to the default initial capacity (8). I can assert that in the constructor itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should discard the lowest bit of the hash though, it might be better to (h & ((index.length>>1) - 1)) << 1 instead

Hmm, can you elaborate further ? The current logic is not discarding the lowest bit of the hash, just the resulting mod (or mod-like) remainder. As far as I can see, that's the simplest way to ensure an even array index. What are some bad consequences of this in your opinion ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant putting max inside the isPowerOfTwo call, not 4. Also, 2 is technically a valid power of 2 right ? it will always result in 0 which is the only valid index anyway. Regardless, max will always be >= 8 by the unstated precondition of the initialCapacity constructor, which is that the passed capacity be greater than or equal to the default initial capacity (8). I can assert that in the constructor itself.

Yes, it needs be a power of 2 >= 4, because if it's 2 then it would & 0 and only cause collisions.
I would assert it here because this is the place it needs to hold. You can also assert it in other places of course, and e.g. create a helper method to asserts those 2 conditions.

Hmm, can you elaborate further ? The current logic is not discarding the lowest bit of the hash, just the resulting mod (or mod-like) remainder. As far as I can see, that's the simplest way to ensure an even array index. What are some bad consequences of this in your opinion ?

It is keeping keeping the mod remainder, what is above is discarded. But if we do return h & (max - 2), then we also discard the lowest bit, which seems not ideal.

Suppose I have some hash values like 6 and 7, if we discard the lowest bit, then those collide. But if we didn't discard it then they wouldn't collide.
We can't take all 32 bits of the hash, because our array will never have that much length, so I think the typical approach is to preserve all lower bits, because lower bits are most likely to differ for objects that are "close" in values to each other (e.g. if int hash is just itself).

Copy link
Collaborator

@nirvdrum nirvdrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code to work through here. I haven't done a comprehensive pass on the core logic yet. I think if you apply some of the feedback I suggested, it'll be easier to follow the logic.

As a couple of other general notes:

  • You have several branches without a profile. That's fine for a first pass, but will likely limit your performance, unless all branches are equally likely.
  • You're missing some necessary boundaries.

An easy way to check if you're missing a necessary boundary is to run jt build --env native. It's slow, but if you end up with output like:

Deepest level call tree path (47 calls):
   62 ; com.sun.org.apache.xerces.internal.impl.xpath.regex.ParserForXMLSchema.getRange(ParserForXMLSchema.java:381) ; com.sun.org.apache.xerces.internal.impl.xpath.regex.ParserForXMLSchema.setupRange(Token, String)
  489 ; com.sun.org.apache.xerces.internal.impl.xpath.regex.ParserForXMLSchema.getTokenForShorthand(ParserForXMLSchema.java:321) ; com.sun.org.apache.xerces.internal.impl.xpath.regex.ParserForXMLSchema.getRange(String, boolean)
  145 ; com.sun.org.apache.xerces.internal.impl.xpath.regex.RegexParser.parseAtom(RegexParser.java:783) ; com.sun.org.apache.xerces.internal.impl.xpath.regex.ParserForXMLSchema.getTokenForShorthand(int) ; com.sun.org.apache.xerces.internal.impl.xpath.regex.RegexParser.getTokenForShorthand(int)
  440 ; com.sun.org.apache.xerces.internal.impl.xpath.regex.RegexParser.parseFactor(RegexParser.java:677) ; com.sun.org.apache.xerces.internal.impl.xpath.regex.RegexParser.parseAtom()

then you'll know you missed one. The output is letting you know the Native Image attempted to compile too many methods for runtime compilation. Now, that's a slow process, so you may find it more advantageous to go through the code and try to reason about it or add boundaries everywhere and work to eliminate them.

Another option available is to modify mx.truffleruby/native-image.properties to add the PrintRuntimeCompilationCallTree host option:

diff --git a/mx.truffleruby/native-image.properties b/mx.truffleruby/native-image.properties
index 5be3cebd3b..72ade59fda 100644
--- a/mx.truffleruby/native-image.properties
+++ b/mx.truffleruby/native-image.properties
@@ -4,6 +4,7 @@
 Requires = language:nfi language:llvm language:regex

 Args = -H:MaxRuntimeCompileMethods=5400 \
+       -H:+PrintRuntimeCompilationCallTree \
        -H:+AddAllCharsets \
        --initialize-at-build-time=org.truffleruby,org.jcodings,org.joni

That'll modify the output of jt build --env native to include the call tree for each method made available for runtime compilation. It's a ton of output, but if you see a deeply nested entry and work your way back to its root you can identify areas that need boundaries that way. Alternatively, since all of this code lives in CompactHashStore, you can search for that and look for long call trees stemming from that entry.

// This tracks the next valid insertion position into kvStore, it starts at 0 and always increases by 2
// We can't use the size of the RubyHash for that, because deletion reduces its hash.size
// Whereas the insertion pos into kvStore can never decrease, we don't reuse empty slots
private int kvStoreInsertionPos;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for an initial implementation, but unbounded memory growth sounds bad. It might even be a DoS vector (I need to think on that a bit more).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you on that one, there is a method resizeKvIfNeeded() that is currently responsible for expanding the KV, I could also make it shrink the KV as well, triggering a compaction pass if hash.size is too small relative to the Kv.size.

The problem is that compaction probably implies both a rehash and touching the entire index array, there might be a clever way to avoid both but I'm not sure I got all the details right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is something we did not handle in BucketsHashStore either, and java.util.HashMap does not seem to do it either.
But yes maybe we should have a lower threshold too, maybe if falling below 0.20 * capacity entries or maybe even a lower factor. It needs to not trigger though e.g. when we create a new Hash with capacity=8 and add the first entries.
I think something to revisit later.

}

private static void assertConstructorPreconditions(int capacity) {
if (capacity < 1 || capacity > (1 << 28)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 28? I realize we need the number to be effectively half the max array size so we can store pairs of values, but this seems smaller than I'd expect to be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that number will be rounded upwards to the nearest power of 2. So 2^28 + 1 wil be rounded upwards to 2^29. That will be the kv capacity, the index capacity is 2 times that, so 2^30. That's a problem, because the capacity is in entries, so we allocate twice 2^30 array entries for the index, so 2^31, but that's negative !

The thing we can remove to mitigate that is the requirement that the kv size be a multiple of 2, there is no real reason for that, we technically only require that index.length is a power of 2 for efficient mod. So we technically can use the supplied capacity as-is for the KV, and then round upwards for the index capacity.

But there's a pathological edge case there, what if the supplied capacity was already a power of 2 ? rouding "upwards" will yield the same power of 2, and then you end up with a KV and an index array of the same size, which is a too crowded index.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so 2^28 is correct then, could you add a comment explaining that?

@eregon eregon self-requested a review September 20, 2023 12:03
int[] oldIndex = index;
this.index = new int[oldIndex.length];

for (int i = 0; tillArrayEnd.inject(node, i < oldIndex.length); i += 2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This usage of InlinedLoopConditionProfile is not correct (it never reports the number of iterations done), it needs to be like 4ec8e4d & in other places? Could you fix all InlinedLoopConditionProfile to match that?

}

@TruffleBoundary
private static void resizeIndexIfNeeded(CompactHashStore store, int size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heavy logic with the loop below, that's good to be behind a TruffleBoundary, there is almost no value to partial evaluate that, and it would make warmup slower.
But the early check if (indexResizingIsNotNeeded.profile(node, size < store.numSlotsForIndexRebuild)) { that must be moved to each caller, we don't want to make a boundary call if it's not needed.

}
}

private static void relocateHashIfPossible(int currPos, int relocationPos, int[] index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do, could you document it? Is it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just reuses deleted slots along the lookup chain. If we didn't do this, lots of deleted keys will slow down lookup considerably by lengthening the chain.

The way it works is that : hash lookup logic keeps track of the first deleted slot encountered, and returns it along with the position of the hash it's looking for (avoiding allocation via the IntPair clever hack, packing 2 ints in a long). The caller of hash lookup logic is then responsible for calling the relocation logic after it uses the hash position to lookup the hash.

Yes it's not very pretty ¯_(ツ)_/¯

@moste00 moste00 force-pushed the LovingHashes branch 3 times, most recently from fa3944a to eef1320 Compare October 25, 2023 20:06
@eregon eregon changed the title First working skeleton of a compact hash table, still lots of cleaning up and profiling ahead. [GSoC] Use a compact hash table for RubyHash instead of the buckets strategy Oct 26, 2023
Comment on lines 66 to 70
private static final boolean bigHashTypeIsCompact;
static {
String hashType = System.getProperty("BigHashStrategy");
bigHashTypeIsCompact = hashType != null && hashType.equalsIgnoreCase("compact");
}
Copy link
Member

@eregon eregon Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update this so it defaults to your new CompactHashStore? (i.e. use it unless BigHashStrategy == "buckets")
Then we'll be sure the new strategy passes the whole CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same logic is duplicated in src/main/java/org/truffleruby/core/hash/HashLiteralNode.java, could you deduplicate?

@eregon
Copy link
Member

eregon commented Nov 29, 2023

avoid too many tombstones in kvArray. An easy partial fix is to kvStoreInsertionPos -= 2 if deleting the last kvArray entry, notably this is always the case for deleteLast. But we should also handle many deletions not at the end (either in the middle or at the beginning is problematic) and resize/recompute the kvArray + adapt the index for it.

The optimization to decrease kvStoreInsertionPos if deleting at the end is done.
The resizing on too many deletes is not done, I'm thinking to defer that to another PR.

@eregon eregon self-assigned this Nov 29, 2023
@eregon
Copy link
Member

eregon commented Dec 13, 2023

I am sorry this is taking so long to merge.
The reason is we got some CI failures (| ERROR: There was an exception in Java but rb_tr_jmp_buf is NULL.) which only happen on this branch (3 times) and nowhere else.
So while it's not clear how this change can cause that, it seems an odd coincidence.
I'll restart the gate and try to merge it, if that issue becomes too frequent we can change the option default until we figure it out.

@eregon eregon merged commit ca74b40 into oracle:master Dec 13, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants