8372946 - TreeMap sub-map entry spliterator is expensive#28608
8372946 - TreeMap sub-map entry spliterator is expensive#28608olivergillespie wants to merge 2 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back ogillespie! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@olivergillespie The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
|
Tier2 test failure. I will investigate. |
|
Fixed the test failure by skipping that case. The test intentionally modifies the backing map while holding an iterator, which is not safe in general. It got away with it before, but the new implementation reasonably throws CME. |
|
@olivergillespie This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a |
|
/keepalive |
|
@olivergillespie The pull request is being re-evaluated and the inactivity timeout has been reset. |
| return false; | ||
| } | ||
|
|
||
| public abstract Spliterator<Map.Entry<K,V>> spliterator(); |
There was a problem hiding this comment.
I don't think you need this huge a patch. I think you should just do:
| public abstract Spliterator<Map.Entry<K,V>> spliterator(); | |
| public Spliterator<Map.Entry<K,V>> spliterator() { | |
| return Spliterators.spliterator(iterator(), Spliterator.DISTINCT); | |
| } |
Your patch is introducing spliterator behavioral changes unrelated to the performance regression fix.
There was a problem hiding this comment.
Thanks for looking.
I suppose you mean Spliterators.spliteratorUnknownSize?
Hmm - I made the change this way to be consistent with the existing SubMapKeyIterator and DescendingSubMapKeyIterator, simply adding the same functionality for the Entry versions. Do you think those are overcomplicated too, or there's a reason they're like that that doesn't apply to the Entry versions? I don't know why they were originally added, to be honest, I didn't find much useful context in the history.
I don't know Spliterator well enough to spot any subtle behavioural differences, that's one reason I chose to follow the existing patterns.
DescendingSubMapEntryIterator is SORTED but SubMapEntryIterator is not, so I'd have to account for that too.
There was a problem hiding this comment.
I suppose you mean Spliterators.spliteratorUnknownSize?
Yes. Thanks for corecting me.
Hmm - I made the change this way to be consistent with the existing SubMapKeyIterator and DescendingSubMapKeyIterator, simply adding the same functionality for the Entry versions.
I made the recommendation given the starting point is to address a performance regression, instead of to enhance the sub-map entry spliterator to be on par with the DescendingSubMapKeyIterator.
From this starting point, I believe we can easily identify EntrySetView inherits Set::spliterator which is slow because the spliterator calls size() frequently. This root problem is easily fixed with using Spliterators.spliteratorUnknownSize, which also has the minimal behavioral impact.
In contrast, functional enhancement to spliterators is really a can of worms where you can never find an end - sometimes you add more flags, sometimes other splitting strategies. And in your example, you already have a test case failing due to the functional enhancements while you did make new tests to verify them.
So let's keep it simple, fix the bug, and leave the functional enhancements for another time. This also makes backporting the fix much easier.
There was a problem hiding this comment.
Okay sounds good to me! Thanks for the suggestion, I'll update later this week.
There was a problem hiding this comment.
I created #29485 to add test cases before making this change - that required a slight functional tweak so I didn't want to include it in this change.
TreeMapsub-maps use the defaultIteratorSpliteratorimplementation forTreeMap$EntrySetViewwhich is slow for some operations, becauseEntrySetView.size()iterates all elements. This is most trivially shown by something likelargeTreeMap.tailMap(0L, false).entrySet().limit(1).count()taking a long time. This showed up in my application, where it was trivial to mitigate by switching to a for loop, but I think the fix is easy enough.keySet()does not have the same problem, as it provides a customSpliteratorimplementation which is notSpliterator.SIZED, and returnsLong.MAX_VALUEforestimateSize()(which is the recommended approach when the size is expensive to compute). I'm assuming this optimization was simply missed for the EntryIterator in the original implementation, but I don't know for sure.This patch fixes the issue by providing a custom spliterator for
EntrySetView, which is not SIZED. The implementation is copied almost exactly from the equivalentKeyIteratorclasses in this file (SubMapKeyIterator,DescendingSubMapKeyIterator). The only difference is inSubMapEntryIterator.getComparator, for which I copied the implementation fromTreeMap$EntrySpliterator.Basic performance test:
map.tailMap(0L, false).entrySet().stream().limit(1).count()for aTreeMapwith10_000_000entries.Before (keySet is fast using
SubMapKeyIterator, entrySet is slow usingIteratorSpliterator):After (entrySet is now fast, using
SubMapEntryIterator):Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28608/head:pull/28608$ git checkout pull/28608Update a local copy of the PR:
$ git checkout pull/28608$ git pull https://git.openjdk.org/jdk.git pull/28608/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28608View PR using the GUI difftool:
$ git pr show -t 28608Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28608.diff
Using Webrev
Link to Webrev Comment