-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8265029: Preserve SIZED characteristics on slice operations (skip, limit) #3427
Conversation
👋 Welcome back tvaleev! A progress list of the required criteria for merging this PR into |
Webrevs
|
test/jdk/java/util/stream/test/org/openjdk/tests/java/util/stream/CountTest.java
Outdated
Show resolved
Hide resolved
// Original spliterator is split to [0..499] and [500..999] parts | ||
// due to skip+limit, we have [50..499] and [500..849] | ||
var prefix = parSpliterator.trySplit(); | ||
assertNotNull(prefix); | ||
assertTrue(parSpliterator.hasCharacteristics(Spliterator.SIZED)); | ||
assertTrue(parSpliterator.hasCharacteristics(Spliterator.SUBSIZED)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to integrate the code comment to the assertion message. Then test failure would print the message making it easier to understand the failure.
It would be great to extract assertHasCharacteristics
method so the failure printed the actual characteristics rather than "expected true got false"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertTrue/False(spltr.hasCharacteristics(...)) pattern appears in Stream API tests quite often. I can see dozens of occurrences. If extracting such kind of method, then it would be better to replace all of them. At the same time, this will make the patch bigger complicating its review and drifting from the original task. You can contribute such a change separately. In fact, the line number in the report points exactly to the problematic spliterator and characteristics, and dedicated method would not provide you with more information. Anyway, you'll need to take a look at the test source code to see how the spliterator was created, so you won't save much time with a better message.
|
||
@Benchmark | ||
public List<String> seq_baseline() { | ||
return IntStream.range(0, size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically you want to move all the constants to state fields to avoid constant folding by the compiler.
The compiler might accidentally use the fact that range start is always 0 and produce a dedicated optimized code for it.
See https://shipilev.net/blog/2014/java-scala-divided-we-fail/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this article. Here, the upper bound is a state field, so the whole range cannot be optimized. And even if the compiler optimizes at the loop start, it's pretty common to have ranges starting with the constant 0 in the production, so I would not say that having 0 as an iteration starting point makes the benchmark more artificial. The same approach is used in neighbor benchmarks, and in fact, this is not the biggest problem with these benchmarks. Clean type profile makes them much more artificial than starting with zero. So I'd prefer keeping zero as is.
1. Comments in adjustSize 2. repeating code extracted from testNoEvaluationForSizedStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though there are not many changes this cuts deep into how streams work.
I suspect there is some, possibly minor, impact for sized streams without limit/skip because of the increased cost to compute the exact size. Further, that cost is also incurred for AbstractWrappingSpliterator
, where we might need to cache the exact size result.
I made some suggestions, mostly around naming, but I admit to being a little uncomfortable with the proposed change and need to think a little more about it. I am wondering if we need another stream flag indicating the size is known and has to be computed?
src/java.base/share/classes/java/util/stream/AbstractPipeline.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Paul Sandoz <paul.d.sandoz@googlemail.com>
Co-authored-by: Paul Sandoz <paul.d.sandoz@googlemail.com>
Co-authored-by: Paul Sandoz <paul.d.sandoz@googlemail.com>
I see your concern. I made some additional benchmarks and added them here. First, CountSized, which just gets the stream size without actual traversal. We can see how the performance changes depending on number of stream operations. I also added an optional type profile pollution that makes Baseline (The
Type pollution adds some near-constant ~2ns overhead to the non-patched version as well. Patched:
So indeed we have some performance drawback in patched version. Here's a chart:
However, using Stream API without actually iterating the stream is very rare case. And if we iterate, the performance will be dominated by the number of iterations. I tried to model this case with SumSized benchmark (replacing count with sum, for 5 and 10 stream elements), but got very confusing results. Baseline:
Patched:
For some reason, non-patched polluted version is the slowest, and I cannot see any stable overhead in patched version. For 4+ intermediate ops, patched version numbers are always better than the corresponding non-patched ones. I would be glad if somebody explains these numbers or point to a flaw in this benchmark. What do you think, @PaulSandoz? Is it acceptable overhead or should I experiment with the new Stream flag? |
@amaembo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@amaembo this dropped of my radar, but the Skara reminder made it visible again! Thank you for the detailed analysis. I cannot explain the results of when triggering profile pollution over the kinds of stream. Implementation-wise I still find it a little awkward that the operation is responsible for calling the prior op in the pipeline, and we see the consequences of that in the Slice op implementation that predicates on It might be cleaner if the op is presented with some exact size and adjusts it, something more pure e.g. for SliceOp: long exactOutputSize(long sourceSize) {
return calcSize(sourceSize, skip, normalizedLimit);
} The default impl would be: long exactOutputSize(long sourceSize) {
return sourceSize;
} Then the pipeline helper is responsible for traversing the pipeline (be it sequentially or in parallel, in the latter case the above method would not get called since the slice op becomes the source spliterator for the next stages). To do that efficiently does I think require a new flag, set by an op and only meaningful when SIZED is set (and cleared when SIZED is cleared, although perhaps we only need to do that splitting stages for parallel execution, see |
@PaulSandoz I added a new flag and updated |
// 13, 0x04000000 | ||
SIZE_ADJUSTING(13, | ||
set(Type.OP)); | ||
|
||
// The following 2 flags are currently undefined and a free for any further |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, the comment says 'The following 2 flags', while we had three, so it was incorrect before but correct now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good. Thanks making the adjustments. Architecturally, i think we are in a better place. Just have some comments, mostly around code comments.
src/java.base/share/classes/java/util/stream/AbstractPipeline.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the likelihood of some stateless size adjusting op is small but i think its worthwhile capturing the thinking, since this area of streams is quite complex. Thanks for adding the comment.
@amaembo This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 754 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
…java Co-authored-by: Paul Sandoz <paul.d.sandoz@googlemail.com>
/integrate |
@amaembo Since your change was applied there have been 754 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 0c9daa7. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
With the introduction of
toList()
, preserving the SIZED characteristics in more cases becomes more important. This patch preserves SIZED onskip()
andlimit()
operations, so now every combination ofmap/mapToX/boxed/asXyzStream/skip/limit/sorted
preserves size, andtoList()
,toArray()
andcount()
may benefit from this. E. g.,LongStream.range(0, 10_000_000_000L).skip(1).count()
returns result instantly with this patch.Some microbenchmarks added that confirm the reduced memory allocation in
toList()
andtoArray()
cases. Before patch:After patch:
Time improvements are less exciting. It's likely that inlining and vectorizing dominate in these tests over array allocations and unnecessary copying. Still, I notice a significant improvement in SliceToArray.seq_limit case (2x) and mild improvement (+12..16%) in other slice tests. No significant change in parallel execution time, though its performance is much less stable and I didn't run enough tests.
Before patch:
After patch:
I also modernized SliceOps a little bit, using switch expression (with no explicit default!) and diamonds on anonymous classes.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3427/head:pull/3427
$ git checkout pull/3427
Update a local copy of the PR:
$ git checkout pull/3427
$ git pull https://git.openjdk.java.net/jdk pull/3427/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3427
View PR using the GUI difftool:
$ git pr show -t 3427
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3427.diff