-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8319123: Implement JEP 461: Stream Gatherers (Preview) #16420
Conversation
👋 Welcome back vklang! A progress list of the required criteria for merging this PR into |
@viktorklang-ora The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
@Override | ||
@SuppressWarnings("unchecked") | ||
public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) { | ||
public <R, A> R collect(Collector<? super P_OUT, A, R> collector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Un-final:ed to allow GathererOp to implement fusion of gather + collect
6a7d16a
to
2c6998e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very solid work, thank you! See some minor comments inline. I actually have much more ideas of specific gatherers, but they could be discussed separately.
One thing that bothers me is that Gatherer is similar to Collector, yet the APIs are totally different, so users may be confused. I like the Gatherer API much more and I see that evolving current Collector API might be hard, given tons of third-party implementations, so I don't see a good way to fix this. But probably we should provide Collector-to-Gatherer adapter (producing one-element stream)? And in general, it would be nice to use exactly-one-element-producing-gatherers as terminal operations, without explicit findFirst().get().
* @return a new Gatherer | ||
* @throws NullPointerException if any of the parameters are null | ||
*/ | ||
public static <T, R> Gatherer<T, ?, R> fold( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be explicitly named as foldLeft
? Because foldRight
is also possible (though will require complete stream buffering). Also see JDK-8133680 and JDK-8292845 (probably should be linked to gatherer's issue, and read the comment about naming)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't think foldRight
makes much sense for potentially unbounded structures such as Stream. In the case you need it, it might be better to export it to a List and then reversing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the unboundness argument does not hold, as it can be equally applied to any operation except the short-circuiting ones. Yes, this includes your fold()
. The only difference is that foldRight()
will fail with OutOfMemoryError while fold()
will be stuck in the endless loop. Or probably also fail depending on the folder function (for example, if it's a string concatenation, like in the Javadoc example). You may not like the fact the foldRight()
buffers the whole stream content, but we already have operations like this, namely sorted()
and parallel ordered distinct()
.
I'm not convincing you to add foldRight()
to the JDK. But if we don't have it, somebody will surely write it in third-party libraries. After all, this is what third-party libraries are for. In this case, simple name fold()
can be confusing.
Note that Haskell has separate foldl, foldr, scanl, scanr, so the order is always clear. Scala has fold()
(which is like reduce()
and can fold in any order), foldLeft()
and foldRight()
. Naming this method simply fold()
sounds quite confusing. It should be either foldLeft()
or reduceLeft()
(even if we don't provide the -right versions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The my argument around the unboundedness was semantical (starting from the end doesn't make semantical sense for something with no defined end). In this case it doesn't necessarily need to be an infinite stream for it to cause trouble, a large collection might be enough, since you need enough memory to reverse the order which means either making a full copy or having a (very) deep stack.
Having used "Left" and "Right"-named operations for a very long time, I am of the opinion that the "Left"-ones are the 98% use-case, so in this case I am inclined to be that it is going to be less confusing to developers to be "Left-biased" in the naming. I'm completely aware that I could be wrong here, yet with that said, I did give it a lot of thought before deciding.
My hope is that there will be a lot of Gatherer
s created by Java developers so that they can use Streams for more use-cases, and be able to stay within Streams for as long as they need.
* An operation which performs an ordered, <i>reduction-like</i>, | ||
* transformation for scenarios where no combiner-function can be | ||
* implemented, or for reductions which are intrinsically | ||
* order-dependent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we should highlight the most important difference between fold
and reduce
: reduce
must provide an associative function, while fold
does not require this restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look here, probably this comment was overlooked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about that. I tried to word it will "less jargon" so it is easier to casually read and understand.
I'm wondering if adding "associativity" helps in that regard? 🤔
* Optional<String> numberString = | ||
* Stream.of(1,2,3,4,5,6,7,8,9) | ||
* .gather( | ||
* Gatherers.fold(() -> "", (string, number) -> string + number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular sample is not great, as it's totally replaceable with reduce() (map(String::valueOf).reduce(String::concat)
), so users may wonder why using fold instead. I can suggest another sample which I show everywhere:
anyList.stream().gather(Gatherers.fold(() -> 1, (acc, e) -> acc * 31 + Objects.hashCode(e))).findFirst().get();
(haven't tested, but the idea is like this)
It computes the hashCode of the list, according to the list specification. Here, the accumulator is inherently non-associative, so if you replace it with reduce()
and make the stream parallel, it will yield the wrong result. But Gatherers.fold
will work!
If you don't like it, we can think up something else, but concatenation sample is certainly not good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the best thing here is to change such that the supplier supplies something non-zero like "String: "?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such an example still looks artificial for me, as it's easy to rewrite with reduce()
(just prepend with prefix outside of the stream). Well, it's up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you might not want to exit the stream just yet, and have to create a whole new stream.
In any case, I think it would be great to have a better example, I'm just not sure what that'd be just yet. Open to suggestions! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably use {@snippet ...}
:
Thanks for the feedback @amaembo! I hope you find the updates reasonable :) |
Good idea—I might do so when I have the opportunity. |
Webrevs
|
…ring of Gatherers.java to put public at the top of the file.
Integrates Paul Sandoz's feedback/changes Co-authored-by: Paul Sandoz <paul.d.sandoz@googlemail.com>
…default impls for Gatherer-related interfaces
correcting descriptions of Integrator type parameters, and more consistently use "initializer function" instead of "supplier function".
ec84174
to
49c94c7
Compare
@viktorklang-ora Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
/integrate |
@viktorklang-ora |
/sponsor |
Going to push as commit 33b26f7.
Your commit was automatically rebased without conflicts. |
@AlanBateman @viktorklang-ora Pushed as commit 33b26f7. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This Pull-Request implements JEP-461
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16420/head:pull/16420
$ git checkout pull/16420
Update a local copy of the PR:
$ git checkout pull/16420
$ git pull https://git.openjdk.org/jdk.git pull/16420/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16420
View PR using the GUI difftool:
$ git pr show -t 16420
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16420.diff
Webrev
Link to Webrev Comment