Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8319123: Implement JEP 461: Stream Gatherers (Preview) #16420

Closed
wants to merge 32 commits into from

Conversation

viktorklang-ora
Copy link
Contributor

@viktorklang-ora viktorklang-ora commented Oct 30, 2023

This Pull-Request implements JEP-461


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8319173 to be approved

Issues

  • JDK-8319123: Implement JEP 461: Stream Gatherers (Preview) (Enhancement - P4)
  • JDK-8319173: Implement JEP 461: Stream Gatherers (Preview) (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16420/head:pull/16420
$ git checkout pull/16420

Update a local copy of the PR:
$ git checkout pull/16420
$ git pull https://git.openjdk.org/jdk.git pull/16420/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16420

View PR using the GUI difftool:
$ git pr show -t 16420

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16420.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 30, 2023

👋 Welcome back vklang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 30, 2023

@viktorklang-ora The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Oct 30, 2023
@viktorklang-ora viktorklang-ora changed the title Gatherer JDK-8319123 : Preview implementation of JEP-461 Oct 30, 2023
@viktorklang-ora viktorklang-ora changed the title JDK-8319123 : Preview implementation of JEP-461 JDK-8319123 : Implementation of JEP-461 Oct 30, 2023
@Override
@SuppressWarnings("unchecked")
public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
public <R, A> R collect(Collector<? super P_OUT, A, R> collector) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un-final:ed to allow GathererOp to implement fusion of gather + collect

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Oct 31, 2023
@viktorklang-ora viktorklang-ora changed the title JDK-8319123 : Implementation of JEP-461 JDK-8319123 : Implementation of JEP-461: Stream Gatherers (Preview) Nov 3, 2023
Copy link
Contributor

@amaembo amaembo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very solid work, thank you! See some minor comments inline. I actually have much more ideas of specific gatherers, but they could be discussed separately.

One thing that bothers me is that Gatherer is similar to Collector, yet the APIs are totally different, so users may be confused. I like the Gatherer API much more and I see that evolving current Collector API might be hard, given tons of third-party implementations, so I don't see a good way to fix this. But probably we should provide Collector-to-Gatherer adapter (producing one-element stream)? And in general, it would be nice to use exactly-one-element-producing-gatherers as terminal operations, without explicit findFirst().get().

src/java.base/share/classes/java/util/stream/Gatherer.java Outdated Show resolved Hide resolved
src/java.base/share/classes/java/util/stream/Gatherer.java Outdated Show resolved Hide resolved
* @return a new Gatherer
* @throws NullPointerException if any of the parameters are null
*/
public static <T, R> Gatherer<T, ?, R> fold(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be explicitly named as foldLeft? Because foldRight is also possible (though will require complete stream buffering). Also see JDK-8133680 and JDK-8292845 (probably should be linked to gatherer's issue, and read the comment about naming)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't think foldRight makes much sense for potentially unbounded structures such as Stream. In the case you need it, it might be better to export it to a List and then reversing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the unboundness argument does not hold, as it can be equally applied to any operation except the short-circuiting ones. Yes, this includes your fold(). The only difference is that foldRight() will fail with OutOfMemoryError while fold() will be stuck in the endless loop. Or probably also fail depending on the folder function (for example, if it's a string concatenation, like in the Javadoc example). You may not like the fact the foldRight() buffers the whole stream content, but we already have operations like this, namely sorted() and parallel ordered distinct().

I'm not convincing you to add foldRight() to the JDK. But if we don't have it, somebody will surely write it in third-party libraries. After all, this is what third-party libraries are for. In this case, simple name fold() can be confusing.

Note that Haskell has separate foldl, foldr, scanl, scanr, so the order is always clear. Scala has fold() (which is like reduce() and can fold in any order), foldLeft() and foldRight(). Naming this method simply fold() sounds quite confusing. It should be either foldLeft() or reduceLeft() (even if we don't provide the -right versions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The my argument around the unboundedness was semantical (starting from the end doesn't make semantical sense for something with no defined end). In this case it doesn't necessarily need to be an infinite stream for it to cause trouble, a large collection might be enough, since you need enough memory to reverse the order which means either making a full copy or having a (very) deep stack.

Having used "Left" and "Right"-named operations for a very long time, I am of the opinion that the "Left"-ones are the 98% use-case, so in this case I am inclined to be that it is going to be less confusing to developers to be "Left-biased" in the naming. I'm completely aware that I could be wrong here, yet with that said, I did give it a lot of thought before deciding.

My hope is that there will be a lot of Gatherers created by Java developers so that they can use Streams for more use-cases, and be able to stay within Streams for as long as they need.

* An operation which performs an ordered, <i>reduction-like</i>,
* transformation for scenarios where no combiner-function can be
* implemented, or for reductions which are intrinsically
* order-dependent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we should highlight the most important difference between fold and reduce: reduce must provide an associative function, while fold does not require this restriction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please take a look here, probably this comment was overlooked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking about that. I tried to word it will "less jargon" so it is easier to casually read and understand.
I'm wondering if adding "associativity" helps in that regard? 🤔

* Optional<String> numberString =
* Stream.of(1,2,3,4,5,6,7,8,9)
* .gather(
* Gatherers.fold(() -> "", (string, number) -> string + number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular sample is not great, as it's totally replaceable with reduce() (map(String::valueOf).reduce(String::concat)), so users may wonder why using fold instead. I can suggest another sample which I show everywhere:
anyList.stream().gather(Gatherers.fold(() -> 1, (acc, e) -> acc * 31 + Objects.hashCode(e))).findFirst().get();
(haven't tested, but the idea is like this)
It computes the hashCode of the list, according to the list specification. Here, the accumulator is inherently non-associative, so if you replace it with reduce() and make the stream parallel, it will yield the wrong result. But Gatherers.fold will work!

If you don't like it, we can think up something else, but concatenation sample is certainly not good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the best thing here is to change such that the supplier supplies something non-zero like "String: "?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such an example still looks artificial for me, as it's easy to rewrite with reduce() (just prepend with prefix outside of the stream). Well, it's up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you might not want to exit the stream just yet, and have to create a whole new stream.

In any case, I think it would be great to have a better example, I'm just not sure what that'd be just yet. Open to suggestions! :)

Copy link

@ExE-Boss ExE-Boss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably use {@snippet ...}:

@viktorklang-ora
Copy link
Contributor Author

Thanks for the feedback @amaembo! I hope you find the updates reasonable :)

@viktorklang-ora
Copy link
Contributor Author

These should probably use {@snippet ...}:

Good idea—I might do so when I have the opportunity.

@viktorklang-ora viktorklang-ora marked this pull request as ready for review November 8, 2023 15:36
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 8, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 8, 2023

@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@viktorklang-ora Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Nov 30, 2023
@viktorklang-ora
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Nov 30, 2023
@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@viktorklang-ora
Your change (at version 49c94c7) is now ready to be sponsored by a Committer.

@AlanBateman
Copy link
Contributor

AlanBateman commented Nov 30, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 30, 2023

Going to push as commit 33b26f7.
Since your change was applied there have been 6 commits pushed to the master branch:

  • 04ad98e: 8315458: Implement JEP 463: Implicitly Declared Classes and Instance Main Method (Second Preview)
  • 03759e8: 8320304: Refactor and simplify monitor deflation functions
  • da7cf25: 8320665: update jdk_core at open/test/jdk/TEST.groups
  • c9d15f7: 8321025: Enable Neoverse N1 optimizations for Neoverse V2
  • 61653a1: 8320830: [AIX] Dont mix os::dll_load() with direct dlclose() calls
  • 8b102ed: 8321063: AArch64: Zero build fails after JDK-8320368

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 30, 2023
@openjdk openjdk bot closed this Nov 30, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Nov 30, 2023
@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@AlanBateman @viktorklang-ora Pushed as commit 33b26f7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.