New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8180352: Add Stream.toList() method #1026
8180352: Add Stream.toList() method #1026
Conversation
👋 Welcome back smarks! A progress list of the required criteria for merging this PR into |
@stuart-marks The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/label remove security |
@stuart-marks |
/csr |
@stuart-marks has indicated that a compatibility and specification (CSR) request is needed for this pull request. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a test that tests the default implementation in j.u.s.Stream
? Or maybe there is and I missed that?
Objects.requireNonNull(o); | ||
} | ||
|
||
switch (input.length) { // implicit null check of elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was a variable renamed? Should "elements" be "input"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, actually that comment belongs up above. I'll fix it, thanks.
/** | ||
* Accumulates the elements of this stream into a {@code List}. The elements in | ||
* the list will be in this stream's encounter order, if one exists. There are no | ||
* guarantees on the implementation type, mutability, serializability, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful for callers to feel more confident that they will get an immutable instance. In java.time.* we have wording like "This interface places no restrictions on the mutability of implementations, however immutability is strongly recommended." Could something like that work here, emphasising that everyone implementing this method should seek to return an immutable list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point, the "no guarantee of mutability" clashes with the later statement about the possibility of the returned instance being value-based, which strongly implies immutability. I'll work on tuning this up to be a stronger statement on immutability, while retaining "no-guarantee" for implementation type, serializability, etc. I think we do want to preserve future implementation flexibility in those areas.
Mailing list message from Remi Forax on core-libs-dev: ----- Mail original -----
Hi Stuart, It's true that a Stream support nulls, we want people to be able map() with a method that returns null and then filter out the nulls (even if using flatMap for this case is usually a better idea), For me allowing in 2020 to store null in a collection is very backward, all collections since 1.6 doesn't support nulls. Also, adding a third immutable list creates a problem, it means that now when we get an immutable list it can be 3 different implementations but the VM only uses bimorphic inlining cache, I believe that instead of inventing a third semantics that allows to store null in a collection, we should use the semantics of stream.collect(Collectors.toUnmodifiableList()) even if it means that toList() will throw a NPE if one element is null. R?mi |
Mailing list message from Brian Goetz on core-libs-dev:
Uhm ... no and no. It does mean that all methods of the stream interface have to support The original idea was not "the nulls can be removed later."? The There is no value in making users remember which stream methods are I understand the hatred for nulls, but it's a fantasy to think we can |
@dfuch wrote:
The test method |
Mailing list message from John Rose on core-libs-dev: I agree with Stuart and Brian that streams should be null-tolerant, On Nov 3, 2020, at 6:53 AM, Remi Forax <forax at univ-mlv.fr> wrote:
Yes, this part concerns me, with my JIT hat on. It seems to me that the problem can be removed simply by allowing And if a call to Stream::toList call returns 0, 1, or 2 items, |
Mailing list message from forax at univ-mlv.fr on core-libs-dev: ----- Mail original -----
A stream is computation, so i agree that letting nulls to flow is the right call.
I think that ship has sailed a long ago. [...] Let's take a step back, Because of that, i don't think we even have the choice of the semantics for Stream.toList(), it has to be the same as stream.collect(Collectors.toList()). So you are right that toList() can not be null hostile because Collectors.toList() is not null hostile. But it can not be immutable too, for the same reason.
I would say half of the right choices, null hostile: right, immutable: wrong :) And if we at some point introduce a method toImmutableList() on Stream, it will have to be null hostile. R?mi |
Mailing list message from Brian Goetz on core-libs-dev:
Yes, and this is no mere "technical argument".? A Collector is a
This doesn't remotely follow.? (And, if it did, I would likely not even
Nope.? The spec of toList() very clearly says: no guarantees as to the (In fact, we would be entirely justified to change the behavior of Please, can we stop having this same tired "I hate nulls, let's find a |
Mailing list message from forax at univ-mlv.fr on core-libs-dev:
It seems you had an issue when replying to my mail, there is a paragraph before "Because"
I don't understand your argument. R?mi |
Mailing list message from Aaron Scott-Boddendijk on core-libs-dev:
type,
mean
Collectors::toList
spec was
too many
picking
Collectors.toList(), and you also know that if there is a method
same reason but you want the semantics of Stream.toList() to be different Surely, taking that position, List.of(...) should not have produced an In all of the teams I've worked with, through many Java version transitions This case is no different, developers will need to identify the cases in And static-analysis will be able to propose some of those replacements for Does this mean it's more complex than a text-replace, sure. Will there be Please don't blunt the tool to make it safer for children. -- On Wed, Nov 4, 2020 at 9:30 AM <forax at univ-mlv.fr> wrote:
|
Mailing list message from Martin Desruisseaux on core-libs-dev: Hello Le 03/11/2020 ? 21:30, forax at univ-mlv.fr a ?crit?:
Would they would be so numerous to do this change without verifying if ??? Martin |
Mailing list message from forax at univ-mlv.fr on core-libs-dev: Hi Aaron,
The problem is that people will see stream.toList() as a shorcut for stream.collect(Collectors.toList()),
We have Collectors.toList() and Collectors.toUnmodifiableList(), would it make more sense to have stream.toList() and stream.toUnmodifiableList() instead of having stream.toList() that behave half way in betwen the semantics of Collectors.toList() and Collectors.toUnmodifiableList().
I believe that a tool able to suggest to use Collectors.toUnmodifiableList() instead of Collectors.toList() will rely one a global interprocedural static analysis, so such analysis will be invalid once you will update one of your dependency.
You want immutability/unmodifiability over the cost of people miss mis-using the API. I think it's a sin as bad as me want stream.toList() to return a null hostile List :) And I said above perhaps we should not discussed about adding a method Stream.toList() but a method Stream.toUnmodifiableList().
R?mi
|
Mailing list message from Remi Forax on core-libs-dev: ----- Mail original -----
yes, that why i said to Brian that he is right that stream.toList() should return a List that accept null.
R?mi |
Mailing list message from Stuart Marks on core-libs-dev: On 11/3/20 3:10 AM, Florian Weimer wrote:
The new Stream.toList() actually permits null elements (by design) so it goes Rejection of nulls for List::copyOf is be handled by tests in test/jdk/java/util/List/ListFactories.java s'marks |
OK, there are rather too many forking threads here for me to try to reply to any particular message, so I'll try to summarize things and say what I intend to do. Null tolerance. While part of me wants to prohibit nulls, the fact is that Streams pass through nulls, and toList() would be much less useful if it were to reject nulls. The affinity here is closer to Stream.toArray(), which allows nulls, rather than Collectors.to[Unmodifiable]List. Unmodifiability. Unlike with nulls, this is where we've been taking a strong stand recently, where new constructs are unmodifiable ("shallowly immutable"). Consider the Java 9 unmodifiable collections, records, primitive classes, etc. -- they're all unmodifiable. They're data-race-free and are resistant to a whole class of bugs that arise from mutability. Unfortunately, the combination of the above means that the returned List is neither like an ArrayList nor like an unmodifiable list produced by List.of() or by Collectors.toUnmodifiableList(). [Heavy sigh.] While I've been somewhat reluctant to introduce this new thing, the alternatives of trying to reuse one of the existing things are worse. John and Rémi pointed out that the way I implemented this, using a subclass, reintroduces the possibility of problems with megamorphic dispatch which we had so carefully avoided by reducing the number of implementation classes in this area. I agree this is a problem. I also had an off-line discussion with John where we discussed the serialization format, which unfortunately is somewhat coupled with this issue. (Fortunately I think it can mostly be decoupled.) Given that we're introducing a new thing, which is an unmodifiable-list-that-allows-nulls, this needs to be manifested in the serialized form of these new objects. (This corresponds to the new tag value of IMM_LIST_NULLS in the CollSer class.) The alternative is to reuse the existing serialized form, IMM_LIST, for both of these cases, relaxing it to allow nulls. This would be a serialization immutability problem. Suppose I had an application that created a data structure that used lists from List.of(), and I had a global assertion over that structure that it contained no nulls. Further suppose that I serialized and deserizalized this structure. I'd want that assertion to be preserved after deserialization. If another app (or a future version of this app) created the structure using Stream.toList(), this would allow nulls to leak into that structure and violate that assertion. Therefore, the result of Stream.toList() should not be serialization-compatible with List.of() et. al. That's why there's the new IMM_LIST_NULLS tag in the serial format. However, the new representation in the serial format does NOT necessarily require the implementation to be a different class. I'm going to investigate collapsing ListN and ListNNullsAllowed back into a single class, while preserving the separate serialized forms. This should mitigate the concerns about megamorphic dispatch. |
*/ | ||
@SuppressWarnings("unchecked") | ||
default List<T> toList() { | ||
return (List<T>) Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we return listFromTrustedArrayNullsAllowed
here as in ReferencePipeline
?
Or at least, we should avoid unnecessary copying of arrays. See how this is done in StreamEx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listFromTrustedArrayNullsAllowed
is clearly not an option, as it will produce shared-secret leaking (see JDK-8254090 for a similar case). StreamEx solution is dirty as it relies on the implementation detail. I believe, OpenJDK team is not very interested in providing optimal implementations for third-party stream implementations, as third-party implementations will likely update by themselves when necessary. At least, my suggestion to make the default mapMulti
implementation better was declined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation duplicates the array twice, once in this.toArray() and once in the constructor of ArrayList that calls toArray() on Collections.ArrayList.
I'm sure there is a better way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In ReferencePipeline
class we have:
@Override
public final Object[] toArray() {
return toArray(Object[]::new);
}
@Override
@SuppressWarnings("unchecked")
public final <A> A[] toArray(IntFunction<A[]> generator) {
...
@SuppressWarnings("rawtypes")
IntFunction rawGenerator = (IntFunction) generator;
return (A[]) Nodes.flatten(evaluateToArrayNode(rawGenerator), rawGenerator)
.asArray(rawGenerator);
}
In Nodes
class we have:
public static <T> Node<T> flatten(Node<T> node, IntFunction<T[]> generator) {
if (node.getChildCount() > 0) {
...
T[] array = generator.apply((int) size);
new ToArrayTask.OfRef<>(node, array, 0).invoke();
return node(array);
} else {
return node;
}
}
It looks like it is required to implement toList
method in a similar way in order to avoid array copy. i.e. there will be IntFunction<List<T>> generator
which will generate 'ArrayList' with specified size and the list's add
method will be called to add elements to the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the default implementation in the Stream interface, which is overridden by an implementation in the ReferencePipeline class, so it will rarely be used in normal operation. The ReferencePipeline version (part of this changeset) is based on toArray() and avoids any copying. I'm thus not inclined to add new interfaces in order to support this default implementation.
As written it's true that the default implementation does perform apparently redundant copies, but we can't be assured that toArray() actually returns a freshly created array. Thus, we wrap it using Arrays.asList and then copy it using the ArrayList constructor. This is unfortunate but necessary to avoid situations where someone could hold a reference to the internal array of a List, allowing modification of a List that's supposed to be unmodifiable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative with similar performance would be to do a Stream.toArray() and then copy that array into new Object[] and then wrap that copy with listFromTrustedArrayNullsAllowed(). The difference would be in the serialization format of the resulting List and maybe also in the access performance of resulting List (no indirection via the unmodifiableList wrapper and different type for JIT to speculate about). So if we want the resulting List to behave exactly the same in both implementations of toList(), then this alternative might be preferable. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such alternative might also be faster using intrisified array copying method (here measuring just copying overhead):
Benchmark (len) Mode Cnt Score Error Units
ToListBench.toList1 10 avgt 10 14.213 ± 0.061 ns/op
ToListBench.toList1 1000 avgt 10 541.883 ± 3.845 ns/op
ToListBench.toList1 1000000 avgt 10 753223.523 ± 4656.664 ns/op
ToListBench.toList2 10 avgt 10 8.810 ± 0.052 ns/op
ToListBench.toList2 1000 avgt 10 264.748 ± 0.807 ns/op
ToListBench.toList2 1000000 avgt 10 349518.502 ± 3242.061 ns/op
https://gist.github.com/plevart/974b67b65210f8dd122773f481c0a603
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to change the default implementation and its specification, for a couple reasons. First, this implementation won't be used in normal operation, as it's overridden by the JDK implementation. (I expect that stream extension libraries will be updated to override it as well.) Second, I'm quite uncomfortable using an internal method from within a default implementation. Right now this method is an agreement between the unmodifiable collections implementation and the ReferencePipeline implementation, and we have complete freedom to change this agreement at any time. If this internal method were used from a default implementation, its behavior would have to be specified in @implSpec
somehow. Certainly one could write some very vague words for the specification that allow some freedom to make changes, but this has to choose between a better specification and more implementation freedom. I don't see the value in having to make this tradeoff at all.
An extra copy can be avoided via a private agreement between ArrayList and Arrays.asList, which should probably be done in any case.
Mailing list message from Simon Roberts on core-libs-dev: At the risk of a can of worms, or at least of raising something that has This discussion of unmodifiable lists brings me back to the thought that Is this something that has been considered and rejected? (If so, is it easy Cheers, On Thu, Nov 5, 2020 at 10:30 AM Stuart Marks <smarks at openjdk.java.net> -- |
checkUnmodifiable(objects); | ||
} | ||
|
||
@Test(dataProvider = "withNull:StreamTestData<Integer>", dataProviderClass = StreamTestDataProvider.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the non-default toList()
implementation defers to the toArray()
terminal op there is no need for this and the following tests, which are really designed to shake out the optimizations when producing and operating on arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll remove all the tests starting from here to the end of the file. I'm assuming that's the set of tests you're referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes from line 73 to G (the end).
@ePaul wrote:
The separation between streams and the java.util Collections Framework is a good design principle, but it isn't an ironclad rule. It's still easy to have streams create instances of other collections libraries using the Collector interface. What's different here is that the Collections Framework has "leaked into" streams a little bit more, so that they're now more interdependent. This doesn't seem to have any disadvantages; it seems unlikely that the Collections Framework will ever be unplugged from the JDK. However, the benefits are that a List is the closest thing we have to an unmodifiable array that also plays well with generics and that can be value-based; these benefits are considerable. |
At some point there probably will need to be a long article explaining all the issues here, but at the moment the best writeup I have is this one: https://stackoverflow.com/a/57926310/1441122 TL;DR there are a few different ways to approach retrofitting something like this, but they all have enough compromises that the net benefits are unclear. |
Add nulls-allowed empty list. Simplify indexOf/lastIndexOf. Reduce tests, add contains(null) tests.
Mailing list message from Remi Forax on core-libs-dev: ----- Mail original -----
This question is asked at least every six months since 1998
regards,
|
Mailing list message from Alan Snyder on core-libs-dev:
The question that Simon asked is not exactly the question that is answered in this link. The question of whether List should be a subtype of (some kind of) ImmutableList is answered in The question answered in the link is basically a straw man: why not capture every conceivable But a question that deserves ongoing review is whether Java should support immutable collections Much could be written, and probably has been, about the disadvantages of basing the The lack of subtyping between List and ImmutableList is not terrible. Arrays coexist with Alan |
Mailing list message from Martin Desruisseaux on core-libs-dev: Le 06/11/2020 ? 19:00, Alan Snyder a ?crit?:
Maybe an alternative way (admittedly more difficult) to have immutable I realize that it would be a difficult task: how to handle private ??? Regards ??? ??? Martin |
Hi Stuart, I would like to discuss the serialization. You introduce new CollSer.IMM_LIST_NULLS type of immutable collection. This means that if this change goes into JDK16 for example, JDK15 and before will not be able to deserialize such list as they know nothing about IMM_LIST_NULLS even if such lists don't contain nulls. The reason you say to chose new type of serialization format is the following:
I don't quite get this reasoning. Let's try to decompose the reasoning giving an example. Suppose we had the following data structure:
App v1 creates such structures using new Names(List.of(...)) and serializes/deserializes them. They keep the invariant that no nulls are present. Now comes App v2 that starts using new Names(stream.toList()) which allows nulls to be present. When such Names instance from app v2 is serialized and then deserialized in app v1, nulls "leak" into data structure of app v1 that does not expect them. But the question is how does having a separate CollSer.IMM_LIST_NULLS type prevent that from happening? |
I can see that having a separate IMM_LIST_NULLS type might be necessary to preserve the allows-null/disallows-null behaviour of indexOf and lastIndexOf methods... NOTE ALSO that ListN.equals(o) and ListN.hashCode() are inherited from AbstractImmutableList:
and
...which means they will throw NPE when the list contains null. The same goes for SubList. |
@plevart wrote:
When a serialized list with IMM_LIST_NULLS is deserialized on an older JDK, it'll throw InvalidObjectException since that tag isn't valid on older JDKs. Obviously this is still an error, but it's a fail-fast approach that avoids letting nulls leak into a data structure where they might cause a problem some arbitrary time later.
Good catch! Yes, this is a problem. I'll do some rearranging here and add more test cases. Thanks for spotting this. |
Yes, but that is JDK16+ vs. JDK15- and not App V1 vs. App V2 thing. If both apps run on JDK16+, there will be no exception. What I'm trying to say is that the same problem of injecting unexpected nulls via serialization/deserialization can happen also if App V2 starts using ArrayList to construct the data structure and serialize it while App V1 deserializes it and expects non-null values only. App V1 would already have to guard against null values during deserialization in that case, because possibility of null values in deserialized data structure is nothing new for App V1. |
Fix equals, hashCode, indexOf, lastIndexOf to handle nulls properly. Add MOAT tests for new lists; add equals and hashCode tests.
@plevart wrote:
Sure, the IMM_LIST_NULLS tag only helps with serialization compatibility across JDK releases. There are lots of ways an app can make incompatible changes to the serialized forms of its objects that we have no way of detecting. |
Mailing list message from Peter Levart on core-libs-dev: On 11/17/20 9:08 PM, Stuart Marks wrote:
I would say it goes the other way - it worsens the serialization
So why bother with making the result of serialized stream.toList() not But I see that the new? IMM_LIST_NULLS type is needed for one other Peter |
OK, I was imprecise. The IMM_LIST_NULLS tag has an effect only on serialization across JDK releases, not changes to the application's serialization format using the same JDK release, or even on many changes to the app's serialization format across JDK releases. By "helps with serialization compatibility" I meant that this new serialized form helps the general issue of serialization compatibility (really, incompatibility) by failing fast in certain cases, instead of possibly allowing polluted data to leak into the receiving application and causing some arbitrary exception later during the run. But as you noted last, this is a different kind of object, and it has different behavior, so it needs a different encoding in the serialized form. I've updated this PR with changes to fix null handling and other issues. |
return (List<E>) new List12<>(input[0], input[1]); | ||
default: | ||
return (List<E>) new ListN<>(input, false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a switch expression + arrow (->) here will allow you too factor the cast, even if it disappear in the generated bytecode, I think it makes the code more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh yes, I haven't gotten used to using those yet!
Mailing list message from Remi Forax on core-libs-dev: ----- Mail original -----
Hi Stuart, I still think that using a null friendly List is a mistake, but you, Brian and John all think it's not an issue, so let's go with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend you review the problem list of this source file in IntelliJ, which shows some potential cleanups. This can be done as a follow on PR if appropriate, as can any improvements to the default implementation.
* Null argument or null elements in the argument will result in NPE. | ||
* | ||
* @param <E> the List's element type | ||
* @param input the input array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rogue parameter declaration.
@stuart-marks This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 377 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
/integrate |
@stuart-marks Since your change was applied there have been 437 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 41dbc13. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This change introduces a new terminal operation on Stream. This looks like a convenience method for Stream.collect(Collectors.toList()) or Stream.collect(Collectors.toUnmodifiableList()), but it's not. Having this method directly on Stream enables it to do what can't easily by done by a Collector. In particular, it allows the stream to deposit results directly into a destination array (even in parallel) and have this array be wrapped in an unmodifiable List without copying.
In the past we've kept most things from the Collections Framework as implementations of Collector, not directly on Stream, whereas only fundamental things (like toArray) appear directly on Stream. This is true of most Collections, but it does seem that List is special. It can be a thin wrapper around an array; it can handle generics better than arrays; and unlike an array, it can be made unmodifiable (shallowly immutable); and it can be value-based. See John Rose's comments in the bug report:
https://bugs.openjdk.java.net/browse/JDK-8180352?focusedCommentId=14133065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14133065
This operation is null-tolerant, which matches the rest of Streams. This isn't specified, though; a general statement about null handling in Streams is probably warranted at some point.
Finally, this method is indeed quite convenient (if the caller can deal with what this operation returns), as collecting into a List is the most common stream terminal operation.
Progress
Testing
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/1026/head:pull/1026
$ git checkout pull/1026