Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8180352: Add Stream.toList() method #1026

Closed

Conversation

stuart-marks
Copy link
Member

@stuart-marks stuart-marks commented Nov 3, 2020

This change introduces a new terminal operation on Stream. This looks like a convenience method for Stream.collect(Collectors.toList()) or Stream.collect(Collectors.toUnmodifiableList()), but it's not. Having this method directly on Stream enables it to do what can't easily by done by a Collector. In particular, it allows the stream to deposit results directly into a destination array (even in parallel) and have this array be wrapped in an unmodifiable List without copying.

In the past we've kept most things from the Collections Framework as implementations of Collector, not directly on Stream, whereas only fundamental things (like toArray) appear directly on Stream. This is true of most Collections, but it does seem that List is special. It can be a thin wrapper around an array; it can handle generics better than arrays; and unlike an array, it can be made unmodifiable (shallowly immutable); and it can be value-based. See John Rose's comments in the bug report:

https://bugs.openjdk.java.net/browse/JDK-8180352?focusedCommentId=14133065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14133065

This operation is null-tolerant, which matches the rest of Streams. This isn't specified, though; a general statement about null handling in Streams is probably warranted at some point.

Finally, this method is indeed quite convenient (if the caller can deal with what this operation returns), as collecting into a List is the most common stream terminal operation.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Linux x86 Windows x64 macOS x64
Build ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed) ✔️ (9/9 passed) ✔️ (9/9 passed)

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1026/head:pull/1026
$ git checkout pull/1026

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 3, 2020

👋 Welcome back smarks! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 3, 2020

@stuart-marks The following labels will be automatically applied to this pull request:

  • core-libs
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added security security-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Nov 3, 2020
@stuart-marks
Copy link
Member Author

/label remove security

@openjdk openjdk bot removed the security security-dev@openjdk.org label Nov 3, 2020
@openjdk
Copy link

openjdk bot commented Nov 3, 2020

@stuart-marks
The security label was successfully removed.

@stuart-marks
Copy link
Member Author

/csr

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Nov 3, 2020
@openjdk
Copy link

openjdk bot commented Nov 3, 2020

@stuart-marks has indicated that a compatibility and specification (CSR) request is needed for this pull request.
@stuart-marks please create a CSR request and add link to it in JDK-8180352. This pull request cannot be integrated until the CSR request is approved.

@stuart-marks stuart-marks marked this pull request as ready for review November 3, 2020 03:11
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 3, 2020
@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Webrevs

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a test that tests the default implementation in j.u.s.Stream? Or maybe there is and I missed that?

Objects.requireNonNull(o);
}

switch (input.length) { // implicit null check of elements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was a variable renamed? Should "elements" be "input"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, actually that comment belongs up above. I'll fix it, thanks.

/**
* Accumulates the elements of this stream into a {@code List}. The elements in
* the list will be in this stream's encounter order, if one exists. There are no
* guarantees on the implementation type, mutability, serializability, or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful for callers to feel more confident that they will get an immutable instance. In java.time.* we have wording like "This interface places no restrictions on the mutability of implementations, however immutability is strongly recommended." Could something like that work here, emphasising that everyone implementing this method should seek to return an immutable list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point, the "no guarantee of mutability" clashes with the later statement about the possibility of the returned instance being value-based, which strongly implies immutability. I'll work on tuning this up to be a stronger statement on immutability, while retaining "no-guarantee" for implementation type, serializability, etc. I think we do want to preserve future implementation flexibility in those areas.

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from Remi Forax on core-libs-dev:

----- Mail original -----

De: "Stuart Marks" <smarks at openjdk.java.net>
?: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Mardi 3 Novembre 2020 04:18:08
Objet: RFR: 8180352: Add Stream.toList() method

This change introduces a new terminal operation on Stream. This looks like a
convenience method for Stream.collect(Collectors.toList()) or
Stream.collect(Collectors.toUnmodifiableList()), but it's not. Having this
method directly on Stream enables it to do what can't easily by done by a
Collector. In particular, it allows the stream to deposit results directly into
a destination array (even in parallel) and have this array be wrapped in an
unmodifiable List without copying.

In the past we've kept most things from the Collections Framework as
implementations of Collector, not directly on Stream, whereas only fundamental
things (like toArray) appear directly on Stream. This is true of most
Collections, but it does seem that List is special. It can be a thin wrapper
around an array; it can handle generics better than arrays; and unlike an
array, it can be made unmodifiable (shallowly immutable); and it can be
value-based. See John Rose's comments in the bug report:

https://bugs.openjdk.java.net/browse/JDK-8180352?focusedCommentId=14133065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14133065

This operation is null-tolerant, which matches the rest of Streams. This isn't
specified, though; a general statement about null handling in Streams is
probably warranted at some point.

Finally, this method is indeed quite convenient (if the caller can deal with
what this operation returns), as collecting into a List is the most common
stream terminal operation.

Hi Stuart,
I'm Okay with the idea of having a method toList() on Stream but really dislike the proposed semantics because tit is neither stream.collect(toList()) nor stream.collect(toUnmodifiableList()) but something in between.

It's true that a Stream support nulls, we want people to be able map() with a method that returns null and then filter out the nulls (even if using flatMap for this case is usually a better idea),
but it doesn't mean that all methods of the Stream interface has to support nulls, the original idea was more to allow nulls to flow in the stream because at some point they will be removed before being stored in a collection.

For me allowing in 2020 to store null in a collection is very backward, all collections since 1.6 doesn't support nulls.

Also, adding a third immutable list creates a problem, it means that now when we get an immutable list it can be 3 different implementations but the VM only uses bimorphic inlining cache,
so more callsites will fail to inline because of that. I think we have already reduced the number of implementation of immutable map from 3 to 2 for the very same reasons.

I believe that instead of inventing a third semantics that allows to store null in a collection, we should use the semantics of stream.collect(Collectors.toUnmodifiableList()) even if it means that toList() will throw a NPE if one element is null.

R?mi

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from Brian Goetz on core-libs-dev:

This change introduces a new terminal operation on Stream. This looks like a
convenience method for Stream.collect(Collectors.toList()) or
Stream.collect(Collectors.toUnmodifiableList()), but it's not. Having this
method directly on Stream enables it to do what can't easily by done by a
Collector. In particular, it allows the stream to deposit results directly into
a destination array (even in parallel) and have this array be wrapped in an
unmodifiable List without copying.

Hi Stuart,
I'm Okay with the idea of having a method toList() on Stream but really dislike the proposed semantics because tit is neither stream.collect(toList()) nor stream.collect(toUnmodifiableList()) but something in between.

It's true that a Stream support nulls, we want people to be able map() with a method that returns null and then filter out the nulls (even if using flatMap for this case is usually a better idea),
but it doesn't mean that all methods of the Stream interface has to support nulls, the original idea was more to allow nulls to flow in the stream because at some point they will be removed before being stored in a collection.

Uhm ... no and no.

It does mean that all methods of the stream interface have to support
nulls.? Streams are null tolerant.? Because ...

The original idea was not "the nulls can be removed later."? The
original idea was "Streams are plumbing, they pass values through
pipelines, to user-specified lambdas, into arrays, etc, and the stream
plumbing should not have an opinion on the values that are flowing
through."?? And this was the right choice.

There is no value in making users remember which stream methods are
null-hostile and which are null-tolerant; this is just more accidental
complexity.? And the root cause of that accidental complexity is the
misguided belief that we can (and should) contain the consequences of
nullity by making ad-hoc restrictions about nullity.?? That doesn't
work, and all it does is add more complexity and more ways for X to not
interact with Y.

I understand the hatred for nulls, but it's a fantasy to think we can
contain them with ad-hoc restrictions.?? And the aggregate cost in
complexity of all the ad-hoc decisions is pretty significant. Stuart
made the right choice here.

@stuart-marks
Copy link
Member Author

@dfuch wrote:

Should there be a test that tests the default implementation in j.u.s.Stream? Or maybe there is and I missed that?

The test method testDefaultOps does that. The stream test framework has a thing called DefaultMethodStreams that delegates everything except default methods to another Stream instance, so this should test the default implementation.

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from John Rose on core-libs-dev:

I agree with Stuart and Brian that streams should be null-tolerant,
including in this case. And of course I?m delighted to see an
immutable container as the output to the utility method; I?m
a fan of reduced-copy, race-free, bug-resistant algorithms you
get from immutability.

On Nov 3, 2020, at 6:53 AM, Remi Forax <forax at univ-mlv.fr> wrote:

Also, adding a third immutable list creates a problem, it means that now when we get an immutable list it can be 3 different implementations but the VM only uses bimorphic inlining cache,
so more callsites will fail to inline because of that. I think we have already reduced the number of implementation of immutable map from 3 to 2 for the very same reasons.

Yes, this part concerns me, with my JIT hat on.

It seems to me that the problem can be removed simply by allowing
the existing ListN object to contain nulls. That?s not the same
thing as allowing List.of(1,2,3, null): That factory method can
reject nulls, while the privileged factory method that makes
an array-backed ListN can simply sidestep the null check.

And if a call to Stream::toList call returns 0, 1, or 2 items,
it?s a straightforward matter to test if either is null, and then
choose to use either ListN or List12. No new implementation
classes, at all!

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from forax at univ-mlv.fr on core-libs-dev:

----- Mail original -----

De: "Brian Goetz" <brian.goetz at oracle.com>
?: "Remi Forax" <forax at univ-mlv.fr>, "Stuart Marks" <smarks at openjdk.java.net>
Cc: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Mardi 3 Novembre 2020 19:02:37
Objet: Re: RFR: 8180352: Add Stream.toList() method

This change introduces a new terminal operation on Stream. This looks like a
convenience method for Stream.collect(Collectors.toList()) or
Stream.collect(Collectors.toUnmodifiableList()), but it's not. Having this
method directly on Stream enables it to do what can't easily by done by a
Collector. In particular, it allows the stream to deposit results directly into
a destination array (even in parallel) and have this array be wrapped in an
unmodifiable List without copying.

Hi Stuart,
I'm Okay with the idea of having a method toList() on Stream but really dislike
the proposed semantics because tit is neither stream.collect(toList()) nor
stream.collect(toUnmodifiableList()) but something in between.

It's true that a Stream support nulls, we want people to be able map() with a
method that returns null and then filter out the nulls (even if using flatMap
for this case is usually a better idea),
but it doesn't mean that all methods of the Stream interface has to support
nulls, the original idea was more to allow nulls to flow in the stream because
at some point they will be removed before being stored in a collection.

Uhm ... no and no.

It does mean that all methods of the stream interface have to support
nulls.? Streams are null tolerant.? Because ...

The original idea was not "the nulls can be removed later."? The
original idea was "Streams are plumbing, they pass values through
pipelines, to user-specified lambdas, into arrays, etc, and the stream
plumbing should not have an opinion on the values that are flowing
through."?? And this was the right choice.

A stream is computation, so i agree that letting nulls to flow is the right call.
So i stand corrected on that point.

There is no value in making users remember which stream methods are
null-hostile and which are null-tolerant; this is just more accidental
complexity.?

I think that ship has sailed a long ago.
Some collectors are null hostile, some are not.
You can make a point that a Collector is technically not the Stream API per se.

[...]

Let's take a step back,
if we introduce a method toList() on Stream it will be used a lot, i mean really a lot, to the point where people will change the code from stream.collect(Collectors.toList()) to use stream.toList() instead.

Because of that, i don't think we even have the choice of the semantics for Stream.toList(), it has to be the same as stream.collect(Collectors.toList()).

So you are right that toList() can not be null hostile because Collectors.toList() is not null hostile.

But it can not be immutable too, for the same reason.

Stuart made the right choice here.

I would say half of the right choices, null hostile: right, immutable: wrong :)

And if we at some point introduce a method toImmutableList() on Stream, it will have to be null hostile.

R?mi

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from Brian Goetz on core-libs-dev:

There is no value in making users remember which stream methods are
null-hostile and which are null-tolerant; this is just more accidental
complexity.
I think that ship has sailed a long ago.
Some collectors are null hostile, some are not.
You can make a point that a Collector is technically not the Stream API per se.

Yes, and this is no mere "technical argument".? A Collector is a
collection of behaviors, governed by its own spec.? The behaviors passed
to stream operations, whether they be lambdas passed by the user
(`.map(x -> x.hopeXIsntNull())`) or collector objects or comparators,
are under the control of the user, and it is the user's responsibility
to pass behaviors which are compatible with the data domain -- which the
user knows and the streams implementation cannot know.

Because of that, i don't think we even have the choice of the semantics for Stream.toList(), it has to be the same as stream.collect(Collectors.toList()).

This doesn't remotely follow.? (And, if it did, I would likely not even
support this RFE.)? The spec of Collectors::toList was crafted to
disavow pretty much anything other than List-hood, in part in
anticipation of this addition.

But it can not be immutable too, for the same reason.

Nope.? The spec of toList() very clearly says: no guarantees as to the
type, mutability, serializability, etc etc of the returned list.? That
doesn't mean that every method returning a List added to the JDK after
Collectors::toList must similarly disavow all such guarantees.

(In fact, we would be entirely justified to change the behavior of
Collectors::toList tomorrow, to match the proposed Stream::toList; the
spec was crafted to preserve this option.? We probably won't, because
there are too many programmers who refuse to read the specification, and
have baked in the assumption that it returns an ArrayList, and this
would likely just be picking a fight for little good reason, regardless
of who is right.)

Please, can we stop having this same tired "I hate nulls, let's find a
new way to punish people who like them" discussion for every single
feature?

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from forax at univ-mlv.fr on core-libs-dev:

De: "Brian Goetz" <brian.goetz at oracle.com>
?: "Remi Forax" <forax at univ-mlv.fr>
Cc: "Stuart Marks" <smarks at openjdk.java.net>, "core-libs-dev"
<core-libs-dev at openjdk.java.net>
Envoy?: Mardi 3 Novembre 2020 20:54:57
Objet: Re: RFR: 8180352: Add Stream.toList() method

There is no value in making users remember which stream methods are
null-hostile and which are null-tolerant; this is just more accidental
complexity.

I think that ship has sailed a long ago.
Some collectors are null hostile, some are not.
You can make a point that a Collector is technically not the Stream API per se.

Yes, and this is no mere "technical argument". A Collector is a collection of
behaviors, governed by its own spec. The behaviors passed to stream operations,
whether they be lambdas passed by the user (`.map(x -> x.hopeXIsntNull())`) or
collector objects or comparators, are under the control of the user, and it is
the user's responsibility to pass behaviors which are compatible with the data
domain -- which the user knows and the streams implementation cannot know.

Because of that, i don't think we even have the choice of the semantics for
Stream.toList(), it has to be the same as stream.collect(Collectors.toList()).

This doesn't remotely follow. (And, if it did, I would likely not even support
this RFE.)

It seems you had an issue when replying to my mail, there is a paragraph before "Because"

Let's take a step back,
if we introduce a method toList() on Stream it will be used a lot, i mean really
a lot, to the point where people will change the code from
stream.collect(Collectors.toList()) to use stream.toList() instead.

The spec of Collectors::toList was crafted to disavow pretty much anything other
than List-hood, in part in anticipation of this addition.

But it can not be immutable too, for the same reason.

Nope. The spec of toList() very clearly says: no guarantees as to the type,
mutability, serializability, etc etc of the returned list. That doesn't mean
that every method returning a List added to the JDK after Collectors::toList
must similarly disavow all such guarantees.

(In fact, we would be entirely justified to change the behavior of
Collectors::toList tomorrow, to match the proposed Stream::toList; the spec was
crafted to preserve this option. We probably won't, because there are too many
programmers who refuse to read the specification, and have baked in the
assumption that it returns an ArrayList, and this would likely just be picking
a fight for little good reason, regardless of who is right.)

I don't understand your argument.
You know that you can not change the implementation of Collectors.toList(), and you also know that if there is a method Stream.toList(), people will replace the calls to .collect(Collectors.toList()) by a call to Stream.toList() for the very same reason but you want the semantics of Stream.toList() to be different from the semantics of Collectors.toList().

R?mi

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from Aaron Scott-Boddendijk on core-libs-dev:

But it can not be immutable too, for the same reason.

Nope. The spec of toList() very clearly says: no guarantees as to the

type,

mutability, serializability, etc etc of the returned list. That doesn't

mean

that every method returning a List added to the JDK after

Collectors::toList

must similarly disavow all such guarantees.

(In fact, we would be entirely justified to change the behavior of
Collectors::toList tomorrow, to match the proposed Stream::toList; the

spec was

crafted to preserve this option. We probably won't, because there are

too many

programmers who refuse to read the specification, and have baked in the
assumption that it returns an ArrayList, and this would likely just be

picking

a fight for little good reason, regardless of who is right.)

I don't understand your argument.
You know that you can not change the implementation of

Collectors.toList(), and you also know that if there is a method
Stream.toList(), people will replace the calls to

.collect(Collectors.toList()) by a call to Stream.toList() for the very

same reason but you want the semantics of Stream.toList() to be different
from the semantics of Collectors.toList().

Surely, taking that position, List.of(...) should not have produced an
unmodifiable list since many combinations of creating Lists would not have
produced unmodifiable lists.

In all of the teams I've worked with, through many Java version transitions
(and 3rdparty library version transitions), dealing with the selective
adoption of new APIs is nothing new.

This case is no different, developers will need to identify the cases in
which they can accept the list as unmodifiable and replace those uses of
Collectors.toList() and Collectors.toUnmodifiableList().

And static-analysis will be able to propose some of those replacements for
us.

Does this mean it's more complex than a text-replace, sure. Will there be
mistakes, sure. But are there benefits to the immutability of the API
result? I suggest that's pretty hard to debate against given the risk of a
mistake is an exception rather than bad data outcomes.

Please don't blunt the tool to make it safer for children.

--
Aaron Scott-Boddendijk

On Wed, Nov 4, 2020 at 9:30 AM <forax at univ-mlv.fr> wrote:

De: "Brian Goetz" <brian.goetz at oracle.com>
?: "Remi Forax" <forax at univ-mlv.fr>
Cc: "Stuart Marks" <smarks at openjdk.java.net>, "core-libs-dev"
<core-libs-dev at openjdk.java.net>
Envoy?: Mardi 3 Novembre 2020 20:54:57
Objet: Re: RFR: 8180352: Add Stream.toList() method

There is no value in making users remember which stream methods are
null-hostile and which are null-tolerant; this is just more accidental
complexity.

I think that ship has sailed a long ago.
Some collectors are null hostile, some are not.
You can make a point that a Collector is technically not the Stream API
per se.

Yes, and this is no mere "technical argument". A Collector is a
collection of
behaviors, governed by its own spec. The behaviors passed to stream
operations,
whether they be lambdas passed by the user (`.map(x ->
x.hopeXIsntNull())`) or
collector objects or comparators, are under the control of the user, and
it is
the user's responsibility to pass behaviors which are compatible with
the data
domain -- which the user knows and the streams implementation cannot
know.

Because of that, i don't think we even have the choice of the semantics
for
Stream.toList(), it has to be the same as
stream.collect(Collectors.toList()).

This doesn't remotely follow. (And, if it did, I would likely not even
support
this RFE.)
It seems you had an issue when replying to my mail, there is a paragraph
before "Because"

Let's take a step back,
if we introduce a method toList() on Stream it will be used a lot, i
mean really
a lot, to the point where people will change the code from
stream.collect(Collectors.toList()) to use stream.toList() instead.

The spec of Collectors::toList was crafted to disavow pretty much
anything other
than List-hood, in part in anticipation of this addition.

But it can not be immutable too, for the same reason.

Nope. The spec of toList() very clearly says: no guarantees as to the
type,
mutability, serializability, etc etc of the returned list. That doesn't
mean
that every method returning a List added to the JDK after
Collectors::toList
must similarly disavow all such guarantees.

(In fact, we would be entirely justified to change the behavior of
Collectors::toList tomorrow, to match the proposed Stream::toList; the
spec was
crafted to preserve this option. We probably won't, because there are
too many
programmers who refuse to read the specification, and have baked in the
assumption that it returns an ArrayList, and this would likely just be
picking
a fight for little good reason, regardless of who is right.)
I don't understand your argument.
You know that you can not change the implementation of
Collectors.toList(), and you also know that if there is a method
Stream.toList(), people will replace the calls to
.collect(Collectors.toList()) by a call to Stream.toList() for the very
same reason but you want the semantics of Stream.toList() to be different
from the semantics of Collectors.toList().

R?mi

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from Martin Desruisseaux on core-libs-dev:

Hello

Le 03/11/2020 ? 21:30, forax at univ-mlv.fr a ?crit?:

You know that you can not change the implementation of
Collectors.toList(), and you also know that if there is a method
Stream.toList(), people will replace the calls to
.collect(Collectors.toList()) by a call to Stream.toList() for the
very same reason (?snip?)

Would they would be so numerous to do this change without verifying if
the specifications match? (on my side I do read the method javadoc). But
even if we worry about developers not reading javadoc, the argument
could go both ways: they could assume that all Stream methods accepts
null and not read that Stream.toList() specifies otherwise.

??? Martin

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from forax at univ-mlv.fr on core-libs-dev:

Hi Aaron,

De: "Aaron Scott-Boddendijk" <talden at gmail.com>
?: "Remi Forax" <forax at univ-mlv.fr>
Cc: "Brian Goetz" <brian.goetz at oracle.com>, "Stuart Marks"
<smarks at openjdk.java.net>, "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Mardi 3 Novembre 2020 21:59:37
Objet: Re: RFR: 8180352: Add Stream.toList() method

But it can not be immutable too, for the same reason.

Nope. The spec of toList() very clearly says: no guarantees as to the type,
mutability, serializability, etc etc of the returned list. That doesn't mean
that every method returning a List added to the JDK after Collectors::toList
must similarly disavow all such guarantees.

(In fact, we would be entirely justified to change the behavior of
Collectors::toList tomorrow, to match the proposed Stream::toList; the spec was
crafted to preserve this option. We probably won't, because there are too many
programmers who refuse to read the specification, and have baked in the
assumption that it returns an ArrayList, and this would likely just be picking
a fight for little good reason, regardless of who is right.)

I don't understand your argument.
You know that you can not change the implementation of Collectors.toList(), and
you also know that if there is a method Stream.toList(), people will replace
the calls to
.collect(Collectors.toList()) by a call to Stream.toList() for the very same
reason but you want the semantics of Stream.toList() to be different from the
semantics of Collectors.toList().

Surely, taking that position, List.of(...) should not have produced an
unmodifiable list since many combinations of creating Lists would not have
produced unmodifiable lists.

The problem is that people will see stream.toList() as a shorcut for stream.collect(Collectors.toList()),
i've not seen my student thinking that List.of() is a shortcut for creating an ArrayList with some elements, but maybe it's because we introduce ArrayList later compared to List.of(...).

In all of the teams I've worked with, through many Java version transitions (and
3rdparty library version transitions), dealing with the selective adoption of
new APIs is nothing new.

This case is no different, developers will need to identify the cases in which
they can accept the list as unmodifiable and replace those uses of
Collectors.toList() and Collectors.toUnmodifiableList().

We have Collectors.toList() and Collectors.toUnmodifiableList(), would it make more sense to have stream.toList() and stream.toUnmodifiableList() instead of having stream.toList() that behave half way in betwen the semantics of Collectors.toList() and Collectors.toUnmodifiableList().

And static-analysis will be able to propose some of those replacements for us.

I believe that a tool able to suggest to use Collectors.toUnmodifiableList() instead of Collectors.toList() will rely one a global interprocedural static analysis, so such analysis will be invalid once you will update one of your dependency.

Does this mean it's more complex than a text-replace, sure. Will there be
mistakes, sure. But are there benefits to the immutability of the API result? I
suggest that's pretty hard to debate against given the risk of a mistake is an
exception rather than bad data outcomes.

You want immutability/unmodifiability over the cost of people miss mis-using the API. I think it's a sin as bad as me want stream.toList() to return a null hostile List :)
As Brain implies, Java is a blue collar language, we should not forget it, whatever we think about null or immutability.

And I said above perhaps we should not discussed about adding a method Stream.toList() but a method Stream.toUnmodifiableList().

Please don't blunt the tool to make it safer for children.

--
Aaron Scott-Boddendijk

R?mi

On Wed, Nov 4, 2020 at 9:30 AM < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr
] > wrote:

De: "Brian Goetz" < [ mailto:brian.goetz at oracle.com | brian.goetz at oracle.com ] >
?: "Remi Forax" < [ mailto:forax at univ-mlv.fr | forax at univ-mlv.fr ] >
Cc: "Stuart Marks" < [ mailto:smarks at openjdk.java.net | smarks at openjdk.java.net
] >, "core-libs-dev"
< [ mailto:core-libs-dev at openjdk.java.net | core-libs-dev at openjdk.java.net ] >
Envoy?: Mardi 3 Novembre 2020 20:54:57
Objet: Re: RFR: 8180352: Add Stream.toList() method

There is no value in making users remember which stream methods are
null-hostile and which are null-tolerant; this is just more accidental
complexity.

I think that ship has sailed a long ago.
Some collectors are null hostile, some are not.
You can make a point that a Collector is technically not the Stream API per se.

Yes, and this is no mere "technical argument". A Collector is a collection of
behaviors, governed by its own spec. The behaviors passed to stream operations,
whether they be lambdas passed by the user (`.map(x -> x.hopeXIsntNull())`) or
collector objects or comparators, are under the control of the user, and it is
the user's responsibility to pass behaviors which are compatible with the data
domain -- which the user knows and the streams implementation cannot know.

Because of that, i don't think we even have the choice of the semantics for
Stream.toList(), it has to be the same as stream.collect(Collectors.toList()).

This doesn't remotely follow. (And, if it did, I would likely not even support
this RFE.)
It seems you had an issue when replying to my mail, there is a paragraph before
"Because"

Let's take a step back,
if we introduce a method toList() on Stream it will be used a lot, i mean really
a lot, to the point where people will change the code from
stream.collect(Collectors.toList()) to use stream.toList() instead.

The spec of Collectors::toList was crafted to disavow pretty much anything other
than List-hood, in part in anticipation of this addition.

But it can not be immutable too, for the same reason.

Nope. The spec of toList() very clearly says: no guarantees as to the type,
mutability, serializability, etc etc of the returned list. That doesn't mean
that every method returning a List added to the JDK after Collectors::toList
must similarly disavow all such guarantees.

(In fact, we would be entirely justified to change the behavior of
Collectors::toList tomorrow, to match the proposed Stream::toList; the spec was
crafted to preserve this option. We probably won't, because there are too many
programmers who refuse to read the specification, and have baked in the
assumption that it returns an ArrayList, and this would likely just be picking
a fight for little good reason, regardless of who is right.)
I don't understand your argument.
You know that you can not change the implementation of Collectors.toList(), and
you also know that if there is a method Stream.toList(), people will replace
the calls to .collect(Collectors.toList()) by a call to Stream.toList() for the
very same reason but you want the semantics of Stream.toList() to be different
from the semantics of Collectors.toList().

R?mi

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from Remi Forax on core-libs-dev:

----- Mail original -----

De: "Martin Desruisseaux" <martin.desruisseaux at geomatys.com>
?: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Mardi 3 Novembre 2020 23:16:49
Objet: Re: RFR: 8180352: Add Stream.toList() method

Hello

Le 03/11/2020 ? 21:30, forax at univ-mlv.fr a ?crit?:

You know that you can not change the implementation of
Collectors.toList(), and you also know that if there is a method
Stream.toList(), people will replace the calls to
.collect(Collectors.toList()) by a call to Stream.toList() for the
very same reason (?snip?)

Would they would be so numerous to do this change without verifying if
the specifications match? (on my side I do read the method javadoc). But
even if we worry about developers not reading javadoc, the argument
could go both ways: they could assume that all Stream methods accepts
null and not read that Stream.toList() specifies otherwise.

yes, that why i said to Brian that he is right that stream.toList() should return a List that accept null.

??? Martin

R?mi

@mlbridge
Copy link

mlbridge bot commented Nov 3, 2020

Mailing list message from Stuart Marks on core-libs-dev:

On 11/3/20 3:10 AM, Florian Weimer wrote:

I'd expect a test here that if the list contains a null element, `List::copyOf` throws `NullPointerException`.

The new Stream.toList() actually permits null elements (by design) so it goes
through a different path than List::copyOf. The new tests' data provider does
include nulls in the input, and these should be accepted.

Rejection of nulls for List::copyOf is be handled by tests in

test/jdk/java/util/List/ListFactories.java

s'marks

@stuart-marks
Copy link
Member Author

OK, there are rather too many forking threads here for me to try to reply to any particular message, so I'll try to summarize things and say what I intend to do.

Null tolerance. While part of me wants to prohibit nulls, the fact is that Streams pass through nulls, and toList() would be much less useful if it were to reject nulls. The affinity here is closer to Stream.toArray(), which allows nulls, rather than Collectors.to[Unmodifiable]List.

Unmodifiability. Unlike with nulls, this is where we've been taking a strong stand recently, where new constructs are unmodifiable ("shallowly immutable"). Consider the Java 9 unmodifiable collections, records, primitive classes, etc. -- they're all unmodifiable. They're data-race-free and are resistant to a whole class of bugs that arise from mutability.

Unfortunately, the combination of the above means that the returned List is neither like an ArrayList nor like an unmodifiable list produced by List.of() or by Collectors.toUnmodifiableList(). [Heavy sigh.] While I've been somewhat reluctant to introduce this new thing, the alternatives of trying to reuse one of the existing things are worse.

John and Rémi pointed out that the way I implemented this, using a subclass, reintroduces the possibility of problems with megamorphic dispatch which we had so carefully avoided by reducing the number of implementation classes in this area. I agree this is a problem.

I also had an off-line discussion with John where we discussed the serialization format, which unfortunately is somewhat coupled with this issue. (Fortunately I think it can mostly be decoupled.) Given that we're introducing a new thing, which is an unmodifiable-list-that-allows-nulls, this needs to be manifested in the serialized form of these new objects. (This corresponds to the new tag value of IMM_LIST_NULLS in the CollSer class.) The alternative is to reuse the existing serialized form, IMM_LIST, for both of these cases, relaxing it to allow nulls. This would be a serialization immutability problem. Suppose I had an application that created a data structure that used lists from List.of(), and I had a global assertion over that structure that it contained no nulls. Further suppose that I serialized and deserizalized this structure. I'd want that assertion to be preserved after deserialization. If another app (or a future version of this app) created the structure using Stream.toList(), this would allow nulls to leak into that structure and violate that assertion. Therefore, the result of Stream.toList() should not be serialization-compatible with List.of() et. al. That's why there's the new IMM_LIST_NULLS tag in the serial format.

However, the new representation in the serial format does NOT necessarily require the implementation to be a different class. I'm going to investigate collapsing ListN and ListNNullsAllowed back into a single class, while preserving the separate serialized forms. This should mitigate the concerns about megamorphic dispatch.

*/
@SuppressWarnings("unchecked")
default List<T> toList() {
return (List<T>) Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray())));
Copy link

@orionll orionll Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we return listFromTrustedArrayNullsAllowed here as in ReferencePipeline?
Or at least, we should avoid unnecessary copying of arrays. See how this is done in StreamEx.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listFromTrustedArrayNullsAllowed is clearly not an option, as it will produce shared-secret leaking (see JDK-8254090 for a similar case). StreamEx solution is dirty as it relies on the implementation detail. I believe, OpenJDK team is not very interested in providing optimal implementations for third-party stream implementations, as third-party implementations will likely update by themselves when necessary. At least, my suggestion to make the default mapMulti implementation better was declined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation duplicates the array twice, once in this.toArray() and once in the constructor of ArrayList that calls toArray() on Collections.ArrayList.

I'm sure there is a better way

Copy link

@sergus13 sergus13 Nov 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ReferencePipeline class we have:

@Override
public final Object[] toArray() {
    return toArray(Object[]::new);
}

@Override
@SuppressWarnings("unchecked")
public final <A> A[] toArray(IntFunction<A[]> generator) {
    ...
    @SuppressWarnings("rawtypes")
    IntFunction rawGenerator = (IntFunction) generator;
    return (A[]) Nodes.flatten(evaluateToArrayNode(rawGenerator), rawGenerator)
                          .asArray(rawGenerator);
}

In Nodes class we have:

public static <T> Node<T> flatten(Node<T> node, IntFunction<T[]> generator) {
    if (node.getChildCount() > 0) {
        ... 
        T[] array = generator.apply((int) size);
        new ToArrayTask.OfRef<>(node, array, 0).invoke();
        return node(array);
    } else {
        return node;
    }
}

It looks like it is required to implement toList method in a similar way in order to avoid array copy. i.e. there will be IntFunction<List<T>> generator which will generate 'ArrayList' with specified size and the list's add method will be called to add elements to the list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default implementation in the Stream interface, which is overridden by an implementation in the ReferencePipeline class, so it will rarely be used in normal operation. The ReferencePipeline version (part of this changeset) is based on toArray() and avoids any copying. I'm thus not inclined to add new interfaces in order to support this default implementation.

As written it's true that the default implementation does perform apparently redundant copies, but we can't be assured that toArray() actually returns a freshly created array. Thus, we wrap it using Arrays.asList and then copy it using the ArrayList constructor. This is unfortunate but necessary to avoid situations where someone could hold a reference to the internal array of a List, allowing modification of a List that's supposed to be unmodifiable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative with similar performance would be to do a Stream.toArray() and then copy that array into new Object[] and then wrap that copy with listFromTrustedArrayNullsAllowed(). The difference would be in the serialization format of the resulting List and maybe also in the access performance of resulting List (no indirection via the unmodifiableList wrapper and different type for JIT to speculate about). So if we want the resulting List to behave exactly the same in both implementations of toList(), then this alternative might be preferable. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such alternative might also be faster using intrisified array copying method (here measuring just copying overhead):

Benchmark              (len)  Mode  Cnt       Score      Error  Units
ToListBench.toList1       10  avgt   10      14.213 ±    0.061  ns/op
ToListBench.toList1     1000  avgt   10     541.883 ±    3.845  ns/op
ToListBench.toList1  1000000  avgt   10  753223.523 ± 4656.664  ns/op
ToListBench.toList2       10  avgt   10       8.810 ±    0.052  ns/op
ToListBench.toList2     1000  avgt   10     264.748 ±    0.807  ns/op
ToListBench.toList2  1000000  avgt   10  349518.502 ± 3242.061  ns/op

https://gist.github.com/plevart/974b67b65210f8dd122773f481c0a603

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change the default implementation and its specification, for a couple reasons. First, this implementation won't be used in normal operation, as it's overridden by the JDK implementation. (I expect that stream extension libraries will be updated to override it as well.) Second, I'm quite uncomfortable using an internal method from within a default implementation. Right now this method is an agreement between the unmodifiable collections implementation and the ReferencePipeline implementation, and we have complete freedom to change this agreement at any time. If this internal method were used from a default implementation, its behavior would have to be specified in @implSpec somehow. Certainly one could write some very vague words for the specification that allow some freedom to make changes, but this has to choose between a better specification and more implementation freedom. I don't see the value in having to make this tradeoff at all.

An extra copy can be avoided via a private agreement between ArrayList and Arrays.asList, which should probably be done in any case.

@mlbridge
Copy link

mlbridge bot commented Nov 5, 2020

Mailing list message from Simon Roberts on core-libs-dev:

At the risk of a can of worms, or at least of raising something that has
long since been discussed and rejected...

This discussion of unmodifiable lists brings me back to the thought that
there would be good client-side reasons for inserting an UnmodifiableList
interface as a parent of LIst, not least because all our unmodifiable
variants from the Collections.unmodifiableList proxy onward fail the Liskov
substitution test for actually "being contract-fulfilling Lists".

Is this something that has been considered and rejected? (If so, is it easy
to point me at the discussion, as I expect I'd find it fascinating). Is it
worth considering, might it have some merit, or merely horrible
unforeseen-by-me consequences to the implementation?

Cheers,
Simon

On Thu, Nov 5, 2020 at 10:30 AM Stuart Marks <smarks at openjdk.java.net>
wrote:

--
Simon Roberts
(303) 249 3613

checkUnmodifiable(objects);
}

@Test(dataProvider = "withNull:StreamTestData<Integer>", dataProviderClass = StreamTestDataProvider.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the non-default toList() implementation defers to the toArray() terminal op there is no need for this and the following tests, which are really designed to shake out the optimizations when producing and operating on arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll remove all the tests starting from here to the end of the file. I'm assuming that's the set of tests you're referring to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes from line 73 to G (the end).

@stuart-marks
Copy link
Member Author

@ePaul wrote:

The Stream API works just as well with [third party] collection libraries instead of the java.util ones, just using different collectors. Adding a method which directly returns a java.util.List somewhat couples it to the Java Collection API.

Now this was mentioned two and a half year ago. Did something change which made this consideration irrelevant? I would expect at least some mention of it in the discussion here.

The separation between streams and the java.util Collections Framework is a good design principle, but it isn't an ironclad rule. It's still easy to have streams create instances of other collections libraries using the Collector interface. What's different here is that the Collections Framework has "leaked into" streams a little bit more, so that they're now more interdependent. This doesn't seem to have any disadvantages; it seems unlikely that the Collections Framework will ever be unplugged from the JDK. However, the benefits are that a List is the closest thing we have to an unmodifiable array that also plays well with generics and that can be value-based; these benefits are considerable.

@stuart-marks
Copy link
Member Author

Simon Roberts wrote:

This discussion of unmodifiable lists brings me back to the thought that
there would be good client-side reasons for inserting an UnmodifiableList
interface as a parent of LIst, not least because all our unmodifiable
variants from the Collections.unmodifiableList proxy onward fail the Liskov
substitution test for actually "being contract-fulfilling Lists".

At some point there probably will need to be a long article explaining all the issues here, but at the moment the best writeup I have is this one:

https://stackoverflow.com/a/57926310/1441122

TL;DR there are a few different ways to approach retrofitting something like this, but they all have enough compromises that the net benefits are unclear.

Add nulls-allowed empty list. Simplify indexOf/lastIndexOf.
Reduce tests, add contains(null) tests.
@mlbridge
Copy link

mlbridge bot commented Nov 6, 2020

Mailing list message from Remi Forax on core-libs-dev:

----- Mail original -----

De: "Simon Roberts" <simon at dancingcloudservices.com>
?: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Jeudi 5 Novembre 2020 18:40:44
Objet: Re: RFR: 8180352: Add Stream.toList() method

At the risk of a can of worms, or at least of raising something that has
long since been discussed and rejected...

This discussion of unmodifiable lists brings me back to the thought that
there would be good client-side reasons for inserting an UnmodifiableList
interface as a parent of LIst, not least because all our unmodifiable
variants from the Collections.unmodifiableList proxy onward fail the Liskov
substitution test for actually "being contract-fulfilling Lists".

Is this something that has been considered and rejected? (If so, is it easy
to point me at the discussion, as I expect I'd find it fascinating). Is it
worth considering, might it have some merit, or merely horrible
unforeseen-by-me consequences to the implementation?

This question is asked at least every six months since 1998
https://docs.oracle.com/javase/8/docs/technotes/guides/collections/designfaq.html#a1

Cheers,
Simon

regards,
R?mi Forax

On Thu, Nov 5, 2020 at 10:30 AM Stuart Marks <smarks at openjdk.java.net>
wrote:

--
Simon Roberts
(303) 249 3613

@mlbridge
Copy link

mlbridge bot commented Nov 6, 2020

Mailing list message from Alan Snyder on core-libs-dev:

This discussion of unmodifiable lists brings me back to the thought that
there would be good client-side reasons for inserting an UnmodifiableList
interface as a parent of LIst

On Nov 6, 2020, at 1:14 AM, Remi Forax <forax at univ-mlv.fr <mailto:forax at univ-mlv.fr>> wrote:

This question is asked at least every six months since 1998
https://docs.oracle.com/javase/8/docs/technotes/guides/collections/designfaq.html#a1 <https://docs.oracle.com/javase/8/docs/technotes/guides/collections/designfaq.html#a1>

The question that Simon asked is not exactly the question that is answered in this link.

The question of whether List should be a subtype of (some kind of) ImmutableList is answered in
Stuart?s stack overflow answer (https://stackoverflow.com/a/57926310/1441122 <https://stackoverflow.com/a/57926310/1441122>). The answer
is that it should not.

The question answered in the link is basically a straw man: why not capture every conceivable
semantic distinction in the collections type system. And the answer is, not surprisingly, that
there would be way too many types.

But a question that deserves ongoing review is whether Java should support immutable collections
using a separate type hierarchy. In other words, Immutable List would not be a subtype of List
and List would not be a subtype of Immutable List. The linked answer says:
"Adding this support to the type hierarchy requires four more interfaces.? Four additional
interfaces sounds like a small cost for a significant benefit. As Java evolves to better support
functional programming styles, immutable collections seems like an obvious next step.

Much could be written, and probably has been, about the disadvantages of basing the
collections framework on mutable collections. Let me just remark that in addition to the
already mentioned UnsupportedOperationException, there is also the more subtle
ConcurrentModificationException, both of which would be absent in fully immutable collections.

The lack of subtyping between List and ImmutableList is not terrible. Arrays coexist with
Lists with no subtyping relationship. java.io.File coexists with java.nio.filePath with no
subtyping relationship. It means that explicit conversions are required between List and
ImmutableList. Extra copying may be needed, but there are tricks for reducing copying, and the
need for defensive copying is removed.

Alan

@mlbridge
Copy link

mlbridge bot commented Nov 6, 2020

Mailing list message from Martin Desruisseaux on core-libs-dev:

Le 06/11/2020 ? 19:00, Alan Snyder a ?crit?:

(?snip?) But a question that deserves ongoing review is whether Java
should support immutable collections using a separate type hierarchy
(?snip?).

Maybe an alternative way (admittedly more difficult) to have immutable
collections without introducing new interfaces could be to take
inspiration from the C++ "const" keyword? Especially since in Java,
"const" is already a reserved keyword, just not used yet.

I realize that it would be a difficult task: how to handle private
fields that are just caches in a const object; where to add the "const"
keyword in existing JDK API; how to make such change in a backward
compatible way (e.g. when a legacy code overrides a "const" method). I
do not have any expectation. But given the inconvenient of alternatives,
I wonder if there is some research about the long-term feasibility of a
"const" semantic in Java? If it was the case, then maybe it would be
better to not add immutable collection interfaces in Java for now.

??? Regards

??? ??? Martin

@plevart
Copy link
Contributor

plevart commented Nov 8, 2020

Hi Stuart,

I would like to discuss the serialization. You introduce new CollSer.IMM_LIST_NULLS type of immutable collection. This means that if this change goes into JDK16 for example, JDK15 and before will not be able to deserialize such list as they know nothing about IMM_LIST_NULLS even if such lists don't contain nulls. The reason you say to chose new type of serialization format is the following:

"Suppose I had an application that created a data structure that used lists from List.of(), and I had a global assertion over that structure that it contained no nulls. Further suppose that I serialized and deserizalized this structure. I'd want that assertion to be preserved after deserialization. If another app (or a future version of this app) created the structure using Stream.to
List(), this would allow nulls to leak into that structure and violate that assertion. Therefore, the result of Stream.toList() should not be serialization-compatible with List.of() et. al. That's why there's the new IMM_LIST_NULLS tag in the serial format"

I don't quite get this reasoning. Let's try to decompose the reasoning giving an example. Suppose we had the following data structure:

public class Names implements Serializable {
  private final List<String> names;
  Names(List<String> names) {
    this.names = names;
  }
  public List<String> names() { return names; }
}

App v1 creates such structures using new Names(List.of(...)) and serializes/deserializes them. They keep the invariant that no nulls are present. Now comes App v2 that starts using new Names(stream.toList()) which allows nulls to be present. When such Names instance from app v2 is serialized and then deserialized in app v1, nulls "leak" into data structure of app v1 that does not expect them.

But the question is how does having a separate CollSer.IMM_LIST_NULLS type prevent that from happening?

@plevart
Copy link
Contributor

plevart commented Nov 8, 2020

I can see that having a separate IMM_LIST_NULLS type might be necessary to preserve the allows-null/disallows-null behaviour of indexOf and lastIndexOf methods...

NOTE ALSO that ListN.equals(o) and ListN.hashCode() are inherited from AbstractImmutableList:

        @Override
        public boolean equals(Object o) {
            if (o == this) {
                return true;
            }

            if (!(o instanceof List)) {
                return false;
            }

            Iterator<?> oit = ((List<?>) o).iterator();
            for (int i = 0, s = size(); i < s; i++) {
                if (!oit.hasNext() || !get(i).equals(oit.next())) {
                    return false;
                }
            }
            return !oit.hasNext();
        }

and

        public int hashCode() {
            int hash = 1;
            for (int i = 0, s = size(); i < s; i++) {
                hash = 31 * hash + get(i).hashCode();
            }
            return hash;
        }

...which means they will throw NPE when the list contains null. The same goes for SubList.

@stuart-marks
Copy link
Member Author

@plevart wrote:

But the question is how does having a separate CollSer.IMM_LIST_NULLS type prevent that from happening?

When a serialized list with IMM_LIST_NULLS is deserialized on an older JDK, it'll throw InvalidObjectException since that tag isn't valid on older JDKs. Obviously this is still an error, but it's a fail-fast approach that avoids letting nulls leak into a data structure where they might cause a problem some arbitrary time later.

NOTE ALSO that ListN.equals(o) and ListN.hashCode() are inherited from AbstractImmutableList [...] which means they will throw NPE when the list contains null. The same goes for SubList.

Good catch! Yes, this is a problem. I'll do some rearranging here and add more test cases. Thanks for spotting this.

@plevart
Copy link
Contributor

plevart commented Nov 10, 2020

@plevart wrote:

But the question is how does having a separate CollSer.IMM_LIST_NULLS type prevent that from happening?

When a serialized list with IMM_LIST_NULLS is deserialized on an older JDK, it'll throw InvalidObjectException since that tag isn't valid on older JDKs. Obviously this is still an error, but it's a fail-fast approach that avoids letting nulls leak into a data structure where they might cause a problem some arbitrary time later.

Yes, but that is JDK16+ vs. JDK15- and not App V1 vs. App V2 thing. If both apps run on JDK16+, there will be no exception.

What I'm trying to say is that the same problem of injecting unexpected nulls via serialization/deserialization can happen also if App V2 starts using ArrayList to construct the data structure and serialize it while App V1 deserializes it and expects non-null values only. App V1 would already have to guard against null values during deserialization in that case, because possibility of null values in deserialized data structure is nothing new for App V1.

Fix equals, hashCode, indexOf, lastIndexOf to handle nulls properly.
Add MOAT tests for new lists; add equals and hashCode tests.
@stuart-marks
Copy link
Member Author

@plevart wrote:

Yes, but that is JDK16+ vs. JDK15- and not App V1 vs. App V2 thing. If both apps run on JDK16+, there will be no exception.

Sure, the IMM_LIST_NULLS tag only helps with serialization compatibility across JDK releases. There are lots of ways an app can make incompatible changes to the serialized forms of its objects that we have no way of detecting.

@mlbridge
Copy link

mlbridge bot commented Nov 18, 2020

Mailing list message from Peter Levart on core-libs-dev:

On 11/17/20 9:08 PM, Stuart Marks wrote:

@plevart wrote:

Yes, but that is JDK16+ vs. JDK15- and not App V1 vs. App V2 thing. If both apps run on JDK16+, there will be no exception.
Sure, the IMM_LIST_NULLS tag only helps with serialization compatibility across JDK releases.

I would say it goes the other way - it worsens the serialization
compatibility. You are introducing a serialization format that is
incompatible with older JDK releases. OTOH the problem of introducing
nulls into List(s) in serializable data structures already exists
because serializable List implementations that allow nulls already
exist, so by introducing another one that is able to be deserialized on
older JDK would not worsen the situation.

There are lots of ways an app can make incompatible changes to the serialized forms of its objects that we have no way of detecting.

So why bother with making the result of serialized stream.toList() not
de-serializable on older JDK(s)? By making a favor to a hypothetical app
that runs on older JDK and does not expect nulls in Lists of a data
structure, you are also preventing an app that allows nulls in similar
data structure from deserializing the datastructure on older JDK when
the serialized form was produced with serializing stream.toList() on
newer JDK.

But I see that the new? IMM_LIST_NULLS type is needed for one other
thing - the immutable list implementation of that type has different
behavior of indexOf and lastIndexOf methods (it doesn't throw NPE when
null is passed to those methods) so this behavior has to be preserved in
the deserialized instance and there is not way to achieve that on old
JDK with existing serialization format, so there has to be an
incompatible change in the serialization format for that matter.

Peter

@stuart-marks
Copy link
Member Author

stuart-marks commented Nov 18, 2020

Sure, the IMM_LIST_NULLS tag only helps with serialization compatibility across JDK releases.

I would say it goes the other way - it worsens the serialization
compatibility.

OK, I was imprecise. The IMM_LIST_NULLS tag has an effect only on serialization across JDK releases, not changes to the application's serialization format using the same JDK release, or even on many changes to the app's serialization format across JDK releases. By "helps with serialization compatibility" I meant that this new serialized form helps the general issue of serialization compatibility (really, incompatibility) by failing fast in certain cases, instead of possibly allowing polluted data to leak into the receiving application and causing some arbitrary exception later during the run.

But as you noted last, this is a different kind of object, and it has different behavior, so it needs a different encoding in the serialized form.

I've updated this PR with changes to fix null handling and other issues.

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label Nov 19, 2020
return (List<E>) new List12<>(input[0], input[1]);
default:
return (List<E>) new ListN<>(input, false);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a switch expression + arrow (->) here will allow you too factor the cast, even if it disappear in the generated bytecode, I think it makes the code more readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh yes, I haven't gotten used to using those yet!

@mlbridge
Copy link

mlbridge bot commented Nov 25, 2020

Mailing list message from Remi Forax on core-libs-dev:

----- Mail original -----

De: "Stuart Marks" <smarks at openjdk.java.net>
?: "core-libs-dev" <core-libs-dev at openjdk.java.net>
Envoy?: Mardi 24 Novembre 2020 08:10:14
Objet: Re: RFR: 8180352: Add Stream.toList() method [v4]

This change introduces a new terminal operation on Stream. This looks like a
convenience method for Stream.collect(Collectors.toList()) or
Stream.collect(Collectors.toUnmodifiableList()), but it's not. Having this
method directly on Stream enables it to do what can't easily by done by a
Collector. In particular, it allows the stream to deposit results directly into
a destination array (even in parallel) and have this array be wrapped in an
unmodifiable List without copying.

In the past we've kept most things from the Collections Framework as
implementations of Collector, not directly on Stream, whereas only fundamental
things (like toArray) appear directly on Stream. This is true of most
Collections, but it does seem that List is special. It can be a thin wrapper
around an array; it can handle generics better than arrays; and unlike an
array, it can be made unmodifiable (shallowly immutable); and it can be
value-based. See John Rose's comments in the bug report:

https://bugs.openjdk.java.net/browse/JDK-8180352?focusedCommentId=14133065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14133065

This operation is null-tolerant, which matches the rest of Streams. This isn't
specified, though; a general statement about null handling in Streams is
probably warranted at some point.

Finally, this method is indeed quite convenient (if the caller can deal with
what this operation returns), as collecting into a List is the most common
stream terminal operation.

Stuart Marks has updated the pull request incrementally with one additional
commit since the last revision:

Add comment and assert regarding array class; use switch expression.

-------------

Hi Stuart,
This version is Ok for me.

I still think that using a null friendly List is a mistake, but you, Brian and John all think it's not an issue, so let's go with it.

Copy link
Member

@PaulSandoz PaulSandoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend you review the problem list of this source file in IntelliJ, which shows some potential cleanups. This can be done as a follow on PR if appropriate, as can any improvements to the default implementation.

* Null argument or null elements in the argument will result in NPE.
*
* @param <E> the List's element type
* @param input the input array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rogue parameter declaration.

@openjdk
Copy link

openjdk bot commented Nov 25, 2020

@stuart-marks This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8180352: Add Stream.toList() method

Reviewed-by: psandoz

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 377 new commits pushed to the master branch:

  • 7c73fff: 8256486: Linux/Windows-x86 builds broken after JDK-8254231
  • 461c5fc: 8256450: Add gz option to jmap to write a gzipped heap dump
  • dee79d6: 8253936: Replace ... with {@code ...} for java.sql
  • cfb175d: 8256508: Improve CompileCommand flag
  • 7aed9b6: 8256016: Dacapo24H.java failed with "assert(false) failed: unscheduable graph"
  • 26e6cb3: 8256489: Make gtest for long path names on Windows more resilient in the presence of virus scanners
  • 911f16d: 8257056: Submit workflow should apt-get update to avoid package installation errors
  • b0bd0c2: 8256755: Update build.tools.depend.Depend to handle record components in API signatures
  • 9aeadbb: 8256865: Foreign Memory Access and Linker API are missing NPE checks
  • 8cd2e0f: 8243315: ParallelScavengeHeap::initialize() passes GenAlignment as page size to os::trace_page_sizes instead of actual page size
  • ... and 367 more: https://git.openjdk.java.net/jdk/compare/184db64dfb711c9cba426fb0b45bea4c962489de...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 25, 2020
@stuart-marks
Copy link
Member Author

/integrate

@openjdk openjdk bot closed this Nov 30, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 30, 2020
@openjdk
Copy link

openjdk bot commented Nov 30, 2020

@stuart-marks Since your change was applied there have been 437 commits pushed to the master branch:

  • 8969069: 8256995: [vector] Improve broadcast operations
  • 6eb25d7: 8254082: AbstractStringBuilder.insert(int dstOffset, CharSequence s, int start, int end) is missing fast-path for String
  • 4c86e46: 8256810: Incremental rebuild broken on Macosx
  • 02ba519: 8255001: Move G1PeriodicGCTask to its own file
  • 8aaee53: 8256187: [TEST_BUG] Automate bug4275046.java test
  • a3e1980: 8256541: Sort out what version of awk is used in the build system
  • e3abe51: 8257418: C2: Rename barrier data member in MemNode and LoadStoreNode
  • c071960: 8257083: Security infra test failures caused by JDK-8202343
  • 4db05e9: 8254042: gtest/GTestWrapper.java failed os.test_random
  • 962f7a3: 8257162: Initialize ThreadLocalAllocBuffer members
  • ... and 427 more: https://git.openjdk.java.net/jdk/compare/184db64dfb711c9cba426fb0b45bea4c962489de...master

Your commit was automatically rebased without conflicts.

Pushed as commit 41dbc13.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
10 participants