ISPN-13428 Distributed streams collectors ignore distributedBatchSize #9622

danberindei · 2021-11-02T10:46:13Z

https://issues.redhat.com/browse/ISPN-13428
https://issues.redhat.com/browse/ISPN-13438

BytesObjectOutput does not handle overflow correctly
Add warning to CacheStream collector operations javadoc

danberindei · 2021-11-02T10:47:41Z

@wburns if you are ok with only running the collectors on the originators, I can add another commit to deprecate the serializable collectors API and to remove the internal scaffolding.

wburns · 2021-11-02T14:53:46Z

I do not like this change at all. The javadoc for CacheStream states that operations are shipped to remote nodes and performed there. This is changing and removing functionality that you cannot reproduce anywhere else. Where as a user can very easily convert the stream to an iterator and do this themselves.

The distributedBatchSize explicitly lists which methods that utilize it as well.

danberindei · 2021-11-02T19:25:57Z

The javadoc for CacheStream states that operations are shipped to remote nodes and performed there.

True, but the Javadoc does not say anything about not being able to use a collector with more than 2GB's worth of data from each node.

The first option I considered was to make the collectors use distributedBatchSize(), but that seemed much too complicated, so I went for the simpler option.

This is changing and removing functionality that you cannot reproduce anywhere else. Where as a user can very easily convert the stream to an iterator and do this themselves.

I believe users can also emulate a collector using reduce(), e.g.

      List<Map.Entry<Integer, String>> result2 =
            createStream(entrySet).reduce(Collections.emptyList(),
                                          (list, e) -> {
                                             if (list.isEmpty()) list = new ArrayList<>();
                                             list.add(e);
                                             return list;
                                          },
                                          (left, right) -> {
                                             if (left.isEmpty()) return right;
                                             left.addAll(right);
                                             return left;
                                          });
      assertEquals(range, result2.size());

I know it's ugly, but somehow I like that it's ugly and users would give it a second thought before using that code in production :)

On the one hand I can imagine applications needing a "mutable reduction" that actually reduces the size of the values and is just nicer to express using the collector methods compared to the reduce methods; OTOH our CacheCollectors methods steer users towards java.util.Collectors, and basically all our tests/examples just add the values in a collection.

wburns · 2021-11-02T19:33:25Z

True, but the Javadoc does not say anything about not being able to use a collector with more than 2GB's worth of data from each node.

Yeah, sadly that is a limitation for everything we do since we are tied to Integer.MAX_VALUE for array lengths.

The first option I considered was to make the collectors use distributedBatchSize(), but that seemed much too complicated, so I went for the simpler option.

Yeah, you don't want to do that, heh.

I believe users can also emulate a collector using reduce(), e.g.

But then we have different terminal operations that some go remote and some don't. Reduce and collect are very similar to each other and it would be weird that only some of them are local. Sorry, I meant that if we changed all of these then there would be no alternative.

OTOH our CacheCollectors methods steer users towards java.util.Collectors, and basically all our tests/examples just add the values in a collection.

Yeah, that is because this is using the Stream API and java.util.stream.Collector is a pretty predominate API in it, so I was trying to keep aligned with it.

Once the new API comes in assuming it exposes something like ClusterPublisherManager this will be a lot simpler to distinguish local and remote invocations as only the functions that transform Publishers are sent remotely and anything done to the returned Publisher will be local only.

danberindei · 2021-11-04T20:08:38Z

Updated to fix the overflow exception and to add a warning in the collector methods javadoc.

wburns · 2021-11-05T13:53:40Z

Integrated into main, thanks @danberindei !

danberindei requested a review from wburns November 2, 2021 10:46

ISPN-13428 Speed up stream tests by removing cache size checks

91bce62

danberindei force-pushed the ISPN-13428_collectors branch from 0969b41 to dbf9a58 Compare November 4, 2021 20:03

danberindei added 2 commits November 4, 2021 22:04

ISPN-13438 BytesObjectOutput does not handle overflow correctly

9da2cbe

ISPN-13428 Add warning to CacheStream collector operations javadoc

4789d3f

danberindei force-pushed the ISPN-13428_collectors branch from dbf9a58 to 4789d3f Compare November 4, 2021 20:04

wburns merged commit b598450 into infinispan:main Nov 5, 2021

danberindei deleted the ISPN-13428_collectors branch November 5, 2021 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISPN-13428 Distributed streams collectors ignore distributedBatchSize #9622

ISPN-13428 Distributed streams collectors ignore distributedBatchSize #9622

danberindei commented Nov 2, 2021 •

edited

danberindei commented Nov 2, 2021

wburns commented Nov 2, 2021

danberindei commented Nov 2, 2021 •

edited

wburns commented Nov 2, 2021

danberindei commented Nov 4, 2021

wburns commented Nov 5, 2021

ISPN-13428 Distributed streams collectors ignore distributedBatchSize #9622

ISPN-13428 Distributed streams collectors ignore distributedBatchSize #9622

Conversation

danberindei commented Nov 2, 2021 • edited

danberindei commented Nov 2, 2021

wburns commented Nov 2, 2021

danberindei commented Nov 2, 2021 • edited

wburns commented Nov 2, 2021

danberindei commented Nov 4, 2021

wburns commented Nov 5, 2021

danberindei commented Nov 2, 2021 •

edited

danberindei commented Nov 2, 2021 •

edited