Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SeqBuffer<T> class as a replacement for some of Seq.toList() calls #305

Merged
merged 1 commit into from Jul 28, 2017

Conversation

@tlinkowski
Copy link
Contributor

commented Apr 27, 2017

I propose a solution to the problems described in #195 for the following methods:

  • crossSelfJoin() (crossJoin() needs to be adapted by @lukaseder in jOOQ-tools)
  • inner(Self)Join()
  • leftOuter(Self)Join()

The solution for nearly all the remaining methods will consist in applying Seq.lazy() to them for which I'll create a separate PR in order to demonstrate its usefulness, as discussed in #302.


PS. SeqBuffer could be used to create a simpler implementation of Seq.duplicate.
PPS. SeqBuffer could also be used to implement method Seq.duplicate(int count). Such method would return a List<Seq<T>> containing count Seqs. However, the only application for such method that I can think of is implementing #49 which (to be honest) I don't find very useful either.

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request Apr 30, 2017
tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request Apr 30, 2017
@tlinkowski

This comment has been minimized.

Copy link
Contributor Author

commented Apr 30, 2017

I added a new implementation of Seq.splitAt [#308] that uses the proposed SeqBuffer.

@lukaseder

This comment has been minimized.

Copy link
Member

commented May 1, 2017

That looks very interesting, thank you very much for your suggestion. I'll review this later (hopefully in the next 2 days). We'll definitely need a reusable SeqBuffer.

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 1, 2017
tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 1, 2017
}

private final Spliterator<T> source;
private final List<T> buffer = new ArrayList<>();

This comment has been minimized.

Copy link
@lukaseder

lukaseder May 1, 2017

Member

Hmm, this buffer is not thread safe. It may well be that one consumer of a buffered result consumes the buffer while another consumer produces new elements at the same time. E.g., this could happen in your splitAt() implementation suggestion here: b345bbb

v2.map(t -> t.v1)
));
SeqBuffer<T> buffer = SeqBuffer.of(stream);
return tuple(buffer.seq().limit(position), buffer.seq().skip(position));

This comment has been minimized.

Copy link
@lukaseder

lukaseder May 1, 2017

Member

Much nicer, indeed. Only caveat: Thread safety in the current SeqBuffer implementation

This comment has been minimized.

Copy link
@tlinkowski

tlinkowski May 2, 2017

Author Contributor

As a side note: do you realize that the current implementation of Seq.splitAt (as well as Seq.partition) is not thread safe? :)

This comment has been minimized.

Copy link
@lukaseder

lukaseder Jul 28, 2017

Member

Hmm, splitAt() doesn't look wrong to me, but partition may well be.

This comment has been minimized.

Copy link
@billoneil

billoneil Jul 28, 2017

Contributor

I was under the impression that Seq wasn't thread safe to begin with or was it just that it will not be a parallel stream?

This comment has been minimized.

Copy link
@lukaseder

lukaseder Jul 28, 2017

Member

There's currently no guarantee in the API, as there are too many flaws, still. But in principle, the results from methods like splitAt() (which produce several Seq) should be thread safe. Or at least, we should have an option for them to be thread safe.

Perhaps this is a concept that should be reviewed more globally.

Note that thread safety and parallelism aren't the same thing in this context. Stream's parallelism allows for parallel processing of operations like map() or filter(). By keeping operations independent of one another, parallelism can drastically speed up the processing of a stream. Thread safety in our context just means that consuming two things (e.g. the splitAt() results) on different threads, we don't want to get wrong results.

But again, perhaps we should implement this more thoroughly, with a specific thread safety flag...

This comment has been minimized.

Copy link
@tlinkowski

tlinkowski Jul 28, 2017

Author Contributor

Well, Seq.splitAt was directly based on Seq.partition :) And Seq.partition definitely isn't thread-safe because both Seqs write to buffer1 and buffer2 without any synchronization.

This comment has been minimized.

Copy link
@tlinkowski

tlinkowski Jul 28, 2017

Author Contributor

@billoneil If I understand it right I think @lukaseder distinguishes between two things:

  • single Seq is sequential so as a consequence it's not thread safe and two threads cannot operate on the same Seq
  • there seems to be no restriction to consuming one Seq on one thread and another Seq on the other thread even if they both "derive" from the same Seq (hence need for thread safety in such cases)

This comment has been minimized.

Copy link
@billoneil

billoneil Jul 28, 2017

Contributor

Ah I wasn't thinking, makes sense.

@lukaseder lukaseder added this to the Version 0.9.13 milestone May 1, 2017

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 2, 2017
tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 2, 2017
@tlinkowski
Copy link
Contributor Author

left a comment

@lukaseder, here's my shot at thread-safety of SeqBuffer. Let me know what you think of it.

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 2, 2017
if (estimateSize > MAX_ARRAY_SIZE)
throw new IllegalArgumentException("Stream is too long to be buffered: " + estimateSize);

return new ArrayList<>((int) estimateSize);

This comment has been minimized.

Copy link
@lukaseder

lukaseder May 3, 2017

Member

Hm, no I don't agree with this. If we create a large capacitied array list, we're back to the original solution that you wanted to avoid: Up-front large memory consumption for intermediate buffers. Imagine:

Seq.seq(Collections.nCopies(1000000, "value")).splitAt(999999).v1.limit(1).forEach(System.out::println);

The splitAt() operation would create a large array list only to consume its first element...

This comment has been minimized.

Copy link
@tlinkowski

tlinkowski May 3, 2017

Author Contributor

Ha, you're right! I'll revert it.

PS. We wouldn't be entirely back to square one because I mainly wanted to avoid up-front processor consumption (i.e. consuming entire Seq) but you're absolutely right that it's best to also avoid up-front memory consumption.

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 3, 2017
Tomasz Linkowski
[jOOQ#305] reverted "added ArrayList initialization based on source.e…
…stimateSize()" because it introduced unnecessary up-front memory consumption

Reverted from commit f54482e
@lukaseder

This comment has been minimized.

Copy link
Member

commented May 10, 2017

Thanks for the fix. Now, I wonder again if it would be possible to squash all commits into one for this PR (GitHub has such a feature). That would make the final review much simpler for me.

Tomasz Linkowski
[#305] Added thread-safe SeqBuffer
[#195] Applied SeqBuffer to certain Seq method implementations
[#122] Tread-safe Seq.duplicate()

@tlinkowski tlinkowski force-pushed the tlinkowski:SeqBuffer branch from bf9a107 to 61e7537 May 11, 2017

@tlinkowski

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2017

Done.

PS. I couldn't find such feature (available to the PR author) anywhere on GitHub so I squashed them manually.

@lukaseder

This comment has been minimized.

Copy link
Member

commented May 16, 2017

PS. I couldn't find such feature (available to the PR author) anywhere on GitHub so I squashed them manually.

Oh, interesting. That was a wrong assumption then, sorry. I can indeed squash and merge, but not squash, review again, then merge... Will send a feature request to GitHub.

Thanks very much!

@lukaseder lukaseder referenced this pull request Jul 28, 2017
2 of 4 tasks complete

return tuple(seq(new Duplicate()), seq(new Duplicate()));
SeqBuffer<T> buffer = SeqBuffer.of(stream);
return tuple(buffer.seq(), buffer.seq());

This comment has been minimized.

Copy link
@lukaseder

lukaseder Jul 28, 2017

Member

Hmm, I see. Much simpler, sure, but the flip side of this implementation is that if both duplicates are consumed at the same speed, we're wasting a lot of memory for a buffer that might no longer be needed. What are your thoughts on this?

This comment has been minimized.

Copy link
@tlinkowski

tlinkowski Jul 28, 2017

Author Contributor

You're absolutely right. On the other hand, current implementation is thread safe thanks to SeqBuffer, and the previous one wasn't.

All in all, I'd be more inclined to this simple implementation but I guess I'd leave an appropriate comment in this method in case someone reports any memory-related issues in the future.

v2.map(t -> t.v1)
));
SeqBuffer<T> buffer = SeqBuffer.of(stream);
return tuple(buffer.seq().limit(position), buffer.seq().skip(position));

This comment has been minimized.

Copy link
@lukaseder

lukaseder Jul 28, 2017

Member

Hmm, splitAt() doesn't look wrong to me, but partition may well be.

@lukaseder lukaseder merged commit 10741dd into jOOQ:master Jul 28, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lukaseder lukaseder added the R: Fixed label Jul 28, 2017

@lukaseder

This comment has been minimized.

Copy link
Member

commented Jul 28, 2017

Thanks again for this change. I'll review this later in detail, and will let you know.

@tlinkowski tlinkowski deleted the tlinkowski:SeqBuffer branch Jul 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.