Lazy dropEnd and friends #306

Bodigrim · 2020-10-18T14:58:15Z

There are no dropEnd, takeEnd, dropWhileEnd, takeWhileEnd, spanEnd and breakEnd operating on lazy bytestrings. However, it is not clear for me which degree of laziness is desired here.

Imagine a lazy bytestring of several chunks of length 100, and dropEnd 80.

One strategy is as follows: we look at the first chunk only, without forcing the rest of the string. It has 100 chars, so we can return a chunk containing the first 20 chars immediately, and put remaning 80 in a buffer. Then we force the tail and observe that the next chunk is also 100-chars-long, so we can flush the buffer, and also a chunk of 20 chars, etc. The benefit is that we operate as lazy as possible and do not trigger hidden effects (lazy IO) unnecessary. But the cost is that the result has twice as many chunks as the input: 20 chars, 80 chars, 20 chars, 80 chars...

Another strategy is more eager. Evaluate first two chunks immediately: if the length of the second is > 80, return the first one unchanged. Evaluate the third chunk: if it is longer than 80 chars, return the second chunk unchanged, etc. In this case the output string retain original chunks of length 100, except the last one, which is good because of less fragmentation and pointers involved. The downside is that we may cause more lazy IO than strictly necessary.

@vdukhovni @sjakobi @hsyl20 what do you think?

The text was updated successfully, but these errors were encountered:

vdukhovni · 2020-10-18T19:16:48Z

The second strategy is correct. It avoids lots of copying that's likely more expensive than any benefit from forcing an I/O early. With the first strategy you end up copying every buffer in the stream, with the second, all but the last are returned verbatim.

I should also mention that you're assuming that lazy bytestrings have pending lazy I/O operations, but this is not true in general, often they're just pure octet-strings stored as multiple chunks. And, for example, with takeWhileEnd you can't return anything until you've reached the actual end, because there could always be some later octets that don't meed the predicate.

It is perhaps worth noting that takeEnd and dropEnd can be performed in a streaming manner, within a finite memory budget, but dropWhileEnd and takeWhileEnd are not compatible with streaming in general. Though of course with dropWhileEnd, if any octet of a chunk fails the predicate, one can immediately release all prior chunks and also that one if the octet in question is the last one. The worst behaviour of the lot is takeWhileEnd, nothing can be output until the end has been seen. Don't do that with large data streams that involve lazy I/O.

FWIW, none of these operations are available in Streaming.ByteString, but that's not too surprising, since the idea was to build a "better lazy ByteString", with as much of the same API as made sense...

Bodigrim · 2020-10-20T21:39:50Z

Yeah, I'm also in favor of the second strategy.

Bodigrim · 2020-10-25T13:12:23Z

@sjakobi @hsyl20 opinions?

sjakobi · 2020-11-01T10:50:56Z

The second strategy sounds preferable to me. Thanks for explaining the trade-off so clearly!

* Add lazy dropEnd and friends Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>

* Add lazy dropEnd and friends Fixes haskell#306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>

Bodigrim added the discussion/rfc label Oct 18, 2020

Bodigrim added pr-welcome and removed discussion/rfc labels Nov 1, 2020

Bodigrim mentioned this issue Jan 3, 2021

Iron out API discrepancies #289

Open

3kyro mentioned this issue May 26, 2021

Add lazy dropEnd and friends #395

Merged

Bodigrim added this to the 0.11.2.0 milestone Jul 28, 2021

Bodigrim closed this as completed in #395 Jul 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazy dropEnd and friends #306

Lazy dropEnd and friends #306

Bodigrim commented Oct 18, 2020

vdukhovni commented Oct 18, 2020

Bodigrim commented Oct 20, 2020

Bodigrim commented Oct 25, 2020

sjakobi commented Nov 1, 2020

Lazy dropEnd and friends #306

Lazy dropEnd and friends #306

Comments

Bodigrim commented Oct 18, 2020

vdukhovni commented Oct 18, 2020

Bodigrim commented Oct 20, 2020

Bodigrim commented Oct 25, 2020

sjakobi commented Nov 1, 2020