-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lazy dropEnd and friends #306
Comments
The second strategy is correct. It avoids lots of copying that's likely more expensive than any benefit from forcing an I/O early. With the first strategy you end up copying every buffer in the stream, with the second, all but the last are returned verbatim. I should also mention that you're assuming that lazy bytestrings have pending lazy I/O operations, but this is not true in general, often they're just pure octet-strings stored as multiple chunks. And, for example, with takeWhileEnd you can't return anything until you've reached the actual end, because there could always be some later octets that don't meed the predicate. It is perhaps worth noting that FWIW, none of these operations are available in |
Yeah, I'm also in favor of the second strategy. |
The second strategy sounds preferable to me. Thanks for explaining the trade-off so clearly! |
* Add lazy dropEnd and friends Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>
* Add lazy dropEnd and friends Fixes #306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>
* Add lazy dropEnd and friends Fixes haskell#306. Utilizes an eager strategy as described in the original issue. Chunks are eagerly evaluated but the overall structure and pointers are kept intact. * Fix `since` version number * Use `foldrChunks` instead of `length` based checks * Add review changes - Pass Int64 to accumulators - Clean up `breakEnd` - Implement lazy version of `dropWhileEnd` - Add dropWhileEnd lazy test * Fix `breakEnd` and `spanEnd` lazyness * Make `dropEnd` lazier * Formatting * Fix lazy `dropEnd` * Add `Deque` module `dropEnd` now uses `Deque` for handling the accumulated bytestrigs * Normalize function names * Return `Maybe (S.ByteString, Deque)` from pops Plus all other review suggestions * Add review changes * Rename `dropElements` to `dropEndBytes` * Add examples + style fixes * Update Data/ByteString/Lazy/Internal/Deque.hs Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com> * Replace `elemLength` with `byteLength` * Upadate examples Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>
There are no
dropEnd
,takeEnd
,dropWhileEnd
,takeWhileEnd
,spanEnd
andbreakEnd
operating on lazy bytestrings. However, it is not clear for me which degree of laziness is desired here.Imagine a lazy bytestring of several chunks of length 100, and
dropEnd 80
.One strategy is as follows: we look at the first chunk only, without forcing the rest of the string. It has 100 chars, so we can return a chunk containing the first 20 chars immediately, and put remaning 80 in a buffer. Then we force the tail and observe that the next chunk is also 100-chars-long, so we can flush the buffer, and also a chunk of 20 chars, etc. The benefit is that we operate as lazy as possible and do not trigger hidden effects (lazy IO) unnecessary. But the cost is that the result has twice as many chunks as the input: 20 chars, 80 chars, 20 chars, 80 chars...
Another strategy is more eager. Evaluate first two chunks immediately: if the length of the second is > 80, return the first one unchanged. Evaluate the third chunk: if it is longer than 80 chars, return the second chunk unchanged, etc. In this case the output string retain original chunks of length 100, except the last one, which is good because of less fragmentation and pointers involved. The downside is that we may cause more lazy IO than strictly necessary.
@vdukhovni @sjakobi @hsyl20 what do you think?
The text was updated successfully, but these errors were encountered: