Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lazy dropEnd and friends #395

Merged
merged 17 commits into from
Jul 28, 2021
Merged

Add lazy dropEnd and friends #395

merged 17 commits into from
Jul 28, 2021

Conversation

3kyro
Copy link
Contributor

@3kyro 3kyro commented May 26, 2021

Fixes #306.
Utilizes an eager strategy as described in the original issue. Chunks
are eagerly evaluated but the overall structure and pointers are kept
intact.

Fixes #306.
Utilizes an eager strategy as described in the original issue. Chunks
are eagerly evaluated but the overall structure and pointers are kept
intact.
Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have a couple of suggestions.

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
@3kyro
Copy link
Contributor Author

3kyro commented May 27, 2021

Thanks for the help!

I used foldrChunks instead of length based checks, but I still have some concerns/questions: Will the tuple based folds work when used in a streaming manner, as discussed in a comment in the original issue? e.g. would take x . dropEnd y operate in constant space?

Would explicitly implementing the folds help? eg;

dropEnd i cs0 = cs 
  where 
    (_,cs) = go i cs0
    go :: Int64 -> ByteString -> (Int64, ByteString)
    -- n keeps track of dropped bytes    
    go n cs = ...

Hope this makes sense 😄

@Bodigrim
Copy link
Contributor

Sorry, @3kyro, I'm on the go and won't have time to review until two weeks from now.

Will the tuple based folds work when used in a streaming manner, as discussed in a comment in the original issue?

Good question. Could you possibly check it in REPL with something like Chunk "foo" (Chunk "bar" undefined)? Bonus points for adding tests, you can find some inspiration here.

@3kyro
Copy link
Contributor Author

3kyro commented May 28, 2021

No problem. I'll check lazy behavior and add some checks as well as fixing ghc-8,2 CI failures.

@3kyro
Copy link
Contributor Author

3kyro commented Jun 7, 2021

Hi @Bodigrim, I'm not really sure I can create a version of these end manipulation functions that is sufficiently lazy. Based on the discussion in #306, we evaluate all chunks when doing (take x . dropEnd y) cs. This seems necessary as cs needs to be fully evaluated to know if there's anything left to take.

Regarding style, I'm not sure if you'd prefer a more explicit implementation, similar to other functions in the module (drop, take etc). dropEnd would be something like:

dropEnd :: Int64 -> ByteString -> ByteString
dropEnd i p | i <= 0 = p
dropEnd i cs0 = cs0' 
  where (_ , cs0') = dropEnd' (fromIntegral i) cs0
        dropEnd' n cs | n <= 0 = (0, cs)
        dropEnd' n Empty = (n, Empty)
        dropEnd' n (Chunk c Empty) = 
            (n - S.length c, S.fromStrict $ S.dropEnd n c)
        dropEnd' n (Chunk c cs) = 
            let (n',cs') = dropEnd' n cs
              in (n' - S.length c, S.fromStrict (S.dropEnd n' c) `append` cs')

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @3kyro, I got distracted by other things for longer than planned. Hope it was not too long.

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
- Pass Int64 to accumulators
- Clean up `breakEnd`
- Implement lazy version of `dropWhileEnd`
- Add dropWhileEnd lazy test
Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
@Bodigrim
Copy link
Contributor

Bodigrim commented Jul 1, 2021

@3kyro I'm afraid this is not quite correct, as you can see from CI. Try

cabal test --test-options '-p dropEnd --quickcheck-tests 1000'

@3kyro
Copy link
Contributor Author

3kyro commented Jul 14, 2021

Hi @Bodigrim, sorry for the delay but I was away for some days.

I have corrected the behavior of dropEnd, trying not to reverse chunk lists too much. I believe I'm still within O(n) but I would really appreciate your feedback. Having a FIFO container would make the code much simpler, but I guess importing containers is out of the question.

@sjakobi
Copy link
Member

sjakobi commented Jul 14, 2021

Having a FIFO container would make the code much simpler, but I guess importing containers is out of the question.

AFAIK adding a dependency on containers (or maybe alternatively array) would probably be okay. We'd have to check how it affects GHC build times though.

@Bodigrim
Copy link
Contributor

I think this PR does not justify an additional dependency, even containers. We do not really need Data.Sequence here, only a simple stack, which can be implemented in a small utility module (as two lists). But I guess reversing is not prohibitively expensive.

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@3kyro, nice! I think we are almost there, please take a look at suggestions.

dropEnd :: Int64 -> ByteString -> ByteString
dropEnd i p | i <= 0 = p
dropEnd i p = go [] [] 0 0 p
where go hss tss acc h (Chunk c cs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a comment, explaining meaning of arguments? AFAIU hss and tss model a bidirectional queue (aka deque), acc tracks total length of hss + tss, and h tracks the length of head hss.

Computing length of head hss is O(1) operation, so I think we can simplify a bit by not passing h argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the need for the head of hss as its length is already included in the accumulated length. Furthermore, in a Deque, calculating the length of itshead might trigger a re-balancing which I think would not be ideal (e.g. it would always re balance on the second chunk). This seems a bit cleaner, although we might trigger getOutput a bit more often than before.

len c = fromIntegral (S.length c)
hLen cs = maybe 0 len (listToMaybe cs)

getOutput out [] [] acc = (out, [], [], acc)
Copy link
Contributor

@Bodigrim Bodigrim Jul 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it would be nice to document meaning of arguments and return values.

What I can suggest is creating data Deque = Deque { hss :: [S.ByteString], tss :: [S.ByteString], acc :: Int64 } (bonus points for better fields names). Then go becomes go :: Deque -> ByteString -> ByteString, and getOutput :: [S.ByteString] -> Deque -> ([S.ByteString], Deque). This reduces cognitive load for a reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe now with the use of the Deque, arguments are quite easy to follow. I'd happily add comments though if necessary

getOutput out [] [] acc = (out, [], [], acc)
getOutput out [] bss acc = getOutput out (L.reverse bss) [] acc
getOutput out (x:xs) bss acc =
if len x <= acc - i - len x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks suspicious, there are len x on both sides. Wouldn't len x <= acc - i be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its redundant

@3kyro
Copy link
Contributor Author

3kyro commented Jul 19, 2021

Hi @Bodigrim, I've added an unexposed Data.Bytestring.Lazy.Internal.Deque module and modified dropEnd accordingly. I have not prepared tests for Deque's functions as the module only tests the exposed API.

`dropEnd` now uses `Deque` for handling the accumulated bytestrigs
Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

-- O(1) , occasionally O(n)
popFront :: Deque -> (Maybe S.ByteString, Deque)
popFront (Deque [] [] _) = (Nothing, empty)
popFront (Deque [] rs acc) = popFront (Deque (reverse rs) [] acc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such pattern is appealing, but GHC is likely to conclude that there is an unbounded general recursion going on and will resist to inline and optimize and such. I'd rewrite it as

popFront (Deque [] rs acc) = case reverse rs of 
  [] -> ... 
  x : xs -> ...

-- Pop a `S.ByteString` from the front of the `Deque`
-- Returns the bytestring, or Nothing if the Deque is empty, and the updated Deque
-- O(1) , occasionally O(n)
popFront :: Deque -> (Maybe S.ByteString, Deque)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
popFront :: Deque -> (Maybe S.ByteString, Deque)
popFront :: Deque -> Maybe (S.ByteString, Deque)

This is a bit more precise: if it's Nothing, there is little point to return a "new" Deque.

else (out, deque)

-- drop n elements from the rear of the accumulating `deque`
dropElements deque n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add type signatures for go, getOutput, dropElements and fromDeque?

case D.popFront deque of
(Nothing, deque') -> (out, deque')
(Just x, deque') ->
if len x <= D.elemLength deque' - i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deque' is deque without x, so I'd expect just if D.elemLength deque' >= i then ...


-- drop n elements from the rear of the accumulating `deque`
dropElements deque n
| D.null deque = Empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you do not need to handle this case specially: if deque is empty, then D.popRear returns Nothing and Empty naturally arises below.

@Bodigrim Bodigrim requested a review from sjakobi July 19, 2021 20:55
Plus all other review suggestions
@3kyro
Copy link
Contributor Author

3kyro commented Jul 20, 2021

Latest review changes added

bytestring.cabal Outdated Show resolved Hide resolved
Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy.hs Show resolved Hide resolved
Data/ByteString/Lazy/Internal/Deque.hs Show resolved Hide resolved
Data/ByteString/Lazy/Internal/Deque.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy/Internal/Deque.hs Outdated Show resolved Hide resolved

-- get all `S.ByteString` from the front of the accumulating deque
-- for which we know they won't be dropped
getOutput :: [S.ByteString] -> D.Deque -> ([S.ByteString], D.Deque)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of operating on lists of strict bytestrings, it seems convenient to simply use the lazy bytestring type. Is there a reason not to do that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, seems nicer. The only problem is that there is no lazy left fold for chunks, and so we need to first reverse chunks and fold from the right. Might be even better to pay traveral than have the space leaked by the foldl?

| D.elemLength deque < i = go (D.snoc c deque) cs
| otherwise =
let (output, deque') = getOutput [] (D.snoc c deque)
in L.foldl (flip chunk) (go deque' cs) output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using a lazy foldl here instead of foldl'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lazy fold here is necessary so that we don't force go deque cs`. However this has now been changed (see above review comment)

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress. I'll probably be AFK until Saturday. I can review more then.

Data/ByteString/Lazy.hs Show resolved Hide resolved
@3kyro
Copy link
Contributor Author

3kyro commented Jul 21, 2021

Good progress. I'll probably be AFK until Saturday. I can review more then.

Hi, I'm sorry, I started replying before pushing the PR. Thanks for the review

edit: I've rebased as my previous commits were not clean. There is an additional change included, I'm returning a Deque from dropEndBytes instead of directly returning a ByteString as I believe makes the intent a bit clearer.

Sorry for the confusion

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

Since the implementation has turned out to be so complex, I'm relying on the tests to catch any bugs.

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved
Comment on lines +806 to +809
-- | Returns the longest (possibly empty) suffix of elements
-- satisfying the predicate.
--
-- @'takeWhileEnd' p@ is equivalent to @'reverse' . 'takeWhile' p . 'reverse'@.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example would be nice to have here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used Chunk (pack [1,2]) (Chunk (pack [3,4,6])) Empty as an example of a lazy bytestring. Hope it's not too verbose, but textual representation of a bytestring is always tricky.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would be helpful for users who will mostly not be aware of ByteString's internal constructors. If you want to represent the bytes as numbers, you can use the OverloadedLists syntax, e.g. [1,2,3,4,6].

Data/ByteString/Lazy.hs Show resolved Hide resolved
Data/ByteString/Lazy/Internal/Deque.hs Outdated Show resolved Hide resolved
3kyro and others added 2 commits July 27, 2021 15:32
Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>
Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more wibble – LGTM apart from that. :)

Comment on lines +806 to +809
-- | Returns the longest (possibly empty) suffix of elements
-- satisfying the predicate.
--
-- @'takeWhileEnd' p@ is equivalent to @'reverse' . 'takeWhile' p . 'reverse'@.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that would be helpful for users who will mostly not be aware of ByteString's internal constructors. If you want to represent the bytes as numbers, you can use the OverloadedLists syntax, e.g. [1,2,3,4,6].

@Bodigrim
Copy link
Contributor

@3kyro please check CI failure.

@3kyro
Copy link
Contributor Author

3kyro commented Jul 27, 2021

One more wibble – LGTM apart from that. :)

Hi @sjakobi, I've updated the examples using OverloadedList, however I still believe that we could have mentioned Chunk and Empty; for reference, "learn yourself haskell for real good" which is a pretty well known introduction to haskell explicitly mentions the lazy bytestrings constructors, eg in input and output.

Thanks again for the review :)

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@Bodigrim Bodigrim added this to the 0.11.2.0 milestone Jul 28, 2021
@Bodigrim Bodigrim merged commit bb7540a into haskell:master Jul 28, 2021
@Bodigrim
Copy link
Contributor

Thanks @3kyro!

Bodigrim pushed a commit that referenced this pull request Aug 1, 2021
* Add lazy dropEnd and friends

Fixes #306.
Utilizes an eager strategy as described in the original issue. Chunks
are eagerly evaluated but the overall structure and pointers are kept
intact.

* Fix `since` version number

* Use `foldrChunks` instead of `length` based checks

* Add review changes

- Pass Int64 to accumulators
- Clean up `breakEnd`
- Implement lazy version of `dropWhileEnd`
- Add dropWhileEnd lazy test

* Fix `breakEnd` and `spanEnd` lazyness

* Make `dropEnd` lazier

* Formatting

* Fix lazy `dropEnd`

* Add `Deque` module

`dropEnd` now uses `Deque` for handling the accumulated bytestrigs

* Normalize function names

* Return `Maybe (S.ByteString, Deque)` from pops

Plus all other review suggestions

* Add review changes

* Rename `dropElements` to `dropEndBytes`

* Add examples + style fixes

* Update Data/ByteString/Lazy/Internal/Deque.hs

Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>

* Replace `elemLength` with `byteLength`

* Upadate examples

Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>
noughtmare pushed a commit to noughtmare/bytestring that referenced this pull request Dec 12, 2021
* Add lazy dropEnd and friends

Fixes haskell#306.
Utilizes an eager strategy as described in the original issue. Chunks
are eagerly evaluated but the overall structure and pointers are kept
intact.

* Fix `since` version number

* Use `foldrChunks` instead of `length` based checks

* Add review changes

- Pass Int64 to accumulators
- Clean up `breakEnd`
- Implement lazy version of `dropWhileEnd`
- Add dropWhileEnd lazy test

* Fix `breakEnd` and `spanEnd` lazyness

* Make `dropEnd` lazier

* Formatting

* Fix lazy `dropEnd`

* Add `Deque` module

`dropEnd` now uses `Deque` for handling the accumulated bytestrigs

* Normalize function names

* Return `Maybe (S.ByteString, Deque)` from pops

Plus all other review suggestions

* Add review changes

* Rename `dropElements` to `dropEndBytes`

* Add examples + style fixes

* Update Data/ByteString/Lazy/Internal/Deque.hs

Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>

* Replace `elemLength` with `byteLength`

* Upadate examples

Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lazy dropEnd and friends
3 participants