Straighten folds and scans. #364

kindaro · 2021-02-20T19:22:51Z

Resolve #373.
Resolve #372.

kindaro · 2021-02-20T19:25:23Z

I am going to keep this pull request updated so that fellow developers may be apprised of my progress. Feel free to comment!

Bodigrim · 2021-02-20T19:35:18Z

(Windows test is flaky, do not pay attention to it; I'll rerun)

kindaro · 2021-02-20T19:35:30Z

@Bodigrim I noticed something suspicious.

To get a feel for the ball park my benchmarks should be in, I added a benchmark for strict and lazy foldl'. (Both are already in master — I did not touch them.) What I notice is that the lazy foldl' performs 10 times as fast as the strict one. How can it be possible? Maybe my benchmarks are broken?

This is what I see.

By chance you have an explanation?

Bodigrim · 2021-02-20T19:52:31Z

By chance you have an explanation?

Yes, it is a known quirk of benchmarks. See #345, #329, #23.

Note that benchmarks have a form nf (foldl' arg1 arg2) bs. Now compare definitions of strict and lazy bytestrings:

bytestring/Data/ByteString.hs

Lines 501 to 502 in d52d42d

    
           foldl' :: (a -> Word8 -> a) -> a -> ByteString -> a 
        
           foldl' f v (BS fp len) =

bytestring/Data/ByteString/Lazy.hs

Lines 447 to 448 in d52d42d

    
           foldl' :: (a -> Word8 -> a) -> a -> ByteString -> a 
        
           foldl' f = go

The application of the strict foldl' needs three arguments to become saturated and suitable for inlining. But under nf only two arguments are supplied, so it remains non-inlined with correspondent runtime performance penalty. But the lazy foldl', according to its definition, needs only one argument to be saturated, so it inlines perfectly even under nf.

I think that real-world consequences of this are not as severe as one may expect looking at benchmarks: I expect that in the majority of client code foldl' has all three arguments (or is expanded by GHC to have them, see examples with foldl' (+) 0 . id in #23). Still it would be nice to fix this quirk, but #345 uncovered more issues with inlining (basically, it inlines better without {-# INLINE #-} at all), and I did not have time to experiment with it properly.

kindaro · 2021-02-23T09:56:32Z

Curious, I would never think some inlining could give a tenfold performance boost.

@Bodigrim I added some strictness checks and it looks to me as though the functions I defined behave the way their names suggest. (It is all quite intricate so a second look would be good.) However, I found out that I do not really know if it is appropriate to call them right folds. A lazy byte string is essentially a list of lists, and there is not really a text book definition for a left or right fold over that sort of thing… Does this look like a right fold to you?

foldr' f a (Chunk c cs) = S.foldr' f (foldr' f a cs) c

I would like to make sure this piece of code is up to our quality standards before moving on to other functions, so your review would be appreciated!

Data/ByteString/Lazy.hs

tests/Properties.hs

kindaro · 2021-03-01T10:40:14Z

We have comments like this:

-- | 'scanl' is similar to 'foldl', but returns a list of successive
-- reduced values from the left. This function will fuse.

I suppose it would be good if I can claim something similar for the functions I add, like scanl1. But I do not know how.

How do we know that a given function will fuse? What does this even mean? What are the laws? Where is it documented and checked?

kindaro · 2021-03-01T14:45:48Z

I think this is a good place to take a break and get things merged. Property checks tell us that the behaviour of the new functions is identical to their strict analogues.

tests/Properties.hs

Data/ByteString/Lazy/Char8.hs

Bodigrim · 2021-03-01T18:55:43Z

How do we know that a given function will fuse? What does this even mean? What are the laws? Where is it documented and checked?

Thanks for noticing this. The thing is the comment is 15 years old. 13 years ago fusion framework in ByteString was replaced by a stub, and eventually removed 10 years ago. But the comment has persisted. I'm afraid there is nothing we can call "fusion" in ByteString nowadays. Could you please clean up survived instances of this comment?

kindaro · 2021-03-01T19:37:38Z

I'm afraid there is nothing we can call "fusion" in ByteString nowadays. Could you please clean up survived instances of this comment?

I opened an issue #374 to track this.

tests/Properties.hs

Bodigrim · 2021-03-01T19:53:45Z

Data/ByteString/Lazy.hs

+    -- ^ input of length n
+    -> ByteString
+    -- ^ output of length n+1
+scanr f z = pack . fmap (foldr f z) . tails


This is a bit unfortunate, because there is no sharing between tails, so you end up folding each tail with f independently, O(n^2) operations in total.

Yes, I would like to do something about it. I need to meditate on this. Can I maybe reverse the thing and then fold it from the start? It is forced all along anyway.

I am not sure how to think about this sort of things.

I have not gave it much thought, but maybe we can start from a high-level picture? Let's write lazy

mapAccumLChunks :: (acc -> Strict.ByteString -> (acc, Strict.ByteString)) -> acc -> Lazy.ByteString -> (acc, ByteString) mapAccumRChunks :: (acc -> Strict.ByteString -> (acc, Strict.ByteString)) -> acc -> Lazy.ByteString -> (acc, ByteString)

Then reuse them in definitions of mapAccum{L,R}, and scan{l,r} as well (scans are just a special case of mapAccum).

My understanding is that, since we output a byte string about as long as the input, time and space complexity is bounded below by O(n). This means to my mind that reversing and then using scanl is as efficient as it gets. Is there anything I get wrong? What is the mark I am aiming at?

Also, should I write some bench marks?

I think we can do a bit better than reversing the entire input. I think we can do one pass to reverse the order of the chunks and determine the total length. Then do a second pass over the reversed sequence of chunks and write the output.

Benchmarks would be useful to check that we're roughly in the right ballpark.

Alright, I get it like this:

We agree that linear time and space are lower bounds.

We still want to avoid some expensive operations, like reversing all chunks.

Sounds about right?

Yep, sound right to me! :)

kindaro · 2021-03-05T17:08:45Z

I ran some bench marks and performance is orders of magnitude behind expectations. I am now angling to follow the advice above and write everything in terms of mapAccumLChunks and mapAccumRChunks. It will take a few days.

kindaro · 2021-04-18T12:57:25Z

How about this? Turns out we can use standard recursion schemes. I am not sure if there would be some performance drop on older compilers but I did not notice any in my usual setting.

Data/ByteString/Lazy.hs

kindaro · 2021-08-15T19:27:19Z

Sorry I disappeared.

I had been trying to add checks for other functions and it was hard. So I took a month long creative break. It was fun and now I am back with new ideas. I am going to do this:

Property check the evaluation of strict subjects versus strict oracles, and the other way around.

It should pass, but it can also pass if the check is not fine enough, or if the functions are actually the same.
Property check the evaluation of strict subjects versus lazy oracles, and the other way around.

We are going to expect failure here. If it fails, we can say that our checks are fine enough to detect the difference in laziness, and that said difference actually is there.

So, our checks are going to check themselves, and also verify that our definitions of functions with different strictness are truly different.

sjakobi · 2021-08-16T18:13:02Z

@kindaro I'm kind of wary of large, long-running PRs and shifting goal-posts. PRs like this tend to attract merge conflicts and ultimately consume a disproportionate amount of developers' and reviewers' resources.

Are there parts in this PR that could be merged already? Even if some parts are not "fully" ready, we could merge them and possibly consider not exposing them in the next release if we think they might not be safe for users yet.

kindaro · 2021-08-16T18:31:39Z

Yes, we can throw out automated strictness checks and trust our own judgement with regard to whether the functions in question have the right strictness properties. Then we can merge tomorrow.

However, the way I see it, the whole point of automated strictness checks is to reduce the involvement of reviewers. So the optimizations for merging sooner and for spending less reviewers' resources are actually conflicting. (I am fine with spending more developers' resources since I am the only developer and I optimize for quality.)

We can also merge the library code tomorrow, then merge the checks when they are ready and quickly fix the library code if any faults turn up.

Whatever the maintainers say I do.

Bodigrim · 2021-08-16T18:41:40Z

I'm happy to spend additional reviewers' resources here, whenever you mark this PR as ready for review. While I deeply appreciate your efforts on automatic strictness checks, it feels like their volume and dependency footprint warrants a separate PR (and potentially a separate package).

kindaro · 2021-08-18T00:23:30Z

Yo, I took away the strictness checks and responded to the pending comments from previous reviews.

Bodigrim

Thanks! Couple minor suggestions, but looks good overall.

Data/ByteString/Lazy.hs

Bodigrim · 2021-08-18T23:56:20Z

tests/Properties.hs

+        L.take (L.length xs + 1) (L.scanl (+) 0 (explosiveTail (xs <> L.singleton 1))) === (L.pack . fmap (L.foldr (+) 0) . L.inits) xs
+    , testProperty "scanl1 is lazy" $ \ xs -> L.length xs > 0 ==>
+        L.take (L.length xs) (L.scanl1 (+) (explosiveTail (xs <> L.singleton 1))) === (L.pack . fmap (L.foldr1 (+)) . tail . L.inits) xs
+    ]


I was wondering why scanr is less lazy than scanl. The thing is that its output starts from an accumulator, and Data.ByteString.mapAccumR is too strict in this respect.

bytestring/Data/ByteString.hs

Lines 734 to 743 in 05a09c3

go src dst = mapAccumR_ acc (len-1)

where

mapAccumR_ !s (-1) = return s

mapAccumR_ !s !n = do

x <- peekByteOff src n

let (s', y) = f s x

pokeByteOff dst n y

mapAccumR_ s' (n-1)

acc' <- unsafeWithForeignPtr gp (go a)

return (acc', BS gp len)

I think this is fine: there are no particular expectations about strictness of scanr (there is no scanr' in Prelude).

Not sure I follow. Is there a specific proposition you are reasoning towards? Or a question I may answer? For example:

Proposition Data.ByteString.Lazy.scanr cannot be lazy.

Proof

As you noted, the output of a scanr starts from the end, so this is the sort of laziness we can have:

λ take 2 . reverse $ Prelude.scanr (+) 0 [undefined, 1] [0,1]

So, first the spine of the input list is evaluated to the end, then elements are evaluated from the end backwards. (Whether the accumulator is evaluated before or after the first element depends on the order of evaluation of +.) Similarly, the byte stream's spine would have to be evaluated first. But the spine of the byte stream is strict in the leaf:

bytestring/Data/ByteString/Lazy/Internal.hs

Lines 74 to 85 in 05a09c3

-- | A space-efficient representation of a 'Word8' vector, supporting many

-- efficient operations.

--

-- A lazy 'ByteString' contains 8-bit bytes, or by using the operations

-- from "Data.ByteString.Lazy.Char8" it can be interpreted as containing

-- 8-bit characters.

--

data ByteString = Empty | Chunk {-# UNPACK #-} !S.ByteString ByteString

deriving (Typeable, TH.Lift)

-- See 'invariant' function later in this module for internal invariants.

The leaf itself is a byte array and therefore also strict throughout. So, once we force the spine, every byte is also forced. There is no lazy scanr for byte streams. ∎

Something like this?

This was just a remark for myself and @sjakobi and anyone else who is puzzled why we have laziness properties for scanl, but not for scanr.

It's not like you cannot make Data.ByteString.Lazy.scanr a bit lazier. E. g., for the proposed implementation

> Data.ByteString.Lazy.head $ Data.ByteString.Lazy.scanr const 42 ("foo" <> undefined) *** Exception: Prelude.undefined CallStack (from HasCallStack): error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err undefined, called at <interactive>:11:75 in interactive:Ghci1

However, if we are ready to sacrifice performance, one can define

scanr f z bs = cons hd tl where (_, tl) = mapAccumR (\x y -> (f y x, x)) z bs (hd, _) = List.mapAccumR (\x y -> (f y x, x)) z (unpack bs)

for which

> Data.ByteString.Lazy.head $ Data.ByteString.Lazy.scanr const 42 ("foo" <> undefined) 102

You can define an even lazier (and slower) version, capable to return first few chunks of bytestring, as long as f is very lazy (e. g., f = const).

My point is that this is a rare use case, which does not justify performance sacrifices, especially given that there is no general expectation how lazy scanr should be. I'm fine with your implementation, no action required.

sjakobi

I'm happy to see that this PR is close to being merged now! :)

Data/ByteString/Lazy.hs

Turns out we do not really need it. We thought we need it to implement `scan[lr]`, but actually `mapAccum[LR]` is enough.

sjakobi

Thank you, @kindaro! :)

Bodigrim · 2021-08-19T18:01:13Z

Great stuff, @kindaro! Thanks!

* Add strict right folds. * Add property checks. * Add benchmarks. * Inline strictness checks. * Straighten scans. * Fix whitespace. * Use `===` for equality. * Use infix operator for brevity. * Add bench marks for lazy scans. * Use standard recursion schemes. * Dodge import conflicts on older GHC versions. * Final considerations according to the last review. * Final considerations according to one more last review. * Add bench mark for lazy accumulating maps. * Throw away `mapAccum[LR]Chunks`. Turns out we do not really need it. We thought we need it to implement `scan[lr]`, but actually `mapAccum[LR]` is enough.

kindaro force-pushed the iron-out-api-discrepancies branch from 364fb3c to 280b2ea Compare February 23, 2021 09:32

Bodigrim reviewed Feb 24, 2021

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

tests/Properties.hs Outdated Show resolved Hide resolved

tests/Properties.hs Outdated Show resolved Hide resolved

kindaro mentioned this pull request Feb 26, 2021

Remove unnecessary comments. #369

Merged

kindaro force-pushed the iron-out-api-discrepancies branch from e3a08d8 to 955990f Compare February 27, 2021 19:06

Bodigrim reviewed Feb 28, 2021

View reviewed changes

tests/Properties.hs Outdated Show resolved Hide resolved

This was referenced Mar 1, 2021

Iron out API discrepancies #289

Open

Lazy scanl is not actually lazy. #372

Closed

kindaro changed the title ~~Iron out API discrepancies~~ Straighten folds and scans. Mar 1, 2021

kindaro marked this pull request as ready for review March 1, 2021 14:51

Bodigrim reviewed Mar 1, 2021

View reviewed changes

tests/Properties.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy/Char8.hs Outdated Show resolved Hide resolved

kindaro mentioned this pull request Mar 1, 2021

There is no fusion anymore. #374

Closed

kindaro force-pushed the iron-out-api-discrepancies branch from d7097a4 to b3ac740 Compare March 1, 2021 19:39

kindaro requested a review from Bodigrim March 1, 2021 19:40

Bodigrim reviewed Mar 1, 2021

View reviewed changes

kindaro force-pushed the iron-out-api-discrepancies branch from 812eb96 to ea54f76 Compare April 18, 2021 12:42

kindaro requested review from Bodigrim April 18, 2021 12:59

kindaro commented Apr 18, 2021

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

kindaro added 4 commits August 15, 2021 16:08

Use infix operator for brevity.

0c10e05

Add bench marks for lazy scans.

db41966

Use standard recursion schemes.

d8ffe46

Dodge import conflicts on older GHC versions.

8ac8ad8

kindaro force-pushed the iron-out-api-discrepancies branch from 6369ef9 to 890df8c Compare August 15, 2021 11:10

kindaro marked this pull request as draft August 15, 2021 11:11

Final considerations according to the last review.

d0d708d

kindaro force-pushed the iron-out-api-discrepancies branch from 890df8c to d0d708d Compare August 17, 2021 23:43

kindaro marked this pull request as ready for review August 18, 2021 00:22

Bodigrim reviewed Aug 18, 2021

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

Final considerations according to one more last review.

5fa8aed

kindaro force-pushed the iron-out-api-discrepancies branch from 20292f4 to 5fa8aed Compare August 18, 2021 20:57

kindaro requested a review from Bodigrim August 18, 2021 21:04

Bodigrim approved these changes Aug 18, 2021

View reviewed changes

Bodigrim requested a review from sjakobi August 18, 2021 23:57

sjakobi reviewed Aug 19, 2021

View reviewed changes

Data/ByteString/Lazy.hs Outdated Show resolved Hide resolved

kindaro added 2 commits August 19, 2021 18:52

Add bench mark for lazy accumulating maps.

af6c605

Throw away mapAccum[LR]Chunks.

93df278

Turns out we do not really need it. We thought we need it to implement `scan[lr]`, but actually `mapAccum[LR]` is enough.

kindaro force-pushed the iron-out-api-discrepancies branch from 5e785a3 to 93df278 Compare August 19, 2021 16:09

sjakobi approved these changes Aug 19, 2021

View reviewed changes

Bodigrim added this to the 0.11.2.0 milestone Aug 19, 2021

Bodigrim merged commit 99b7ff6 into haskell:master Aug 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Straighten folds and scans. #364

Straighten folds and scans. #364

kindaro commented Feb 20, 2021 •

edited

Loading

kindaro commented Feb 20, 2021

Bodigrim commented Feb 20, 2021

kindaro commented Feb 20, 2021 •

edited

Loading

Bodigrim commented Feb 20, 2021 •

edited

Loading

kindaro commented Feb 23, 2021

kindaro commented Mar 1, 2021

kindaro commented Mar 1, 2021 •

edited

Loading

Bodigrim commented Mar 1, 2021

kindaro commented Mar 1, 2021

Bodigrim Mar 1, 2021

kindaro Mar 1, 2021

Bodigrim Mar 1, 2021

kindaro Mar 2, 2021 •

edited

Loading

sjakobi Mar 2, 2021

kindaro Mar 3, 2021

sjakobi Mar 3, 2021

kindaro commented Mar 5, 2021

kindaro commented Apr 18, 2021 •

edited

Loading

kindaro commented Aug 15, 2021 •

edited

Loading

sjakobi commented Aug 16, 2021

kindaro commented Aug 16, 2021

Bodigrim commented Aug 16, 2021

kindaro commented Aug 18, 2021

Bodigrim left a comment

Bodigrim Aug 18, 2021

kindaro Aug 19, 2021 •

edited

Loading

Bodigrim Aug 19, 2021

sjakobi left a comment

sjakobi left a comment

Bodigrim commented Aug 19, 2021

	go src dst = mapAccumR_ acc (len-1)
	where
	mapAccumR_ !s (-1) = return s
	mapAccumR_ !s !n = do
	x <- peekByteOff src n
	let (s', y) = f s x
	pokeByteOff dst n y
	mapAccumR_ s' (n-1)
	acc' <- unsafeWithForeignPtr gp (go a)
	return (acc', BS gp len)


	-- \| A space-efficient representation of a 'Word8' vector, supporting many
	-- efficient operations.
	--
	-- A lazy 'ByteString' contains 8-bit bytes, or by using the operations
	-- from "Data.ByteString.Lazy.Char8" it can be interpreted as containing
	-- 8-bit characters.
	--
	data ByteString = Empty \| Chunk {-# UNPACK #-} !S.ByteString ByteString
	deriving (Typeable, TH.Lift)
	-- See 'invariant' function later in this module for internal invariants.

Straighten folds and scans. #364

Straighten folds and scans. #364

Conversation

kindaro commented Feb 20, 2021 • edited Loading

kindaro commented Feb 20, 2021

Bodigrim commented Feb 20, 2021

kindaro commented Feb 20, 2021 • edited Loading

Bodigrim commented Feb 20, 2021 • edited Loading

kindaro commented Feb 23, 2021

kindaro commented Mar 1, 2021

kindaro commented Mar 1, 2021 • edited Loading

Bodigrim commented Mar 1, 2021

kindaro commented Mar 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kindaro Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kindaro commented Mar 5, 2021

kindaro commented Apr 18, 2021 • edited Loading

kindaro commented Aug 15, 2021 • edited Loading

sjakobi commented Aug 16, 2021

kindaro commented Aug 16, 2021

Bodigrim commented Aug 16, 2021

kindaro commented Aug 18, 2021

Bodigrim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kindaro Aug 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

sjakobi left a comment

Choose a reason for hiding this comment

Bodigrim commented Aug 19, 2021

kindaro commented Feb 20, 2021 •

edited

Loading

kindaro commented Feb 20, 2021 •

edited

Loading

Bodigrim commented Feb 20, 2021 •

edited

Loading

kindaro commented Mar 1, 2021 •

edited

Loading

kindaro Mar 2, 2021 •

edited

Loading

kindaro commented Apr 18, 2021 •

edited

Loading

kindaro commented Aug 15, 2021 •

edited

Loading

kindaro Aug 19, 2021 •

edited

Loading