Reimplement (<*) for sequences #713

treeowl · 2020-04-06T07:54:28Z

Make (<*) for Data.Sequence incrementally asymptotically optimal.
This finally completes the task, begun in December 2014, of making all
the Applicative methods for sequences asymptotically optimal
even when their results are consumed incrementally.

Closes #314

treeowl · 2020-04-06T07:57:15Z

@int-e, @oisdk, and @sjakobi: I'd really appreciate a review of this one. I wrote it, but can't claim to understand it all properly. Some of the names are bloody awful, and I find it really hard to keep track of what things go on the left and what things go on the right. I haven't written nearly enough comments, and that's largely because any I write are likely to be at least partially wrong.

sjakobi · 2020-04-06T13:00:10Z

Impressive!

Have you benchmarked whether this is in fact faster in practice?

BTW I'm wondering under what circumstances you'd even use <* on Seq. Any ideas?

jwaldmann · 2020-04-06T14:17:58Z

BTW I'm wondering under what circumstances you'd even use <* on Seq. Any ideas?

I was wondering the same thing. Perhaps it appears in the result of an Applicative-Do transform?

I am not seeing it in the examples https://gitlab.haskell.org/ghc/ghc/-/wikis/applicative-do but perhaps there are further optimisations.

[EDIT] Section 8 of: https://doi.org/10.1145/2976002.2976007 https://simonmar.github.io/bib/papers/applicativedo.pdf says "Our desugaring does not currently exploit the *> or <* combinators that Applicative provides, and in certain cases using these operators instead of <*> can result in performance benefits."

treeowl · 2020-04-06T14:19:27Z

Impressive!

Have you benchmarked whether this is in fact faster in practice?

BTW I'm wondering under what circumstances you'd even use <* on Seq. Any ideas?

No formal benchmarks, but informal tests are mighty convincing. The allocation and residency drops are enormous. I have no idea why you'd want this operation, sadly.

dmwit · 2020-04-06T14:36:42Z

containers/src/Data/Sequence.hs

+--
+-- '<*' takes time and space proportional to the product of the length
+-- of its first argument and the logarithm of the length of its second
+-- argument.


Huh. Why do the asymptotics for <* and *> differ? Is that really right?

Yes, but it needs more explanation. *> gets to reuse the original structure of the second sequence quite heavily, so that's correct for additional allocation and residency, but doesn't count the fact that it will retain large portions of the original structure of the second sequence as well. Please feel free to suggest a better blurb. For <*, we need to build an all new copy of a structure like the second sequence for every element of the first sequence.

For *>, once we've turned the second sequence inside out (in time logarithmic in its length), we just stick that in a bunch of places unchanged. We can't get that kind of repetition in <*.

sjakobi

I admittedly haven't looked very closely at the code, but since the property test indicates that it's correct and you're happy with the performance, I don't see a reason not to merge this.

Once we know of a usecase we can still invest more effort.

We could also consider exporting beforeSeq' as e.g. replicateEach :: Int -> Seq a -> Seq a if that would be useful to anybody.

treeowl · 2020-04-06T17:52:44Z

@sjakobi, I'm really looking for ideas to make the code more readable/understandable more than performance thoughts. I've thrown much more and larger QuickCheck runs at this on my computer than we have Travis do for us, so I'm pretty well convinced it's correct.

sjakobi · 2020-04-06T17:58:06Z

Thanks for the clarification @treeowl! I'm currently trying to focus on other projects, but I'll try to an in-depth review if I find myself idle!

treeowl · 2020-04-06T21:32:10Z

@Lysxia, you might find this interesting.

treeowl · 2020-04-07T17:22:27Z

One way to go from here is to integrate the rreplicate machinery into beforeSeq' and raptyMiddle, fusing away the explicit intermediate spine representation and getting rid of most or all of the extra datatypes. That should improve immediate-indexing performance slightly (which doesn't matter much). What's less obvious to me is whether it will make the whole implementation easier to understand or even harder. It'll probably get shorter, but that only goes so far.

Make `(<*)` for `Data.Sequence` incrementally asymptotically optimal. This finally completes the task, begun in December 2014, of making all the `Applicative` methods for sequences asymptotically optimal even when their results are consumed incrementally.

treeowl · 2020-04-07T22:18:53Z

I added a commit that removes the intermediate structure. I'm not so sure which version is less unclear. Do you have an opinion?

Lysxia · 2020-04-07T23:41:33Z

I like the new version better.

One idea to improve the documentation is to write down the invariants of the various helpers. Once they are known reviewing the code becomes pretty easy. It's good to remove intermediate structures as you did to make the code more compact, because there are less invariants to write.

treeowl · 2020-04-07T23:44:50Z

@Lysxia, yes, that would be quite helpful.... I'll just need to figure out what the invariants are. That'll have the side benefit of helping work out whether there's a simpler or more efficient way to manage all the size arithmetic. Would you like to take a crack at it?

Lysxia · 2020-04-07T23:54:38Z

I can give it a try this weekend.

Remove the explicit intermediate spine structure from `(<*)`. I don't really know if this is a good or a bad thing from a clarity standpoint.

treeowl · 2020-04-08T05:07:05Z

Many thanks, @Lysxia. Suggestions for better function and variable names will certainly be appreciated as well. Also, the way the nine arguments to raptyMiddle are currently bundled and arranged seems pretty ... arbitrary. Let me know if you have a better idea for that.

int-e · 2020-04-11T21:28:22Z

I think I mostly understand what's going on now... but those names :-/

For starters I'd suggest replicateEach as a name for beforeSeq'.

While trying to wrap my head around this problem (and code), I have implemented a version based on rigidify, but less aggressively optimized, see gist. Some of the optimization should be recovered by inlining... but I have not done any benchmarks. Even if it's not competetive, the code may be a source for names.

treeowl · 2020-04-11T21:30:52Z

@int-e, have you looked at the PR @Lysxia opened against my branch a couple hours ago? Some of the names get much less bad there, and there's some documentation!

containers/src/Data/Sequence/Internal.hs

int-e · 2020-04-11T21:51:49Z

containers/src/Data/Sequence/Internal.hs

+beforeSeq :: Seq a -> Seq b -> Seq a
+beforeSeq xs ys = beforeSeq' xs (length ys)
+
+beforeSeq' :: Seq a -> Int -> Seq a


It would probably be a good idea to comment on the basic ideas somewhere around here... from a 2-3 tree point of view, the result is obtained by fixing a 2-3 tree shape of size lenys >= 2 (which can be constructed in logarithmic time and space) that is then planeted at every leaf of xs.

The shape is defined recursively in a bottom-up fashion; at each level there is an initial node, a middle node that has to be replicated a number of times, and a final node, (l, n x m, r). For n = 0 or n = 1, a single node is enough to accomplish this; for n = 2, two nodes are required. In all other cases, we combine l with one or two m's into a single node and symmetrically, one or two m's with r for the last node, such that the remaining number of m's (n-4 to n-2) is divisible by 3. So for the middle part, we can combine 3 m's and then recurse.

The code is complicated by the fact that the first and last elements of the initial list are put on the fringes of the finger tree while the shape is being fixed. This involves quite a bit of bookkeeping (partly captured by RCountMid above).

Yes, an explanation of the basic ideas is indeed in order. I don't yet understand the second paragraph in your comment above: specifically what you mean about "a middle node that has to be replicated a number of times". I was thinking about the general process as using a (virtual) rigid tree for the second argument (the one where we only care about the length) in which every node that's not on the spine is a 3-node. To create the corresponding 2-3 tree, we need to create three things: the squashed-up left spine, the squashed-up right spine, and (in case we have a Single at the bottom) a full 3-tree for the center. Fortunately, that full 3-tree is produced "for free" out of the process of building the other parts, since we need such trees at each level to fill in the internal nodes. But I know my thinking about this is rather fuzzy in general; I mostly worked things out by trial and error starting with the geometrical intuition. So it's very likely possible to explain it better some other way. But yours remains mysterious to me.

Well, your perspective is different from mine. In my mind, I build the full 2-3 tree first. The finger tree is then obtained by peeling off digits from the left and the right, leaving leftover digits of size 1 or 2. (This can be done on the fly as the 2-3 tree is being built.) Btw, my n corresponds to deep_count in your code.

Anyway, let's see an example. To build a full 2-3 tree with 14 nodes, we can combine them as follows:

abbbbbbbbbbbbc -- 14 nodes: combine triples in the middle, and pairs or triples at the ends (ab)(bbb)(bbb)(bbb)(bbc) -- 4 nodes: ditto ((ab)(bbb))((bbb)(bbb)(bbc)) -- 2 nodes: base case (((ab)(bbb))((bbb)(bbb)(bbc)))

The digits that can be peeled off are a,b and (bbb) from the left, and b,b,c and (bbb),(bbb) from the right.

containers/src/Data/Sequence/Internal.hs

int-e · 2020-04-11T21:53:46Z

@treeowl no, I missed it. Too tired to look now. (Maybe tomorrow...)

P.S. actually I'm not even sure how I could've discovered it, except by proactively looking at your repo and pull requests. A comment here might have helped. (Link: treeowl#2)

treeowl · 2020-04-12T03:03:17Z

@int-e, indeed, discovering such things is not nearly as easy as it should be. I'm expecting to merge it to my branch tomorrow. When it came in, I was out in the woods. Now I'm helping put away a massive grocery order.

* Document the invariants of `raptyMiddle`. * Simplify some of the arithmetic. * Reorganize the arguments for clarity.

treeowl · 2020-04-13T18:55:53Z

Any opinions on whether we need to use that lazy pattern match and build the 2-3 trees top-down rather than bottom-up? My intuition suggests we probably don't, and that we therefore probably shouldn't, since doing so adds four thunks to each level of each 2-3 tree and the trees are shallow enough that building them lazily probably doesn't improve locality.

int-e · 2020-04-14T08:03:38Z

containers/src/Data/Sequence/Internal.hs

+            (RCountMid pr deep_count sf)  -- deep_count > 1
+  = case deep_count `quotRem` 3 of
+      (q,0)
+       -> Deep (deep_count * sizec + lenys * (size midxs + 1))


Observations:

This can also be computed as lenys * (size midxs + 2) - ssf - spr.

The expression is the same for all cases so one could factor out the Deep (...) part.

The value of lenys * (size midxs + 1) (or ... + 2) never changes, is it worth floating out?

Yes.... do you have an opinion about which calculation is better? Yes, we should factor out the Deep part; I'll try to remember to get that done. I don't know if it's worth floating out size midxs. Maybe. I haven't made any efforts at micro-optimizations like that because they require proper benchmarking, which is a bit fussy. I'm inclined not to bother for the moment.

int-e · 2020-04-14T09:19:00Z

I highly doubt the lazy pattern match will ever be worth it. Just look at its type, which is a nesting of Nodes (and adds another level on top), so we provably need O(log(n)) calls to reach that type. So to make an asymptotic difference, we'd have to look at several nodes on that level, but only inspect the top part of each... that's highly artificial.

Adding in that all this laziness isn't exactly free I think the pattern match should be strict.

treeowl · 2020-04-14T14:41:53Z

I highly doubt the lazy pattern match will ever be worth it. Just look at its type, which is a nesting of Nodes (and adds another level on top), so we provably need O(log(n)) calls to reach that type. So to make an asymptotic difference, we'd have to look at several nodes on that level, but only inspect the top part of each... that's highly artificial.

Adding in that all this laziness isn't exactly free I think the pattern match should be strict.

That was my thought... but now I'm beginning to doubt myself. When we index into or split a sequence, we have to touch just the root of a number of nodes along the way, to find the right spot. So maybe we really can't do this after all?

int-e · 2020-04-14T15:00:04Z

That was my thought... but now I'm beginning to doubt myself. When we index into or split a sequence, we have to touch just the root of a number of nodes along the way, to find the right spot. So maybe we really can't do this after all?

Uhm but you won't be touching most of the roots anyway because they'll be further up the tree? Most of the decisions will be made based upon structure produced by mapMulFT. So you'll home in on just one or two of the trees produced by the fill23 functions, and you will almost certainly get to the bottom of those.

Lysxia · 2020-04-14T15:31:46Z

When we index into or split a sequence, we have to touch just the root of a number of nodes along the way, to find the right spot.

The node that matters for this is the one constructed by the fill23_final functions. I think those are the ones that should be lazy so the size is immediately available, if you want to avoid forcing one or two trees needlessly. AFAICT as soon as a function starts to look under the node constructed by fill23_final, it is already about to go all the way down, that's when the whole thing will be forced anyway, that's why we might as well make lift_fill23 strict.

A microscopic concern is that the strict version consumes slightly more stack space than the lazy one, but only by an amount logarithmic in lenys. There's probably a proper tail-recursive version out there, but that seems too much trouble for the moment.

treeowl · 2020-04-14T15:40:51Z

Aha! Yes, I think that's right. We can and should construct the root lazily and the rest strictly, and indeed that seems very natural in this context.

treeowl · 2020-04-14T16:38:24Z

And yes, I think you're right, we probably could make this tail recursive. I'm not sure it's worth the trouble, but I'll think on it at some point.

treeowl · 2020-04-14T20:50:51Z

Hmm... A tail recursive version seems a bit tricky, thanks to the polymorphic recursion. I think making it tail recursive essentially means reassociating function compositions from f . (g . (h . i)) to ((f . g) . h) . i. That presumably requires an explicit representation of the composition chain. If that's right, I doubt it's worth the trouble.

@int-e

* Rename and clarify some functions based on discussions with @int-e and @Lysxia. * Make `mapMulFT` unconditionally strict in its multiplier to avoid one silly box. * Update changelog with credits. * Build the 2-3 trees more eagerly to reduce heap allocation.

treeowl force-pushed the seq-before branch 2 times, most recently from 07f0ec1 to 8af84b4 Compare April 6, 2020 08:47

treeowl requested review from oisdk, dmwit and sjakobi April 6, 2020 08:54

dmwit reviewed Apr 6, 2020

View reviewed changes

sjakobi approved these changes Apr 6, 2020

View reviewed changes

Reimplement (<*) for sequences

1d70a39

Make `(<*)` for `Data.Sequence` incrementally asymptotically optimal. This finally completes the task, begun in December 2014, of making all the `Applicative` methods for sequences asymptotically optimal even when their results are consumed incrementally.

treeowl force-pushed the seq-before branch from 8af84b4 to 1d70a39 Compare April 7, 2020 18:00

Streamline definition of <*

0647307

Remove the explicit intermediate spine structure from `(<*)`. I don't really know if this is a good or a bad thing from a clarity standpoint.

treeowl force-pushed the seq-before branch from dfbd15b to 0647307 Compare April 8, 2020 02:31

int-e reviewed Apr 11, 2020

View reviewed changes

Document invariants of raptyMiddle

2885e54

* Document the invariants of `raptyMiddle`. * Simplify some of the arithmetic. * Reorganize the arguments for clarity.

treeowl force-pushed the seq-before branch 2 times, most recently from 99d20e7 to fbb516a Compare April 13, 2020 19:33

int-e reviewed Apr 14, 2020

View reviewed changes

treeowl force-pushed the seq-before branch 2 times, most recently from 35b613a to 1af5a64 Compare April 14, 2020 19:26

More <* cleanups

ab2bd99

* Rename and clarify some functions based on discussions with @int-e and @Lysxia. * Make `mapMulFT` unconditionally strict in its multiplier to avoid one silly box. * Update changelog with credits. * Build the 2-3 trees more eagerly to reduce heap allocation.

treeowl force-pushed the seq-before branch from 1af5a64 to ab2bd99 Compare April 15, 2020 15:21

treeowl merged commit 0061862 into haskell:master Apr 15, 2020

Lysxia mentioned this pull request Apr 17, 2020

Document internals of liftA2 #715

Merged

Reimplement (<*) for sequences #713

Reimplement (<*) for sequences #713

Uh oh!

Conversation

treeowl commented Apr 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

treeowl commented Apr 6, 2020

Uh oh!

sjakobi commented Apr 6, 2020

Uh oh!

jwaldmann commented Apr 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

treeowl commented Apr 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjakobi left a comment

Choose a reason for hiding this comment

Uh oh!

treeowl commented Apr 6, 2020

Uh oh!

sjakobi commented Apr 6, 2020

Uh oh!

treeowl commented Apr 6, 2020

Uh oh!

treeowl commented Apr 7, 2020

Uh oh!

treeowl commented Apr 7, 2020

Uh oh!

Lysxia commented Apr 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

treeowl commented Apr 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lysxia commented Apr 7, 2020

Uh oh!

treeowl commented Apr 8, 2020

Uh oh!

int-e commented Apr 11, 2020

Uh oh!

treeowl commented Apr 11, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

int-e commented Apr 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

treeowl commented Apr 12, 2020

Uh oh!

treeowl commented Apr 13, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

int-e commented Apr 14, 2020

Uh oh!

treeowl commented Apr 14, 2020

Uh oh!

int-e commented Apr 14, 2020

Uh oh!

Lysxia commented Apr 14, 2020

Uh oh!

treeowl commented Apr 14, 2020

Uh oh!

treeowl commented Apr 14, 2020

treeowl commented Apr 6, 2020 •

edited

Loading

jwaldmann commented Apr 6, 2020 •

edited

Loading

Lysxia commented Apr 7, 2020 •

edited

Loading

treeowl commented Apr 7, 2020 •

edited

Loading

int-e commented Apr 11, 2020 •

edited

Loading