-
Notifications
You must be signed in to change notification settings - Fork 186
Reimplement (<*) for sequences #713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@int-e, @oisdk, and @sjakobi: I'd really appreciate a review of this one. I wrote it, but can't claim to understand it all properly. Some of the names are bloody awful, and I find it really hard to keep track of what things go on the left and what things go on the right. I haven't written nearly enough comments, and that's largely because any I write are likely to be at least partially wrong. |
07f0ec1
to
8af84b4
Compare
Impressive! Have you benchmarked whether this is in fact faster in practice? BTW I'm wondering under what circumstances you'd even use |
I was wondering the same thing. Perhaps it appears in the result of an Applicative-Do transform? I am not seeing it in the examples https://gitlab.haskell.org/ghc/ghc/-/wikis/applicative-do but perhaps there are further optimisations. [EDIT] Section 8 of: https://doi.org/10.1145/2976002.2976007 https://simonmar.github.io/bib/papers/applicativedo.pdf says "Our desugaring does not currently exploit the |
No formal benchmarks, but informal tests are mighty convincing. The allocation and residency drops are enormous. I have no idea why you'd want this operation, sadly. |
-- | ||
-- '<*' takes time and space proportional to the product of the length | ||
-- of its first argument and the logarithm of the length of its second | ||
-- argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. Why do the asymptotics for <*
and *>
differ? Is that really right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it needs more explanation. *>
gets to reuse the original structure of the second sequence quite heavily, so that's correct for additional allocation and residency, but doesn't count the fact that it will retain large portions of the original structure of the second sequence as well. Please feel free to suggest a better blurb. For <*
, we need to build an all new copy of a structure like the second sequence for every element of the first sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For *>
, once we've turned the second sequence inside out (in time logarithmic in its length), we just stick that in a bunch of places unchanged. We can't get that kind of repetition in <*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admittedly haven't looked very closely at the code, but since the property test indicates that it's correct and you're happy with the performance, I don't see a reason not to merge this.
Once we know of a usecase we can still invest more effort.
We could also consider exporting beforeSeq'
as e.g. replicateEach :: Int -> Seq a -> Seq a
if that would be useful to anybody.
@sjakobi, I'm really looking for ideas to make the code more readable/understandable more than performance thoughts. I've thrown much more and larger QuickCheck runs at this on my computer than we have Travis do for us, so I'm pretty well convinced it's correct. |
Thanks for the clarification @treeowl! I'm currently trying to focus on other projects, but I'll try to an in-depth review if I find myself idle! |
@Lysxia, you might find this interesting. |
One way to go from here is to integrate the |
Make `(<*)` for `Data.Sequence` incrementally asymptotically optimal. This finally completes the task, begun in December 2014, of making all the `Applicative` methods for sequences asymptotically optimal even when their results are consumed incrementally.
I added a commit that removes the intermediate structure. I'm not so sure which version is less unclear. Do you have an opinion? |
I like the new version better. One idea to improve the documentation is to write down the invariants of the various helpers. Once they are known reviewing the code becomes pretty easy. It's good to remove intermediate structures as you did to make the code more compact, because there are less invariants to write. |
@Lysxia, yes, that would be quite helpful.... I'll just need to figure out what the invariants are. That'll have the side benefit of helping work out whether there's a simpler or more efficient way to manage all the size arithmetic. Would you like to take a crack at it? |
I can give it a try this weekend. |
Remove the explicit intermediate spine structure from `(<*)`. I don't really know if this is a good or a bad thing from a clarity standpoint.
Many thanks, @Lysxia. Suggestions for better function and variable names will certainly be appreciated as well. Also, the way the nine arguments to |
I think I mostly understand what's going on now... but those names :-/ For starters I'd suggest While trying to wrap my head around this problem (and code), I have implemented a version based on |
beforeSeq :: Seq a -> Seq b -> Seq a | ||
beforeSeq xs ys = beforeSeq' xs (length ys) | ||
|
||
beforeSeq' :: Seq a -> Int -> Seq a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be a good idea to comment on the basic ideas somewhere around here... from a 2-3 tree point of view, the result is obtained by fixing a 2-3 tree shape of size lenys >= 2
(which can be constructed in logarithmic time and space) that is then planeted at every leaf of xs
.
The shape is defined recursively in a bottom-up fashion; at each level there is an initial node, a middle node that has to be replicated a number of times, and a final node, (l, n x m, r)
. For n = 0 or n = 1, a single node is enough to accomplish this; for n = 2, two nodes are required. In all other cases, we combine l with one or two m's into a single node and symmetrically, one or two m's with r for the last node, such that the remaining number of m's (n-4 to n-2) is divisible by 3. So for the middle part, we can combine 3 m's and then recurse.
The code is complicated by the fact that the first and last elements of the initial list are put on the fringes of the finger tree while the shape is being fixed. This involves quite a bit of bookkeeping (partly captured by RCountMid
above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, an explanation of the basic ideas is indeed in order. I don't yet understand the second paragraph in your comment above: specifically what you mean about "a middle node that has to be replicated a number of times". I was thinking about the general process as using a (virtual) rigid tree for the second argument (the one where we only care about the length) in which every node that's not on the spine is a 3-node. To create the corresponding 2-3 tree, we need to create three things: the squashed-up left spine, the squashed-up right spine, and (in case we have a Single
at the bottom) a full 3-tree for the center. Fortunately, that full 3-tree is produced "for free" out of the process of building the other parts, since we need such trees at each level to fill in the internal nodes. But I know my thinking about this is rather fuzzy in general; I mostly worked things out by trial and error starting with the geometrical intuition. So it's very likely possible to explain it better some other way. But yours remains mysterious to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, your perspective is different from mine. In my mind, I build the full 2-3 tree first. The finger tree is then obtained by peeling off digits from the left and the right, leaving leftover digits of size 1 or 2. (This can be done on the fly as the 2-3 tree is being built.) Btw, my n corresponds to deep_count
in your code.
Anyway, let's see an example. To build a full 2-3 tree with 14 nodes, we can combine them as follows:
abbbbbbbbbbbbc -- 14 nodes: combine triples in the middle, and pairs or triples at the ends
(ab)(bbb)(bbb)(bbb)(bbc) -- 4 nodes: ditto
((ab)(bbb))((bbb)(bbb)(bbc)) -- 2 nodes: base case
(((ab)(bbb))((bbb)(bbb)(bbc)))
The digits that can be peeled off are a,b
and (bbb)
from the left, and b,b,c
and (bbb),(bbb)
from the right.
@int-e, indeed, discovering such things is not nearly as easy as it should be. I'm expecting to merge it to my branch tomorrow. When it came in, I was out in the woods. Now I'm helping put away a massive grocery order. |
* Document the invariants of `raptyMiddle`. * Simplify some of the arithmetic. * Reorganize the arguments for clarity.
Any opinions on whether we need to use that lazy pattern match and build the 2-3 trees top-down rather than bottom-up? My intuition suggests we probably don't, and that we therefore probably shouldn't, since doing so adds four thunks to each level of each 2-3 tree and the trees are shallow enough that building them lazily probably doesn't improve locality. |
99d20e7
to
fbb516a
Compare
(RCountMid pr deep_count sf) -- deep_count > 1 | ||
= case deep_count `quotRem` 3 of | ||
(q,0) | ||
-> Deep (deep_count * sizec + lenys * (size midxs + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observations:
- This can also be computed as
lenys * (size midxs + 2) - ssf - spr
. - The expression is the same for all cases so one could factor out the
Deep (...)
part. - The value of
lenys * (size midxs + 1)
(or... + 2
) never changes, is it worth floating out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.... do you have an opinion about which calculation is better? Yes, we should factor out the Deep
part; I'll try to remember to get that done. I don't know if it's worth floating out size midxs
. Maybe. I haven't made any efforts at micro-optimizations like that because they require proper benchmarking, which is a bit fussy. I'm inclined not to bother for the moment.
I highly doubt the lazy pattern match will ever be worth it. Just look at its type, which is a nesting of Adding in that all this laziness isn't exactly free I think the pattern match should be strict. |
That was my thought... but now I'm beginning to doubt myself. When we index into or split a sequence, we have to touch just the root of a number of nodes along the way, to find the right spot. So maybe we really can't do this after all? |
Uhm but you won't be touching most of the roots anyway because they'll be further up the tree? Most of the decisions will be made based upon structure produced by |
The node that matters for this is the one constructed by the A microscopic concern is that the strict version consumes slightly more stack space than the lazy one, but only by an amount logarithmic in |
Aha! Yes, I think that's right. We can and should construct the root lazily and the rest strictly, and indeed that seems very natural in this context. |
And yes, I think you're right, we probably could make this tail recursive. I'm not sure it's worth the trouble, but I'll think on it at some point. |
35b613a
to
1af5a64
Compare
Hmm... A tail recursive version seems a bit tricky, thanks to the polymorphic recursion. I think making it tail recursive essentially means reassociating function compositions from |
Make
(<*)
forData.Sequence
incrementally asymptotically optimal.This finally completes the task, begun in December 2014, of making all
the
Applicative
methods for sequences asymptotically optimaleven when their results are consumed incrementally.
Closes #314