-
Notifications
You must be signed in to change notification settings - Fork 194
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Apply Codensity transform on ConduitM type.
This greatly improves performance in some cases by forcing right-associativity. More importantly, it obviates the need for rewrite rules for many common cases, e.g. yield >>= foo no longer needs to be rewritten to be efficient. This was especially important, given that these rules would not fire reliably in do-notation, since do-notation associates to the left. Pinging @feuerbach I bet you thought I forgot about this entirely ;)
- Loading branch information
Showing
8 changed files
with
378 additions
and
230 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @klao. This change potentially improves performance of slidingVector significantly (it did in my tests at least). The same speedup can be achieved in the current conduit by dropping down to the raw constructors, but that's rather ugly.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a new approach to this problem, by the way: http://homepages.cwi.nl/~ploeg/papers/zseq.pdf
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting paper, thanks for the link. After reading it through, I'm not convinced it will actually provide a performance advantage over the codensity approach. While the issue raised of switching between representations is important, in all of the code I rewrote in conduit, it doesn't seem to present an actual problem. The reason is that any functions that need to inspect the
Pipe
values need to inspect the entire tree, and then I can simply combine the change in representations with the traversal. For example, see toProducer.On the other hand, my experience with difference lists vs
Seq
is that, for cases of purely constructing a value by appending elements (a.k.a., snoc), difference lists are always faster, which would imply that codensity is similarly faster than the type indexed sequences described in that paper.I'm open to looking into this further, but I was wondering if you had thoughts on this.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed,
Seq
has shown high constant factors in my experiments. But it isn't the only option.Here's the result of my comparison of various free monad implementations (for a somewhat different purpose).
It'd be interesting to see a similar benchmark for a typical conduit application.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, Michael! I didn't know you were working on this (or, even that you were considering it).
We should check the performance of a naive vector builder with this. It might need no tricks anymore. Though, I think for that the whole
Pipe
should have been codensified...5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About
Seq
and the paper Roman mentioned. I recently saw @ekmett talking on irc about something, which I now realize was this transformation. (I think. :)) It was about some data structure which is much more suitable for it (that has constant time concatenation, afair).5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snoyberg
The zseq paper approach turns out to be a win when you need to both "match on the outside of the monad" and continue binding on the inside.
The one case where this arises here is in pipes/conduit where you might want to chain two of them in series (categorical composition) which has to "match on the outside" to figure out when to pull from the second, but then bind the result of that composition into a third process (monadic composition), which takes place on the inside of the continuation.
Left associated binds are expensive for the normal ADT approach, composition followed by binding is expensive for the codensity approach, because you have to force the whole continuation to start. On the other hand the zseq approach trades off constant factors to never suffer an asymptotic hit for either. As you put things explicitly into the continuation-as-catenable-output-restricted-deque you are paying as you go to get the reified continuation reassociated just enough that you always retain O(1) access to either end.
Ultimately the issue is that there will be code for which you absolutely need the zseq / reflection without remorse style, so then the question is if you are willing to just say that those usecases are out of scope. This has been the approach taken by pipes so far for instance, and its a perfectly reasonable stance.
That said, I personally want something that is never suboptimal asymptotically, and the zseq paper finally offers us a path to get there. The question then becomes how to adjust the constant factors to make them palatable.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note: Seq is a suboptimal data structure for implementing the catenable output-restricted deque. You are paying an O(log n) cost on appends, while an output-restricted catenable deque can get you O(1) appends.
The issue is that you can wind up with cases where you have O(n^2) with regards to the number of steps taken if you write particularly the bad cases in either the codensity or direct style, while the asymptotic cost never rises above O(n) with a catenable output restricted deque in the mix.
With
Seq
you only get down to O(n log n), and of course the extra log n factor (with large constants to boot!) is going to lose in the common O(n) cases where you don't have the slowdown, but when you do, evenSeq
is a win.5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ekmett: Thank you for this extensive comment!
Where are you at implementing an efficient output-restricted catenable deque? Are you working on the basis of Kaplan & Tarjan, or is there a better one?
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"better" is relative. There is a write up by Tarjan and his then-student Mihaescu from 2003 on a fancier form of catenable deque. That builds a completely catenable deque (not just output-restricted deque / steque), but it is rather complicated to implement in a typed world.
I have code I'm likely to push into free to make a
Control.Monad.Free.Reflection
that will supply a reflection without remorse based free monad using a catenable deque built over a realtime deque, like they do in the paper. The major difference is I swap some arguments around to make the deque act as a model for an actual free category with folding/traversal based on (.) not (>>>).The K&T deque is going to be more expensive than you want. Look up the reference code for the zseq paper that @atzeus has on his github if you don't want to wait for me to circle back to finishing up the code for free. ;)
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feuerbach Good call on the benchmarks, I've started a repo for it here: https://github.com/snoyberg/conduit-cps-benchmarks.
@ekmett Thanks for getting involved in the discussion. If you have some code available for the
Reflection
module, I'd love to see it. I have a few ideas of what such a data structure would look like, but I'd like to see what you're thinking of.Maybe I'm being naive here, but I think this can be worked around. If you look at the Codensity module in my benchmark repo, my categorical composition (which I call fuse) is:
I believe this allows for the cheap append with low constant factors which we get from codensity, yet does not require forcing the left side of a monadic bind.
To be sure we're talking about the same thing, I'd imagine you're worried about code like the following:
My benchmarks (the "mix compositions" one in the linked repo) seem to support that this is still relatively efficient.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, here are the benchmark results:
http://download.fpcomplete.com/benchmarks/cps-benchmark-20140818a.html
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Michael, the problem in general is this:
with a lot of conduits fused together. Admittedly, this is not how we normally program with conduits. But, I can imagine a problem, where the most natural solution is to recursively create a conduit by fusing things together. This is OK with the original solution, but would be quadratic with the codensity approach.
All-in-all, I don't think that this is a big issue, but it can appear in many different forms, eg. zipping sources together is similar, and possibly more likely that someone will zip many (variable number) of sources... So, there might be a bigger issue here that we just don't see yet. But, I don't think it's very likely, and the quadratic behavior of
replicateM
and co. is definitely problematic (I definitely ran into it unsuspectingly :)). So, fixing it with the codensity transformation is really good, especially if it brings a general speed-up too!5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that there's any increase complexity for the code you described with the codensity approach. Both in conduit 1.1 (a.k.a. "standard") and codensity, each fuse call requires traversing each argument and generating a new structure. (By the way, that's exactly what I'm hoping to eliminate by implementing stream fusion.) The way I've implemented codensity, I don't think there's any extra traversal of the newly generated structure when you proceed to fuse it again. I think zipping sources falls into this same category.
It's also entirely possible that I'm simply not seeing the problem, so if you think I'm mistaken, please do tell me so.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sorry, what I wrote was bogus.
Fortunately for conduit, it doesn't have any function corresponding to pipes'
next
: http://hackage.haskell.org/package/pipes-4.1.2/docs/Pipes.html#v:nextThat is something that cannot be done if pipes were codensified. And that previously could have been written for conduit, but not anymore. Of course, you can still do it if you go down to the level of
Pipe
s.I'll try to see if I can find a not too convoluted degenerate example for the new ConduitM.
5cacdc3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, conduit does have something like
next
: connect-and-resume. It would look like:And that's exactly why ResumableSource is not codensity-fied. That does mean that it's inefficient to monadically compose
ResumableSource
s, but as it turns out, they aren't an instance ofMonad
anyway (due to finalizers). So it seems like (by chance), we keep the same semantics and performance of what we had previously.