Add slidingVectorWindow #8

klao · 2014-08-09T15:50:29Z

Hi Michael,

I probably should have checked first, but I only noticed that you started working on the same functionality when I wanted to set up a pull request. :)

My code seems to have turned out to be quite a bit simpler than yours, so I decided to show it anyway. Of course, I was only going for amortized constant time, and your code is real constant time. But, it would be easy to keep the simpler structure and still do the proper constant time at the cost of being very slightly inefficient.

Anyway, feel free to ignore this! Or if you'd like to, I'll clean up my solution and make it in a proper pull request. :)

Mihaly

klao · 2014-08-09T16:44:34Z

OK, I rewrote it to be properly constant time per element. It actually got shorter. :)

I also adapted the test from your branch. And it revealed that the existing slidingWindow was buggy. It yielded an empty sequence on an empty stream. And while it's a matter of specification, I think it's better if it doesn't do that. (We both implemented slidingVector in such a way that it doesn't yield anything on empty stream.) So, I fixed it too.

klao · 2014-08-09T18:11:16Z

I am adding back the amortized constant time implementation, as according to the benchmark it's quite a bit faster! It's also more memory efficient (it never needs two buffers at the same time).

I will leave it to you to decide what kind of implementation do you want to choose.

klao · 2014-08-09T18:53:24Z

Umm, also the benchmarks that you had on the slidingVector branch were flawed.

I wrote my own simple benchmarks when I started working on this today. I had the same result as your benchmarks: the regular slidingWindow with vectors was much faster than my slidingVectorWindow. And, I just couldn't believe it! So, I played with it for a while. There were two issues:

You have to force the created vectors. Otherwise slidingWindow just yield unevaluated thunks, which is extremely cheap, obviously.
The benchmarks in IO should not have "constant, IO-independent structure". GHC with optimization (full-laziness in particular) can float out your benchmarks to the top level, and when you are executing them multiple times, it just "walks the spine" again and again, no new values are created. (Fortunately, I was burned by this before and realized what's happening.) This does not happen with your Conduit benchmarks, but it could have, it depends solely on the quirks of the GHC optimizer.

With the corrected benchmarks, now the amortized version isn't faster anymore, so there's no reason for choosing it, I think. Anyway, I will stop for now. :)

snoyberg · 2014-08-10T09:27:21Z

Very interesting stuff, thanks! My slidingVector branch wasn't fully published; I added some deepseq calls to ensure that the values were being evaluated.

While I was working on optimizing this, I noticed some problems with rewrite rules in conduit, which is where my focus ended up being. But your implementation of slidingVector seems better than mine. I'll probably end up including that, after I work out a few other issues with optimizations.

snoyberg · 2014-08-10T09:41:48Z

Actually, there's some evidence of the performance problem on your branch too: compare the performance of slidingVector with slidingWindow on Seq: Seq far outperforms it for smaller window sizes. The issue has to do with extra constructors being created for PipeM.

klao · 2014-08-10T17:11:13Z

Yes, I noticed it too, but it looked like a bigger thing to investigate. I'm glad that you are looking into it!

snoyberg · 2014-08-15T14:57:15Z

It looks like there might be a fundamental flaw with this approach:

https://ghc.haskell.org/trac/ghc/ticket/9446

It seems like there are some GC assumptions about unsafeFreeze which I was unaware of. The only sane modification I can think of to this code is to fill up an entire mutable vector, and then yield a number of immutable vectors from that at once. That introduces some buffering (a chance in behavior), but I think that's reasonable.

I'd like to ask for clarification on the GHC issue too, because this GC functionality is surprising to me. I don't have time to do so coherently right now, so if you'd like to jump in, go for it :).

klao · 2014-08-15T17:35:22Z

Yes, this is a serious blow. I didn't know about this quirk of GC either. I've asked for some clarification on that trac ticket.

But, even if this turns out to be unsolvable in general, I am quite confident that this works correctly for storable (and probably unboxed) vectors. As this is something that @bos does in his attoparsec library too: https://github.com/bos/attoparsec/blob/master/Data/Attoparsec/ByteString/Buffer.hs :)

snoyberg · 2014-08-16T20:15:44Z

@bos's code in attoparsec is specifically using foreign pointers, which would be the same as storable vectors. It makes sense that GC wouldn't affect them, since (1) GHC provides no means of distinguishing between mutable and immutable ptrs and (2) pointers are always pinned memory. I'm not convinced that this issue wouldn't affect us with unboxed vectors.

snoyberg · 2014-08-17T07:45:49Z

Actually, looking more closely at attoparsec, it seems that in the Text.Buffer module it is using unboxed arrays, and it freezes and thaws the vectors for each addition. Maybe that's all we need to do as well: continue to thaw the vector until we're done mutating it completely.

klao · 2014-08-17T15:56:24Z

I think, what's important is that ByteArrays (and ForeignPtrs) are opaque from the GC's perspective. The GC doesn't have to look inside them. That's why it shouldn't be a problem with them.

About freezing/thawing: I am not sure that it's a viable option for us. It works for attoparsec because attoparsec is the sole user of its buffer, so if they organize the code properly then they can thaw/freeze it and access it as mutable/immutable properly in the single execution thread. We, on the other hand, release the freezed vectors to the user. And, if we thaw them sneakily later, we will violate the restriction that the immutable vector should not be accessed after it's been thawed. The user could actually access it concurrently from some other thread, as we mutate it. So, this whole thing might work with boxed vectors, but I don't think we can just assume it.

I'd like to understand primitive arrays (which are underpinning the boxed vector) and their relation to GC better, before thinking if this thing could be done or not.

Add slidingVectorWindow

f922a02

klao added 2 commits August 9, 2014 19:33

Add amortized constant time slidingVectorWindow

cf57d0b

Fix the benchmark

dd87814

mgsloan force-pushed the master branch from eb636f3 to 91b2615 Compare November 11, 2014 04:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add slidingVectorWindow #8

Add slidingVectorWindow #8

klao commented Aug 9, 2014

klao commented Aug 9, 2014

klao commented Aug 9, 2014

klao commented Aug 9, 2014

snoyberg commented Aug 10, 2014

snoyberg commented Aug 10, 2014

klao commented Aug 10, 2014

snoyberg commented Aug 15, 2014

klao commented Aug 15, 2014

snoyberg commented Aug 16, 2014

snoyberg commented Aug 17, 2014

klao commented Aug 17, 2014

Add slidingVectorWindow #8

Are you sure you want to change the base?

Add slidingVectorWindow #8

Conversation

klao commented Aug 9, 2014

klao commented Aug 9, 2014

klao commented Aug 9, 2014

klao commented Aug 9, 2014

snoyberg commented Aug 10, 2014

snoyberg commented Aug 10, 2014

klao commented Aug 10, 2014

snoyberg commented Aug 15, 2014

klao commented Aug 15, 2014

snoyberg commented Aug 16, 2014

snoyberg commented Aug 17, 2014

klao commented Aug 17, 2014