Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slidingVectorWindow #8

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add slidingVectorWindow #8

wants to merge 3 commits into from

Conversation

klao
Copy link

@klao klao commented Aug 9, 2014

Hi Michael,

I probably should have checked first, but I only noticed that you started working on the same functionality when I wanted to set up a pull request. :)

My code seems to have turned out to be quite a bit simpler than yours, so I decided to show it anyway. Of course, I was only going for amortized constant time, and your code is real constant time. But, it would be easy to keep the simpler structure and still do the proper constant time at the cost of being very slightly inefficient.

Anyway, feel free to ignore this! Or if you'd like to, I'll clean up my solution and make it in a proper pull request. :)

Mihaly

@klao
Copy link
Author

klao commented Aug 9, 2014

OK, I rewrote it to be properly constant time per element. It actually got shorter. :)

I also adapted the test from your branch. And it revealed that the existing slidingWindow was buggy. It yielded an empty sequence on an empty stream. And while it's a matter of specification, I think it's better if it doesn't do that. (We both implemented slidingVector in such a way that it doesn't yield anything on empty stream.) So, I fixed it too.

@klao
Copy link
Author

klao commented Aug 9, 2014

I am adding back the amortized constant time implementation, as according to the benchmark it's quite a bit faster! It's also more memory efficient (it never needs two buffers at the same time).

I will leave it to you to decide what kind of implementation do you want to choose.

@klao
Copy link
Author

klao commented Aug 9, 2014

Umm, also the benchmarks that you had on the slidingVector branch were flawed.

I wrote my own simple benchmarks when I started working on this today. I had the same result as your benchmarks: the regular slidingWindow with vectors was much faster than my slidingVectorWindow. And, I just couldn't believe it! So, I played with it for a while. There were two issues:

  1. You have to force the created vectors. Otherwise slidingWindow just yield unevaluated thunks, which is extremely cheap, obviously.
  2. The benchmarks in IO should not have "constant, IO-independent structure". GHC with optimization (full-laziness in particular) can float out your benchmarks to the top level, and when you are executing them multiple times, it just "walks the spine" again and again, no new values are created. (Fortunately, I was burned by this before and realized what's happening.) This does not happen with your Conduit benchmarks, but it could have, it depends solely on the quirks of the GHC optimizer.

With the corrected benchmarks, now the amortized version isn't faster anymore, so there's no reason for choosing it, I think. Anyway, I will stop for now. :)

@snoyberg
Copy link
Member

Very interesting stuff, thanks! My slidingVector branch wasn't fully published; I added some deepseq calls to ensure that the values were being evaluated.

While I was working on optimizing this, I noticed some problems with rewrite rules in conduit, which is where my focus ended up being. But your implementation of slidingVector seems better than mine. I'll probably end up including that, after I work out a few other issues with optimizations.

@snoyberg
Copy link
Member

Actually, there's some evidence of the performance problem on your branch too: compare the performance of slidingVector with slidingWindow on Seq: Seq far outperforms it for smaller window sizes. The issue has to do with extra constructors being created for PipeM.

@klao
Copy link
Author

klao commented Aug 10, 2014

Yes, I noticed it too, but it looked like a bigger thing to investigate. I'm glad that you are looking into it!

@snoyberg
Copy link
Member

It looks like there might be a fundamental flaw with this approach:

https://ghc.haskell.org/trac/ghc/ticket/9446

It seems like there are some GC assumptions about unsafeFreeze which I was unaware of. The only sane modification I can think of to this code is to fill up an entire mutable vector, and then yield a number of immutable vectors from that at once. That introduces some buffering (a chance in behavior), but I think that's reasonable.

I'd like to ask for clarification on the GHC issue too, because this GC functionality is surprising to me. I don't have time to do so coherently right now, so if you'd like to jump in, go for it :).

@klao
Copy link
Author

klao commented Aug 15, 2014

Yes, this is a serious blow. I didn't know about this quirk of GC either. I've asked for some clarification on that trac ticket.

But, even if this turns out to be unsolvable in general, I am quite confident that this works correctly for storable (and probably unboxed) vectors. As this is something that @bos does in his attoparsec library too: https://github.com/bos/attoparsec/blob/master/Data/Attoparsec/ByteString/Buffer.hs :)

@snoyberg
Copy link
Member

@bos's code in attoparsec is specifically using foreign pointers, which would be the same as storable vectors. It makes sense that GC wouldn't affect them, since (1) GHC provides no means of distinguishing between mutable and immutable ptrs and (2) pointers are always pinned memory. I'm not convinced that this issue wouldn't affect us with unboxed vectors.

@snoyberg
Copy link
Member

Actually, looking more closely at attoparsec, it seems that in the Text.Buffer module it is using unboxed arrays, and it freezes and thaws the vectors for each addition. Maybe that's all we need to do as well: continue to thaw the vector until we're done mutating it completely.

@klao
Copy link
Author

klao commented Aug 17, 2014

I think, what's important is that ByteArrays (and ForeignPtrs) are opaque from the GC's perspective. The GC doesn't have to look inside them. That's why it shouldn't be a problem with them.

About freezing/thawing: I am not sure that it's a viable option for us. It works for attoparsec because attoparsec is the sole user of its buffer, so if they organize the code properly then they can thaw/freeze it and access it as mutable/immutable properly in the single execution thread. We, on the other hand, release the freezed vectors to the user. And, if we thaw them sneakily later, we will violate the restriction that the immutable vector should not be accessed after it's been thawed. The user could actually access it concurrently from some other thread, as we mutate it. So, this whole thing might work with boxed vectors, but I don't think we can just assume it.

I'd like to understand primitive arrays (which are underpinning the boxed vector) and their relation to GC better, before thinking if this thing could be done or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants