-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use RRB-Tree in Vector #38
Comments
Not yet known, you're definitely giving it a good stress test. Is there a more contrived version of this that you can share so I can figure out what's up? |
Oh I just re-read this and realizing you said splice() not slice(). Yes, splice is quite inefficient. |
If you're calling it repeatedly, you might try replacing .splice(...) with .splice(...).toVector() |
For now I'm just building a new vector as they're quite small. Is this inefficiency inherent in the nature of the trie data structure you're using or something implementation-specific? Do the ClojureScript data structures exhibit this issue? |
splice() is definitely an exensive operation regardless as everything after the splice point needs to be shifted (moved in memory) for regular arrays. The same is true for a HAMT. I was thinking about this a bit and wonder if you might benefit from a different kind of data structure that's designed around making splice() fast. Linked lists are really fast at this if you're willing to give up O(1) access and immutability. Perhaps some kind of b-tree would be well adapted to immutable(persistent) form and deal well with splicing. What's the specific use case here? On Tue, Aug 5, 2014 at 1:43 PM, Nathan Sobo notifications@github.com
|
Yeah, great point. I'll just replace the entire |
I'll definitely keep it in mind, so I'll keep this open. At the very least I think a real TODO is clarifying splice()'s use with withMutable and even finding incremental perf improvements. — On Tue, Aug 5, 2014 at 3:04 PM, Nathan Sobo notifications@github.com
|
FWIW efficient splice (and other ops) is supported if you implement RRB-Trees - a variant of bitmapped vector tries (i.e. Vectors) http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf |
BTW, that official paper on RRB trees is lacking many details. I would recommend to look at this thesis. |
Interesting. I'll read up! Thanks guys. — On Wed, Aug 6, 2014 at 6:23 AM, Alexander Semenov
|
@hypirion (author of the thesis above): didn't you say you found RRB-trees slower in practice (like, by a constant factor) than the Clojure-style vector for some operations? I bring this up because if so, that may be an argument for presenting RRB-trees as an alternative implementation rather than replacing the current one. |
My understanding was the constant perf hit only came if the RRB nodes are in the path, which means the result of slice/concat. I'm interested to hear more though. — On Fri, Oct 31, 2014 at 10:51 PM, Scott Feeney notifications@github.com
|
There is a lot of additional code required for the RRB-tree cases, but it's designed to switch over to the persistent vector code whenever possible: RRB-tree code can generally detect when the substructure of a node is persistent vector-ish, and will jump to pvec performance at that point in the tree. I did some benchmarks on that, and found no notable performance difference if no slices/concatenations were done on the RRB-tree. That being said, those were implemented with a lot of twiddling in C with Boehm-GC, and I'd hardly call that equivalent with a javascript engine. I'm not JS expert, but I'd guess the initial implementation of the RRB-tree will be slower than the immutable list for non-slice and concat use, until it's tweaked to the JS engines. I suspect it would be harder to inline and optimise for the optimising compiler. Making it optimisable may take some time (at least based on my own experience), and you probably want to not degrade immutable list performance while you do it. So If I were you, I would probably start out having the RRB-tree be a different class, and let Again, I don't have much knowledge and experience on how the JS engines optimises and how one would tweak JS code. Perhaps the points I bring up is of no concern at all, and for some reason the performance is equivalent from the get-go. |
Thanks @hypirion this is really helpful. |
@leebyron You hinted about something using |
Nevermind. #37 explains. Thanks! |
Another alternative implementation is to use AVL trees (any binary tree will do, really, but AVL trees are fast and easy to implement) where each AVL node contains a JavaScript array. Once the JavaScript array reaches a certain size, it splits it into two AVL nodes. I found through benchmarking that 125 is a good limit for the JavaScript arrays. I wrote a library that implements immutable lists in the above manner, you can find the benchmarks here: https://github.com/Pauan/Immutable/blob/javascript/benchmarks/List/2015-01-13 As you can see, they are quite a bit faster than Immutable-js, for most operations. I'm willing to go into more detail on their implementation, if you want. |
@Pauan in the case of Mori you're not measuring what you think you're measuring, there's a pending update to Mori which will permit performance comparisons of the actual data structures from plain JavaScript. Also your AVL tree implementation seems to come in consistently behind for random access/update in these benchmarks which is one of the value propositions for Bitmapped Array Tries. Also you're not benchmarking against transients in the Immutable.js case nor the Mori case. |
@swannodette Those are all excellent points. Once the new version of Mori comes out, I would love to re-run the benchmarks. I was actually quite disappointed when I first benchmarked Mori: I was expecting it to be faster, so an improvement there would make me happy. Do you have any recommendations for how I can run the benchmarks better? I think my usage is pretty straight-forward: https://github.com/Pauan/Immutable/blob/javascript/src/Benchmark/List.js I simply load up Mori as a Node.js module and use it in the typical way that I would expect a developer to do. I did the same with the other libraries. The random access issue is a bit complicated. You are right that it is consistently slower than Immutable-js, but not by that much. In exchange for that, it has vastly better performance for lots of other things, especially random inserts (which is a big part of the reason I wrote the library in the first place, I needed decent-speed random inserts). This could change if Immutable-js gets the aforementioned RRB-tree optimizations. It's also important to note that the performance of "100 values" is misleading: as an optimization, I use cons cells for O(1) push at the end, but that means that lookup degrades to O(n) for lists with less than 125 values in them. Once it hits the 125 limit, performance jumps up significantly. You can see that with "1000 values". In the past I have benchmarked transients (in both Immutable-js and Mori), and I found that in some cases they were slower, for bulk inserts they were generally faster, but in general they don't make a huge difference. It wasn't worth the extra time it took to benchmark them, so I took them out. But I can add them back in if you would like to see the results. P.S. O(1) concat and slice in Mori is insane. I'm curious how that's implemented, and whether it involves some other tradeoffs or not. |
@Pauan: Slice in Mori is O(1) time because it creates a view over the original vector. In essence, it's like function slice(vec, start, length) {
if (isAlreadyASlice(vec)) {
start += vec.offset;
vec = vec.originalVector;
}
return {
originalVector: vec,
offset: start,
length: length
};
} Totally made-up pseudocode, but conceptually, that's what is going on. So, as long as the slice lives, nothing in the original vector can be garbage collected. I copied this behavior in my immutable-vector implementation, but warned about it in the docs, since it's a pretty big caveat, and can lead to memory leaks if not understood. |
@swannodette: Correct me if I'm remembering wrong or the above behavior has since changed. As for concat, I don't remember concat in Mori being O(1), and I'm confused by the notion that it is. |
@graue your assessment of |
@graue I see, I was afraid it might be something like that. Thanks for the explanation. @graue @swannodette I mentioned Mori's concat being O(1) because in the benchmarks I showed, you can clearly see that it has O(1) behavior. I have checked, and my benchmark code appears to be correct. Since Mori vectors don't seem to have a specialized concat function, I'm using |
@Pauan |
@swannodette Okay, cool, so it's like a concat over ES6 Iterables. Thanks for answering my many questions. |
With @swannodette's help, I updated the benchmarks to use the latest versions of Immutable-js and Mori. You can find it here: https://github.com/Pauan/Immutable/blob/javascript/benchmarks/List/2015-01-16 Mori 0.3.2 is significantly faster than 0.2.9. Most of the speed gain comes from using fixed-arity functions in the benchmark code. My library (without transients) holds up well compared to Mori and Immutable-js (with transients). My library has extremely fast concat and slice. It does not "cheat": concat and slice are not lazy, they return new Lists, and the old Lists can be garbage collected. The price for this is a very minor loss of lookup speed: for a list with 1,000,000 elements, my library takes 13 operations to lookup a value. Mori and Immutable-js takes 4 operations. So, if you only need to insert / remove at the end of a list, then either Mori or Immutable-js are excellent choices. But if you want good concat / slice performance, or you need to insert at arbitrary indexes, my library offers a much faster alternative. I think the performance of AVL trees + arrays is a lot better than what most people think. I'm curious to see if RRB-trees improve the performance of splice / concat / slice. |
Nice work @Pauan! I haven't spent much time benchmarking or optimizing Immutable.js yet, so there are likely some wins to be had. A huge thanks to you and @swannodette for illustrating some baselines to strive for. It's always awesome and inspiring to see the headroom to strive to close :). However one thing I want to be cautious of, and part of why I've under-prioritized benchmarking, is comparing benchmarks to every day use. I don't think benchmarks are enough to prove or disprove an idea unless they closely mirror common use scenarios. AVL trees are really interesting and I'm looking forward to seeing how they may play a role in this lib, or at least in the new landscape of these structures as a whole. May I also suggest a composite test where reads and writes are interleaved at different frequencies? Most common use tends to read a lot more than write which is part of why Clojure/Scala/immutable chose to optimize for reads instead of writes |
@leebyron Indeed, real-world examples always trump synthetic benchmarks. But it's also much harder to find and test real-world examples. Adding in more benchmarks for different scenarios (like interleaving read/write) is a good idea, and I might add them in at some point. I agree, reads tend to be more frequent than writes, so optimizing for read is a good idea. But when you're trading minor read performance for multiple orders of magnitude of write performance, I think that's a good tradeoff. It will of course depend upon the particular programmer's use-cases, though. My library also supports Tuples, for the situations where read performance is needed much more so than write performance. I think this is a good idea: a general-purpose List that's good at everything (reading, inserting, concatenating, etc.), and a more specialized data structure that can be faster for a particular operation (like reading). |
Totally. The beauty of data structures is that there are so many variants and each one has its ideal use. |
@Pauan the speedups do not come primarily from fixed arity invokes. Two things changed in ClojureScript, we no longer leak @leebyron while I appreciate the conservatism with wrt. micro benchmarking, let's not be disingenuous, Clojure and thus ClojureScript's many optimizations are the direct result of 7 years of observing common usage patterns in production applications. |
@swannodette I'm sure there were plenty of other optimizations, but I ran the Mori 0.3.2 benchmarks both with the The results I posted were with the |
@swannodette Exactly my point. Clojure & ClojureScript's performance is the result of multiple smart people working diligently for quite a long time with production performance in mind. And it's seriously impressive and inspiring (like, literally inspiring, like literally inspired this project). My hesitancy towards benchmarks is simply to ensure they don't become the driving force for optimization work which in my experience on other projects has often led to worse performance for production apps, or distraction from more important work. Always focus on production performance first. Immutable.js has a long road to follow, and many great ideas to borrow to get there (including this task). Putting together a benchmarking rig is on my todo list, with benchmarks designed to mimic common patterns, and I'll want to do that before going deep on performance tweaking, but production use and perf is still going to be my end-goal :). Not there yet! |
@Pauan btw all your Mori Amortization via laziness is a well know strategy it's why Clojure(Script) users have got along fine for so long without needing AVL or RRB trees. Still both are useful and thus have been available to Clojure(Script) as separate contrib libs. |
@swannodette That is intentional: my benchmark code is correct. Mori's I have no problem with laziness, and lazy views are a great idea. Perhaps Immutable-js and Immutable could benefit from a lazy slice. I would probably call it something like But, currently, JavaScript's There's also the issue that Mori's You might ask: why not have a second benchmark, for lazy slice? Because if all the libraries implemented lazy slice, they would all be O(1), and there is no point in benchmarking if all the libraries are O(1). You might ask: why not put in the benchmark anyways, even though not all the libraries have lazy slice? Because that's not fair. Not only is it not fair, it's heavily misleading to anybody who looks at the benchmarks. They would think that Mori's Lazy slice and non-lazy slice do different things, have different behavior, and have different caveats and tradeoffs. They should be compared separately. |
@PauAB |
Ah apologies it looks like the |
Perhaps benchmarking non-lazy slice is useless, perhaps not. But benchmarking lazy slice is definitely useless, because it's always O(1). If you prefer, I can simply remove the |
I had to make slice non-lazy to fix some behavioral issues, my intent is to make it lazy again in the future (and by lazy, I mean just a view on a backing List, same as subvec). |
Yeah, guys I'm gunna have to call this convo over (at least in this context) because we've gotten pretty off-topic from RRB trees :). You've both inspired me to a) investigate some alternative options for speeding things up and b) add a benchmarking harness of some kind to start tracking this stuff better. When I get a benchmark harness up, I'll be psyched to get your feedback. |
Sounds good |
Thank you for reporting this issue and appreciate your patience. We've notified the core team for an update on this issue. We're looking for a response within the next 30 days or the issue may be closed. |
I'm attempting to use vectors to represent the contents of 5-line tiles in Atom, and I'm keeping the groups of lines up to date with repeated applications of
splice
. After not very long, however, performance of iteration seems to degrade. Is this a known issue?The text was updated successfully, but these errors were encountered: