Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
rewrite selector compiler for minimongo #480
I've written a new selector compiler that works without
All the tests pass, but they are very incomplete, so I will probably be working on those next (depending on the time I find).
I know it's not in a mergable state right now (both feature-wise and quality-wise), but I thought I let you know what I'm working on and would really like to hear your comments.
This was referenced
Nov 14, 2012
This looks like a great start! Let us know when it's ready to fully review and merge.
My main thought: let's make sure that we're benchmarking the right thing. There are two operations whose speed can be affected here: compiling a selector and executing it. For, eg, an observed cursor, the execution may happen many many more times than the compilation. My impression was that some of the advantage of eval is that the generated selector function could end up being very simple and so execution could be fast. Now, without benchmark, who's to say if that's actually true... and it looks like your code shouldn't add TOO much work to execution time.
But my point is just: let's make sure we're not just trying to optimize compilation in a vacuum without also looking at the effect on execution!
I'm testing this mostly with my app, which I believe has a fairly standard usage of minimongo. I simply profile actual use of the site and check which functions take up a lot of time. With this work, the
I agree on your concerns with observed cursors, which call
Now actually proving that a function is easily JITable is a different thing, I know. After I am sure that it works correctly (right now I'm looking into expanding the tests before I continue working on this branch), I am trying make sure that the hottest functions are monomorphic and such. I'm not an outright expert on optimizing JS, but there seams to be a lot of reading material for it, so I'm optimistic :)
On repeatedly calling selectors: Longterm the best strategy might be something I vaguely recall from the CouchDB implementation: A cursor call always saves the result set, together with the selector. Whenever the collection changes, the saved selector functions are being run over the new/changed docs and the result set is being updated (can be done lazily, when the cursor is called next). Removed docs are being removed from all result sets, of course.
That would make both observed cursors really cheap, and would also greatly help the typical case of most queries being the same (check if cursor with same selector exists -> use that, together with its result set). Result sets could be precalculated or cached on the server, making startup faster. Memory might be an issue (but result set could be discarded after some time).
Then again, although I'd love to implement something like that, I doubt that I have the time to do it (gotta earn money). Might be an idea for someone, though ...
Yes, that's roughly how cursors work now, client-side. The extra rub though, is that there are roughly two modes: one mode for applying one change at a time to all results (which is basically what you described), and a different mode for applying a batch of many changes. The reason is that since we can observe ordered lists, if you want to get an optimal set of "moved" callbacks you really want to process all of the changes which can affect ordering at once.
I rebased the branch on #485, because the test coverage was not good enough before (maybe I should have merged, but I didn't think that through before).
I'm quite confident that my implementation is faster than the original. The common cases (_id check or empty selector) are handled very quickly.
Also, addressing values in arrays (e.g. people.2.name) as well as $elemMatch are supported.
There is just one remaining issue: I didn't find any documentation how mongodb handles comparisons ($lt, $gt) between objects. I made my implementation match the tests, but I'd feel better if I knew how these comparisons are defined. If you could give me hints on that, I'd be really thankful.
Other than that, I believe the branch is ready for pulling.
Hi Ed! Sorry this took too long, but I finally got around to reviewing your patch (I spent most of today on it!) You can take a look at the 'pr-480' branch to see what I did.
The first commit there is your changes, squashed together (and also removing some dead code from the original compiler): c5f4d5e
The next commit there is some cleanup I did while reviewing it: 8180d8e
After that, I decided to benchmark this. You can see the benchmark I chose on the branch as 47b31c8 (though I really just kept it in a git stash and popped it onto various versions of the code as I worked). I ran tests for the minimongo package only and looked just at the time for the selector_compiler test. This benchmark commit makes it run the test 100 times, and for each match check, it compiled the selector once and executed it 100 times. I figured this would show a reasonable tradeoff between compilation and evaluation time.
Unfortunately, at least in Chrome on my Mac, this benchmark showed that your code was a slowdown. The test took about 2 seconds with the "old compiler" code but about 7.5 seconds with the "Ed compiler" (or, well, Ed compiler with my modifications). Yes, the compiler was definitely faster --- and I think your code was more readable and maintainable --- but when combined with evaluation time, that definitely seems like a slowdown.
I was inspired to keep trying, though. The old compiler managed to hardcode a pretty simple function with not too many nested function calls, but used the (slow) eval. Your compiler did very little compilation so that evaluation was still full of lots of table lookups. What if we tried something in the middle? I did another pass: ce204e0 This avoided using eval like yours did, but tried to make as many choices as possible at eval time rather than at runtime. This performed a little better than yours (around 5.5 seconds) but still not as well as the current code.
I'm not sure what to do next. I don't want to merge your compiler or mine, since their main inspiration is speed and they don't seem to improve speed in my benchmarks.
Maybe my benchmark is bad? Or maybe there are browsers where one of our compilers is so much faster than the old compiler that it makes up for the decrease in speed on Chrome? (ie, if it takes another browser from "incredibly slow" to "reasonably fast" while only dropping Chrome from "very fast" to "pretty fast", maybe that's good enough?)
Another possibility is that we should just do a function cache like you suggested. Did you say you already implemented one? A cache plus the eval compiler could be the best case (except for code clarity and extensibility :( )
In any case, I was inspired to do an implementation of "foo.1.bar" indexing for the old eval compiler, which I think works.
Thanks for putting the time into this --- I hope we can get some real speedups for minimongo out of this work.
BTW, I took a quick stab at doing something that was just like the old eval compiler, except that it cached the results of eval. (ie, it called
Thanks for all the feedback! When time permits, I'm going to look more
One main motivation for my patch (that I didn't mention before, because as
In this plan, I have indexes as hashmaps (+ordered arrays for range
If you are interested in indexes for minimongo, I would be happy to draft
2012/12/18 David Glasser firstname.lastname@example.org
Hi @Ed-von-Schleck ! The original inspiration for this change was performance, and so my original decision to not accept the pull request was based on my benchmark not showing a performance improvement (and a performance regression, though possibly not one observable in real use).
I was disappointed, though, because your version was definitely more maintainable.
Over on the
So congratulations! This is one of the larger changes we've been able to merge from our open source community so far.