New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big amibitous plan for native secondary indexes and purge() #3775
Comments
Also, the adapter-specific |
we could dust off the old alternative plan we decided against when doing dependent dbs. The idea was to a striped down storage api for making space where an index can be stored, this would make it possible to make other types of indices that don't map well to map reduce (i.e. geopouch) |
I think that's basically what the |
I think I'm going to move mapreduce back into the pouchdb repo, and then expose |
So I am a little bit wary of the proposal, in general I think it may be ok to have implementations define their own viewCleanup etc, somewhat worried about My vague idea for secondary indexes is to change our underlying storage format, we currently have
alongside
I suggest we do
My main worry with that is creating dynamic indexes with indexeddb doesnt seem particularly easy, I believe there is some movement to support that but its likely gonna take a while. As for how that API is exposed to the adapters, we may just want to let them pass it through, but honestly I would probably skip implementing native |
Basically CouchDB implemented a fairly clever btree log format because it was dealing with the native filesystem, we are dealing with databases and by not using them as such we are at the least 2x slower |
Folding both However, the big problem is the lack of support for In general, though, I'm not sure that we need to implement the "single Don't worry, though; it's looking more and more like I won't have time for this this summer, so most likely I will just work on stabilizing/speeding up the solution we already have. So there's no hurry. :) |
I have spent a bit of time on the views. So I created this plugin: https://github.com/sgenoud/pouchdb-dr-view What is in there? Ok, first I have refactored the update part of the query, trying to decouple the write from the read. I hoped it would improve some things, does not seem to do it too much, according to my little tests (perhaps 10% by tweeking the buffer parameters). Note that it passes the mapreduce test suite. Additionally it should make it "easier" to implement your another view engine, by exposing some methods that could be overridden (in the same way than the plugins do it for pouchdb). I have thrown lots of stuff there - there is definitely more work to be done on the API exposed, as well as baking in some performance test to help for developing a more performant view. I'd love to get some feedback, I don't want to sunk too much more time in there is not much support to continue in this direction... |
@sgenoud Cool - is there any way you could publish this as a separate plugin that is compatible with the PouchDB plugin ecosystem? In principle, map/reduce is a plugin, meaning that someone who wrote a better version could just publish it as a separate module (e.g. see pouchdb-find). As for the "direction" we are going in, yeah, we are probably going to work to have the secondary indexes within the adapters themselves, rather than create an entirely separate database. However, if you have suggestions for improving performance in core map/reduce, then we'd be happy to accept them. :) |
@nolanlawson actually it is a compatible pouchdb plugin (but I have not published it yet). I uses the With this plugin I have come to realize (on a leveldb adapter - I have not looked how this translates in the browser) that the slowest part seems to be the changes reading – if I write one document for every change the writer tends to wait for the reader. I was thinking of changing the behaviour to something that actually works more like couch itself – compute all the views of one ddoc together as a group (one read, multiple writes). I would also be interesting to see what happens when we put the reader in a web worker (or something similar). On the other hand we could solve the "cold start" problem with something similar to the pouchdb dump tool you created. If the views are stored as pouch databases we can import them easily. You would need to bootstrap from scratch, both the main db and the views (i.e. to make sure the data is consistent). Obviously if the data changes a lot we loose all the wins – but it should cover some common cases. Generally, I am not sure how you want to implement the map/reduce done more natively - as you need to run an arbitrary javascript function on every document anyway (and get outside of the C layer to do it). You could get wins for pouchdb-find though, because, as far as I understand, it creates map functions that "only" pull some fields (and do not do some crazy things like running a double metaphone). |
is speeding up secondary indexes still on the roadmap or have priorities changed after a year? |
My primary goal in #4984 is to dramatically speed up |
Seems we can close this issue now that idb-next is well underway |
Is there another issue we can watch or a place that shows the progress on idb-next? Sorry if I missed something, and thanks for all the great work 🙌 |
Its @ #5207, its basically the pouchdb-find implementation that needs done then to worry about migrations and verify performance etc. Been hammered at work recently but hoping to get on it soon, very happy if anyone else wants to take a look |
I've got a plan for how I'd like to proceed with solving both:
purge()
feature (Feature Request - Method to Purge a document #802).My proposal: move the
_query()
and_viewCleanup()
implementations into the adapters themselves. Yes, this will require writing it three times for IDB/WebSQL/LevelDB, but it offers the benefit of vastly improved map/reduce performance. Which I think we sorely need, because as it stands, we're about 100x slower than native indexes, and 100 is a big number (source: #3039 (comment)).What would these
_query()
and_viewCleanup()
functions look like? Basically like this: here's the socket-pouch custom adapter defining its own implementations and here's the modest change to mapreduce required to pull off this trick. So basically we would just need to move the_query()
and_viewCleanup()
implementations into their respective adapters (http already having been written) and deprecate the mapreduce plugin.How does this affect pouchdb-find, etc.? I think I've already demonstrated with pouchdb-abstract-mapreduce that you can build
pouchdb-find
on top of map/reduce and it works just peachy. I did the same thing for pouchdb-quick-search. Both codebases could be very easily rewritten to use native_query()
and_viewCleanup()
functions.Doesn't it sacrifice modularity to re-integrate the mapreduce plugin? A bit, but it could still be possible to build PouchDB without the
_query()
and_viewCleanup()
functions. We'd just have to have a separate build script that uses some browserify tricks to sub out those functions for stubs that throw "not implemented" errors, and include them in separate files:idb-mapreduce.js
,websql-mapreduce.js
,leveldb-mapreduce.js
. Not a big deal.What's the migration plan? None at the moment, except that existing secondary indexes would have to be rebuilt. I think this is the easiest fix and not really a big ask for users. Anyway, given that the schema would look different between the old and new implementations, such a migration is inevitable.
What are the other benefits? Plenty. No more awkward filtering of mrview databases in pouchdb-all-dbs, no more IndexedDB destroy() race conditions, no more having to manage "dependentDbs" at all. As well as perf improvements and making it vastly simpler to implement
purge()
, since you don't have to worry about coordinating the dependent databases.In general, I think our current map/reduce implementation was a fine solution at the time. It gave us robust secondary indexes for all three adapters, without having to write a lot of adapter-specific code. And we now have tons of map/reduce tests that are going to serve us well going forward. But speed is a priority, and
purge()
needs to be implemented, and in my mind this is the best way to do it.If y'all are okay with this proposal, I'll start implementing it and could hopefully have it done within a month or two.
The text was updated successfully, but these errors were encountered: