Big amibitous plan for native secondary indexes and purge() #3775

nolanlawson · 2015-04-30T12:34:31Z

I've got a plan for how I'd like to proceed with solving both:

faster, "native" secondary indexes (Native secondary indexes #2280), and
the purge() feature (Feature Request - Method to Purge a document #802).

My proposal: move the _query() and _viewCleanup() implementations into the adapters themselves. Yes, this will require writing it three times for IDB/WebSQL/LevelDB, but it offers the benefit of vastly improved map/reduce performance. Which I think we sorely need, because as it stands, we're about 100x slower than native indexes, and 100 is a big number (source: #3039 (comment)).

What would these _query() and _viewCleanup() functions look like? Basically like this: here's the socket-pouch custom adapter defining its own implementations and here's the modest change to mapreduce required to pull off this trick. So basically we would just need to move the _query() and _viewCleanup() implementations into their respective adapters (http already having been written) and deprecate the mapreduce plugin.

How does this affect pouchdb-find, etc.? I think I've already demonstrated with pouchdb-abstract-mapreduce that you can build pouchdb-find on top of map/reduce and it works just peachy. I did the same thing for pouchdb-quick-search. Both codebases could be very easily rewritten to use native _query() and _viewCleanup() functions.

Doesn't it sacrifice modularity to re-integrate the mapreduce plugin? A bit, but it could still be possible to build PouchDB without the _query() and _viewCleanup() functions. We'd just have to have a separate build script that uses some browserify tricks to sub out those functions for stubs that throw "not implemented" errors, and include them in separate files: idb-mapreduce.js, websql-mapreduce.js, leveldb-mapreduce.js. Not a big deal.

What's the migration plan? None at the moment, except that existing secondary indexes would have to be rebuilt. I think this is the easiest fix and not really a big ask for users. Anyway, given that the schema would look different between the old and new implementations, such a migration is inevitable.

What are the other benefits? Plenty. No more awkward filtering of mrview databases in pouchdb-all-dbs, no more IndexedDB destroy() race conditions, no more having to manage "dependentDbs" at all. As well as perf improvements and making it vastly simpler to implement purge(), since you don't have to worry about coordinating the dependent databases.

In general, I think our current map/reduce implementation was a fine solution at the time. It gave us robust secondary indexes for all three adapters, without having to write a lot of adapter-specific code. And we now have tons of map/reduce tests that are going to serve us well going forward. But speed is a priority, and purge() needs to be implemented, and in my mind this is the best way to do it.

If y'all are okay with this proposal, I'll start implementing it and could hopefully have it done within a month or two.

The text was updated successfully, but these errors were encountered:

nolanlawson · 2015-04-30T12:37:45Z

Also, the adapter-specific _query() and _viewCleanup() functions don't have be perfectly user-facing (e.g. with evaling map functions and running reduce and all that jazz). They could be more modest and low-level, and look more like what abstract-mapreduce is doing.

calvinmetcalf · 2015-04-30T15:15:45Z

we could dust off the old alternative plan we decided against when doing dependent dbs. The idea was to a striped down storage api for making space where an index can be stored, this would make it possible to make other types of indices that don't map well to map reduce (i.e. geopouch)

nolanlawson · 2015-05-01T02:27:58Z

I think that's basically what the abstract-mapreduce plan would be. :) Although maybe a bit higher-level than what we originally wrote.

nolanlawson · 2015-05-25T18:42:25Z

I think I'm going to move mapreduce back into the pouchdb repo, and then expose abstract-mapreduce under extras. It would make things a lot easier to manage if it were a single repo.

daleharvey · 2015-07-20T13:33:32Z

So I am a little bit wary of the proposal, in general I think it may be ok to have implementations define their own viewCleanup etc, somewhat worried about query becoming a first class citizen in the adapter api.

My vague idea for secondary indexes is to change our underlying storage format, we currently have

|    BY_ID    |
  { _id: docId, 
    _rev: docRev,
    _winningSeq: etc }

alongside

|    BY_SEQ    |

{ id: seq, 
  data: '{some: json}' }

I suggest we do

|    BY_ID    |
{ _id: 'hello', 
  _winningRev: 'xxx',
  _seq: X, // may have to be an array?
  _rev_tree: [],
  winningDoc: {
    some: json
  },
  revs: {
     // the old docs
  }
}

My main worry with that is creating dynamic indexes with indexeddb doesnt seem particularly easy, I believe there is some movement to support that but its likely gonna take a while.

As for how that API is exposed to the adapters, we may just want to let them pass it through, but honestly I would probably skip implementing native query and focus on native pouchdb-find at this point

daleharvey · 2015-07-20T13:35:20Z

Basically CouchDB implemented a fairly clever btree log format because it was dealing with the native filesystem, we are dealing with databases and by not using them as such we are at the least 2x slower

nolanlawson · 2015-07-20T15:33:22Z

Folding both objectStores into a single one is a great idea, especially given that joins in IndexedDB are very slow.

However, the big problem is the lack of support for multiEntry in IE's IndexedDB. Yes, _seq would have to be an array, and so would a _secondaryIndexes field, but IE wouldn't support that, so we'd have to come up with some kind of workaround. I looked into iegap and asked @dfahlander about it, and it seems it would only add about 5KB and passes the W3C tests, so it's a reasonable solution.

In general, though, I'm not sure that we need to implement the "single objectStore" solution just to add native secondary indexes. And yeah, you can't dynamically add indexes in IndexedDB, so my idea was to create a single index and then prefix them.

Don't worry, though; it's looking more and more like I won't have time for this this summer, so most likely I will just work on stabilizing/speeding up the solution we already have. So there's no hurry. :)

sgenoud · 2016-02-16T16:45:43Z

I have spent a bit of time on the views. So I created this plugin: https://github.com/sgenoud/pouchdb-dr-view

What is in there? Ok, first I have refactored the update part of the query, trying to decouple the write from the read. I hoped it would improve some things, does not seem to do it too much, according to my little tests (perhaps 10% by tweeking the buffer parameters).

Note that it passes the mapreduce test suite.

Additionally it should make it "easier" to implement your another view engine, by exposing some methods that could be overridden (in the same way than the plugins do it for pouchdb).

I have thrown lots of stuff there - there is definitely more work to be done on the API exposed, as well as baking in some performance test to help for developing a more performant view. I'd love to get some feedback, I don't want to sunk too much more time in there is not much support to continue in this direction...

nolanlawson · 2016-02-17T18:58:03Z

@sgenoud Cool - is there any way you could publish this as a separate plugin that is compatible with the PouchDB plugin ecosystem? In principle, map/reduce is a plugin, meaning that someone who wrote a better version could just publish it as a separate module (e.g. see pouchdb-find).

As for the "direction" we are going in, yeah, we are probably going to work to have the secondary indexes within the adapters themselves, rather than create an entirely separate database. However, if you have suggestions for improving performance in core map/reduce, then we'd be happy to accept them. :)

sgenoud · 2016-02-18T11:18:29Z

@nolanlawson actually it is a compatible pouchdb plugin (but I have not published it yet). I uses the _query hook you added to override the behaviour.

With this plugin I have come to realize (on a leveldb adapter - I have not looked how this translates in the browser) that the slowest part seems to be the changes reading – if I write one document for every change the writer tends to wait for the reader. I was thinking of changing the behaviour to something that actually works more like couch itself – compute all the views of one ddoc together as a group (one read, multiple writes).

I would also be interesting to see what happens when we put the reader in a web worker (or something similar).

On the other hand we could solve the "cold start" problem with something similar to the pouchdb dump tool you created. If the views are stored as pouch databases we can import them easily. You would need to bootstrap from scratch, both the main db and the views (i.e. to make sure the data is consistent). Obviously if the data changes a lot we loose all the wins – but it should cover some common cases.

Generally, I am not sure how you want to implement the map/reduce done more natively - as you need to run an arbitrary javascript function on every document anyway (and get outside of the C layer to do it). You could get wins for pouchdb-find though, because, as far as I understand, it creates map functions that "only" pull some fields (and do not do some crazy things like running a double metaphone).

nrw · 2016-04-22T18:25:44Z

is speeding up secondary indexes still on the roadmap or have priorities changed after a year?

daleharvey · 2016-04-22T18:52:36Z

My primary goal in #4984 is to dramatically speed up pouchdb-find queries

nolanlawson · 2016-12-13T22:40:53Z

Seems we can close this issue now that idb-next is well underway

gr2m · 2016-12-13T22:45:41Z

Is there another issue we can watch or a place that shows the progress on idb-next? Sorry if I missed something, and thanks for all the great work 🙌

daleharvey · 2016-12-13T23:14:37Z

Its @ #5207, its basically the pouchdb-find implementation that needs done then to worry about migrations and verify performance etc. Been hammered at work recently but hoping to get on it soon, very happy if anyone else wants to take a look

rsutphin mentioned this issue Jun 1, 2015

docs / query about deleted item tombstones and which system introduces them pouchdb-community/ember-pouch#61

Closed

daleharvey mentioned this issue Jul 20, 2015

Native secondary indexes #2280

Closed

nolanlawson mentioned this issue Jul 28, 2015

Querying when removing document can sometimes cause incorrect query result. #4088

Closed

daleharvey added the enhancement Feature request label Aug 4, 2015

This was referenced Jan 28, 2016

slow query times using persistent views #4817

Closed

Slow behavior on a Cordova App during View building. #4851

Closed

nolanlawson mentioned this issue Feb 18, 2016

Slow query performance even in nodejs #4902

Closed

nolanlawson closed this as completed Dec 13, 2016

nolanlawson mentioned this issue Jun 7, 2017

Implement pouchdb-find for idb-next #6543

Closed

clooudch mentioned this issue Mar 16, 2018

Discussion about PouchDB Attachment-Performance #7152

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big amibitous plan for native secondary indexes and purge() #3775

Big amibitous plan for native secondary indexes and purge() #3775

nolanlawson commented Apr 30, 2015

nolanlawson commented Apr 30, 2015

calvinmetcalf commented Apr 30, 2015

nolanlawson commented May 1, 2015

nolanlawson commented May 25, 2015

daleharvey commented Jul 20, 2015

daleharvey commented Jul 20, 2015

nolanlawson commented Jul 20, 2015

sgenoud commented Feb 16, 2016

nolanlawson commented Feb 17, 2016

sgenoud commented Feb 18, 2016

nrw commented Apr 22, 2016

daleharvey commented Apr 22, 2016

nolanlawson commented Dec 13, 2016

gr2m commented Dec 13, 2016

daleharvey commented Dec 13, 2016

Big amibitous plan for native secondary indexes and purge() #3775

Big amibitous plan for native secondary indexes and purge() #3775

Comments

nolanlawson commented Apr 30, 2015

nolanlawson commented Apr 30, 2015

calvinmetcalf commented Apr 30, 2015

nolanlawson commented May 1, 2015

nolanlawson commented May 25, 2015

daleharvey commented Jul 20, 2015

daleharvey commented Jul 20, 2015

nolanlawson commented Jul 20, 2015

sgenoud commented Feb 16, 2016

nolanlawson commented Feb 17, 2016

sgenoud commented Feb 18, 2016

nrw commented Apr 22, 2016

daleharvey commented Apr 22, 2016

nolanlawson commented Dec 13, 2016

gr2m commented Dec 13, 2016

daleharvey commented Dec 13, 2016