WIP - (#6031) - faster IDB changes() using getAll() #6031

Closed
wants to merge 1 commit into
from

Projects

None yet

1 participant

@nolanlawson
Member

I've been trying to find ways to improve the performance of secondary indexes. Obviously we'd rather go with native secondary indexes, but that may be a ways off, and I think there are ways we can at least stop the bleeding for current slow paths in secondary indexes.

From profiling, I noticed that a large amount of time is spent using IDBCursors in our IDB changes() implementation. Cursors are slow because we effectively do a stair-step pattern pulling in all changes() (e.g. the first 50 changes after seq 2), applying filters and limits and checking doc_ids and checking that the document is the winning doc and checking that the sequence is the winning sequence, etc. We do this using cursors because the filters often need to be applied in JavaScript and then checked against the limit to ensure we return exactly the right results in exactly the right order.

However, for the vast majority of cases (and for the cases that affect secondary indexes), the changes() queries are rather simple, and can essentially be expressed as "fetch all documents with doc.seq >= seq and with limit limit for the given seq and limit". If none of the documents have later seqs (e.g. in the rare case of a non-winning document being replicated after a winning document), then this query is all we need, and can get done with two simple IDB operations: getAll() and getAllKeys() (for browsers that support it, currently Chrome and Firefox but Safari is implementing it as well and it's under consideration for Edge).

So my technique is to write a separate changes() implementation that uses the "fast" strategy, and then we fall back to the old "slow" strategy whenever necessary. Running the mapreduce and integration tests, I found that the fast path was hit 6587 times and the slow path was hit 545 times, so 92% of the time we're using the fast option (in Firefox/Chrome anyway).

Furthermore this has a huge impact on our perf. Using the temp-views test (which is a good proxy for intense secondary index usage), I got the following improvement (10 iterations):

Before After Improvement
Chrome 55 100185ms 47574ms 52.5%
Firefox 50 96430ms 61164ms 35.6%
@nolanlawson nolanlawson (#6031) - faster IDB changes() using getAll()
2923c27
@nolanlawson nolanlawson changed the title from (#6031) - faster IDB changes() using getAll() to WIP - (#6031) - faster IDB changes() using getAll() Dec 19, 2016
@nolanlawson
Member

Adding WIP because on second thought I think we need new tests and I want to ensure this implementation is bulletproof

@nolanlawson
Member
nolanlawson commented Dec 19, 2016 edited

On second thought I believe this introduces bugs (I can probably find a test that fails if you modify the same document multiple times and then multiple documents one time immediately after). Also I think this can be abstracted into a single implementation that does a "batched cursor" – i.e. a pseudocursor that uses getAll() with limit for browsers that support it and regular cursors for other browsers.

But at the very least this demonstrates the perf improvement!

@nolanlawson nolanlawson reopened this Dec 19, 2016
@nolanlawson
Member

Reopening because I want to remind myself to keep looking into this. The batched cursor will be challenging to implement, but I think it can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment