Ultra slow processing when syncing a document with many revisions #5327

rague · 2016-06-10T09:50:31Z

Environment (Node.js/browser/hybrid app/etc.)

browser

Browser/platform

Chrome 51 + Safari 9.1.1

Adapter

IndexedDB (Chrome) / WebSQL (Safari)

Server

CouchDB 1.6.1

Issue description

When syncing a document with many remote revisions (> 1000), the processing of response is so slow that it needs a near infinite time to finish (> 10mn), freezing the page.

I configure local base with revs_limit: 50 and autocompaction: true
I configure sync operation with batch_size: 25

The chrome profile show that 90% of time is consumed by 'stem' function.

The weight of the response for the document request is 1,5 MB

config_2_utest.txt

Here is the request :

Edit by Nolan: moved to this Gist.

daleharvey · 2016-06-10T10:08:19Z

hrm, so if I remember the slow here, that array of 919 revisions is going to be turned into a tree (that looks something like [rev1, [rev2, rev3]]], the stemming function will then turn that tree back into a long array, slice it then turn it back into a tree. We have hard problem with the inefficiency of the tree format, it could likely be improved although I cant think of easily how, its also going to be a hard change to make (it might be worth looking at this in idb-next).

But one possibly easy looking win is applying rev_tree limits on the way in to creating that tree instead of after the tree has been created.

I think the first step for this bug is to write a script that generates a database that exhibits that problems you are seeing during replication so everyone can debug it

nolanlawson · 2016-06-12T15:48:19Z

Wow, that config_2_utest.txt file is huge merely because of the _revisions list, i.e. the list of all revision IDs. It's enormous.

@daleharvey From reading the bug report, it seems like the issue here is not our merging algorithm, but rather just the overhead of pulling down a 1.5MB file in order to sync a single document. Note that that's 1.5MB of just revision ids.

To be quite frank, this seems like a design error in CouchDB's replication protocol that we can't get around without modifying the protocol.

rague · 2016-06-18T08:55:32Z

I understood why there is a so heavy _revisions list. It's a bug in my client resolution routine : when I have two running application which try to concurrently resolve the conflict, they turns in a dead loop, producing many conflicting revisions.

I've solved the problem, but my database is still filled of theses enormous revisions list. So what ?

I see that if I could change the MAX_SIMULTANEOUS_REVS used for the open_revs argument, it could perhaps help.

rague · 2016-06-18T09:11:48Z

I've tested with MAX_SIMULTANEOUS_REVS = 1. It doesn't resolve the problem. It seems that the problem is only that we have a lot of revisions, each with a long _revisions list.

This case is perhaps exceptionnel (because it has been produced by a bug of my application). However it shouldn't break PouchDB. My CouchDB base is not broken, PouchDB should be able to works with it.

Since it's seems that it could be hard to optimize this part of PouchDB, what can I do with my database to resolve my problem. Any advice ?

daleharvey · 2016-06-18T12:50:52Z

@rague and chance you could write some example code of how the conflict resolution could build a dep graph in this way? then I would definitely be able to look into it

nolanlawson · 2016-06-18T22:02:49Z

This sounds like an issue that would be solved by consistent rev generation (already an issue open for this).

rague · 2016-06-20T09:01:51Z

The rev generation issue (#4642) is effectively related to my problem.

Here is my routine for conflicts resolution :

Edit by Nolan: moved to this Gist

willholley · 2016-06-20T14:24:22Z

I have an implementation of pluggable rev generation using md5 or the current uuid strategy. I hesitated to submit a PR because there was some discussion in the CouchDB community about developing a common rev generation algorithm which didn't depend upon Erlang internals. That thread seems to have evaporated so I could resurrect my branch if there's interest.

nolanlawson · 2016-06-20T14:52:13Z

I'm interested in that @willholley. I think we've reached the point where we can:

do deterministic revs by default, ignore whether or not it's compatible with CouchDB, if CouchDB wants to become deterministic at some point they can use our system (assuming we don't do anything that's JS-specific). Even if CouchDB doesn't, we still have the benefit of Pouch clients not going into an infinite loop.
fork the replicator into a separate module that doesn't do the deterministic revs, in case it turns out to be a perf issue and people want a way out.

ermouth · 2016-06-25T18:48:54Z

Seems if deterministic rev hasher could take in account not only current doc json, but rev hash of doc‘s previous revision as well, tree of document versions automagically turns into something like a blockchain.

nolanlawson · 2016-06-30T00:40:14Z

Closing as a dup of #4642. Please reopen if I'm incorrect.

angkec · 2017-05-11T00:02:36Z

Running into the same problem that having a huge revisions history makes sync very slow with Pouchdb. CPU utilization reaches 100% for the entirety of the slow sync process. Only a fraction of the time is spent on networking, since I was using pouchdb-load, it's the database building process taking too long. I'm solving this by limiting revs saved on couchdb to a minimal number, but Pouchdb should probably handle documents with long list of revs better.

courageDeveloper · 2020-04-03T02:17:19Z

My app drags and hangs when it syncs with Couchdb, but whenever i disconnect my network cable from the server it becomes fast, why is this so? is it something wrong with my pagination or am i syncing wrongly? It's really a pain and i need answers ASAP.

daleharvey added the bug Confirmed bug label Jun 10, 2016

nolanlawson closed this as completed Jun 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultra slow processing when syncing a document with many revisions #5327

Ultra slow processing when syncing a document with many revisions #5327

rague commented Jun 10, 2016 •

edited by nolanlawson

daleharvey commented Jun 10, 2016

nolanlawson commented Jun 12, 2016

rague commented Jun 18, 2016

rague commented Jun 18, 2016

daleharvey commented Jun 18, 2016

nolanlawson commented Jun 18, 2016 via email

rague commented Jun 20, 2016 •

edited by nolanlawson

willholley commented Jun 20, 2016

nolanlawson commented Jun 20, 2016 •

edited

ermouth commented Jun 25, 2016

nolanlawson commented Jun 30, 2016

angkec commented May 11, 2017 •

edited

courageDeveloper commented Apr 3, 2020

Ultra slow processing when syncing a document with many revisions #5327

Ultra slow processing when syncing a document with many revisions #5327

Comments

rague commented Jun 10, 2016 • edited by nolanlawson

Environment (Node.js/browser/hybrid app/etc.)

Browser/platform

Adapter

Server

Issue description

daleharvey commented Jun 10, 2016

nolanlawson commented Jun 12, 2016

rague commented Jun 18, 2016

rague commented Jun 18, 2016

daleharvey commented Jun 18, 2016

nolanlawson commented Jun 18, 2016 via email

rague commented Jun 20, 2016 • edited by nolanlawson

willholley commented Jun 20, 2016

nolanlawson commented Jun 20, 2016 • edited

ermouth commented Jun 25, 2016

nolanlawson commented Jun 30, 2016

angkec commented May 11, 2017 • edited

courageDeveloper commented Apr 3, 2020

rague commented Jun 10, 2016 •

edited by nolanlawson

rague commented Jun 20, 2016 •

edited by nolanlawson

nolanlawson commented Jun 20, 2016 •

edited

angkec commented May 11, 2017 •

edited