Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ultra slow processing when syncing a document with many revisions #5327

Closed
rague opened this issue Jun 10, 2016 · 13 comments
Closed

Ultra slow processing when syncing a document with many revisions #5327

rague opened this issue Jun 10, 2016 · 13 comments
Labels
bug Confirmed bug

Comments

@rague
Copy link

rague commented Jun 10, 2016

Environment (Node.js/browser/hybrid app/etc.)

browser

Browser/platform

Chrome 51 + Safari 9.1.1

Adapter

IndexedDB (Chrome) / WebSQL (Safari)

Server

CouchDB 1.6.1

Issue description

When syncing a document with many remote revisions (> 1000), the processing of response is so slow that it needs a near infinite time to finish (> 10mn), freezing the page.

I configure local base with revs_limit: 50 and autocompaction: true
I configure sync operation with batch_size: 25

The chrome profile show that 90% of time is consumed by 'stem' function.

The weight of the response for the document request is 1,5 MB

config_2_utest.txt

Here is the request :

Edit by Nolan: moved to this Gist.

@daleharvey
Copy link
Member

hrm, so if I remember the slow here, that array of 919 revisions is going to be turned into a tree (that looks something like [rev1, [rev2, rev3]]], the stemming function will then turn that tree back into a long array, slice it then turn it back into a tree. We have hard problem with the inefficiency of the tree format, it could likely be improved although I cant think of easily how, its also going to be a hard change to make (it might be worth looking at this in idb-next).

But one possibly easy looking win is applying rev_tree limits on the way in to creating that tree instead of after the tree has been created.

I think the first step for this bug is to write a script that generates a database that exhibits that problems you are seeing during replication so everyone can debug it

@daleharvey daleharvey added the bug Confirmed bug label Jun 10, 2016
@nolanlawson
Copy link
Member

Wow, that config_2_utest.txt file is huge merely because of the _revisions list, i.e. the list of all revision IDs. It's enormous.

@daleharvey From reading the bug report, it seems like the issue here is not our merging algorithm, but rather just the overhead of pulling down a 1.5MB file in order to sync a single document. Note that that's 1.5MB of just revision ids.

To be quite frank, this seems like a design error in CouchDB's replication protocol that we can't get around without modifying the protocol.

@rague
Copy link
Author

rague commented Jun 18, 2016

I understood why there is a so heavy _revisions list. It's a bug in my client resolution routine : when I have two running application which try to concurrently resolve the conflict, they turns in a dead loop, producing many conflicting revisions.

I've solved the problem, but my database is still filled of theses enormous revisions list. So what ?

I see that if I could change the MAX_SIMULTANEOUS_REVS used for the open_revs argument, it could perhaps help.

@rague
Copy link
Author

rague commented Jun 18, 2016

I've tested with MAX_SIMULTANEOUS_REVS = 1. It doesn't resolve the problem. It seems that the problem is only that we have a lot of revisions, each with a long _revisions list.

This case is perhaps exceptionnel (because it has been produced by a bug of my application). However it shouldn't break PouchDB. My CouchDB base is not broken, PouchDB should be able to works with it.

Since it's seems that it could be hard to optimize this part of PouchDB, what can I do with my database to resolve my problem. Any advice ?

@daleharvey
Copy link
Member

@rague and chance you could write some example code of how the conflict resolution could build a dep graph in this way? then I would definitely be able to look into it

@nolanlawson
Copy link
Member

nolanlawson commented Jun 18, 2016 via email

@rague
Copy link
Author

rague commented Jun 20, 2016

The rev generation issue (#4642) is effectively related to my problem.

Here is my routine for conflicts resolution :

Edit by Nolan: moved to this Gist

@willholley
Copy link
Member

I have an implementation of pluggable rev generation using md5 or the current uuid strategy. I hesitated to submit a PR because there was some discussion in the CouchDB community about developing a common rev generation algorithm which didn't depend upon Erlang internals. That thread seems to have evaporated so I could resurrect my branch if there's interest.

@nolanlawson
Copy link
Member

nolanlawson commented Jun 20, 2016

I'm interested in that @willholley. I think we've reached the point where we can:

  1. do deterministic revs by default, ignore whether or not it's compatible with CouchDB, if CouchDB wants to become deterministic at some point they can use our system (assuming we don't do anything that's JS-specific). Even if CouchDB doesn't, we still have the benefit of Pouch clients not going into an infinite loop.
  2. fork the replicator into a separate module that doesn't do the deterministic revs, in case it turns out to be a perf issue and people want a way out.

@ermouth
Copy link
Contributor

ermouth commented Jun 25, 2016

Seems if deterministic rev hasher could take in account not only current doc json, but rev hash of doc‘s previous revision as well, tree of document versions automagically turns into something like a blockchain.

@nolanlawson
Copy link
Member

Closing as a dup of #4642. Please reopen if I'm incorrect.

@angkec
Copy link

angkec commented May 11, 2017

Running into the same problem that having a huge revisions history makes sync very slow with Pouchdb. CPU utilization reaches 100% for the entirety of the slow sync process. Only a fraction of the time is spent on networking, since I was using pouchdb-load, it's the database building process taking too long. I'm solving this by limiting revs saved on couchdb to a minimal number, but Pouchdb should probably handle documents with long list of revs better.

@courageDeveloper
Copy link

My app drags and hangs when it syncs with Couchdb, but whenever i disconnect my network cable from the server it becomes fast, why is this so? is it something wrong with my pagination or am i syncing wrongly? It's really a pain and i need answers ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug
Projects
None yet
Development

No branches or pull requests

7 participants