Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revs_limit hides, but does not actually delete, old document data #4372

Closed
elafargue opened this issue Sep 28, 2015 · 41 comments
Closed

revs_limit hides, but does not actually delete, old document data #4372

elafargue opened this issue Sep 28, 2015 · 41 comments
Labels
bug Confirmed bug

Comments

@elafargue
Copy link

I am using PouchDB for a data logging application, and one of my databases holds a single 'status' document. This document is updated up to twice per minute, so new revisions are generated fairly rapidly.

I have noticed that even when using compaction - either automatic or explicit -, the storage requirements for this document get out of hand very quick: for a 100 byte JSON structure, I end up with several hundred megabytes of "*.ldb" files in the LevelDB storage directory. The server running for a bit more than a week now uses more than 1GB for the status document...

My expectation was that once I reached the hardcoded 1000 revisions limit, the space required for this status document would be large but stop increasing, but that is not the case.

When fetching all revisions of the 'status' document, I correctly only get the last 1000 revisions, all of them "missing" but the latest, since I use compaction. But exploring the ldb files, I can find references to every single revision ever created, which I suppose explains why my storage increases continuously.

Is this the expected behavior, or is this a bug?

PouchDB version is the latest as of 2015.09.28
OS is Linux

@nolanlawson
Copy link
Member

Definitely a bug; sounds like when we trim 1000+ revisions, we don't actually remove the revisions; we just update the metadata.

@elafargue
Copy link
Author

@nolanlawson thanks for the feedback - let me know how I can help... is this independent from the underlying database system, or is this linked to LevelDB specifically? I can take a look if you point me in the right direction in the source tree...

@nolanlawson
Copy link
Member

Yep, so ideally this would need fixes in all three adapters and a new test (to guarantee that the fix worked). For LevelDB you would be looking at the bulkDocs implementation and what happens when metadata gets trimmed, which currently happens in shared utility functions here and here. Note that in one case the code path applies for new_edits=false, whereas for the other one it applies to new_edits=true.

So yeah, it is very very subtle, but if you have the inclination to attack the bug then I encourage you to do so. :) You can verify that your fix works by writing a test that modifies a document >1000 times and then verifying that you can no longer fetch the oldest revision via a get().

@elafargue
Copy link
Author

Hmm, not an ideal candidate for a first PR on PouchDB, I see :) Not sure I'll be able to help in a timely manner here, unfortunately - though it's currently kind of a blocker in my project... Well, I'll see what I can do but don't let me hold you back if you want to tackle it first ;-)

@nolanlawson
Copy link
Member

As a temporary workaround, you can replicate from new PouchDB('/path/to/first') to new PouchDB('/path/to/second') and the second one will end up being compacted/smaller.

@elafargue
Copy link
Author

Good point. I am doing a few more tests - if I do 50'000 sequential updates to a simple { ts: timestamp} document, the levelDB database size can go up to 50MB which is a lot for such a small doc, but actually the size goes up and down all the time during the loop, from as low as 9M to at high as 50M, in just a couple of seconds. I'm a bit surprised...

Another issue: when running the same test program multiple times, each time the test is restarted the database size increases slightly, and ends up taking hundreds of megabytes even with auto compaction...

The test is a simple:

function update() {
   db.get('001').then(function (result) {
       result.ts = new Date().getTime();
      return db.put(result);
   }).then(function (response) {
      if (i++ < 50000)
         update();
    }).catch(function (err) {
        console.log(err);
    });
}

update();

Also, doing a "db.compact()" after each put seems to make matters worse, the database grew up to 500MB in a couple of minutes on this test.

@nolanlawson
Copy link
Member

Interesting. It might have something to do with the internals of LevelDB, which I'm not too familiar with.

@daleharvey daleharvey added the bug Confirmed bug label Oct 5, 2015
@christopherreay
Copy link

Just out of interest, does this plugin suffer from the same issues?
I would guess not.
They might be able to help:
https://github.com/redgeoff/delta-pouch

@nolanlawson
Copy link
Member

No, delta-pouch wouldn't, because every document only ever has one revision.

@elafargue
Copy link
Author

By the way: PouchDB 5.1.0 also suffers from this - I have the same issue on my software on this latest version. It also looks like in some instances, some documents disappear after many updates - this is on code that never deletes a document but updates the docs every 30 seconds...

@elafargue
Copy link
Author

I'd love to get an update on this issue: is there something I don't understand with auto compaction or non-leaf revisions that explains this behaviour, or is this really a bug?

This is kind of a show stopper for any NodeJS app that relies on PouchDB for long term operations, especially since I am now getting corruption issues on the same code after a couple of days or running...

@nolanlawson
Copy link
Member

If you auto-compact, then that should actually fix your problem, because the old revisions will be deleted every time you make an update.

The bug described in this issue is that, even if you have autocompact disabled, PouchDB should "compact" those >1000 old revisions, but it's not doing that.

@elafargue
Copy link
Author

I tried with both autocompact and without, the problem is pretty much the
same with the test code I shared above...

On Sat, Nov 21, 2015, 13:10 Nolan Lawson notifications@github.com wrote:

If you auto-compact, then that should actually fix your problem, because
the old revisions will be deleted every time you make an update.

The bug described in this issue is that, even if you have autocompact
disabled, PouchDB should "compact" those >1000 old revisions, but it's
not doing that.


Reply to this email directly or view it on GitHub
#4372 (comment).

@sgenoud
Copy link
Contributor

sgenoud commented Jan 26, 2016

I had a look at what is not pruned at 1000 documents even with auto_compaction on, and it looks like the rev_map is never pruned at all - my guess is that we should add it somewhere around here: https://github.com/pouchdb/pouchdb/blob/master/src/adapters/leveldb/index.js#L1303

I will spend a bit more time next week to make sure this is what we want to do...

sgenoud pushed a commit to sgenoud/pouchdb that referenced this issue Jan 26, 2016
We only keep the revision - sequence map for revisions that have a
document in the database.
sgenoud pushed a commit to sgenoud/pouchdb that referenced this issue Jan 29, 2016
We only keep the revision - sequence map for revisions that have a
document in the database.
sgenoud pushed a commit to sgenoud/pouchdb that referenced this issue Jan 29, 2016
When a document has more revisions than the revlimit the revisions are
now not only stemm from the revision tree but also removed from the
database.
@nolanlawson nolanlawson changed the title Storage requirements on single document seem to grow indefinitely with LevelDB revs_limit hides, but does not actually delete, old document data Feb 1, 2016
sgenoud pushed a commit to sgenoud/pouchdb that referenced this issue Feb 4, 2016
Added a simple test that checks that old revisions are not directly
accessible anymore. The test revealed a bug in the websql implementation
that was fixed.
Also, indentation
sgenoud pushed a commit to sgenoud/pouchdb that referenced this issue Feb 4, 2016
sgenoud pushed a commit to sgenoud/pouchdb that referenced this issue Feb 4, 2016
daleharvey pushed a commit that referenced this issue Feb 8, 2016
We only keep the revision - sequence map for revisions that have a
document in the database.
@daleharvey
Copy link
Member

Fixed in 4cccf61

@zhakhalov
Copy link

Please update npm version(add tag v5.2.2)

@daleharvey
Copy link
Member

We do releases around every month, so this will be released in around 2 weeks time, cheers

@ghost
Copy link

ghost commented May 12, 2016

I've created PouchDB (5.3.2) with the option not to create revisions:

var db = new PouchDB('test1', { revs_limit: 1, auto_compaction: true })

screen shot 2016-05-12 at 11 11 28

When I inspect indexedDB in chrome it looks like revisions are stored there. I'd like to keep size of db small as it is used in cordova project, no sync with CouchDB is needed.

http://stackoverflow.com/questions/37182802/pouchdb-growing-with-revisions-even-when-revs-limit-1

@gianbasagre
Copy link

@nolanlawson @daleharvey It seems this issue is still happening (tested on 5.3.2 and nightly).

@nolanlawson nolanlawson reopened this May 13, 2016
@nolanlawson
Copy link
Member

Yep, I believe I saw it reported on StackOverflow as well.

@gianbasagre
Copy link

@nolanlawson Is there a previous version where this issue is fixed? I'm using this in a Cordova app and the data just keeps on growing and growing which is not very good for a mobile app.

@nolanlawson
Copy link
Member

I don't actually know if this was ever fixed, to be honest. It looks like 4cccf61 only fixes it for the LevelDB adapter.

@gianbasagre
Copy link

Thanks. Is there anyway we can delete previous revisions manually? Via SQL query perhaps?

@nolanlawson
Copy link
Member

I don't believe so. If this is SQLite/WebSQL, you could try doing a VACUUM command though.

@sgenoud
Copy link
Contributor

sgenoud commented May 20, 2016

4cccf61 was fixing something different. The rev_map (which keeps the list of all the revisions of a document) was not pruned. But the documents were removed at compaction – unless I have missed something (my manual tests were done in leveldb).

But not much has changed since my PR, so either this other commit 14449b7 has broken something, there is something rotten with my PR or it was there all along and I missed it 😉 .

Note that the fix was also implemented for idb 4cccf61#diff-c6e3e4d1d80300a313a8c584a7ad5c8bR238 and websql 4cccf61#diff-c0839fc9da99b43081b6ae6a04f6beefR223) - but the code was simpler.

@daleharvey
Copy link
Member

oh wow, so I made a test case @ http://paste.pouchdb.com/paste/fob5op/, with revs_limit only everything works as expected, however if I set revs_limit: 1, auto_compaction: true I am seeing every revision stored, something fairly serious wrong there.

As a workaround it looks like everything is fine with revs_limits: 1 only, if revs_limit is 1 then auto_compaction isnt needed anyway

@daleharvey
Copy link
Member

auto_compaction on its own does the right thing, revs_limit: 5 + auto_compaction does the right thing, but revs_limit: 1, auto_compaction: true saves every revision, the exact opposite of what it is supposed to be doing. With that configuration we are going to be try to compact the same rev twice within the same transaction, its possible we are mucking outselves up there somehow

@sgenoud
Copy link
Contributor

sgenoud commented Jun 1, 2016

interesting! With the help of your test case, break points and some head scratching I found the source of the bug (not sure how to fix it though - it will need some more head scratching).

In this line we use the metadata to decide which data to auto compact. But, at this stage, if we have revs_limit=1, we only have one revision, the one that needs to stay (and we don't know about the older revisions that would need to be compacted). Therefore we don't compact them. So they stay forever and the db keeps growing.

@daleharvey
Copy link
Member

Ah yeh, here is the problem, 4cccf61#diff-c6e3e4d1d80300a313a8c584a7ad5c8bR236

I thought we would be compacting twice, but we arent we are doing one or the other.

if (docInfo.stemmedRevs.length) {
  compactRevs(docInfo.stemmedRevs, docInfo.metadata.id, txn);
}

should be unconditional and done before autoCompaction

@daleharvey
Copy link
Member

So the hard part about this is testing :( its specifically not possible to test via integration tests / the pouch api, it seems like the test needs adapter specific implementations which we have thus far avoided

@ghost
Copy link

ghost commented Jun 2, 2016

Thanks for info. Using only revs_limit=1 solved my problem.

sgenoud pushed a commit to sgenoud/pouchdb that referenced this issue Jun 2, 2016
@sgenoud
Copy link
Contributor

sgenoud commented Jun 2, 2016

The only way I could see is to ask each adapter to provide a size indice (that you would compute from the number of documents and some random documents query). This might be of some interest for application as well as for tests, but I feel like I am stretching it.

I have been working on a simple PR and wanted to do some manual testing - do you have a simple way you do those?

@daleharvey
Copy link
Member

For manual testing I have just been opening the web inspector in chrome and checking to see if the by-seq table has more documents in it than it should

@sgenoud
Copy link
Contributor

sgenoud commented Jun 2, 2016

I mean generally with the test setup, do you have a trick to serve the library to test in the browser?

@daleharvey
Copy link
Member

oh yeh npm run dev will open a server on localhost:8000, it will also start watching for file changes and rebuild the code when you change it. you can use ?grep=myuniquetest on the url to only run your own test

@sgenoud
Copy link
Contributor

sgenoud commented Jun 3, 2016

I have done manual testing on websql, idb and leveldb and it fixes the issue!

@daleharvey
Copy link
Member

Fixed in e09ba8a

@sgenoud I have invited you to become a pouchdb maintainer, all PR's still go through code review but this means you can review and merge others code if you wish, Thanks for the contributions :)

@sgenoud
Copy link
Contributor

sgenoud commented Jun 8, 2016

Thanks!

daleharvey added a commit that referenced this issue Jun 8, 2016
@nolanlawson
Copy link
Member

Welcome!! 🎉 💃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug
Projects
None yet
Development

No branches or pull requests

8 participants