-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revs_limit hides, but does not actually delete, old document data #4372
Comments
Definitely a bug; sounds like when we trim 1000+ revisions, we don't actually remove the revisions; we just update the metadata. |
@nolanlawson thanks for the feedback - let me know how I can help... is this independent from the underlying database system, or is this linked to LevelDB specifically? I can take a look if you point me in the right direction in the source tree... |
Yep, so ideally this would need fixes in all three adapters and a new test (to guarantee that the fix worked). For LevelDB you would be looking at the bulkDocs implementation and what happens when metadata gets trimmed, which currently happens in shared utility functions here and here. Note that in one case the code path applies for So yeah, it is very very subtle, but if you have the inclination to attack the bug then I encourage you to do so. :) You can verify that your fix works by writing a test that modifies a document >1000 times and then verifying that you can no longer fetch the oldest revision via a |
Hmm, not an ideal candidate for a first PR on PouchDB, I see :) Not sure I'll be able to help in a timely manner here, unfortunately - though it's currently kind of a blocker in my project... Well, I'll see what I can do but don't let me hold you back if you want to tackle it first ;-) |
As a temporary workaround, you can replicate from |
Good point. I am doing a few more tests - if I do 50'000 sequential updates to a simple { ts: timestamp} document, the levelDB database size can go up to 50MB which is a lot for such a small doc, but actually the size goes up and down all the time during the loop, from as low as 9M to at high as 50M, in just a couple of seconds. I'm a bit surprised... Another issue: when running the same test program multiple times, each time the test is restarted the database size increases slightly, and ends up taking hundreds of megabytes even with auto compaction... The test is a simple:
Also, doing a "db.compact()" after each put seems to make matters worse, the database grew up to 500MB in a couple of minutes on this test. |
Interesting. It might have something to do with the internals of LevelDB, which I'm not too familiar with. |
Just out of interest, does this plugin suffer from the same issues? |
No, delta-pouch wouldn't, because every document only ever has one revision. |
By the way: PouchDB 5.1.0 also suffers from this - I have the same issue on my software on this latest version. It also looks like in some instances, some documents disappear after many updates - this is on code that never deletes a document but updates the docs every 30 seconds... |
I'd love to get an update on this issue: is there something I don't understand with auto compaction or non-leaf revisions that explains this behaviour, or is this really a bug? This is kind of a show stopper for any NodeJS app that relies on PouchDB for long term operations, especially since I am now getting corruption issues on the same code after a couple of days or running... |
If you auto-compact, then that should actually fix your problem, because the old revisions will be deleted every time you make an update. The bug described in this issue is that, even if you have autocompact disabled, PouchDB should "compact" those >1000 old revisions, but it's not doing that. |
I tried with both autocompact and without, the problem is pretty much the On Sat, Nov 21, 2015, 13:10 Nolan Lawson notifications@github.com wrote:
|
I had a look at what is not pruned at 1000 documents even with I will spend a bit more time next week to make sure this is what we want to do... |
We only keep the revision - sequence map for revisions that have a document in the database.
We only keep the revision - sequence map for revisions that have a document in the database.
When a document has more revisions than the revlimit the revisions are now not only stemm from the revision tree but also removed from the database.
Added a simple test that checks that old revisions are not directly accessible anymore. The test revealed a bug in the websql implementation that was fixed. Also, indentation
We only keep the revision - sequence map for revisions that have a document in the database.
Fixed in 4cccf61 |
Please update npm version(add tag |
We do releases around every month, so this will be released in around 2 weeks time, cheers |
I've created PouchDB (5.3.2) with the option not to create revisions:
When I inspect indexedDB in chrome it looks like revisions are stored there. I'd like to keep size of db small as it is used in cordova project, no sync with CouchDB is needed. http://stackoverflow.com/questions/37182802/pouchdb-growing-with-revisions-even-when-revs-limit-1 |
@nolanlawson @daleharvey It seems this issue is still happening (tested on 5.3.2 and nightly). |
Yep, I believe I saw it reported on StackOverflow as well. |
@nolanlawson Is there a previous version where this issue is fixed? I'm using this in a Cordova app and the data just keeps on growing and growing which is not very good for a mobile app. |
I don't actually know if this was ever fixed, to be honest. It looks like 4cccf61 only fixes it for the LevelDB adapter. |
Thanks. Is there anyway we can delete previous revisions manually? Via SQL query perhaps? |
I don't believe so. If this is SQLite/WebSQL, you could try doing a |
4cccf61 was fixing something different. The rev_map (which keeps the list of all the revisions of a document) was not pruned. But the documents were removed at compaction – unless I have missed something (my manual tests were done in leveldb). But not much has changed since my PR, so either this other commit 14449b7 has broken something, there is something rotten with my PR or it was there all along and I missed it 😉 . Note that the fix was also implemented for idb 4cccf61#diff-c6e3e4d1d80300a313a8c584a7ad5c8bR238 and websql 4cccf61#diff-c0839fc9da99b43081b6ae6a04f6beefR223) - but the code was simpler. |
oh wow, so I made a test case @ http://paste.pouchdb.com/paste/fob5op/, with As a workaround it looks like everything is fine with |
auto_compaction on its own does the right thing, revs_limit: 5 + auto_compaction does the right thing, but |
interesting! With the help of your test case, break points and some head scratching I found the source of the bug (not sure how to fix it though - it will need some more head scratching). In this line we use the metadata to decide which data to auto compact. But, at this stage, if we have |
Ah yeh, here is the problem, 4cccf61#diff-c6e3e4d1d80300a313a8c584a7ad5c8bR236 I thought we would be compacting twice, but we arent we are doing one or the other.
should be unconditional and done before |
So the hard part about this is testing :( its specifically not possible to test via integration tests / the pouch api, it seems like the test needs adapter specific implementations which we have thus far avoided |
Thanks for info. Using only revs_limit=1 solved my problem. |
The only way I could see is to ask each adapter to provide a size indice (that you would compute from the number of documents and some random documents query). This might be of some interest for application as well as for tests, but I feel like I am stretching it. I have been working on a simple PR and wanted to do some manual testing - do you have a simple way you do those? |
For manual testing I have just been opening the web inspector in chrome and checking to see if the |
I mean generally with the test setup, do you have a trick to serve the library to test in the browser? |
oh yeh |
I have done manual testing on websql, idb and leveldb and it fixes the issue! |
Thanks! |
Welcome!! 🎉 💃 |
I am using PouchDB for a data logging application, and one of my databases holds a single 'status' document. This document is updated up to twice per minute, so new revisions are generated fairly rapidly.
I have noticed that even when using compaction - either automatic or explicit -, the storage requirements for this document get out of hand very quick: for a 100 byte JSON structure, I end up with several hundred megabytes of "*.ldb" files in the LevelDB storage directory. The server running for a bit more than a week now uses more than 1GB for the status document...
My expectation was that once I reached the hardcoded 1000 revisions limit, the space required for this status document would be large but stop increasing, but that is not the case.
When fetching all revisions of the 'status' document, I correctly only get the last 1000 revisions, all of them "missing" but the latest, since I use compaction. But exploring the ldb files, I can find references to every single revision ever created, which I suppose explains why my storage increases continuously.
Is this the expected behavior, or is this a bug?
PouchDB version is the latest as of 2015.09.28
OS is Linux
The text was updated successfully, but these errors were encountered: