Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataloss possible when system crashes during loadDatabase #92

Closed
szwacz opened this issue Oct 18, 2013 · 27 comments
Closed

Dataloss possible when system crashes during loadDatabase #92

szwacz opened this issue Oct 18, 2013 · 27 comments

Comments

@szwacz
Copy link
Contributor

szwacz commented Oct 18, 2013

Hello @louischatriot

loadDatabase while running is rewriting the whole database file. This is quite fragile moment. If the system crashes at this time, data will be lost.

I don't know how probable is this issue, but I have run some dummy test...

var db = new Datastore({ filename: 'test.db' }); // file was 13MB

db.loadDatabase();

setTimeout(function () {
    process.exit(1);
}, 50);

...and data indeed was lost.

Possible fix would be ofcourse something like:
For file test.db

  1. Change file name to test.db.bac
  2. Do loadDatabase and save test.db
  3. Remove test.db.bac
    And while loading data first time we have to look first for test.db.bac

What do you think? Is it a real problem or too edge case to bother? Heh, life taught me there is no such thing like too edge case ;)

@louischatriot
Copy link
Owner

Hello,

This is actually a very good point, I hadn't considered this case. Your solution is also the first idea I had, I need a bit more thought on this but that's definitely something I want to do.

Thanks !

@spolu
Copy link

spolu commented Oct 20, 2013

This would break hard links. But it prevents copying the whole file. This is how emacs behaves more or less so I guess it's acceptable.

@szwacz
Copy link
Contributor Author

szwacz commented Nov 16, 2013

Hah, this just happened to me in real life.
I mistakenly clicked on launching icon of my node-webkit application, thought "I dont need you now", and immediatelly hit close button. That was enough :) Database file is 0KB now.
I can patch it in my app not letting to kill the node-webkit process until nedb is in safe state, but general, bullet proof solution would be much better ofcourse.

But this proves this is real issue, not just theoretical speculation.

@spolu
Copy link

spolu commented Nov 18, 2013

+1 to implement this

@louischatriot
Copy link
Owner

I've been planning to do it for the last month but your last message gave me the nudge :) I'm on it.

@louischatriot
Copy link
Owner

Hello @szwacz @spolu ,

I finally had the time to code this :) Since this is a crucial part of NeDB and you are the two would raised this issue, it would be great if you two could take a look at the code and tell me if you think there is a problem! The branch is no-dataloss-during-loaddatabase

The two parts of the failsafe loadDatabase are thoroughly tested:

  • Persistence.persistCachedDatabase which actually persists the database, and uses a temporary datafile to ensure no data is lost
  • Persistence.ensureDatafileIntegrity which is the first step of a loadDatabase and makes sure the data is in the datafile, even if a crash happened in the loadDatabase before.

There is also a test that simulates a crash during a loadDatabase. It is timed for my machine though, if you want to go as deep as this you'll need to find the proper timing for your machines.

I'm planning on merging this in master and releasing the new version on Thursday, if you don't have time to check this no worries, it is well tested.

Cheers,
Louis

@szwacz
Copy link
Contributor Author

szwacz commented Nov 24, 2013

Thanks a lot. I will look into it tomorrow.

@spolu
Copy link

spolu commented Nov 25, 2013

Looks good!

@spolu
Copy link

spolu commented Nov 25, 2013

Ideally, it would be interesting to dig into vim/emacs startegy to see if they lock the files while reading/moving/writing them. I guess that's not a "necessity" as you explicitely say that nedb does not support parallel accesses to a DB file.

In any case, I'm looking forward to seeing this merged in master!

@szwacz
Copy link
Contributor Author

szwacz commented Nov 25, 2013

I have few thought about testing suite.

  1. You are testing agains file exists and file empty. What I guess is ok because nedb writes whole file in one chunk, but...
    a) I don't have knowledge how file systems work, but is it for sure not possible to have non-empty file with part of data inside?
    b) If in the futute you may migrate into stream writing, then for sure this will be the case.
    But I imagine it will require some marker-lines in file to be sure about its integity, so complications and backwards compatibility issues.
  2. Test If system crashes during a loadDatabase, the former version is not lost is pain in the butt :) All edge cases are simulated in previous tests, if there is a need to test the main db class I would do that the same way - by manipulating (simulating) malformed data on disk, not by trying to actually crash the process.
    But this is yours test suite :)

@spolu
Copy link

spolu commented Nov 25, 2013

1- Perfectly possible indeed! You're right.
@louischatriot An easy fix to that: move f to f~ and then write in f.tmp and finally move f.tmp to f.

@szwacz
Copy link
Contributor Author

szwacz commented Nov 25, 2013

@spolu it solves nothing I think. If you have malformed files with data inside there is no other way to tell which one to use as by checking which is the newest and integral at the same time. In nedb case this is actually hard, because data is always appended to the end of file, so you can't tell which should be the trully last line of data just by looking at it.

@spolu
Copy link

spolu commented Nov 25, 2013

@szwacz in that very case it does solve the problem. If the file is incomplete it means that the final rename hasn't happened and f will not exist. In that case the f~ file should be recovered.

The fact that the rename is atomic makes it work as a lock and you can't end up with a malformed f file. You can have a malformed f.tmp file, but in that case f does not exist and f~ should be recovered.

@szwacz
Copy link
Contributor Author

szwacz commented Nov 25, 2013

Ok, now I understand. You want to:

Saving:

  1. rename main to old
  2. write tmp
  3. rename tmp to main
  4. delete old

Initial reading:

  1. look for main
  2. if main doesn't exist look for old (never bother with tmp)
  3. if both doesn't exist start new db

So the additional rename works as ensure step that file is complete. It indeed might work.

@spolu
Copy link

spolu commented Nov 25, 2013

That's it!

@FlorianBruniaux
Copy link

Hi,

Same issue here :

var Datastore = require('nedb'), db = new Datastore('database.db'); //database = 5ko
db.loadDatabase(); //database = 0ko

@szwacz
Copy link
Contributor Author

szwacz commented Nov 25, 2013

@FlorianBruniaux I'm pretty certain snippet you provided is not enough to cause any problem. If you have similar issues please tell more about how it happens.

@FlorianBruniaux
Copy link

Hi, here is my whole snippet :

var requirejs = require('requirejs'),
     Datastore = require('nedb'), db = new Datastore('database.db');

requirejs.config({
    nodeRequire: require
});

requirejs([

], function () {

    exports.colisages = function(req, res) {

        db.loadDatabase();

        db.find({}, function (err, docs) {
            res.send(docs);
            console.log(docs);
        });

    };

});

@szwacz
Copy link
Contributor Author

szwacz commented Nov 25, 2013

I think it have nothing to do with this topic :). db.loadDatabase() is not synchronous function, please see the documentation.

@FlorianBruniaux
Copy link

Yeah, exact. Sorry'

But is it normal that this single line :

Datastore = require('nedb'), db = new Datastore({ filename: 'database.db', autoload: true });

delete the whole data of database.db ? (Maybe I missed something .. ^^' )

@szwacz
Copy link
Contributor Author

szwacz commented Nov 25, 2013

@FlorianBruniaux For sure this is not normal, and looks like so huge bug that I can't believe it is the case. Please investigate your problem a little more and if you still think you spoted a bug, report it in separate issue.

@FlorianBruniaux
Copy link

The problem came from the fact I used json data from an other test (with an other db solution). I had modified the id of my object

"id":"0001" 

instead of

"_id":"mjoNZvETbK3mBVeS"

Okey. I'm an idiot 👍 !

Pb solved with a bit of attention ^^'

@louischatriot
Copy link
Owner

@spolu @szwacz The scheme you suggested is much better than what I had done so I changed it. I will merge this evening (its 1pm in France so in about 7 hours), feel free to comment before then if you want.

Thanks for the help!

@szwacz
Copy link
Contributor Author

szwacz commented Nov 28, 2013

No further comments. Thanks for awesome work :)

@louischatriot
Copy link
Owner

Thanks :) It's now published, as v0.8.13

@spolu
Copy link

spolu commented Nov 29, 2013

Awesome

@louischatriot
Copy link
Owner

This implementation was in fact not correct and prone to dataloss in case of OS (not process) crash, if kernel buffer cache was not flushed to disk.

The new implementation is very much inspired from Redis AOF persistence, see this very interesting article by antirez: http://oldblog.antirez.com/post/redis-persistence-demystified.html

Implementation here if you want to look at it: https://github.com/louischatriot/nedb/blob/master/lib/storage.js

You'll notice it doesn't need two temp files to ensure data integrity, but hwat it does need and were not present are fsync calls after data change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants