Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data loss with --no-direct-io #1703

Closed
wojons opened this issue Nov 25, 2013 · 14 comments
Closed

data loss with --no-direct-io #1703

wojons opened this issue Nov 25, 2013 · 14 comments
Assignees
Labels
Milestone

Comments

@wojons
Copy link
Contributor

wojons commented Nov 25, 2013

Lost power a few hours after creating a small test table with maybe 5 rows in it. When i restarted the server the table was empty. I MAY have lost 100k records from a different table. Not sure how often fsync is called with --no-direct-io and if it just waits for linux to force call fsync that could be the issue. I will update the ticket when i get back to my workstation with the mount output but should just be default ext4 with normal linuxmint (ubuntu) defaults

@danielmewes
Copy link
Member

As I understand, our syncing scheme doesn't rely on direct i/o at all. @srh: Can you confirm that?
So --no-direct-io should behave the same in this respect as the default direct i/o.

I wonder if the table could appear empty if the file for the namespace in the rethinkdb data directory got lost?

@coffeemug
Copy link
Contributor

Could someone ping @srh in person about this when you get the chance? (he tends to read github issues less frequently when he's in the middle of a big project)

@srh
Copy link
Contributor

srh commented Nov 26, 2013

The --no-direct-io option is indeed independent of syncing. Syncing writes is affected by hard durability / soft durability options and noreply options. fsync is called in any case that a write to disk happens -- what the hard/soft/noreply options affect is how frequently we write to disk.

@coffeemug
Copy link
Contributor

I don't think there is enough actionable data for us here. Is everyone ok with me moving this issue into backlog so we can do further testing when time permits?

@mlucy
Copy link
Member

mlucy commented Nov 30, 2013

I feel like we should create a new issue for writing automated power-failure tests (maybe by kill -9ing a VM running RethinkDB?). We might be able to reproduce that way.

@srh
Copy link
Contributor

srh commented Dec 3, 2013

The problem could be that we call fdatasync, but we don't call fsync in such a way that makes sure the file's actually present in the directory. Then the file doesn't exist upon startup after the power failure, but the metadata says the table exists, and (perhaps) silently creates the table when it can't find a file for the table.

@danielmewes
Copy link
Member

It seems we would also have to call fsync on the directory in which we create rethinkdb_data in the case of rethinkdb create.

@wojons
Copy link
Contributor Author

wojons commented Dec 4, 2013

@mlucy i think that would be a great idea running in the cloud these days you dont know how and when your server will be stopped and if there will be some sort of automatic migration mid processes. Also I would think it would be useful to see how it handles other types of failures like random parts of memeory falling out of sync or something like that simulating kernel panics blah blah blah....

@danielmewes and @srh sounds like you guys have figured out the problem

@danielmewes
Copy link
Member

@srh: Ok if I take this?

@ghost ghost assigned danielmewes Dec 4, 2013
@danielmewes
Copy link
Member

A fix for the possible cause of this is in code review 1070 by @srh.

@danielmewes
Copy link
Member

The fix has been merged into next as of 5ed5667 and cherry-picked into v1.11.x as of 85ef29d.

@danielmewes
Copy link
Member

@wojons: The fix will be included in the next release of RethinkDB, whether it is a point release (1.11.2) or a major one (1.12).

@wojons
Copy link
Contributor Author

wojons commented Dec 5, 2013

thanks @danielmewes

@AtnNn
Copy link
Member

AtnNn commented Dec 6, 2013

The fix has been released in RethinkDB 1.11.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants