New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data loss with --no-direct-io #1703
Comments
As I understand, our syncing scheme doesn't rely on direct i/o at all. @srh: Can you confirm that? I wonder if the table could appear empty if the file for the namespace in the rethinkdb data directory got lost? |
Could someone ping @srh in person about this when you get the chance? (he tends to read github issues less frequently when he's in the middle of a big project) |
The |
I don't think there is enough actionable data for us here. Is everyone ok with me moving this issue into backlog so we can do further testing when time permits? |
I feel like we should create a new issue for writing automated power-failure tests (maybe by kill -9ing a VM running RethinkDB?). We might be able to reproduce that way. |
The problem could be that we call fdatasync, but we don't call fsync in such a way that makes sure the file's actually present in the directory. Then the file doesn't exist upon startup after the power failure, but the metadata says the table exists, and (perhaps) silently creates the table when it can't find a file for the table. |
It seems we would also have to call |
@mlucy i think that would be a great idea running in the cloud these days you dont know how and when your server will be stopped and if there will be some sort of automatic migration mid processes. Also I would think it would be useful to see how it handles other types of failures like random parts of memeory falling out of sync or something like that simulating kernel panics blah blah blah.... @danielmewes and @srh sounds like you guys have figured out the problem |
@srh: Ok if I take this? |
A fix for the possible cause of this is in code review 1070 by @srh. |
@wojons: The fix will be included in the next release of RethinkDB, whether it is a point release (1.11.2) or a major one (1.12). |
thanks @danielmewes |
The fix has been released in RethinkDB 1.11.2 |
Lost power a few hours after creating a small test table with maybe 5 rows in it. When i restarted the server the table was empty. I MAY have lost 100k records from a different table. Not sure how often fsync is called with --no-direct-io and if it just waits for linux to force call fsync that could be the issue. I will update the ticket when i get back to my workstation with the mount output but should just be default ext4 with normal linuxmint (ubuntu) defaults
The text was updated successfully, but these errors were encountered: