-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing LevelDB corruption. #203
Comments
Comment #1 originally posted by gavinandresen on 2013-08-12T06:16:12.000Z: We're seeing very similar corruption reported, running Bitcoin on OSX: bitcoin/bitcoin#2770 |
Comment #2 originally posted by sudosurootdev on 2013-08-12T07:58:35.000Z: This issue is affecting a ton of the crypto currency clients... bitcoin, litecoin, novacoin, worldcoin, on and on and on... all Level DB errors and it is for me when I shut the client down and start back up. It is so annoying because I have to delete the DB files and re download the whole block chain. Also, it is not just OSX I saw at least one person say Windows XP and I am on Ubuntu 13.04 DESKTOP ( WITH WINDOWWS TOO) and on Ubuntu SERVER... |
Comment #3 originally posted by jonas.schnelli on 2013-08-13T06:22:17.000Z: I would recommend to set the priority of this defect to -->"Priority-High"<--. People getting unusable/destroyed level-db's because of this issue. |
Comment #4 originally posted by mh.in.england on 2013-08-15T08:06:46.000Z: The issue on OS X may be that fsync apparently doesn't tell the hard disk to flush to the platters. There's a separate magic incantation for that. |
Comment #5 originally posted by dana.powers on 2013-08-15T17:45:26.000Z: I think it would be very helpful for bug reports on corruption to include version specifics for both OS and filesystem. This issue is probably related to getting writes flushed to disk properly and the steps necessary to do that can be dependent on both the OS and the filesystem. Leveldb is likely tuned very well for the linux stack used at Google, but for other stacks we may need to tweak the use of fsync/fdatasync etc -- I think this is what port/port_posix.h is intended for. On Mac OS X for most filesystems, for example, it will probably require using a fcntl F_FULLFSYNC, instead of a simply fsync(), in order to guarantee writes get to non-volatile storage before returning. Other OS/fs pairs may require other tweaks. Unfortunately that may significantly degrade performance as F_FULLFSYNC will force all buffers to write, including those unrelated to leveldb (i.e., I don't believe it is file-specific). Patch from my local git repo is attached. |
Comment #6 originally posted by jtolds on 2013-08-15T18:16:15.000Z: Come to think of it, the originally reported corruption in this ticket was on an OS X system as well. |
Comment #7 originally posted by jtolds on 2013-08-15T18:21:11.000Z: oops, i was just informed that it was actually a linux VM on top of OS X. i'm sure the VM stack called the appropriate f_fullfsync, but i don't know for certain, and i don't know the specific vm used at this point. :( sorry guys. |
Original issue 197 created by jtolds on 2013-07-29T15:48:54.000Z:
Like issue # 196, we recently decided to enable paranoid mode to see how good LevelDB was actually doing wrt corruption and data integrity.
We found this wacky case of corruption and can't explain it. It appears as if two threads raced on adding a record to the log file: one with a short record, and one with a long record. The short record wrote, the long record wrote, the sort record updated pointer, then the long record updated pointer. It ended up looking like some random bytes were inserted, but the rest of the records lined up on block boundaries perfectly. When loading it sees the set of zeros (how coincidental) and jumps to the next block, which, fortunately, was an end record type, and so complained under paranoid mode. The scary part is that if the record at the beginning of the next block was a full type, then it would be silent data loss, even under paranoid mode.
All of the hex dumps are sequential bytes in the file, partitioned into headers, data, and the strange data in the middle of the log.
I have no idea how this happened or how to fix it.
The text was updated successfully, but these errors were encountered: