Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug: After a power-loss, LevelDB does not discard partially-flushed transactions #251

Open
cmumford opened this issue Sep 9, 2014 · 5 comments
Labels

Comments

@cmumford
Copy link
Contributor

cmumford commented Sep 9, 2014

Original issue 245 created by madthanu on 2014-07-31T00:30:01.000Z:

This (possible) bug is about what happens during a power loss. The bug also depends on how the database is opened after rebooting: we run RepairDB (with paranoid checksum switched on) before trying to retrieve values.

The bug: When a Put() is issued, LevelDB appends to the log file; the appends result from write() calls issued by the internal EmitPhysicalRecord() function. Consider that a power loss happens, and the last append corresponding to the Put() results in the appended portion of the file containing garbage or zeros (as possible with writeback journaling file systems). If we then try to retrieve values from the database after rebooting from the power loss (and after executing RepairDB), we get back a corrupted value corresponding to the Put().

Similar behavior occurs if the power loss results in a situation as follows: one of the appends is buffered in-memory, but other appends following it (to the same file) are flushed to the disk by the time of the power loss. The region corresponding to the buffered append can be filled with zeros or garbage; this situation does not typically happen, but it might (with a writeback file system) if the buffer cache and the disk cache behaves atypically.

If RepairDB is not used, the result of the bug is different than described above. Also, if RepairDB is not used, there are many power-loss-related situations in which LevelDB seems to behave badly. Please let me know if it is not necessary (or not a good practice) to use RepairDB after every power loss.

What steps will reproduce the problem?

  1. Use a separate partition with the ext3 file system under the writeback mode (mount -o data=writeback), for the database. No other background process should be writing to the file system; this lets us easily simulate the timing interleaving necessary for the bug to happen.
  2. The EmitPhysicalRecord function in log_writer.cc has a Flush() call on the log file (line 94 in version 1.15). Just before that call, add an fdatasync() to the log file. This is again for the timing interleaving.
  3. Insert a 45000 characters-long key-value pair, using an asynchronous Put(), and then do an infinite loop.
  4. Wait for 5 seconds, and pull off the power chord (the power chord should be pulled back between the 5th and the 25th second).
  5. After rebooting the machine, re-open the database with paranoid checksums, run RepairDB, and try reading the values.

What is the expected output? What do you see instead?
The inserted value, or an empty database, is expected. A corrupted value is seen.

What version of the product are you using? On what operating system?
LevelDB 1.15, on Ubuntu 12.04.

Please provide any additional information below.

  1. I have not tried to obtain a minimal test-case (steps to reproduce the problem), let me know if a smaller test-case will help.
  2. I found this bug by using a tool that lets us simulate various power-loss scenarios; let me know if the tool will be useful.
  3. In general, it seems as if LevelDB tries to retrieve Put()-s that were only partially persisted to the log before the power loss, instead of discarding them after reboot. I’m guessing this also based on the behavior if RepairDB is not used after the reboot.
@cmumford
Copy link
Contributor Author

cmumford commented Sep 9, 2014

Comment #1 originally posted by madthanu on 2014-08-08T23:09:51.000Z:

After further testing, this possible bug also seems to exist in LevelDB-1.17 (not solved by https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#). LevelDB-1.17 does not, however, require RepairDB to be called for simple process-crash scenarios.

Also, if this bug is triggered atop LevelDB-1.17 without RepairDB, instead of getting back a corrupted value, an error is returned (assuming paranoid checksums is used).

@cmumford cmumford self-assigned this Sep 9, 2014
@cmumford
Copy link
Contributor Author

cmumford commented Sep 9, 2014

Comment #2 originally posted by madthanu on 2014-08-08T23:38:17.000Z:

After further testing, this possible bug also seems to exist in LevelDB-1.17 (not solved by https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#). LevelDB-1.17 does not, however, require RepairDB to be called for simple process-crash scenarios.

Also, if this bug is triggered atop LevelDB-1.17 without RepairDB, instead of getting back a corrupted value, an error is returned (assuming paranoid checksums is used).

The same problems are also exhibited during the append to MANIFEST files (again done by EmitPhysicalRecord), during compaction.

@xiw
Copy link

xiw commented Aug 11, 2015

The code comment ("do checksumming") doesn't seem to match the code ("do not checksum") in RepairDB.

https://github.com/google/leveldb/blob/master/db/repair.cc#L189

// We intentionally make log::Reader do checksumming so that
// corruptions cause entire commits to be skipped instead of
// propagating bad information (like overly large sequence
// numbers).
log::Reader reader(lfile, &reporter, false/*do not checksum*/,
                   0/*initial_offset*/);

@Diapolo
Copy link

Diapolo commented Oct 21, 2015

Was this fixed with 1.18 or is it sill an open issue?

@cmumford
Copy link
Contributor Author

@Diapolo: If this was an issue, then it likely still is an issue - not fixed by 1.18. I have not attempted to write a reproduction for this bug.

@cmumford cmumford removed their assignment Jan 16, 2016
maochongxin pushed a commit to maochongxin/leveldb that referenced this issue Jul 21, 2022
Add policy CMP0056, which honors the link flags in try_compile and
try_run. This allows for building against non-system libc++ by providing
the correct -L and rpath options in a containing project.

For example:

    set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -L ${LLVM_ROOT}/lib -l c++ -l c++abi")
    set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,-rpath,${LLVM_ROOT}/lib")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants