Possible bug: After a power-loss, LevelDB does not discard partially-flushed transactions #251

cmumford · 2014-09-09T17:42:44Z

Original issue 245 created by madthanu on 2014-07-31T00:30:01.000Z:

This (possible) bug is about what happens during a power loss. The bug also depends on how the database is opened after rebooting: we run RepairDB (with paranoid checksum switched on) before trying to retrieve values.

The bug: When a Put() is issued, LevelDB appends to the log file; the appends result from write() calls issued by the internal EmitPhysicalRecord() function. Consider that a power loss happens, and the last append corresponding to the Put() results in the appended portion of the file containing garbage or zeros (as possible with writeback journaling file systems). If we then try to retrieve values from the database after rebooting from the power loss (and after executing RepairDB), we get back a corrupted value corresponding to the Put().

Similar behavior occurs if the power loss results in a situation as follows: one of the appends is buffered in-memory, but other appends following it (to the same file) are flushed to the disk by the time of the power loss. The region corresponding to the buffered append can be filled with zeros or garbage; this situation does not typically happen, but it might (with a writeback file system) if the buffer cache and the disk cache behaves atypically.

If RepairDB is not used, the result of the bug is different than described above. Also, if RepairDB is not used, there are many power-loss-related situations in which LevelDB seems to behave badly. Please let me know if it is not necessary (or not a good practice) to use RepairDB after every power loss.

What steps will reproduce the problem?

Use a separate partition with the ext3 file system under the writeback mode (mount -o data=writeback), for the database. No other background process should be writing to the file system; this lets us easily simulate the timing interleaving necessary for the bug to happen.
The EmitPhysicalRecord function in log_writer.cc has a Flush() call on the log file (line 94 in version 1.15). Just before that call, add an fdatasync() to the log file. This is again for the timing interleaving.
Insert a 45000 characters-long key-value pair, using an asynchronous Put(), and then do an infinite loop.
Wait for 5 seconds, and pull off the power chord (the power chord should be pulled back between the 5th and the 25th second).
After rebooting the machine, re-open the database with paranoid checksums, run RepairDB, and try reading the values.

What is the expected output? What do you see instead?
The inserted value, or an empty database, is expected. A corrupted value is seen.

What version of the product are you using? On what operating system?
LevelDB 1.15, on Ubuntu 12.04.

Please provide any additional information below.

I have not tried to obtain a minimal test-case (steps to reproduce the problem), let me know if a smaller test-case will help.
I found this bug by using a tool that lets us simulate various power-loss scenarios; let me know if the tool will be useful.
In general, it seems as if LevelDB tries to retrieve Put()-s that were only partially persisted to the log before the power loss, instead of discarding them after reboot. I’m guessing this also based on the behavior if RepairDB is not used after the reboot.

cmumford · 2014-09-09T17:42:44Z

Comment #1 originally posted by madthanu on 2014-08-08T23:09:51.000Z:

After further testing, this possible bug also seems to exist in LevelDB-1.17 (not solved by https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#). LevelDB-1.17 does not, however, require RepairDB to be called for simple process-crash scenarios.

Also, if this bug is triggered atop LevelDB-1.17 without RepairDB, instead of getting back a corrupted value, an error is returned (assuming paranoid checksums is used).

cmumford · 2014-09-09T17:42:46Z

Comment #2 originally posted by madthanu on 2014-08-08T23:38:17.000Z:

After further testing, this possible bug also seems to exist in LevelDB-1.17 (not solved by https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#). LevelDB-1.17 does not, however, require RepairDB to be called for simple process-crash scenarios.

Also, if this bug is triggered atop LevelDB-1.17 without RepairDB, instead of getting back a corrupted value, an error is returned (assuming paranoid checksums is used).

The same problems are also exhibited during the append to MANIFEST files (again done by EmitPhysicalRecord), during compaction.

xiw · 2015-08-11T07:07:43Z

The code comment ("do checksumming") doesn't seem to match the code ("do not checksum") in RepairDB.

https://github.com/google/leveldb/blob/master/db/repair.cc#L189

// We intentionally make log::Reader do checksumming so that
// corruptions cause entire commits to be skipped instead of
// propagating bad information (like overly large sequence
// numbers).
log::Reader reader(lfile, &reporter, false/*do not checksum*/,
                   0/*initial_offset*/);

Diapolo · 2015-10-21T20:32:26Z

Was this fixed with 1.18 or is it sill an open issue?

cmumford · 2016-01-15T18:00:57Z

@Diapolo: If this was an issue, then it likely still is an issue - not fixed by 1.18. I have not attempted to write a reproduction for this bug.

Add policy CMP0056, which honors the link flags in try_compile and try_run. This allows for building against non-system libc++ by providing the correct -L and rpath options in a containing project. For example: set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -L ${LLVM_ROOT}/lib -l c++ -l c++abi") set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,-rpath,${LLVM_ROOT}/lib")

cmumford self-assigned this Sep 9, 2014

cmumford added Type-Defect labels Sep 9, 2014

cmumford added bug and removed Priority-Medium labels Jan 15, 2016

cmumford removed their assignment Jan 16, 2016

mj-ramos mentioned this issue Sep 21, 2023

PebblesDB does not discard partially-flushed values utsaslab/pebblesdb#28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug: After a power-loss, LevelDB does not discard partially-flushed transactions #251

Possible bug: After a power-loss, LevelDB does not discard partially-flushed transactions #251

cmumford commented Sep 9, 2014

cmumford commented Sep 9, 2014

cmumford commented Sep 9, 2014

xiw commented Aug 11, 2015

Diapolo commented Oct 21, 2015

cmumford commented Jan 15, 2016

Possible bug: After a power-loss, LevelDB does not discard partially-flushed transactions #251

Possible bug: After a power-loss, LevelDB does not discard partially-flushed transactions #251

Comments

cmumford commented Sep 9, 2014

cmumford commented Sep 9, 2014

cmumford commented Sep 9, 2014

xiw commented Aug 11, 2015

Diapolo commented Oct 21, 2015

cmumford commented Jan 15, 2016