Second attempt at db_stress crash-recovery verification #3793

ajkr · 2018-04-30T03:27:09Z

Original commit: a4fb1f8
Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22d

This PR includes the contents of the original commit plus two bug fixes, which are:

In whitebox crash test, only set --expected_values_path for db_stress runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each db_stress run, so we cannot maintain expected state across db_stress runs.
Made Exists() return true for UNKNOWN_SENTINEL values. I previously had an assert in Exists() that value was not UNKNOWN_SENTINEL. But it is possible for post-crash-recovery expected values to be UNKNOWN_SENTINEL (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a SingleDelete deletes no data. But if we had returned false, the effect would be calling SingleDelete on a key with multiple older versions, which is not supported.

Test Plan:

$ python -u tools/db_crashtest.py --simple whitebox --random_kill_odd 888887

Summary: Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values. In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`. For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class. `db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup. Closes facebook#3629 Differential Revision: D7463144 Pulled By: ajkr fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b

ajkr · 2018-04-30T03:29:20Z

btw, this PR is split into two commits. The first commit is exactly a4fb1f8, and the second is the two bug fixes mentioned in the description.

facebook-github-bot

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

anand1976

Good catch!

facebook-github-bot

@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajkr added 2 commits April 27, 2018 18:13

fix whitebox and unknown sentinel

2baad82

facebook-github-bot added the CLA Signed label Apr 30, 2018

ajkr requested a review from anand1976 April 30, 2018 03:29

facebook-github-bot reviewed Apr 30, 2018

View reviewed changes

anand1976 approved these changes Apr 30, 2018

View reviewed changes

facebook-github-bot reviewed Apr 30, 2018

View reviewed changes

facebook-github-bot closed this in 46152d5 Apr 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second attempt at db_stress crash-recovery verification #3793

Second attempt at db_stress crash-recovery verification #3793

ajkr commented Apr 30, 2018 •

edited

Loading

ajkr commented Apr 30, 2018

facebook-github-bot left a comment

anand1976 left a comment

facebook-github-bot left a comment

Second attempt at db_stress crash-recovery verification #3793

Second attempt at db_stress crash-recovery verification #3793

Conversation

ajkr commented Apr 30, 2018 • edited Loading

ajkr commented Apr 30, 2018

facebook-github-bot left a comment

Choose a reason for hiding this comment

anand1976 left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

ajkr commented Apr 30, 2018 •

edited

Loading