Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second attempt at db_stress crash-recovery verification #3793

Closed
wants to merge 2 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Apr 30, 2018

  • Original commit: a4fb1f8
  • Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22d

This PR includes the contents of the original commit plus two bug fixes, which are:

  • In whitebox crash test, only set --expected_values_path for db_stress runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each db_stress run, so we cannot maintain expected state across db_stress runs.
  • Made Exists() return true for UNKNOWN_SENTINEL values. I previously had an assert in Exists() that value was not UNKNOWN_SENTINEL. But it is possible for post-crash-recovery expected values to be UNKNOWN_SENTINEL (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a SingleDelete deletes no data. But if we had returned false, the effect would be calling SingleDelete on a key with multiple older versions, which is not supported.

Test Plan:

$ python -u tools/db_crashtest.py --simple whitebox --random_kill_odd 888887

Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes facebook#3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
@ajkr
Copy link
Contributor Author

ajkr commented Apr 30, 2018

btw, this PR is split into two commits. The first commit is exactly a4fb1f8, and the second is the two bug fixes mentioned in the description.

@ajkr ajkr requested a review from anand1976 April 30, 2018 03:29
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants