-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second attempt at db_stress crash-recovery verification #3793
Conversation
Summary: Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values. In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`. For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class. `db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup. Closes facebook#3629 Differential Revision: D7463144 Pulled By: ajkr fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
btw, this PR is split into two commits. The first commit is exactly a4fb1f8, and the second is the two bug fixes mentioned in the description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
This PR includes the contents of the original commit plus two bug fixes, which are:
--expected_values_path
fordb_stress
runs in the first half of the crash test's duration. In the second half, a fresh DB is created for eachdb_stress
run, so we cannot maintain expected state acrossdb_stress
runs.Exists()
return true forUNKNOWN_SENTINEL
values. I previously had an assert inExists()
that value was notUNKNOWN_SENTINEL
. But it is possible for post-crash-recovery expected values to beUNKNOWN_SENTINEL
(i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where aSingleDelete
deletes no data. But if we had returned false, the effect would be callingSingleDelete
on a key with multiple older versions, which is not supported.Test Plan:
$ python -u tools/db_crashtest.py --simple whitebox --random_kill_odd 888887