-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify write batch checksum before WAL #10114
Conversation
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It would be nice if there's a test for a write group containing two or more batches, with corruption happening before the merge.
@@ -512,15 +512,18 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options, | |||
} | |||
PERF_TIMER_START(write_pre_and_post_process_time); | |||
|
|||
if (!io_s.ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the error handling in this file look good, and could perhaps go even further by moving the IOStatusCheck()
next to the point io_s
is assigned, and reducing the scope of io_s
. My understanding is these changes aren't strictly needed for this PR. LMK if this is incorrect. It's fine to include them here either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the error handling of io_s
in this file since I was worried about the case when WriteToWAL returns corruption and w.CallbackFailed()
is true: either there is an assert for io_s.okay()
or there is no checking for io_s
in this case before the change in this PR. I'm not familiar with the writer callback, whether the error handling change need to be included in this PR depends on if the above scenario is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the longer I look at the existing code, the more confusing it becomes. It might be because FinalStatus()
returns any non-callback failure first. However, if the callback failed, then the callback failure is the first failure that happened so should be returned in FinalStatus()
. So when WriteToWAL()
and leader callback both failed, we should only record the callback failure.
Why we even proceed to WriteToWAL()
after callback failure considering
Lines 18 to 20 in ad135f3
// Will be called while on the write thread before the write executes. If | |
// this function returns a non-OK status, the write will be aborted and this | |
// status will be returned to the caller of DB::Write(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyways, I don't want to derail this. This is setting a DB wide error when WriteToWAL() fails and that's good for me. Worst case looks like a WriteToWAL()
error can be returned when the actual first failure was in the leader callback, which is fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we even proceed to WriteToWAL() after callback failure considering
I'm guessing that for a write group, the writes whose callbacks return failure will be ignored in the following WAL/memtable operations, but we still proceed to WriteToWAL()
with the writes in the group whose callbacks were successful.
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
2af6671
to
63844d2
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
63844d2
to
2cd712e
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
3 similar comments
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
0b823b7
to
f90cf0a
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Thanks for the suggestion! I added some tests with write group of two batches with corruption happening before the merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
fad96d5
to
3e3ad84
Compare
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…n is turned on (#10201) Summary: This bug was discovered after write batch checksum verification before WAL is added (#10114) and stress test with write batch checksum protection is turned on (#10037). In this [line](https://github.com/facebook/rocksdb/blob/d5d8920f2cfd06d1803b0976acbe8b564b88b6b1/db/write_batch.cc#L2887), the number of checksums may not be consistent with `batch->Count()`. This PR fixes this issue. Pull Request resolved: #10201 Test Plan: ``` ./db_stress --batch_protection_bytes_per_key=8 --destroy_db_initially=1 --max_key=100000 --use_txn=1 ``` Reviewed By: ajkr Differential Revision: D37260799 Pulled By: cbi42 fbshipit-source-id: ff8dce7dcce295d689333bc9d892d17a843bf0ea
Summary:
Context: WriteBatch can have key-value checksums when it was created
with protection_bytes_per_key > 0
.This PR added checksum verification for write batches before they are written to WAL.
Test plan:
make check -j32
./db_bench --benchmarks=fillrandom[-X20] -db=/dev/shm/test_rocksdb -write_batch_protection_bytes_per_key=8
fillrandom [AVG 20 runs] : 198875 (± 3006) ops/sec; 22.0 (± 0.3) MB/sec
fillrandom [AVG 20 runs] : 196487 (± 2279) ops/sec; 21.7 (± 0.3) MB/sec
Mean regressed about 1% (198875 -> 196487 ops/sec).