Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify write batch checksum before WAL #10114

Closed
wants to merge 8 commits into from

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented Jun 4, 2022

Summary:
Context: WriteBatch can have key-value checksums when it was created with protection_bytes_per_key > 0.
This PR added checksum verification for write batches before they are written to WAL.

Test plan:

  • Added new unit tests to db_kv_checksum_test.cc: make check -j32
  • benchmark on performance regression: ./db_bench --benchmarks=fillrandom[-X20] -db=/dev/shm/test_rocksdb -write_batch_protection_bytes_per_key=8
    • Pre-PR:
      fillrandom [AVG 20 runs] : 198875 (± 3006) ops/sec; 22.0 (± 0.3) MB/sec
    • Post-PR:
      fillrandom [AVG 20 runs] : 196487 (± 2279) ops/sec; 21.7 (± 0.3) MB/sec
      Mean regressed about 1% (198875 -> 196487 ops/sec).

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42 cbi42 requested a review from ajkr June 4, 2022 05:33
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It would be nice if there's a test for a write group containing two or more batches, with corruption happening before the merge.

@@ -512,15 +512,18 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
}
PERF_TIMER_START(write_pre_and_post_process_time);

if (!io_s.ok()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the error handling in this file look good, and could perhaps go even further by moving the IOStatusCheck() next to the point io_s is assigned, and reducing the scope of io_s. My understanding is these changes aren't strictly needed for this PR. LMK if this is incorrect. It's fine to include them here either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the error handling of io_s in this file since I was worried about the case when WriteToWAL returns corruption and w.CallbackFailed() is true: either there is an assert for io_s.okay() or there is no checking for io_s in this case before the change in this PR. I'm not familiar with the writer callback, whether the error handling change need to be included in this PR depends on if the above scenario is possible.

Copy link
Contributor

@ajkr ajkr Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the longer I look at the existing code, the more confusing it becomes. It might be because FinalStatus() returns any non-callback failure first. However, if the callback failed, then the callback failure is the first failure that happened so should be returned in FinalStatus(). So when WriteToWAL() and leader callback both failed, we should only record the callback failure.

Why we even proceed to WriteToWAL() after callback failure considering

// Will be called while on the write thread before the write executes. If
// this function returns a non-OK status, the write will be aborted and this
// status will be returned to the caller of DB::Write().
is a mystery to me. What happens to callback failures in non-leader writers is also unclear.

Copy link
Contributor

@ajkr ajkr Jun 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways, I don't want to derail this. This is setting a DB wide error when WriteToWAL() fails and that's good for me. Worst case looks like a WriteToWAL() error can be returned when the actual first failure was in the leader callback, which is fine with me.

Copy link
Member Author

@cbi42 cbi42 Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we even proceed to WriteToWAL() after callback failure considering

I'm guessing that for a write group, the writes whose callbacks return failure will be ignored in the following WAL/memtable operations, but we still proceed to WriteToWAL() with the writes in the group whose callbacks were successful.

db/db_impl/db_impl_write.cc Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

3 similar comments
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42 cbi42 requested a review from ajkr June 10, 2022 23:07
@cbi42
Copy link
Member Author

cbi42 commented Jun 11, 2022

LGTM. It would be nice if there's a test for a write group containing two or more batches, with corruption happening before the merge.

Thanks for the suggestion! I added some tests with write group of two batches with corruption happening before the merge.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

db/write_batch.cc Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

db/write_batch.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cbi42 added a commit to cbi42/rocksdb that referenced this pull request Jun 16, 2022
facebook-github-bot pushed a commit that referenced this pull request Jun 17, 2022
Summary:
Update HISTORY.md for #10114: write batch checksum verification before writing to WAL.

Pull Request resolved: #10189

Reviewed By: ajkr

Differential Revision: D37226366

Pulled By: cbi42

fbshipit-source-id: cd2f076961abc35f35783e0f2cc3beda68cdb446
facebook-github-bot pushed a commit that referenced this pull request Jun 18, 2022
…n is turned on (#10201)

Summary:
This bug was discovered after write batch checksum verification before WAL is added (#10114) and stress test with write batch checksum protection is turned on (#10037). In this [line](https://github.com/facebook/rocksdb/blob/d5d8920f2cfd06d1803b0976acbe8b564b88b6b1/db/write_batch.cc#L2887), the number of checksums may not be consistent with `batch->Count()`. This PR fixes this issue.

Pull Request resolved: #10201

Test Plan:
```
./db_stress --batch_protection_bytes_per_key=8 --destroy_db_initially=1 --max_key=100000 --use_txn=1
```

Reviewed By: ajkr

Differential Revision: D37260799

Pulled By: cbi42

fbshipit-source-id: ff8dce7dcce295d689333bc9d892d17a843bf0ea
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants