Skip to content

Commit

Permalink
Enable force_consistency_checks by default (#7446)
Browse files Browse the repository at this point in the history
Summary:
This has been running in production on some key workloads, so
we believe it to be safe and extremely low cost. Nevertheless, I've
added code to ensure that "force_consistency_checks" is mentioned in
any corruption reports so that people know how to disable in case of
false positive corruption reports.

Pull Request resolved: #7446

Test Plan:
make check, CI, temporary debug print new message with
./version_builder_test

Reviewed By: ajkr

Differential Revision: D23972101

Pulled By: pdillinger

fbshipit-source-id: 9623e400f3752577c0ecf977e6d0915562cf9968
  • Loading branch information
pdillinger authored and facebook-github-bot committed Sep 30, 2020
1 parent 5f33436 commit ddbc5da
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 14 deletions.
3 changes: 2 additions & 1 deletion HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
* The methods to create and manage EncrypedEnv have been changed. The EncryptionProvider is now passed to NewEncryptedEnv as a shared pointer, rather than a raw pointer. Comparably, the CTREncryptedProvider now takes a shared pointer, rather than a reference, to a BlockCipher. CreateFromString methods have been added to BlockCipher and EncryptionProvider to provide a single API by which different ciphers and providers can be created, respectively.
* The internal classes (CTREncryptionProvider, ROT13BlockCipher, CTRCipherStream) associated with the EncryptedEnv have been moved out of the public API. To create a CTREncryptionProvider, one can either use EncryptionProvider::NewCTRProvider, or EncryptionProvider::CreateFromString("CTR"). To create a new ROT13BlockCipher, one can either use BlockCipher::NewROT13Cipher or BlockCipher::CreateFromString("ROT13").
* The EncryptionProvider::AddCipher method has been added to allow keys to be added to an EncryptionProvider. This API will allow future providers to support multiple cipher keys.
* Add a new option "allow_data_in_errors". When this new option is set by users, it allows users to opt-in to get error messages containing corrupted keys/values. Corrupt keys, values will be logged in the messages, logs, status etc. that will help users with the useful information regarding affected data. By default value of this option is set false to prevent users data to be exposed in the messages so currently, data will be redacted from logs, messages, status by default.
* Add a new option "allow_data_in_errors". When this new option is set by users, it allows users to opt-in to get error messages containing corrupted keys/values. Corrupt keys, values will be logged in the messages, logs, status etc. that will help users with the useful information regarding affected data. By default value of this option is set false to prevent users data to be exposed in the messages so currently, data will be redacted from logs, messages, status by default.
* AdvancedColumnFamilyOptions::force_consistency_checks is now true by default, for more proactive DB corruption detection at virtually no cost (estimated two extra CPU cycles per million on a major production workload). Corruptions reported by these checks now mention "force_consistency_checks" in case a false positive corruption report is suspected and the option needs to be disabled (unlikely). Since existing column families have a saved setting for force_consistency_checks, only new column families will pick up the new default.

### General Improvements
* The settings of the DBOptions and ColumnFamilyOptions are now managed by Configurable objects (see New Features). The same convenience methods to configure these options still exist but the backend implementation has been unified under a common implementation.
Expand Down
33 changes: 25 additions & 8 deletions db/version_builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -270,14 +270,7 @@ class VersionBuilder::Rep {
(*expected_linked_ssts)[blob_file_number].emplace(table_file_number);
}

Status CheckConsistency(VersionStorageInfo* vstorage) {
#ifdef NDEBUG
if (!vstorage->force_consistency_checks()) {
// Dont run consistency checks in release mode except if
// explicitly asked to
return Status::OK();
}
#endif
Status CheckConsistencyDetails(VersionStorageInfo* vstorage) {
// Make sure the files are sorted correctly and that the links between
// table files and blob files are consistent. The latter is checked using
// the following mapping, which is built using the forward links
Expand Down Expand Up @@ -389,6 +382,30 @@ class VersionBuilder::Rep {
return ret_s;
}

Status CheckConsistency(VersionStorageInfo* vstorage) {
// Always run consistency checks in debug build
#ifdef NDEBUG
if (!vstorage->force_consistency_checks()) {
return Status::OK();
}
#endif
Status s = CheckConsistencyDetails(vstorage);
if (s.IsCorruption() && s.getState()) {
// Make it clear the error is due to force_consistency_checks = 1 or
// debug build
#ifdef NDEBUG
auto prefix = "force_consistency_checks";
#else
auto prefix = "force_consistency_checks(DEBUG)";
#endif
s = Status::Corruption(prefix, s.getState());
} else {
// was only expecting corruption with message, or OK
assert(s.ok());
}
return s;
}

bool CheckConsistencyForNumLevels() const {
// Make sure there are no files on or beyond num_levels().
if (has_invalid_levels_) {
Expand Down
12 changes: 7 additions & 5 deletions include/rocksdb/advanced_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -651,11 +651,13 @@ struct AdvancedColumnFamilyOptions {
// Dynamically changeable through SetOptions() API
bool paranoid_file_checks = false;

// In debug mode, RocksDB run consistency checks on the LSM every time the LSM
// change (Flush, Compaction, AddFile). These checks are disabled in release
// mode, use this option to enable them in release mode as well.
// Default: false
bool force_consistency_checks = false;
// In debug mode, RocksDB runs consistency checks on the LSM every time the
// LSM changes (Flush, Compaction, AddFile). When this option is true, these
// checks are also enabled in release mode. These checks were historically
// disabled in release mode, but are now enabled by default for proactive
// corruption detection, at almost no cost in extra CPU.
// Default: true
bool force_consistency_checks = true;

// Measure IO stats in compactions and flushes, if true.
//
Expand Down

0 comments on commit ddbc5da

Please sign in to comment.