db_stress verify with lost unsynced operations #8966

ajkr · 2021-09-27T22:07:17Z

When a previous run left behind historical state/trace files (implying it was run with --sync_fault_injection set), this PR uses them to restore the expected state according to the DB's recovered sequence number. That way, a tail of latest unsynced operations are permitted to be dropped, as is the case when data in page cache or certain Envs is lost. The point of the verification in this scenario is just to ensure there is no hole in the recovered data.

Test Plan:

ran it a while, made sure it is restoring expected values using the historical state/trace files:

$ rm -rf ./tmp-db/ ./exp/ && mkdir -p ./tmp-db/ ./exp/ && while ./db_stress -compression_type=none -clear_column_family_one_in=0 -expected_values_dir=./exp -sync_fault_injection=1 -destroy_db_initially=0 -db=./tmp-db -max_key=1000000 -ops_per_thread=10000 -reopen=0 -threads=32 ; do : ; done

facebook-github-bot · 2021-09-27T22:23:03Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-09-28T22:00:15Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-09-28T22:49:11Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-12-03T05:43:25Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-12-03T05:45:28Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-12-03T23:08:06Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-12-03T23:09:31Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-12-04T06:42:22Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-12-04T06:45:29Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

pdillinger

I'm not understanding how it's able to sync up the replayed trace and the DB state. Can you explain? I don't see for example a seqno filter or threshold on what we apply to expected state.

pdillinger · 2021-12-07T16:39:23Z

db_stress_tool/db_stress_common.cc

@@ -233,6 +233,11 @@ size_t GenerateValue(uint32_t rand, char* v, size_t max_sz) {
  return value_sz;  // the size of the value set.
 }

+uint32_t GetValueBase(Slice s) {
+  assert(s.size() >= sizeof(uint32_t));
+  return *((uint32_t*)s.data());


You could take this opportunity to change to something like Get/PutUnaligned instead of C-style cast.

pdillinger · 2021-12-07T17:00:15Z

db_stress_tool/expected_state.cc

+    Status PutCF(uint32_t column_family_id, const Slice& key,
+                 const Slice& value) override {
+      uint64_t key_id;
+      if (!GetIntVal(key.ToString(), &key_id)) {


I'm going to try to ignore the inherited craziness of GetIntVal (extern inline taking a string by value, apparently decoding to unsigned where the encode encodes from signed, and builds a temporary vector rather than using values as they are decoded)

pdillinger · 2021-12-07T17:02:57Z

db_stress_tool/expected_state.cc

+Status FileExpectedStateManager::Restore(DB* db) {
+  // An `ExpectedStateTraceRecordHandler` applies a configurable number of
+  // write operation trace records to the configured expected state.
+  class ExpectedStateTraceRecordHandler : public TraceRecord::Handler,


A little big for a local class IMHO

Done, made it non-local.

pdillinger · 2021-12-07T17:21:21Z

db_stress_tool/expected_state.cc

+    if (s.ok()) {
+      s = replayer->Prepare();
+    }
+    while (true) {


I prefer for (;;) {

pdillinger · 2021-12-07T17:23:38Z

db_stress_tool/expected_state.cc

+  };
+
+  SequenceNumber seqno = db->GetLatestSequenceNumber();
+  if (seqno < saved_seqno_) {


For clarity, I would assert(HasHistory())

pdillinger · 2021-12-07T17:33:20Z

db_stress_tool/expected_state.cc

-    s = Env::Default()->DeleteFile(temp_path);
-  } else if (s.IsNotFound()) {
-    s = Status::OK();
+    saved_seqno_ = kMaxSequenceNumber;


Maybe say something like SaveAtAndAfter(...) will be called after Restore(...) to initialize tracing

It might or might not be called after reaching here, depending on whether the current run uses -sync_fault_injection=1. Reaching this code path only means the previous db_stress run used -sync_fault_injection=1.

ajkr

Thanks very much for the review! Will get to your other comments soon - just wanted to answer the fundamental question first.

ajkr · 2021-12-07T17:51:30Z

db_stress_tool/expected_state.cc

+      if (num_write_ops_ == max_write_ops_) {
+        return Status::OK();
+      }


I'm not understanding how it's able to sync up the replayed trace and the DB state. Can you explain? I don't see for example a seqno filter or threshold on what we apply to expected state.

This returns without applying the write op once we've reached max_write_ops_. The client configures max_write_ops_ by subtracting the base sequence number of the trace determined by the filename ("X.trace" has base seqno X) from the post-recovery GetLatestSequenceNumber().

OK thanks. Of course it looks less hidden now that you've showed it to me, but I looked for it for a while.

The design/implementation doc explains this in a "Verification algorithm" subsection. It was written after you reviewed. Copy/pasted the subsection here since the doc is not shared publicly.

Verification algorithm

FileExpectedStateManager::Open() discovers from the filesystem (“*.{trace,state}” files) the sequence number synced by the previous db_stress run. It is stored in saved_seqno_.

FileExpectedStateManager::Restore() reconstructs a “LATEST.state” tailored to the DB that just recovered with possibly lost unsynced operations:

Copies “saved_seqno_.state” to “.LATEST.state.tmp”.

Creates an ExpectedStateTraceRecordHandler configured to apply DB::GetLatestSequenceNumber() - saved_seqno_ updates to “.LATEST.state.tmp”.

“saved_seqno_.trace” is replayed using the above custom record handler.

“.LATEST.state.tmp” is renamed to “LATEST.state”.

NonBatchedOpsStressTest::VerifyDb() compares the recovered DB against the values now in “LATEST.state”.

pdillinger

LGTM modulo the cosmetic feedback

facebook-github-bot · 2021-12-07T19:35:33Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-12-07T21:38:34Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-12-07T22:55:23Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-12-07T22:56:53Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-12-07T23:00:33Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

This reverts commit 57625597e6794ff503b8f2b998edd99061f6198a.

ajkr

Thanks for the review!

ajkr · 2021-12-15T17:56:30Z

db_stress_tool/expected_state.cc

+      if (num_write_ops_ == max_write_ops_) {
+        return Status::OK();
+      }


The design/implementation doc explains this in a "Verification algorithm" subsection. It was written after you reviewed. Copy/pasted the subsection here since the doc is not shared publicly.

Verification algorithm

FileExpectedStateManager::Open() discovers from the filesystem (“*.{trace,state}” files) the sequence number synced by the previous db_stress run. It is stored in saved_seqno_.

FileExpectedStateManager::Restore() reconstructs a “LATEST.state” tailored to the DB that just recovered with possibly lost unsynced operations:

Copies “saved_seqno_.state” to “.LATEST.state.tmp”.

Creates an ExpectedStateTraceRecordHandler configured to apply DB::GetLatestSequenceNumber() - saved_seqno_ updates to “.LATEST.state.tmp”.

“saved_seqno_.trace” is replayed using the above custom record handler.

“.LATEST.state.tmp” is renamed to “LATEST.state”.

NonBatchedOpsStressTest::VerifyDb() compares the recovered DB against the values now in “LATEST.state”.

ajkr · 2021-12-15T18:05:28Z

db_stress_tool/expected_state.cc

+Status FileExpectedStateManager::Restore(DB* db) {
+  // An `ExpectedStateTraceRecordHandler` applies a configurable number of
+  // write operation trace records to the configured expected state.
+  class ExpectedStateTraceRecordHandler : public TraceRecord::Handler,


Done, made it non-local.

ajkr · 2021-12-15T18:13:01Z

db_stress_tool/expected_state.cc

+  };
+
+  SequenceNumber seqno = db->GetLatestSequenceNumber();
+  if (seqno < saved_seqno_) {


ajkr · 2021-12-15T18:13:29Z

db_stress_tool/expected_state.cc

+    if (s.ok()) {
+      s = replayer->Prepare();
+    }
+    while (true) {


ajkr · 2021-12-15T18:19:33Z

db_stress_tool/expected_state.cc

-    s = Env::Default()->DeleteFile(temp_path);
-  } else if (s.IsNotFound()) {
-    s = Status::OK();
+    saved_seqno_ = kMaxSequenceNumber;


It might or might not be called after reaching here, depending on whether the current run uses -sync_fault_injection=1. Reaching this code path only means the previous db_stress run used -sync_fault_injection=1.

ajkr · 2021-12-15T18:52:33Z

db_stress_tool/db_stress_common.cc

@@ -233,6 +233,11 @@ size_t GenerateValue(uint32_t rand, char* v, size_t max_sz) {
  return value_sz;  // the size of the value set.
 }

+uint32_t GetValueBase(Slice s) {
+  assert(s.size() >= sizeof(uint32_t));
+  return *((uint32_t*)s.data());


facebook-github-bot · 2021-12-15T18:54:18Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-12-15T19:02:02Z

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-12-15T19:28:33Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

ajkr · 2021-12-15T22:01:09Z

db_stress_tool/db_stress_test_base.cc

@@ -307,7 +307,20 @@ void StressTest::FinishInitDb(SharedState* shared) {
    fprintf(stdout, "Compaction filter factory: %s\n",
            compaction_filter_factory->Name());
  }
-  // TODO(ajkr): First restore if there's already a trace.
+
+  if (shared->HasHistory() && IsStateTracked()) {


This IsStateTracked() causes a problem in the following sequence:

-sync_fault_injection=1 -test_batches_snapshot=0

-test_batches_snapshot=1

-test_batches_snapshot=0

The second command can cause DB seqno to advance beyond what is recoverable in expected values, which causes the third command to fail.

facebook-github-bot · 2021-12-15T23:16:53Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

This fixes two bugs in the recently committed DB verification following crash-recovery with unsynced data loss (facebook#8966): First, we cannot trust `GetLatestSequenceNumber()` post-recovery to return the sequence number of the latest recovered record. Oddly, it can return a larger seqno than that in case WAL data older than the latest `LogAndApply()` is lost. This behavior is inherited from LevelDB. They do not have `FileMetaData::largest_seqno` so could not get the exact latest record seqno from WAL+MANIFEST, whereas we could if we wanted to. But for now I added a workaround. Second, there was a bug in crash test runs involving mixed values for `-test_batches_snapshots`. The problem was we were neither restoring expected values nor enabling tracing when `-test_batches_snapshots=1`. This caused a future `-test_batches_snapshots=0` run to not find enough trace data to restore expected values. The fix is to restore expected values at the start of `-test_batches_snapshots=1` runs, but still leave tracing disabled as we do not need to track those KVs. Test Plan: The below command runs for several minutes (still fails eventually) whereas it used to fail every time at the start of the second `db_stress` run. ``` $ TEST_TMPDIR=/dev/shm /usr/local/bin/python3 -u tools/db_crashtest.py blackbox --interval=10 --max_key=100000 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152 --value_size_mult=33 --sync_fault_injection=1 ```-

…pshots` (#9302) Summary: This fixes two bugs in the recently committed DB verification following crash-recovery with unsynced data loss (#8966): The first bug was in crash test runs involving mixed values for `-test_batches_snapshots`. The problem was we were neither restoring expected values nor enabling tracing when `-test_batches_snapshots=1`. This caused a future `-test_batches_snapshots=0` run to not find enough trace data to restore expected values. The fix is to restore expected values at the start of `-test_batches_snapshots=1` runs, but still leave tracing disabled as we do not need to track those KVs. The second bug was in `db_stress` runs that restore the expected values file and use compaction filter. The compaction filter was initialized to use the pre-restore expected values, which would be `munmap()`'d during `FileExpectedStateManager::Restore()`. Then compaction filter would run into a segfault. The fix is just to reorder compaction filter init after expected values restore. Pull Request resolved: #9302 Test Plan: - To verify the first problem, the below sequence used to fail; now it passes. ``` $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=0 $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=1 $ ./db_stress --db=./test-db/ --expected_values_dir=./test-db-expected/ --max_key=100000 --ops_per_thread=1000 --sync_fault_injection=1 --clear_column_family_one_in=0 --destroy_db_initially=0 -reopen=0 -test_batches_snapshots=0 ``` - The second problem occurred rarely in the form of a SIGSEGV on a file that was `munmap()`d. I have not seen it after this PR though this doesn't prove much. Reviewed By: jay-zhuang Differential Revision: D33155283 Pulled By: ajkr fbshipit-source-id: 66fd0f0edf34015a010c30015f14f104734e964e

…9338) Summary: Recently we added the ability to verify some prefix of operations are recovered (AKA no "hole" in the recovered data) (#8966). Besides testing unsynced data loss scenarios, it is also useful to test WAL disabled use cases, where unflushed writes are expected to be lost. Note RocksDB only offers the prefix-recovery guarantee to WAL-disabled use cases that use atomic flush, so crash test always enables atomic flush when WAL is disabled. To verify WAL-disabled crash-recovery correctness globally, i.e., also in whitebox and blackbox transaction tests, it is possible but requires further changes. I added TODOs in db_crashtest.py. Depends on #9305. Pull Request resolved: #9338 Test Plan: Running all crash tests and many instances of blackbox. Sandcastle links are in Phabricator diff test plan. Reviewed By: riversand963 Differential Revision: D33345333 Pulled By: ajkr fbshipit-source-id: f56dd7d2e5a78d59301bf4fc3fedb980eb31e0ce

facebook-github-bot added the CLA Signed label Sep 27, 2021

ajkr force-pushed the dbstress-restore-expected-values branch from 5ed2c58 to 7e9f573 Compare September 28, 2021 22:00

ajkr force-pushed the dbstress-restore-expected-values branch from 7e9f573 to c8095f5 Compare December 3, 2021 05:43

ajkr force-pushed the dbstress-restore-expected-values branch from c8095f5 to aee8f70 Compare December 3, 2021 23:08

ajkr force-pushed the dbstress-restore-expected-values branch from aee8f70 to e7a1aac Compare December 4, 2021 06:42

pdillinger reviewed Dec 7, 2021

View reviewed changes

ajkr commented Dec 7, 2021

View reviewed changes

pdillinger approved these changes Dec 7, 2021

View reviewed changes

ajkr force-pushed the dbstress-restore-expected-values branch from e7a1aac to 4f7aca7 Compare December 7, 2021 19:35

ajkr force-pushed the dbstress-restore-expected-values branch from 4f7aca7 to 11c4941 Compare December 7, 2021 22:55

ajkr added 6 commits December 15, 2021 10:02

Revert "remove Restore to make this PR self-contained"

13eb77b

This reverts commit 57625597e6794ff503b8f2b998edd99061f6198a.

update client to use Restore

7dbc7eb

restore without replay

549817e

noop replay

983fcab

call handler per op; manage ExpectedState for tempfile

c613cfe

apply write ops to ExpectedState

49ec18d

ajkr and others added 5 commits December 15, 2021 10:02

fix forgot to reopen LATEST.state in Restore()

93d8636

make format

e9ee1ef

fix lite

82e12d3

minor cleanup

3b31d02

only restore in state-tracking StressTests

07f7e9b

ajkr force-pushed the dbstress-restore-expected-values branch from 11c4941 to 07f7e9b Compare December 15, 2021 18:15

address comments

9c9c0ca

ajkr commented Dec 15, 2021

View reviewed changes

make format

bce77bd

facebook-github-bot closed this in c9818b3 Dec 15, 2021

ajkr commented Dec 15, 2021

View reviewed changes

ajkr mentioned this pull request Dec 16, 2021

Fix unsynced data loss correctness test with mixed -test_batches_snapshots #9302

Closed

ajkr mentioned this pull request Dec 27, 2021

Test correctness with WAL disabled in non-txn blackbox crash tests #9338

Closed

db_stress verify with lost unsynced operations #8966

db_stress verify with lost unsynced operations #8966

Conversation

ajkr commented Sep 27, 2021 • edited Loading

facebook-github-bot commented Sep 27, 2021

facebook-github-bot commented Sep 28, 2021

facebook-github-bot commented Sep 28, 2021

facebook-github-bot commented Dec 3, 2021

facebook-github-bot commented Dec 3, 2021

facebook-github-bot commented Dec 3, 2021

facebook-github-bot commented Dec 3, 2021

facebook-github-bot commented Dec 4, 2021

facebook-github-bot commented Dec 4, 2021

pdillinger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajkr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pdillinger left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 7, 2021

facebook-github-bot commented Dec 7, 2021

facebook-github-bot commented Dec 7, 2021

facebook-github-bot commented Dec 7, 2021

facebook-github-bot commented Dec 7, 2021

ajkr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 15, 2021

facebook-github-bot commented Dec 15, 2021

facebook-github-bot commented Dec 15, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Dec 15, 2021

ajkr commented Sep 27, 2021 •

edited

Loading