Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db_stress support tracking historical values #8960

Closed
wants to merge 16 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Sep 26, 2021

When --sync_fault_injection is set, this PR takes a snapshot of the expected values and starts an operation trace when the DB is opened. These files are stored in --expected_values_dir. They will be used for recovering the expected state of the DB following a crash where a suffix of unsynced operations are allowed to be lost.

Test Plan: injected crashed at various points in FileExpectedStateManager and verified the next run recovers the state/trace file with highest seqno and removes all older/temporary files. Note we don't use sync_fault_injection in CI crash tests yet.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ajkr for the PR. Left a few initial minor comments before taking closer look.

@@ -789,7 +790,10 @@ DEFINE_int32(get_property_one_in, 1000,

DEFINE_bool(sync_fault_injection, false,
"If true, FaultInjectionTestFS will be used for write operations, "
" and unsynced data in DB will lost after crash.");
"and unsynced data in DB will lost after crash. In such a case we "
"track DB changes in a trace file in --expected_values_dir for "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give an example of the name of a trace file? For example, it can be reworded a bit like:

track DB changes in a trace file (*.trace)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation if expected_values_dir is empty but sync_fault_injection is set to true? The doc for sync_fault_injection seems to imply that "in such a case, we will need expected_values_dir.

If so, can we add a check in command line arguments parsing and processing, so that we make sure user sets expected_values_dir if they set sync_fault_injection to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give an example of the name of a trace file?

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation if expected_values_dir is empty but sync_fault_injection is set to true?

Empty expected_values_dir means no verification at startup, and verification at shutdown based only on the operations performed by that particular db_stress run. Combining it with sync_fault_injection=true would not bring additional coverage (compared to sync_fault_injection=false) because sync_fault_injection only causes unsynced data to be dropped at process end, which is after all verifications when expected_values_dir is empty.

Since I believe the combination brings no additional coverage, it is currently not supported. The error looks like this:

$ ./db_stress  --expected_values_dir="" --sync_fault_injection=true
...
Error enabling history tracing: Not implemented:

@@ -239,6 +386,10 @@ Status AnonExpectedStateManager::Open() {
return latest_->Open(true /* create */);
}

Status AnonExpectedStateManager::SaveAtAndAfter(DB* /* db */) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can this be inlined just to save some typing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

std::string temp_path = GetTempPathForFilename(kLatestFilename);
Status s = Env::Default()->FileExists(temp_path);
#ifndef ROCKSDB_LITE
Status FileExpectedStateManager::SaveAtAndAfter(DB* db) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is SaveAtAndAfter only going to be called on DB open, or eventually also anytime the WAL is synced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to call it anytime the WAL is synced. However there are some difficulties with parallel writers mutating the "*.state" file while it's being copied, for example. We would need to figure out the locking to make it possible.

@ajkr ajkr force-pushed the dbstress-expected-state-saveatandafter branch from 4741881 to 8242f86 Compare December 2, 2021 22:39
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr ajkr force-pushed the dbstress-expected-state-saveatandafter branch from ecacfa3 to db46d2e Compare December 4, 2021 06:37
@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a reasonable starting place to me 👍

}
}
if (s.ok()) {
if (old_saved_seqno != kMaxSequenceNumber &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: duplicate if could be simplified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

}
}
if (s.ok()) {
if (old_saved_seqno != kMaxSequenceNumber &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants