Test correctness with WAL disabled in non-txn blackbox crash tests #9338

ajkr · 2021-12-27T08:05:29Z

Recently we added the ability to verify some prefix of operations are recovered (AKA no "hole" in the recovered data) (#8966). Besides testing unsynced data loss scenarios, it is also useful to test WAL disabled use cases, where unflushed writes are expected to be lost. Note RocksDB only offers the prefix-recovery guarantee to WAL-disabled use cases that use atomic flush, so crash test always enables atomic flush when WAL is disabled.

To verify WAL-disabled crash-recovery correctness globally, i.e., also in whitebox and blackbox transaction tests, it is possible but requires further changes. I added TODOs in db_crashtest.py.

Depends on #9305.

Test Plan: Running all crash tests and many instances of blackbox. Sandcastle links are in Phabricator diff test plan.

facebook-github-bot · 2021-12-29T01:49:35Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-01-02T00:34:47Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-01-05T02:55:42Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: The LastSequence field in the MANIFEST file is the baseline seqno for a recovered DB. Recovering WAL entries might cause the recovered DB's seqno to advance above this baseline, but the recovered DB will never use a smaller seqno. Before this PR, we were writing the DB's seqno at the time of LogAndApply() as the LastSequence value. This works in the sense that it is a large enough baseline for the recovered DB that it'll never overwrite any records in existing SST files. At the same time, it's arbitrarily larger than what's needed. This behavior comes from LevelDB, where there was no tracking of largest seqno in an SST file. Now we know the largest seqno of newly written SST files, so we can write an exact value in LastSequence that actually reflects the largest seqno in any file referred to by the MANIFEST. This is primarily useful for correctness testing with unsynced data loss, where the recovered DB's seqno needs to indicate what records were recovered. Pull Request resolved: #9305 Test Plan: - #9338 adds crash-recovery correctness testing coverage for WAL disabled use cases - #9357 will extend that testing to cover file ingestion - Added assertion at end of LogAndApply() for `VersionSet::descriptor_last_sequence_` consistency with files - Manually tested upgrade/downgrade compatibility with a custom crash test that randomly picks between a `db_stress` built with and without this PR (for old code it must run with `-disable_wal=0`) Reviewed By: riversand963 Differential Revision: D33182770 Pulled By: ajkr fbshipit-source-id: 0bfafaf685f347cc8cb0e1d62e0186340a738f7d

facebook-github-bot added the CLA Signed label Dec 27, 2021

ajkr mentioned this pull request Dec 28, 2021

Recover to exact latest seqno of data committed to MANIFEST #9305

Closed

ajkr requested a review from riversand963 December 29, 2021 00:10

ajkr force-pushed the db-crashtest-disable-wal branch from a1ca703 to 807f84b Compare December 29, 2021 01:38

ajkr changed the title ~~Test correctness with WAL disabled in all non-transaction crash tests~~ Test correctness with WAL disabled in non-txn blackbox crash tests Dec 29, 2021

riversand963 approved these changes Dec 29, 2021

View reviewed changes

toggle disable_wal in crash test

6c2bd68

ajkr force-pushed the db-crashtest-disable-wal branch from 807f84b to 6c2bd68 Compare January 2, 2022 00:34

facebook-github-bot closed this in 6892f19 Jan 6, 2022

fuatbasik mentioned this pull request Jul 13, 2023

Verify WAL-disabled crash-recovery consistency globally #11613

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test correctness with WAL disabled in non-txn blackbox crash tests #9338

Test correctness with WAL disabled in non-txn blackbox crash tests #9338

ajkr commented Dec 27, 2021 •

edited

Loading

facebook-github-bot commented Dec 29, 2021

facebook-github-bot commented Jan 2, 2022

facebook-github-bot commented Jan 5, 2022

Test correctness with WAL disabled in non-txn blackbox crash tests #9338

Test correctness with WAL disabled in non-txn blackbox crash tests #9338

Conversation

ajkr commented Dec 27, 2021 • edited Loading

facebook-github-bot commented Dec 29, 2021

facebook-github-bot commented Jan 2, 2022

facebook-github-bot commented Jan 5, 2022

ajkr commented Dec 27, 2021 •

edited

Loading