Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true #9634

akankshamahajan15 · 2022-02-24T23:16:12Z

Summary:

In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.

If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.

As a solution,

the corrupted WALs whose numbers are larger than the
corrupted wal and smaller than the new WAL will be moved to archive folder.
Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful

Test Plan: 1. Added new unit tests
2. make crast_test -j

facebook-github-bot · 2022-02-24T23:30:37Z

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

db/db_impl/db_impl_open.cc

facebook-github-bot · 2022-02-25T00:45:27Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-02-25T02:04:15Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-02-25T03:21:07Z

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963 · 2022-02-26T00:33:36Z

Thanks @akankshamahajan15 for the PR. Hopefully we can improve the recovery process after this series of PRs.

One thing that I should have emphasized: currently, RocksDB will write and persist new MANIFEST files even before recovery succeeds. See https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L611, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L1280, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L523, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L521, etc. (I didn't do a thorough search).

This is behavior is problematic. I think one important thing we want to achieve in this PR is to write new MANIFEST (call LogAndApply) when we are sure the db is consistent.

facebook-github-bot · 2022-02-26T00:43:35Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

akankshamahajan15 · 2022-02-26T00:45:57Z

One thing that I should have emphasized: currently, RocksDB will write and persist new MANIFEST files even before recovery succeeds. See https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L611, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L1280, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L523, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L521, etc. (I didn't do a thorough search).

Yes, you mentioned in the task, but that step wasn't clear to me at that time. I will update the code now to persist new MANIFEST only after the DB is consistent.

facebook-github-bot · 2022-03-03T22:54:17Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-03-03T23:52:56Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-03-04T00:33:28Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

db/db_impl/db_impl_open.cc

facebook-github-bot · 2022-03-24T19:42:52Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-03-24T20:20:43Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-03-24T20:21:43Z

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

db/db_impl/db_impl_open.cc

db/corruption_test.cc

db/db_impl/db_impl.h

… InstallSuperVersions

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot · 2022-04-11T20:49:46Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

akankshamahajan15 · 2022-04-11T20:51:24Z

Also, can you update HISTORY.md?

I updated the HISTORY.md. I added two separate fixes. One is moving corruption wals. Second is to create only one MANIFEST after db is recovered succesfully.

facebook-github-bot · 2022-04-11T20:52:16Z

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-04-11T20:58:55Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-04-11T20:59:29Z

@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr · 2022-04-26T02:55:33Z

@akankshamahajan15 Is this right?

the corrupted WALs whose numbers are larger than the
corrupted wal and smaller than the new WAL will be moved to archive folder.

In the following recovery we archived (126119, 126216].

2022/04/20-20:12:59.858251 556 [db/db_impl/db_impl_open.cc:1163] Point in time recovered to log #126119 seq #1362971066
2022/04/20-20:12:59.858951 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126125 mode 2
2022/04/20-20:13:02.307344 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126128 mode 2
2022/04/20-20:13:05.299616 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126137 mode 2
2022/04/20-20:13:06.112663 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126142 mode 2
2022/04/20-20:13:06.807948 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126145 mode 2
2022/04/20-20:13:06.922859 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126151 mode 2
2022/04/20-20:13:08.407391 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126154 mode 2
2022/04/20-20:13:12.479291 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126160 mode 2
2022/04/20-20:13:13.141196 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126165 mode 2
2022/04/20-20:13:13.689726 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126168 mode 2
2022/04/20-20:13:14.429917 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126170 mode 2
2022/04/20-20:13:14.791170 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126172 mode 2
2022/04/20-20:13:15.354656 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126178 mode 2
2022/04/20-20:13:17.160268 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126183 mode 2
2022/04/20-20:13:19.541990 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126193 mode 2
2022/04/20-20:13:20.461264 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126196 mode 2
2022/04/20-20:13:21.247930 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126202 mode 2
2022/04/20-20:13:23.510047 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126205 mode 2
2022/04/20-20:13:24.541666 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126211 mode 2
2022/04/20-20:13:26.070769 556 [db/db_impl/db_impl_open.cc:913] Recovering log #126216 mode 2

Then the following recovery processes 126119 followed by 126219, which doesn't have consecutive seqnos so recovery terminates at 126119:

2022/04/20-23:38:04.973200 548 [db/db_impl/db_impl_open.cc:1163] Point in time recovered to log #126119 seq #1362971066
2022/04/20-23:38:04.973920 548 [db/db_impl/db_impl_open.cc:913] Recovering log #126219 mode 2
2022/04/20-23:38:05.117620 548 [WARN] [db/db_impl/db_impl_open.cc:919] 126219.log: dropping 47392297 bytes
2022/04/20-23:38:05.118405 548 [ERROR] [db/db_impl/db_impl_open.cc:1217] Column family inconsistency: SST file contains data beyond the point of corruption.

126119 is too early -- I'd expect it to be earlier than the previous run, and in this case it is earlier than a flushed file too.

ajkr · 2022-04-26T03:52:01Z

BTW, I still haven't figured out why we need to remove the corrupt WALs rather than skip over them during recovery, so any clarification on that would be appreciated.

riversand963 · 2022-04-26T04:22:33Z

BTW, I still haven't figured out why we need to remove the corrupt WALs rather than skip over them during recovery, so any clarification on that would be appreciated.

We can but we do not have to.

Thinking more about this, I think we do not have to remove the corrupted WALs as long as we persist the new MANIFEST after successfully syncing the new WAL.

If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error.

riversand963 · 2022-04-26T04:41:07Z

Wait, I remember something, though.
It seems we only sync the new WAL with empty sentinel batch during recovery. It seems possible for an unsynced WAL before the new WAL to become corrupted, without violating the contract between RocksDB and the FS.
Assume we have 10.log, 11.log and 12.log. None of them has been sync'ed, and all of them are higher than "min_log_number_to_keep".
In the first recovery, we stop at 11.log, but in second attempt, is it possible to find a corruption in 10.log which causes the sequence numbers of 10.log and 11.log not consecutive? The WALs are not sync'ed.

…lush_duing_recovery set true (facebook#9634)" This reverts commit ae82d91.

Summary: Left HISTORY.md and unit tests. Added a new unit test to repro the corruption scenario that this PR fixes, and HISTORY.md line for that. Pull Request resolved: #9906 Reviewed By: riversand963 Differential Revision: D35940093 Pulled By: ajkr fbshipit-source-id: 9816f99e1ce405ba36f316beb4f6378c37c8c86b

facebook-github-bot added the CLA Signed label Feb 24, 2022

akankshamahajan15 requested a review from riversand963 February 24, 2022 23:27

akankshamahajan15 commented Feb 24, 2022

View reviewed changes

db/db_impl/db_impl_open.cc Outdated Show resolved Hide resolved

akankshamahajan15 force-pushed the wal_bug_patch1 branch from 5683a27 to 238c252 Compare February 25, 2022 00:45

akankshamahajan15 force-pushed the wal_bug_patch1 branch from 238c252 to 5ea2d7c Compare February 25, 2022 02:04

akankshamahajan15 force-pushed the wal_bug_patch1 branch from 5ea2d7c to 9f3bc57 Compare February 26, 2022 00:43

akankshamahajan15 force-pushed the wal_bug_patch1 branch from 9f3bc57 to fd251eb Compare March 3, 2022 22:54

akankshamahajan15 force-pushed the wal_bug_patch1 branch from fd251eb to daca761 Compare March 3, 2022 23:52

akankshamahajan15 force-pushed the wal_bug_patch1 branch from daca761 to f84cb3d Compare March 4, 2022 00:33

riversand963 reviewed Mar 16, 2022

View reviewed changes

db/db_impl/db_impl_open.cc Outdated Show resolved Hide resolved

akankshamahajan15 force-pushed the wal_bug_patch1 branch from f84cb3d to d31948b Compare March 24, 2022 19:42

akankshamahajan15 requested a review from riversand963 March 24, 2022 20:10

akankshamahajan15 force-pushed the wal_bug_patch1 branch from d31948b to 9f3162d Compare March 24, 2022 20:20

akankshamahajan15 commented Mar 24, 2022

View reviewed changes

db/db_impl/db_impl_open.cc Outdated Show resolved Hide resolved

riversand963 reviewed Mar 24, 2022

View reviewed changes

db/corruption_test.cc Outdated Show resolved Hide resolved

db/corruption_test.cc Outdated Show resolved Hide resolved

db/corruption_test.cc Outdated Show resolved Hide resolved

riversand963 reviewed Mar 25, 2022

View reviewed changes

db/corruption_test.cc Outdated Show resolved Hide resolved

db/corruption_test.cc Outdated Show resolved Hide resolved

db/corruption_test.cc Outdated Show resolved Hide resolved

riversand963 reviewed Mar 25, 2022

View reviewed changes

db/db_impl/db_impl.h Outdated Show resolved Hide resolved

db/db_impl/db_impl.h Outdated Show resolved Hide resolved

db/db_impl/db_impl.h Outdated Show resolved Hide resolved

db/db_impl/db_impl.h Outdated Show resolved Hide resolved

db/db_impl/db_impl.h Outdated Show resolved Hide resolved

akankshamahajan15 added 11 commits April 11, 2022 13:49

Change the order of API calls to Write_and_sync_WAL -> LogAndApply ->…

bb8d039

… InstallSuperVersions

Fix build failures

caa9c42

Addressed comments

a7ef4be

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Fix CircleCI build failure

6a21026

Addressed comments

efd6776

Addressed remaining comments

f0e37ef

Update test

b4da445

Update order of API calls during Recovery

923fc4c

update comments

4352987

Addressed comments

4b51ec7

Addressed the comments

f7e3c1c

akankshamahajan15 force-pushed the wal_bug_patch1 branch from 4dbbde0 to e42ddbb Compare April 11, 2022 20:49

Update HISTORY.md

3207a3d

akankshamahajan15 force-pushed the wal_bug_patch1 branch from e42ddbb to 3207a3d Compare April 11, 2022 20:58

facebook-github-bot closed this in ae82d91 Apr 11, 2022

riversand963 added a commit to riversand963/rocksdb that referenced this pull request Apr 26, 2022

Revert "Remove corrupted WAL files in kPointRecoveryMode with avoid_f…

8b9517e

…lush_duing_recovery set true (facebook#9634)" This reverts commit ae82d91.

ajkr added a commit to ajkr/rocksdb that referenced this pull request Apr 26, 2022

Revert open logic changes in facebook#9634

0d3dbeb

ajkr mentioned this pull request Apr 26, 2022

Revert open logic changes in #9634 #9906

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true #9634

Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true #9634

akankshamahajan15 commented Feb 24, 2022 •

edited

Loading

facebook-github-bot commented Feb 24, 2022

facebook-github-bot commented Feb 25, 2022

facebook-github-bot commented Feb 25, 2022

facebook-github-bot commented Feb 25, 2022

riversand963 commented Feb 26, 2022

facebook-github-bot commented Feb 26, 2022

akankshamahajan15 commented Feb 26, 2022

facebook-github-bot commented Mar 3, 2022

facebook-github-bot commented Mar 3, 2022

facebook-github-bot commented Mar 4, 2022

facebook-github-bot commented Mar 24, 2022

facebook-github-bot commented Mar 24, 2022

facebook-github-bot commented Mar 24, 2022

facebook-github-bot commented Apr 11, 2022

akankshamahajan15 commented Apr 11, 2022 •

edited

Loading

facebook-github-bot commented Apr 11, 2022

facebook-github-bot commented Apr 11, 2022

facebook-github-bot commented Apr 11, 2022

ajkr commented Apr 26, 2022 •

edited

Loading

ajkr commented Apr 26, 2022

riversand963 commented Apr 26, 2022

riversand963 commented Apr 26, 2022

Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true #9634

Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true #9634

Conversation

akankshamahajan15 commented Feb 24, 2022 • edited Loading

facebook-github-bot commented Feb 24, 2022

facebook-github-bot commented Feb 25, 2022

facebook-github-bot commented Feb 25, 2022

facebook-github-bot commented Feb 25, 2022

riversand963 commented Feb 26, 2022

facebook-github-bot commented Feb 26, 2022

akankshamahajan15 commented Feb 26, 2022

facebook-github-bot commented Mar 3, 2022

facebook-github-bot commented Mar 3, 2022

facebook-github-bot commented Mar 4, 2022

facebook-github-bot commented Mar 24, 2022

facebook-github-bot commented Mar 24, 2022

facebook-github-bot commented Mar 24, 2022

facebook-github-bot commented Apr 11, 2022

akankshamahajan15 commented Apr 11, 2022 • edited Loading

facebook-github-bot commented Apr 11, 2022

facebook-github-bot commented Apr 11, 2022

facebook-github-bot commented Apr 11, 2022

ajkr commented Apr 26, 2022 • edited Loading

ajkr commented Apr 26, 2022

riversand963 commented Apr 26, 2022

riversand963 commented Apr 26, 2022

akankshamahajan15 commented Feb 24, 2022 •

edited

Loading

akankshamahajan15 commented Apr 11, 2022 •

edited

Loading

ajkr commented Apr 26, 2022 •

edited

Loading