-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true #9634
Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true #9634
Conversation
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
5683a27
to
238c252
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
238c252
to
5ea2d7c
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Thanks @akankshamahajan15 for the PR. Hopefully we can improve the recovery process after this series of PRs. One thing that I should have emphasized: currently, RocksDB will write and persist new MANIFEST files even before recovery succeeds. See https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L611, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L1280, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L523, https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl_open.cc#L521, etc. (I didn't do a thorough search). This is behavior is problematic. I think one important thing we want to achieve in this PR is to write new MANIFEST (call LogAndApply) when we are sure the db is consistent. |
5ea2d7c
to
9f3bc57
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
Yes, you mentioned in the task, but that step wasn't clear to me at that time. I will update the code now to persist new MANIFEST only after the DB is consistent. |
9f3bc57
to
fd251eb
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
fd251eb
to
daca761
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
daca761
to
f84cb3d
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
f84cb3d
to
d31948b
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
d31948b
to
9f3162d
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
… InstallSuperVersions
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
4dbbde0
to
e42ddbb
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
I updated the HISTORY.md. I added two separate fixes. One is moving corruption wals. Second is to create only one MANIFEST after db is recovered succesfully. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
e42ddbb
to
3207a3d
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@akankshamahajan15 Is this right?
In the following recovery we archived (126119, 126216].
Then the following recovery processes 126119 followed by 126219, which doesn't have consecutive seqnos so recovery terminates at 126119:
126119 is too early -- I'd expect it to be earlier than the previous run, and in this case it is earlier than a flushed file too. |
BTW, I still haven't figured out why we need to remove the corrupt WALs rather than skip over them during recovery, so any clarification on that would be appreciated. |
We can but we do not have to.
|
Wait, I remember something, though. |
…lush_duing_recovery set true (facebook#9634)" This reverts commit ae82d91.
Summary: Left HISTORY.md and unit tests. Added a new unit test to repro the corruption scenario that this PR fixes, and HISTORY.md line for that. Pull Request resolved: #9906 Reviewed By: riversand963 Differential Revision: D35940093 Pulled By: ajkr fbshipit-source-id: 9816f99e1ce405ba36f316beb4f6378c37c8c86b
Summary: Left HISTORY.md and unit tests. Added a new unit test to repro the corruption scenario that this PR fixes, and HISTORY.md line for that. Pull Request resolved: #9906 Reviewed By: riversand963 Differential Revision: D35940093 Pulled By: ajkr fbshipit-source-id: 9816f99e1ce405ba36f316beb4f6378c37c8c86b
Summary:
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.
As a solution,
corrupted wal and smaller than the new WAL will be moved to archive folder.
Test Plan: 1. Added new unit tests
2. make crast_test -j