Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force a new manifest file if append to current one fails #6331

Closed
wants to merge 3 commits into from

Conversation

anand1976
Copy link
Contributor

Fix for issue #6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.

Test Plan:
Add new unit tests in error_handler_test.cc

ASSERT_NE(new_manifest, old_manifest);

Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check the value of key(1) also?

}
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here

@@ -204,6 +198,12 @@ Status ErrorHandler::SetBGError(const Status& bg_err, BackgroundErrorReason reas

new_bg_err = Status(bg_err, sev);

// Check if recovery is currently in progress. If it is, we will save this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this block just before if (new_bg_err == Status::NoSpace()) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but is there a particular reason? The reason for moving this down here is so recovery_error_ will have the correct severity. This way, the caller of DB::Resume() will receive an error indicating whether they can retry resume later or not. We don't necessarily care how new_bg_err_ is handled from this point on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. Can you remove the ";" in line 210 in this PR?

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the process crashes after closing the previous, potentially corrupted MANIFEST, but before creating the new MANIFEST? Will the db still be able to recover?

DestroyAndReopen(options);
ASSERT_OK(dbfull()->GetLiveFiles(live_files, &manifest_size, false));
for (auto& file : live_files) {
if (file.find("MANIFEST") != std::string::npos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use ParseFileName() here and in a few other places?

ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
Destroy(options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we let DBErrorHandlingTest, a subclass of DBTestBase observe KEEP_DB env variable. See ~DBTestBase(). This allows us to optionally preserve the db directory for debugging purpose.

@anand1976
Copy link
Contributor Author

@riversand963

What if the process crashes after closing the previous, potentially corrupted MANIFEST, but before creating the new MANIFEST? Will the db still be able to recover?

Isn't it the same as basic crash recovery?

@riversand963
Copy link
Contributor

@riversand963

What if the process crashes after closing the previous, potentially corrupted MANIFEST, but before creating the new MANIFEST? Will the db still be able to recover?

Isn't it the same as basic crash recovery?

It is, and will cause later recovery to fail iiuc. Just trying to understand the scope of this fix.

@anand1976
Copy link
Contributor Author

It is, and will cause later recovery to fail iiuc. Just trying to understand the scope of this fix.

Actually, I think there is a difference, especially if the writes are batched. A crash while writing manifest will only result in the last record being partially written, and that will be skipped on open I think. The previous records in the batch will be valid. However, if the DB stays up (either read-only or read-write), the files referenced by the previous records will be deleted and recovery on re-open will fail. I think this is the issue you were trying to fix in #5379?

@riversand963
Copy link
Contributor

It is, and will cause later recovery to fail iiuc. Just trying to understand the scope of this fix.

Actually, I think there is a difference, especially if the writes are batched. A crash while writing manifest will only result in the last record being partially written, and that will be skipped on open I think. The previous records in the batch will be valid. However, if the DB stays up (either read-only or read-write), the files referenced by the previous records will be deleted and recovery on re-open will fail. I think this is the issue you were trying to fix in #5379?

Thanks for reminding me of this. I checked log::Reader, and you are right, the trailing incomplete version edit in the MANIFEST will not be reported as failure.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @anand1976 for the fix.

ASSERT_NE(new_manifest, old_manifest);

Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
Destroy(options);
ASSERT_EQ("val", Get(Key(1)));
Close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Close() at the end of each test is not needed, but it's harmless.

anand76 added 3 commits January 29, 2020 15:41
Summary:
When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operaitons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.

Test Plan:
Add new unit tests in error_handler_test.cc

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@anand1976
Copy link
Contributor Author

Appveyor failure is unrelated.

@anand1976
Copy link
Contributor Author

The FB internal failures are due to an infra issue and unrelated to this PR. So going ahead with landing this PR.

@facebook-github-bot
Copy link
Contributor

@anand1976 merged this pull request in fb05b5a.

anand1976 pushed a commit that referenced this pull request Jan 31, 2020
Summary:
Fix for issue #6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: #6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
anand1976 pushed a commit that referenced this pull request Feb 1, 2020
Summary:
Fix for issue #6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: #6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
yiwu-arbug pushed a commit to tikv/rocksdb that referenced this pull request Feb 5, 2020
Summary:
Fix for issue facebook#6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: facebook#6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
yiwu-arbug pushed a commit to tikv/rocksdb that referenced this pull request Feb 6, 2020
Summary:
Fix for issue facebook#6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: facebook#6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
Signed-off-by: Yi Wu <yiwu@pingcap.com>
yiwu-arbug pushed a commit to tikv/rocksdb that referenced this pull request Feb 19, 2020
Summary:
Fix for issue facebook#6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: facebook#6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
levichen94 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 18, 2021
Summary:
Fix for issue facebook/rocksdb#6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: facebook/rocksdb#6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
Signed-off-by: Changlong Chen <levisonchen@live.cn>
mm304321141 pushed a commit to bytedance/terarkdb that referenced this pull request Jun 23, 2021
Summary:
Fix for issue facebook/rocksdb#6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: facebook/rocksdb#6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
Signed-off-by: Changlong Chen <levisonchen@live.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants