Force a new manifest file if append to current one fails #6331

anand1976 · 2020-01-24T23:08:49Z

Fix for issue #6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.

Test Plan:
Add new unit tests in error_handler_test.cc

zhichao-cao · 2020-01-24T23:58:26Z

db/error_handler_test.cc

+  ASSERT_NE(new_manifest, old_manifest);
+
+  Reopen(options);
+  ASSERT_EQ("val", Get(Key(0)));


Can we check the value of key(1) also?

zhichao-cao · 2020-01-24T23:59:08Z

db/error_handler_test.cc

+  }
+  ASSERT_NE(new_manifest, old_manifest);
+  Reopen(options);
+  ASSERT_EQ("val", Get(Key(0)));


zhichao-cao · 2020-01-25T00:10:35Z

db/error_handler.cc

@@ -204,6 +198,12 @@ Status ErrorHandler::SetBGError(const Status& bg_err, BackgroundErrorReason reas

  new_bg_err = Status(bg_err, sev);

+  // Check if recovery is currently in progress. If it is, we will save this


Can we move this block just before if (new_bg_err == Status::NoSpace()) ?

We could, but is there a particular reason? The reason for moving this down here is so recovery_error_ will have the correct severity. This way, the caller of DB::Resume() will receive an error indicating whether they can retry resume later or not. We don't necessarily care how new_bg_err_ is handled from this point on.

Thanks for the explanation. Can you remove the ";" in line 210 in this PR?

riversand963

What if the process crashes after closing the previous, potentially corrupted MANIFEST, but before creating the new MANIFEST? Will the db still be able to recover?

riversand963 · 2020-01-27T19:18:28Z

db/error_handler_test.cc

+  DestroyAndReopen(options);
+  ASSERT_OK(dbfull()->GetLiveFiles(live_files, &manifest_size, false));
+  for (auto& file : live_files) {
+    if (file.find("MANIFEST") != std::string::npos) {


Can we use ParseFileName() here and in a few other places?

riversand963 · 2020-01-27T19:28:36Z

db/error_handler_test.cc

+  ASSERT_NE(new_manifest, old_manifest);
+  Reopen(options);
+  ASSERT_EQ("val", Get(Key(0)));
+  Destroy(options);


I suggest that we let DBErrorHandlingTest, a subclass of DBTestBase observe KEEP_DB env variable. See ~DBTestBase(). This allows us to optionally preserve the db directory for debugging purpose.

anand1976 · 2020-01-28T01:37:11Z

@riversand963

What if the process crashes after closing the previous, potentially corrupted MANIFEST, but before creating the new MANIFEST? Will the db still be able to recover?

Isn't it the same as basic crash recovery?

riversand963 · 2020-01-28T19:06:54Z

@riversand963

What if the process crashes after closing the previous, potentially corrupted MANIFEST, but before creating the new MANIFEST? Will the db still be able to recover?

Isn't it the same as basic crash recovery?

It is, and will cause later recovery to fail iiuc. Just trying to understand the scope of this fix.

anand1976 · 2020-01-28T21:00:20Z

It is, and will cause later recovery to fail iiuc. Just trying to understand the scope of this fix.

Actually, I think there is a difference, especially if the writes are batched. A crash while writing manifest will only result in the last record being partially written, and that will be skipped on open I think. The previous records in the batch will be valid. However, if the DB stays up (either read-only or read-write), the files referenced by the previous records will be deleted and recovery on re-open will fail. I think this is the issue you were trying to fix in #5379?

riversand963 · 2020-01-28T21:59:08Z

It is, and will cause later recovery to fail iiuc. Just trying to understand the scope of this fix.

Actually, I think there is a difference, especially if the writes are batched. A crash while writing manifest will only result in the last record being partially written, and that will be skipped on open I think. The previous records in the batch will be valid. However, if the DB stays up (either read-only or read-write), the files referenced by the previous records will be deleted and recovery on re-open will fail. I think this is the issue you were trying to fix in #5379?

Thanks for reminding me of this. I checked log::Reader, and you are right, the trailing incomplete version edit in the MANIFEST will not be reported as failure.

riversand963

LGTM. Thanks @anand1976 for the fix.

riversand963 · 2020-01-29T22:51:44Z

db/error_handler_test.cc

  ASSERT_NE(new_manifest, old_manifest);

  Reopen(options);
  ASSERT_EQ("val", Get(Key(0)));
-  Destroy(options);
+  ASSERT_EQ("val", Get(Key(1)));
+  Close();


This Close() at the end of each test is not needed, but it's harmless.

Summary: When an append/sync of the manifest file fails due to an IO error such as NoSpace, we don't always put the DB in read-only mode. This is true for flush and compactions, as well as foreground operaitons such as column family add/drop, CompactFiles etc. Subsequent changes to the DB will be recorded in the same manifest file, which would have a corrupted record in the middle due to the previous failure. On next DB::Open(), it will fail to process the full manifest and data will be lost. To fix this, we reset VersionSet::descriptor_log_ on append/sync failure, which will force a new manifest file to be written on the next append. Test Plan: Add new unit tests in error_handler_test.cc Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot

@anand1976 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

anand1976 · 2020-01-30T18:05:07Z

Appveyor failure is unrelated.

anand1976 · 2020-01-30T18:38:37Z

The FB internal failures are due to an infra issue and unrelated to this PR. So going ahead with landing this PR.

facebook-github-bot · 2020-01-30T20:10:42Z

@anand1976 merged this pull request in fb05b5a.

Summary: Fix for issue #6316 When an append/sync of the manifest file fails due to an IO error such as NoSpace, we don't always put the DB in read-only mode. This is true for flush and compactions, as well as foreground operatons such as column family add/drop, CompactFiles etc. Subsequent changes to the DB will be recorded in the same manifest file, which would have a corrupted record in the middle due to the previous failure. On next DB::Open(), it will fail to process the full manifest and data will be lost. To fix this, we reset VersionSet::descriptor_log_ on append/sync failure, which will force a new manifest file to be written on the next append. Pull Request resolved: #6331 Test Plan: Add new unit tests in error_handler_test.cc Differential Revision: D19632951 Pulled By: anand1976 fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3

Summary: Fix for issue facebook#6316 When an append/sync of the manifest file fails due to an IO error such as NoSpace, we don't always put the DB in read-only mode. This is true for flush and compactions, as well as foreground operatons such as column family add/drop, CompactFiles etc. Subsequent changes to the DB will be recorded in the same manifest file, which would have a corrupted record in the middle due to the previous failure. On next DB::Open(), it will fail to process the full manifest and data will be lost. To fix this, we reset VersionSet::descriptor_log_ on append/sync failure, which will force a new manifest file to be written on the next append. Pull Request resolved: facebook#6331 Test Plan: Add new unit tests in error_handler_test.cc Differential Revision: D19632951 Pulled By: anand1976 fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3

Summary: Fix for issue facebook#6316 When an append/sync of the manifest file fails due to an IO error such as NoSpace, we don't always put the DB in read-only mode. This is true for flush and compactions, as well as foreground operatons such as column family add/drop, CompactFiles etc. Subsequent changes to the DB will be recorded in the same manifest file, which would have a corrupted record in the middle due to the previous failure. On next DB::Open(), it will fail to process the full manifest and data will be lost. To fix this, we reset VersionSet::descriptor_log_ on append/sync failure, which will force a new manifest file to be written on the next append. Pull Request resolved: facebook#6331 Test Plan: Add new unit tests in error_handler_test.cc Differential Revision: D19632951 Pulled By: anand1976 fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3 Signed-off-by: Yi Wu <yiwu@pingcap.com>

Summary: Fix for issue facebook#6316 When an append/sync of the manifest file fails due to an IO error such as NoSpace, we don't always put the DB in read-only mode. This is true for flush and compactions, as well as foreground operatons such as column family add/drop, CompactFiles etc. Subsequent changes to the DB will be recorded in the same manifest file, which would have a corrupted record in the middle due to the previous failure. On next DB::Open(), it will fail to process the full manifest and data will be lost. To fix this, we reset VersionSet::descriptor_log_ on append/sync failure, which will force a new manifest file to be written on the next append. Pull Request resolved: facebook#6331 Test Plan: Add new unit tests in error_handler_test.cc Differential Revision: D19632951 Pulled By: anand1976 fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3

Summary: Fix for issue facebook/rocksdb#6316 When an append/sync of the manifest file fails due to an IO error such as NoSpace, we don't always put the DB in read-only mode. This is true for flush and compactions, as well as foreground operatons such as column family add/drop, CompactFiles etc. Subsequent changes to the DB will be recorded in the same manifest file, which would have a corrupted record in the middle due to the previous failure. On next DB::Open(), it will fail to process the full manifest and data will be lost. To fix this, we reset VersionSet::descriptor_log_ on append/sync failure, which will force a new manifest file to be written on the next append. Pull Request resolved: facebook/rocksdb#6331 Test Plan: Add new unit tests in error_handler_test.cc Differential Revision: D19632951 Pulled By: anand1976 fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3 Signed-off-by: Changlong Chen <levisonchen@live.cn>

anand1976 requested review from siying, zhichao-cao and riversand963 January 24, 2020 23:08

facebook-github-bot added the CLA Signed label Jan 24, 2020

anand1976 mentioned this pull request Jan 24, 2020

Database corruption on "No space left on device" #6316

Closed

zhichao-cao reviewed Jan 24, 2020

View reviewed changes

zhichao-cao reviewed Jan 25, 2020

View reviewed changes

zhichao-cao approved these changes Jan 27, 2020

View reviewed changes

riversand963 reviewed Jan 27, 2020

View reviewed changes

riversand963 approved these changes Jan 29, 2020

View reviewed changes

anand76 added 3 commits January 29, 2020 15:41

Address review comments

2d252b0

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Update HISTORY.md

ebe2496

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

anand1976 force-pushed the manifest_write_error branch from 1ec461a to ebe2496 Compare January 29, 2020 23:43

facebook-github-bot reviewed Jan 29, 2020

View reviewed changes

facebook-github-bot closed this in fb05b5a Jan 30, 2020

facebook-github-bot added the Merged label Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force a new manifest file if append to current one fails #6331

Force a new manifest file if append to current one fails #6331

anand1976 commented Jan 24, 2020

zhichao-cao Jan 24, 2020

zhichao-cao Jan 24, 2020

zhichao-cao Jan 25, 2020

anand1976 Jan 27, 2020

zhichao-cao Jan 27, 2020

riversand963 left a comment

riversand963 Jan 27, 2020

riversand963 Jan 27, 2020

anand1976 commented Jan 28, 2020

riversand963 commented Jan 28, 2020

anand1976 commented Jan 28, 2020

riversand963 commented Jan 28, 2020

riversand963 left a comment

riversand963 Jan 29, 2020

facebook-github-bot left a comment

anand1976 commented Jan 30, 2020

anand1976 commented Jan 30, 2020

facebook-github-bot commented Jan 30, 2020

		@@ -204,6 +198,12 @@ Status ErrorHandler::SetBGError(const Status& bg_err, BackgroundErrorReason reas

		new_bg_err = Status(bg_err, sev);

		// Check if recovery is currently in progress. If it is, we will save this

Force a new manifest file if append to current one fails #6331

Force a new manifest file if append to current one fails #6331

Conversation

anand1976 commented Jan 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

riversand963 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anand1976 commented Jan 28, 2020

riversand963 commented Jan 28, 2020

anand1976 commented Jan 28, 2020

riversand963 commented Jan 28, 2020

riversand963 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

anand1976 commented Jan 30, 2020

anand1976 commented Jan 30, 2020

facebook-github-bot commented Jan 30, 2020