[RFC] Disable automatic error recovery for user write failures #8321

anand1976 · 2021-05-20T22:44:41Z

In the current error recovery logic, background write errors during flush/compaction are automatically retried under some circumstances (NoSpace, retryable errors in distributed file systems). Normally, these errors are not visible to the user and we can try to recover from them in the background. However, if recovery takes a long time, the memtables eventually would become full with buffered writes and writes will be stopped by the write controller. When that happens, we currently return Status::Incomplete rather than indefinitely hang the write thread. There are 2 problems with this approach -

The Incomplete error may not be handled correctly by the user. It used to be returned only when the write_options.no_slowdown was set.
Other writes may be queued behind the incomplete write. If the background error recovery succeeds, the queued writes may be successful, which might cause inconsistency, especially with TransactionDB.

The solution is to stop all further writes once we return an error for a user write. This is accomplished in this PR as follows -

When the write controller stops writes and there is a background error, stop all further writes by setting the severity in bg_error_ to kHardError.
Return the bg_error_ rather than Status::Incomplete
Disable automatic error recovery in TransactionDB::Open() by setting db_options.max_bgerror_resume_count to 0. (Is this still required if we have Miss Spelling in README #1?)

zhichao-cao · 2021-05-20T23:55:54Z

db/error_handler.cc

+    new_bg_err = OverrideNoSpaceError(new_bg_err, reason, &auto_recovery);
+  }
+
+  if ((!db_options_.max_bgerror_resume_count || !auto_recovery) &&


How about compaction? Compaction do not do auto recovery and it just reschedule by itself. We set it to soft error. Should it be hard error?

If the user has disabled auto recovery, I think we should set it to hard error even for compaction. Otherwise, too many pending compactions could also lead to a write stall.

siying · 2021-05-21T17:42:35Z

db/db_impl/db_impl_write.cc

+    // error. Since the background error is now user visible and caused a
+    // write to fail, stop the DB and fail subsequent writes as well. There
+    // may be other writes in the queue and might cause inconsistency if the
+    // recovery succeeds and the queued writes are allowed to go through.


It's true that if users use RocksDB in certain way, as what MyRocks does, any write error might not be recoverable. How about other use cases, where writes are more or less independent, so one write failure can be skipped or independently retried later?

That's a good point. Should we introduce an option to control this behavior? And perhaps it could apply to other user write failures, such as IO error during WAL append. The automatic recovery will flush memtables and create a new WAL if there was a WAL append failure. Any writes during that time will be failed, but subsequent writes will succeed.

I think if we want to do that, we probably should go with an option. Do we have an API that allows users to manually recover? If we do, the option can be for manual recovery only.

Yes, DB::Resume() allows users to manually recover. I added an option to freeze the DB on a user write failure. I kept it independent of auto recovery, since the auto recovery can continue as long as its confined to the background and not visible to the user. If the freeze options is set and there is a user visible failure (either due to reason kWriteCallback or write controller stoppage), then we put the DB in read-only mode and cancel any ongoing recovery.

Disable automatic error recovery for user write failures

5bd34be

anand1976 requested review from siying and zhichao-cao May 20, 2021 22:44

facebook-github-bot added the CLA Signed label May 20, 2021

zhichao-cao reviewed May 20, 2021

View reviewed changes

siying reviewed May 21, 2021

View reviewed changes

anand76 added 2 commits May 25, 2021 12:07

Add an option to stop all further writes on a write error

a857e89

Freeze the DB if an error happens with reason kWriteCallback

dfbd949

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Disable automatic error recovery for user write failures #8321

[RFC] Disable automatic error recovery for user write failures #8321

Uh oh!

anand1976 commented May 20, 2021 •

edited

Loading

Uh oh!

zhichao-cao May 20, 2021

Uh oh!

anand1976 May 21, 2021

Uh oh!

siying May 21, 2021

Uh oh!

anand1976 May 21, 2021

Uh oh!

siying May 24, 2021

Uh oh!

anand1976 May 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[RFC] Disable automatic error recovery for user write failures #8321

Are you sure you want to change the base?

[RFC] Disable automatic error recovery for user write failures #8321

Uh oh!

Conversation

anand1976 commented May 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhichao-cao May 20, 2021

Choose a reason for hiding this comment

Uh oh!

anand1976 May 21, 2021

Choose a reason for hiding this comment

Uh oh!

siying May 21, 2021

Choose a reason for hiding this comment

Uh oh!

anand1976 May 21, 2021

Choose a reason for hiding this comment

Uh oh!

siying May 24, 2021

Choose a reason for hiding this comment

Uh oh!

anand1976 May 25, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anand1976 commented May 20, 2021 •

edited

Loading