Revise LockWAL/UnlockWAL implementation #11020

riversand963 · 2022-12-06T23:13:19Z

RocksDB has two public APIs: DB::LockWAL()/DB::UnlockWAL(). The current implementation acquires and
releases the internal DBImpl::log_write_mutex_.

According to the comment on DBImpl::log_write_mutex_: https://github.com/facebook/rocksdb/blob/7.8.fb/db/db_impl/db_impl.h#L2287:L2288

Note: to avoid dealock, if needed to acquire both log_write_mutex_ and mutex_, the order should be first mutex_ and then log_write_mutex_.

This puts limitations on how applications can use the LockWAL() API. After LockWAL() returns ok, then application
should not perform any operation that acquires mutex_. Currently, the use case of LockWAL() is MyRocks implementing
the MySQL storage engine handlerton lock_hton_log interface. The operation that MyRocks performs after LockWAL()
is GetSortedWalFiless() which not only acquires mutex_, but also log_write_mutex_.

There are two issues:

Applications using these two APIs may hang if one thread calls GetSortedWalFiles() after
calling LockWAL() because log_write_mutex is not recursive.
Two threads may dead lock due to lock order inversion.

To fix these issues, we can modify the implementation of LockWAL so that it does not keep
log_write_mutex_ held until UnlockWAL. To achieve the goal of locking the WAL, we can
instead manually inject a write stall so that all future writes will be stopped.

Test plan:
make check

facebook-github-bot · 2022-12-06T23:15:43Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-12-07T05:13:48Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-07T05:15:59Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-12-07T05:54:32Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-07T05:56:41Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963 · 2022-12-07T05:58:40Z

Previous impl can be found at #5146

riversand963 · 2022-12-07T06:16:00Z

db/db_impl/db_impl_write.cc

@@ -924,6 +924,38 @@ Status DBImpl::WriteImplWALOnly(
      write_thread->ExitAsBatchGroupLeader(write_group, status);
      return status;
    }
+  } else {
+    // TODO(yanqin): maybe move this block into a refactored version of


Maybe I can at least put this block into a helper function

ajkr

Thanks for taking this approach. Was wondering about a few more possible simplifications.

db/db_impl/db_impl.cc

db/db_impl/db_impl_write.cc

facebook-github-bot · 2022-12-13T19:16:10Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-13T19:27:03Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr

LGTM!

db/db_impl/db_impl_write.cc

facebook-github-bot · 2022-12-13T21:19:52Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

riversand963 · 2022-12-13T21:22:51Z

Thanks @ajkr for the review!

facebook-github-bot · 2022-12-13T21:23:09Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-12-13T21:30:15Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-13T21:31:43Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-12-13T21:39:04Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-13T21:41:24Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963 · 2022-12-13T22:45:50Z

Looking into flaky tests.

facebook-github-bot · 2022-12-13T23:20:33Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-13T23:40:51Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-14T03:42:08Z

@riversand963 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-12-14T03:43:44Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-12-14T05:47:12Z

@riversand963 merged this pull request in c93ba7d.

Summary: PR facebook#11020 fixed a case where it was easy to deadlock the DB with LockWAL() but introduced a bug showing up as a rare assertion failure in the stress test. Specifically, `assert(w->state == STATE_INIT)` in `WriteThread::LinkOne()` called from `BeginWriteStall()`, `DelayWrite()`, `WriteImplWALOnly()`. I haven't been about to generate a unit test that reproduces this failure but I believe the root cause is that DelayWrite() was never meant to be re-entrant, only called from the DB's write_thread_ leader. facebook#11020 introduced a call to DelayWrite() from the nonmem_write_thread_ group leader. This fix is to make DelayWrite() apply to the specific write queue that it is being called from (inject a dummy write stall entry to the head of the appropriate write queue). WriteController is re-entrant, based on polling and state changes signalled with bg_cv_, so can manage stalling two queues. The only anticipated complication (called out by Andrew in previous PR) is that we don't want timed write delays being injected in parallel for the two queues, because that dimishes the intended throttling effect. Thus, we only allow timed delays for the primary write queue. Test Plan: Although I was not able to reproduce the assertion failure, I was able to reproduce a distinct flaw with what I believe is the same root cause: a kind of deadlock if both write queues need to wake up from stopped writes. Only one will be waiting on bg_cv_ (the other waiting in `LinkOne()` for the write queue to open up), so a single SignalAll() will only unblock one of the queues, with the other re-instating the stop until another signal on bg_cv_. A simple unit test is added for this case. Will also run crash_test_with_multiops_wc_txn for a while looking for issues.

Summary: PR #11020 fixed a case where it was easy to deadlock the DB with LockWAL() but introduced a bug showing up as a rare assertion failure in the stress test. Specifically, `assert(w->state == STATE_INIT)` in `WriteThread::LinkOne()` called from `BeginWriteStall()`, `DelayWrite()`, `WriteImplWALOnly()`. I haven't been about to generate a unit test that reproduces this failure but I believe the root cause is that DelayWrite() was never meant to be re-entrant, only called from the DB's write_thread_ leader. #11020 introduced a call to DelayWrite() from the nonmem_write_thread_ group leader. This fix is to make DelayWrite() apply to the specific write queue that it is being called from (inject a dummy write stall entry to the head of the appropriate write queue). WriteController is re-entrant, based on polling and state changes signalled with bg_cv_, so can manage stalling two queues. The only anticipated complication (called out by Andrew in previous PR) is that we don't want timed write delays being injected in parallel for the two queues, because that dimishes the intended throttling effect. Thus, we only allow timed delays for the primary write queue. HISTORY not updated because this is intended for the same release where the bug was introduced. Pull Request resolved: #11130 Test Plan: Although I was not able to reproduce the assertion failure, I was able to reproduce a distinct flaw with what I believe is the same root cause: a kind of deadlock if both write queues need to wake up from stopped writes. Only one will be waiting on bg_cv_ (the other waiting in `LinkOne()` for the write queue to open up), so a single SignalAll() will only unblock one of the queues, with the other re-instating the stop until another signal on bg_cv_. A simple unit test is added for this case. Will also run crash_test_with_multiops_wc_txn for a while looking for issues. Reviewed By: ajkr Differential Revision: D42749330 Pulled By: pdillinger fbshipit-source-id: 4317dd899a93d57c26fd5af7143038f82d4d4d1b

Summary: PR facebook#11020 fixed a case where it was easy to deadlock the DB with LockWAL() but introduced a bug showing up as a rare assertion failure in the stress test. Specifically, `assert(w->state == STATE_INIT)` in `WriteThread::LinkOne()` called from `BeginWriteStall()`, `DelayWrite()`, `WriteImplWALOnly()`. I haven't been about to generate a unit test that reproduces this failure but I believe the root cause is that DelayWrite() was never meant to be re-entrant, only called from the DB's write_thread_ leader. facebook#11020 introduced a call to DelayWrite() from the nonmem_write_thread_ group leader. This fix is to make DelayWrite() apply to the specific write queue that it is being called from (inject a dummy write stall entry to the head of the appropriate write queue). WriteController is re-entrant, based on polling and state changes signalled with bg_cv_, so can manage stalling two queues. The only anticipated complication (called out by Andrew in previous PR) is that we don't want timed write delays being injected in parallel for the two queues, because that dimishes the intended throttling effect. Thus, we only allow timed delays for the primary write queue. HISTORY not updated because this is intended for the same release where the bug was introduced. Pull Request resolved: facebook#11130 Test Plan: Although I was not able to reproduce the assertion failure, I was able to reproduce a distinct flaw with what I believe is the same root cause: a kind of deadlock if both write queues need to wake up from stopped writes. Only one will be waiting on bg_cv_ (the other waiting in `LinkOne()` for the write queue to open up), so a single SignalAll() will only unblock one of the queues, with the other re-instating the stop until another signal on bg_cv_. A simple unit test is added for this case. Will also run crash_test_with_multiops_wc_txn for a while looking for issues. Reviewed By: ajkr Differential Revision: D42749330 Pulled By: pdillinger fbshipit-source-id: 4317dd899a93d57c26fd5af7143038f82d4d4d1b

riversand963 requested a review from ajkr December 6, 2022 23:13

facebook-github-bot added the CLA Signed label Dec 6, 2022

riversand963 mentioned this pull request Dec 6, 2022

Revise LockWAL API and WAL collection #10953

Closed

riversand963 mentioned this pull request Dec 6, 2022

pfs.log_status query hangs if rocksdb_file_deletions were not disabled beforehand facebook/mysql-5.6#1253

Closed

riversand963 commented Dec 7, 2022

View reviewed changes

ajkr requested changes Dec 13, 2022

View reviewed changes

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved

db/db_impl/db_impl_write.cc Outdated Show resolved Hide resolved

ajkr approved these changes Dec 13, 2022

View reviewed changes

db/db_impl/db_impl_write.cc Outdated Show resolved Hide resolved

riversand963 force-pushed the fix-deadlock-2 branch from 2ab98d3 to b54eedf Compare December 13, 2022 21:30

riversand963 added 8 commits December 13, 2022 13:36

init

2df3779

Unit test

dbc4fb6

Do not overwrite status

07dd98e

Fix CI

5b83532

Some comments

c036176

Address review comments

9518ace

address comments

f08cee1

Update HISTORY

60d0294

riversand963 force-pushed the fix-deadlock-2 branch from b54eedf to 60d0294 Compare December 13, 2022 21:39

Add sync point to deflake a test

a3171f2

minor

d2e616a

Remove assertion to account for manual write stop by LockWal

5ef8c38

facebook-github-bot closed this in c93ba7d Dec 14, 2022

facebook-github-bot added the Merged label Dec 14, 2022

riversand963 deleted the fix-deadlock-2 branch December 14, 2022 06:03

hermanlee mentioned this pull request Jan 11, 2023

MyRocks Clone Plugin Support facebook/mysql-5.6#1250

Closed

pdillinger mentioned this pull request Jan 25, 2023

Fix DelayWrite() calls for two_write_queues #11130

Closed

Yuval-Ariel mentioned this pull request May 4, 2023

unit tests: transaction_test UnlockWALStallCleared fails speedb-io/speedb#493

Closed

igorcanadi mentioned this pull request Jan 17, 2024

[SYS-6913] Upgrade RocksDB-Cloud to 8.9.1 rockset/rocksdb-cloud#315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise LockWAL/UnlockWAL implementation #11020

Revise LockWAL/UnlockWAL implementation #11020

riversand963 commented Dec 6, 2022 •

edited

Loading

facebook-github-bot commented Dec 6, 2022

facebook-github-bot commented Dec 7, 2022

facebook-github-bot commented Dec 7, 2022

facebook-github-bot commented Dec 7, 2022

facebook-github-bot commented Dec 7, 2022

riversand963 commented Dec 7, 2022

riversand963 Dec 7, 2022

ajkr left a comment

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

ajkr left a comment

facebook-github-bot commented Dec 13, 2022

riversand963 commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

riversand963 commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 14, 2022

facebook-github-bot commented Dec 14, 2022

facebook-github-bot commented Dec 14, 2022

Revise LockWAL/UnlockWAL implementation #11020

Revise LockWAL/UnlockWAL implementation #11020

Conversation

riversand963 commented Dec 6, 2022 • edited Loading

facebook-github-bot commented Dec 6, 2022

facebook-github-bot commented Dec 7, 2022

facebook-github-bot commented Dec 7, 2022

facebook-github-bot commented Dec 7, 2022

facebook-github-bot commented Dec 7, 2022

riversand963 commented Dec 7, 2022

riversand963 Dec 7, 2022

Choose a reason for hiding this comment

ajkr left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

ajkr left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 13, 2022

riversand963 commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

riversand963 commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 13, 2022

facebook-github-bot commented Dec 14, 2022

facebook-github-bot commented Dec 14, 2022

facebook-github-bot commented Dec 14, 2022

riversand963 commented Dec 6, 2022 •

edited

Loading