WritePrepared: reduce prepared_mutex_ overhead #5420

maysamyabandeh · 2019-06-06T00:03:36Z

The patch reduces the contention over prepared_mutex_ using these techniques:

Move ::RemovePrepared() to be called from the commit callback when we have two write queues.
Use two separate mutex for PreparedHeap, one prepared_mutex_ needed for ::RemovePrepared, and one ::push_pop_mutex() needed for ::AddPrepared(). Given that we call ::AddPrepared only from the first write queue and ::RemovePrepared mostly from the 2nd, this will result into each the two write queues not competing with each other over a single mutex. ::RemovePrepared might occasionally need to acquire ::push_pop_mutex() if ::erase() ends up with calling ::pop()
Acquire ::push_pop_mutex() on the first callback of the write queue and release it on the last.

maysamyabandeh · 2019-06-06T22:02:58Z

utilities/transactions/write_prepared_transaction_test.cc

@@ -55,25 +55,17 @@ TEST(PreparedHeap, BasicsTest) {
  heap.push(34l);
  // Test that old min is still on top
  ASSERT_EQ(14l, heap.top());
-  heap.push(13l);


With ::AddPrepared being called from the main write queue, the push is now called in order. So we should remove tests with unordered calls to push. This will later help to simplify the implementation of PreparedHeap as well.

Is this is only true in case of two write queues though? If so, maybe we should just decide to only support two writes queues.

No it is true regardless. I did not mention one write queue because it has always been the case for one write queue to insert into PreparedHeap in order.
But I am in favor of deprecating one-write-queue case, which would make the implementation simpler. We should keep that direction in mind.

maysamyabandeh · 2019-06-06T22:04:13Z

utilities/transactions/write_prepared_transaction_test.cc

-      {"AddPrepared::begin:pause", "AddPreparedBeforeMax::read_thread:start"},
-      {"AdvanceMaxEvictedSeq::update_max:pause", "AddPrepared::begin:resume"},
-      {"AddPrepared::end", "AdvanceMaxEvictedSeq::update_max:resume"},
+      {"AddPreparedCallback::AddPrepared::begin:pause", "AddPreparedBeforeMax::read_thread:start"},


Since we now acquire the mutex outside AddPrepared, the synchronization points were removes from within ::AddPrepared to outside it (from AddPreparedCallback)

maysamyabandeh · 2019-06-06T22:07:48Z

db/db_impl/db_impl_write.cc

@@ -263,16 +263,19 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
    size_t total_count = 0;
    size_t valid_batches = 0;
    size_t total_byte_size = 0;
+    size_t pre_release_callback_cnt = 0;


The changes to db_impl_write.cc are separately approved in #5381

lth · 2019-06-07T21:41:44Z

utilities/transactions/write_prepared_txn_db.h

+                          uint64_t log_number, size_t index,
+                          size_t total) override {
+    assert(index < total);
+    // To reduce lock intention with the conccurrent prepare requests, lock on


lock contention with the concurrent

lth · 2019-06-07T21:48:17Z

utilities/transactions/write_prepared_txn_db.cc

    auto to_be_popped = prepared_txns_.top();
    delayed_prepared_.insert(to_be_popped);
    ROCKS_LOG_WARN(info_log_,
                   "prepared_mutex_ overhead %" PRIu64 " (prep=%" PRIu64
                   " new_max=%" PRIu64,
                   static_cast<uint64_t>(delayed_prepared_.size()),
                   to_be_popped, new_max);
-    prepared_txns_.pop();
+    prepared_txns_.pop(true /*locked*/);


Is there a potential race condition here between the prepared_txns_.top() <= new_max check and the pop here (ie. is it possible to store into delayed_prepared_empty_ something greater than the new max?

delayed_prepared_empty_ is a bool so cannot be greater than uint64_t. But if you mean prepared_txns_.top(), it is supposed to be updated after the pop otherwise the loop would continue indefinitely.

Ah, I meant delayed_prepared_.insert(to_be_popped);

Oh, yes. It is possible. Will submit a fix soon.

lth · 2019-06-07T21:54:06Z

I didn't mean to accept actually.

lth

Looks good to me.

lth · 2019-06-10T15:34:51Z

utilities/transactions/write_prepared_txn_db.cc

-                   to_be_popped, new_max);
-    prepared_txns_.pop();
-    delayed_prepared_empty_.store(false, std::memory_order_release);
+    // Need to fetch feresh values of ::top after mutex is acquired


"fresh" values

lth · 2019-06-10T17:06:12Z

utilities/transactions/write_prepared_txn_db.h

+                          uint64_t log_number, size_t index,
+                          size_t total) override {
+    assert(index < total);
+    // To reduce lock intention with the concurrent prepare requests, lock on


Lock contention?

Also, I don't know if I'd call it lock contention because it looks like you actually increase lock contention by holding the lock for longer periods of time (whereas before, it was shorter, but more frequent). I think you're just saving CPU?

Perhaps I can say "reduce lock acquisition cost".

facebook-github-bot

@maysamyabandeh is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-06-10T19:09:26Z

@maysamyabandeh merged this pull request in c292dc8.

Summary: The patch reduces the contention over prepared_mutex_ using these techniques: 1) Move ::RemovePrepared() to be called from the commit callback when we have two write queues. 2) Use two separate mutex for PreparedHeap, one prepared_mutex_ needed for ::RemovePrepared, and one ::push_pop_mutex() needed for ::AddPrepared(). Given that we call ::AddPrepared only from the first write queue and ::RemovePrepared mostly from the 2nd, this will result into each the two write queues not competing with each other over a single mutex. ::RemovePrepared might occasionally need to acquire ::push_pop_mutex() if ::erase() ends up with calling ::pop() 3) Acquire ::push_pop_mutex() on the first callback of the write queue and release it on the last. Pull Request resolved: facebook#5420 Differential Revision: D15741985 Pulled By: maysamyabandeh fbshipit-source-id: 84ce8016007e88bb6e10da5760ba1f0d26347735

facebook-github-bot added the CLA Signed label Jun 6, 2019

myabandeh added 5 commits June 6, 2019 14:47

Pass the order to PreReleaseCallback

1a4a243

Lock prepared_mutex_ once per write group

bfd44d8

Moce RemovePrepared to callback

f6949a8

Move back RemovePrepared for 1 write queue

2813eb7

cleanup

b5c8fed

maysamyabandeh force-pushed the reducepreparemutexoverhead branch from 053afe0 to b5c8fed Compare June 6, 2019 21:47

maysamyabandeh requested a review from lth June 6, 2019 21:58

maysamyabandeh commented Jun 6, 2019

View reviewed changes

myabandeh added 2 commits June 6, 2019 15:20

make format

b111a1a

fix stress test failure

24d6458

lth approved these changes Jun 7, 2019

View reviewed changes

lth self-requested a review June 7, 2019 21:53

fix the race condition

6e6d68b

lth approved these changes Jun 10, 2019

View reviewed changes

apply comments

2fe7955

facebook-github-bot reviewed Jun 10, 2019

View reviewed changes

facebook-github-bot closed this in c292dc8 Jun 10, 2019

facebook-github-bot added the Merged label Jun 10, 2019

maysamyabandeh mentioned this pull request Jun 10, 2019

Pass the order to PreReleaseCallback #5381

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WritePrepared: reduce prepared_mutex_ overhead #5420

WritePrepared: reduce prepared_mutex_ overhead #5420

maysamyabandeh commented Jun 6, 2019 •

edited

Loading

maysamyabandeh Jun 6, 2019

lth Jun 7, 2019

maysamyabandeh Jun 7, 2019

maysamyabandeh Jun 6, 2019

maysamyabandeh Jun 6, 2019

lth Jun 7, 2019

lth Jun 7, 2019

maysamyabandeh Jun 7, 2019

lth Jun 7, 2019

maysamyabandeh Jun 7, 2019

lth commented Jun 7, 2019

lth left a comment

lth Jun 10, 2019

lth Jun 10, 2019

maysamyabandeh Jun 10, 2019

facebook-github-bot left a comment

facebook-github-bot commented Jun 10, 2019

WritePrepared: reduce prepared_mutex_ overhead #5420

WritePrepared: reduce prepared_mutex_ overhead #5420

Conversation

maysamyabandeh commented Jun 6, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lth commented Jun 7, 2019

lth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 10, 2019

maysamyabandeh commented Jun 6, 2019 •

edited

Loading