-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WritePrepared: reduce prepared_mutex_ overhead #5420
WritePrepared: reduce prepared_mutex_ overhead #5420
Conversation
053afe0
to
b5c8fed
Compare
@@ -55,25 +55,17 @@ TEST(PreparedHeap, BasicsTest) { | |||
heap.push(34l); | |||
// Test that old min is still on top | |||
ASSERT_EQ(14l, heap.top()); | |||
heap.push(13l); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With ::AddPrepared being called from the main write queue, the push is now called in order. So we should remove tests with unordered calls to push. This will later help to simplify the implementation of PreparedHeap as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this is only true in case of two write queues though? If so, maybe we should just decide to only support two writes queues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is true regardless. I did not mention one write queue because it has always been the case for one write queue to insert into PreparedHeap in order.
But I am in favor of deprecating one-write-queue case, which would make the implementation simpler. We should keep that direction in mind.
{"AddPrepared::begin:pause", "AddPreparedBeforeMax::read_thread:start"}, | ||
{"AdvanceMaxEvictedSeq::update_max:pause", "AddPrepared::begin:resume"}, | ||
{"AddPrepared::end", "AdvanceMaxEvictedSeq::update_max:resume"}, | ||
{"AddPreparedCallback::AddPrepared::begin:pause", "AddPreparedBeforeMax::read_thread:start"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now acquire the mutex outside AddPrepared, the synchronization points were removes from within ::AddPrepared to outside it (from AddPreparedCallback)
@@ -263,16 +263,19 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options, | |||
size_t total_count = 0; | |||
size_t valid_batches = 0; | |||
size_t total_byte_size = 0; | |||
size_t pre_release_callback_cnt = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to db_impl_write.cc are separately approved in #5381
uint64_t log_number, size_t index, | ||
size_t total) override { | ||
assert(index < total); | ||
// To reduce lock intention with the conccurrent prepare requests, lock on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock contention with the concurrent
auto to_be_popped = prepared_txns_.top(); | ||
delayed_prepared_.insert(to_be_popped); | ||
ROCKS_LOG_WARN(info_log_, | ||
"prepared_mutex_ overhead %" PRIu64 " (prep=%" PRIu64 | ||
" new_max=%" PRIu64, | ||
static_cast<uint64_t>(delayed_prepared_.size()), | ||
to_be_popped, new_max); | ||
prepared_txns_.pop(); | ||
prepared_txns_.pop(true /*locked*/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a potential race condition here between the prepared_txns_.top() <= new_max
check and the pop here (ie. is it possible to store into delayed_prepared_empty_
something greater than the new max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delayed_prepared_empty_ is a bool so cannot be greater than uint64_t. But if you mean prepared_txns_.top(), it is supposed to be updated after the pop otherwise the loop would continue indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I meant delayed_prepared_.insert(to_be_popped);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. It is possible. Will submit a fix soon.
I didn't mean to accept actually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
to_be_popped, new_max); | ||
prepared_txns_.pop(); | ||
delayed_prepared_empty_.store(false, std::memory_order_release); | ||
// Need to fetch feresh values of ::top after mutex is acquired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"fresh" values
uint64_t log_number, size_t index, | ||
size_t total) override { | ||
assert(index < total); | ||
// To reduce lock intention with the concurrent prepare requests, lock on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lock contention?
Also, I don't know if I'd call it lock contention because it looks like you actually increase lock contention by holding the lock for longer periods of time (whereas before, it was shorter, but more frequent). I think you're just saving CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I can say "reduce lock acquisition cost".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maysamyabandeh is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@maysamyabandeh merged this pull request in c292dc8. |
Summary: The patch reduces the contention over prepared_mutex_ using these techniques: 1) Move ::RemovePrepared() to be called from the commit callback when we have two write queues. 2) Use two separate mutex for PreparedHeap, one prepared_mutex_ needed for ::RemovePrepared, and one ::push_pop_mutex() needed for ::AddPrepared(). Given that we call ::AddPrepared only from the first write queue and ::RemovePrepared mostly from the 2nd, this will result into each the two write queues not competing with each other over a single mutex. ::RemovePrepared might occasionally need to acquire ::push_pop_mutex() if ::erase() ends up with calling ::pop() 3) Acquire ::push_pop_mutex() on the first callback of the write queue and release it on the last. Pull Request resolved: facebook#5420 Differential Revision: D15741985 Pulled By: maysamyabandeh fbshipit-source-id: 84ce8016007e88bb6e10da5760ba1f0d26347735
The patch reduces the contention over prepared_mutex_ using these techniques: