-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify write thread logic #2134
Conversation
@al13n321 adding you to have one more eye on the logic, since you initially add the logic around log sync synchronization. |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@@ -170,15 +173,13 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options, | |||
} | |||
} | |||
|
|||
uint64_t last_sequence = versions_->LastSequence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of moving up the definition of last_sequence for 50 lines? Is not it better to delay the definition until the variable is actually used? In this way a code reader has less logic to keep in mind when reading the code.
I know in the classic programming it was suggested to put all the variable definitions together but in such a long function i do not see much benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving simply because I need last_sequence
when I call versions_->SetLastSequence
at the end.
@@ -213,26 +214,18 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options, | |||
PERF_TIMER_GUARD(write_memtable_time); | |||
|
|||
if (!parallel) { | |||
status = WriteBatchInternal::InsertInto( | |||
w.status = WriteBatchInternal::InsertInto( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this change?
// | ||
// Is setting bg_error_ enough here? This will at least stop | ||
// compaction and fail any further writes. | ||
if (!status.ok() && !w.CallbackFailed()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for CallbackFailed is not done when calling UpdateBackgroundError. Can you explain why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if callback fail we should never try insert into memtable. I call UpdateBackgroundError
only for memtable insert error. Any suggestion to renaming/change of logic?
} | ||
} | ||
} | ||
PERF_TIMER_START(write_pre_and_post_process_time); | ||
|
||
// | ||
// Is setting bg_error_ enough here? This will at least stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is move from another location where we set bg_error_ independent of paranoid_check status. Can you explain more what it means and why it made sense to move it down here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should move it to insideUpdateBackgroundError
.
// CompleteParallelWorker returns true if this thread should | ||
// handle exit, false means somebody else did | ||
exit_completed_early = !write_thread_.CompleteParallelWorker(&w); | ||
status = w.FinalStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we always set status whether or not its current value is ok. However later in Line 277 we set it only if status.ok(). This two behavior does not seem equivalent to me.
write_thread_.ExitAsBatchGroupLeader(&w, last_writer, w.status); | ||
} | ||
|
||
if (status.ok()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to my comment at Line 253 where status is updated independent of its current value.
@@ -326,7 +319,7 @@ Status DBImpl::PreprocessWrite(const WriteOptions& write_options, | |||
} | |||
|
|||
if (UNLIKELY(status.ok() && !bg_error_.ok())) { | |||
status = bg_error_; | |||
return bg_error_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to also check the return value of PreprocessWrite and if it is not OK return immediately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can. But then there will be one more place we need to have paranoid check.
} | ||
} | ||
} | ||
PERF_TIMER_START(write_pre_and_post_process_time); | ||
|
||
// | ||
// Is setting bg_error_ enough here? This will at least stop | ||
// compaction and fail any further writes. | ||
if (immutable_db_options_.paranoid_checks && !status.ok() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check was previously done by leader AFTER writing to memtable since we would know that leader is the last one finishing when there is an error. In the refactored version i) we do it BEFORE writing to memtable, ii) only the leader does it if it is not the last one finishing. Can you explain more how these two are equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status
previously include memtable insert error, with the patch now it represent any error except memtable insert error. Memtable insert error is merged intostatus
right before return fromWriteImpl
.- Paranoid check don't include memtable insert error, same as previous.
- leader now execute paranoid check in parallel working phase. It is executed no matter whether it is last one finishing.
I agree this introduce new confusion after the patch. Any suggestion to make the logic clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (assuming that the tests pass)
Summary: The concept about early exit in write thread implementation is a confusing one. It means that if early exit is allowed, batch group leader will not responsible to exit the batch group, but the last finished writer do. In case we need to mark log synced, or encounter memtable insert error, early exit is disallowed. This patch remove such a concept by: * In all cases, the last finished writer (not necessary leader) is responsible to exit batch group. * In case of parallel memtable write, leader will also mark log synced after memtable insert and before signal finish (call `CompleteParallelWorker()`). The purpose is to allow mark log synced (which require locking mutex) can run in parallel to memtable insert in other writers. * The last finish writer should handle memtable insert error (update bg_error_) before exiting batch group. Closes #2134 Differential Revision: D4869667 Pulled By: yiwu-arbug fbshipit-source-id: aec170847c85b90f4179d6a4608a4fe1361544e3
Summary:
The concept about early exit in write thread implementation is a confusing one. It means that if early exit is allowed, batch group leader will not responsible to exit the batch group, but the last finished writer do. In case we need to mark log synced, or encounter memtable insert error, early exit is disallowed.
This patch remove such a concept by:
CompleteParallelWorker()
). The purpose is to allow mark log synced (which require locking mutex) can run in parallel to memtable insert in other writers.Test Plan:
existing tests.