-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New WriteImpl to pipeline WAL/memtable write #2286
Conversation
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
76aa50d
to
b6cc737
Compare
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
db/write_thread.cc
Outdated
} | ||
Writer w; | ||
if (!LinkOne(&w, newest_memtable_writer_)) { | ||
AwaitState(&w, STATE_MEMTABLE_WRITER_LEADER, &ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you make sure the current thread is never group committed into other groups and it is not the leader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In EnterAsMemTableWriter, a leader will never pick a follower with batch == nullptr.
It's nice. This is my last comment, and I look forward to more unit tests. |
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
c05c54b
to
fcf2059
Compare
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug updated the pull request - view changes - changes since last import |
6c3a18c
to
533c0b9
Compare
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug updated the pull request - view changes - changes since last import |
88682fa
to
28e3d43
Compare
@yiwu-arbug updated the pull request - view changes - changes since last import |
@siying I fix two tests that was failing with pipelined write enabled (write_callback_test and DBTest.FlushSchedule), and address the comment about dummy writers and the one with WriteGroup::ToVector(). For unit tests it appears MultiThreadedDBTest was very good at catching any error that I have during my development, so I think they are sufficient. Please take a look again when you have time. |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
28e3d43
to
2a93957
Compare
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@yiwu-arbug updated the pull request - view changes - changes since last import |
@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that 1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit. 2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes. Merging #2056 and #2058 into this PR. Closes #2286 Differential Revision: D5054606 Pulled By: yiwu-arbug fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we should add this as a part of daily stress test.
last_batch_group_size_ = | ||
write_thread_.EnterAsBatchGroupLeader(&w, &wal_write_group); | ||
const SequenceNumber current_sequence = | ||
write_thread_.UpdateLastSequence(versions_->LastSequence()) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to check something.
versions_->LastSequence() is only updated after memtable insertion is done.
So that if we have a sequence of:
WAL Write 1, WAL Write 2, Memtable Write 1, will WAL write 1 and 2 get the same sequence number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only use of the logic here is to update write thread's internal last sequence after db open. After that the internal last sequence will always be no less than versions_->LastSequence()
This is something I forget to cleanup. I might rather get rid of the internal last seuqence, put it inside versions_
. Will update with a followup patch.
Sure. Adding it to stress test is a good idea. |
Summary: PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that 1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit. 2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes. Merging #2056 and #2058 into this PR. Closes #2286 Differential Revision: D5054606 Pulled By: yiwu-arbug fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d
assert(checking_set_.count(cfd) == 0); | ||
checking_set_.insert(cfd); | ||
} | ||
std::lock_guard<std::mutex> lock(checking_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this lock is brought to outside the block? The result would be that in debug mode the entire :: ScheduleFlush is protected by a lock and in release mode, it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall the exact reason, but from the patch it seems to prevent race condition with TakeNextColumnFamily
or Empty
. The lock should not be used in release mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems safe to reduce the scope, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yiwu-arbug for chiming in. We should use the same concurrency mechanism in debug mode as in the release mode so if there is a concurrency issue it would show up in our tsan tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock here is to guard the checking_set_
which is used in debug mode to verify correctness. Otherwise FlushScheduler
is lock free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of cause it would be great if we can remove the checking_set_
entirely and come up with another way for verification without interfering the concurrency mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused. Why cannot we simply reduce the scope of the lock back to checking_set_
. Why the entire function has to be protected by the lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left too many comments and probably confuse you. I mean, reduce the scope seems right :)
@yiwu-arbug do you recall why the scope of this lock is extended to entire function? Line 16 in aa56b7e
|
Summary: FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since #2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode). The patch reverts this change to reduce the scope of such locks. Pull Request resolved: #5372 Differential Revision: D15545831 Pulled By: maysamyabandeh fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426
Summary: FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since facebook#2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode). The patch reverts this change to reduce the scope of such locks. Pull Request resolved: facebook#5372 Differential Revision: D15545831 Pulled By: maysamyabandeh fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426
Summary:
PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that
Merging #2056 and #2058 into this PR.
Test Plan:
Set db_options.enable_pipelined_write=(true|false) and run all tests.