Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New WriteImpl to pipeline WAL/memtable write #2286

Closed
wants to merge 19 commits into from

Conversation

yiwu-arbug
Copy link
Contributor

@yiwu-arbug yiwu-arbug commented May 12, 2017

Summary:
PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that

  1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit.
  2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes.

Merging #2056 and #2058 into this PR.

Test Plan:
Set db_options.enable_pipelined_write=(true|false) and run all tests.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@yiwu-arbug yiwu-arbug changed the title Add pipeline write option (pipeline write part 2) New WriteImpl to pipeline WAL/memtable write May 12, 2017
@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

}
Writer w;
if (!LinkOne(&w, newest_memtable_writer_)) {
AwaitState(&w, STATE_MEMTABLE_WRITER_LEADER, &ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make sure the current thread is never group committed into other groups and it is not the leader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EnterAsMemTableWriter, a leader will never pick a follower with batch == nullptr.

@siying
Copy link
Contributor

siying commented May 12, 2017

It's nice. This is my last comment, and I look forward to more unit tests.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@yiwu-arbug
Copy link
Contributor Author

@siying I fix two tests that was failing with pipelined write enabled (write_callback_test and DBTest.FlushSchedule), and address the comment about dummy writers and the one with WriteGroup::ToVector(). For unit tests it appears MultiThreadedDBTest was very good at catching any error that I have during my development, so I think they are sufficient. Please take a look again when you have time.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug updated the pull request - view changes - changes since last import

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

yiwu-arbug pushed a commit that referenced this pull request May 19, 2017
Summary:
PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that
1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit.
2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes.

Merging #2056 and #2058 into this PR.
Closes #2286

Differential Revision: D5054606

Pulled By: yiwu-arbug

fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d
Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we should add this as a part of daily stress test.

last_batch_group_size_ =
write_thread_.EnterAsBatchGroupLeader(&w, &wal_write_group);
const SequenceNumber current_sequence =
write_thread_.UpdateLastSequence(versions_->LastSequence()) + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to check something.

versions_->LastSequence() is only updated after memtable insertion is done.

So that if we have a sequence of:

WAL Write 1, WAL Write 2, Memtable Write 1, will WAL write 1 and 2 get the same sequence number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only use of the logic here is to update write thread's internal last sequence after db open. After that the internal last sequence will always be no less than versions_->LastSequence() This is something I forget to cleanup. I might rather get rid of the internal last seuqence, put it inside versions_. Will update with a followup patch.

@yiwu-arbug yiwu-arbug deleted the pipeline_option branch May 20, 2017 02:55
@yiwu-arbug
Copy link
Contributor Author

Sure. Adding it to stress test is a good idea.

yiwu-arbug pushed a commit that referenced this pull request May 30, 2017
Summary:
PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that
1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit.
2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes.

Merging #2056 and #2058 into this PR.
Closes #2286

Differential Revision: D5054606

Pulled By: yiwu-arbug

fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d
assert(checking_set_.count(cfd) == 0);
checking_set_.insert(cfd);
}
std::lock_guard<std::mutex> lock(checking_mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this lock is brought to outside the block? The result would be that in debug mode the entire :: ScheduleFlush is protected by a lock and in release mode, it is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the exact reason, but from the patch it seems to prevent race condition with TakeNextColumnFamily or Empty. The lock should not be used in release mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems safe to reduce the scope, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yiwu-arbug for chiming in. We should use the same concurrency mechanism in debug mode as in the release mode so if there is a concurrency issue it would show up in our tsan tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock here is to guard the checking_set_ which is used in debug mode to verify correctness. Otherwise FlushScheduler is lock free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of cause it would be great if we can remove the checking_set_ entirely and come up with another way for verification without interfering the concurrency mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. Why cannot we simply reduce the scope of the lock back to checking_set_. Why the entire function has to be protected by the lock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left too many comments and probably confuse you. I mean, reduce the scope seems right :)

@maysamyabandeh
Copy link
Contributor

maysamyabandeh commented Apr 24, 2019

@yiwu-arbug do you recall why the scope of this lock is extended to entire function?

std::lock_guard<std::mutex> lock(checking_mutex_);

facebook-github-bot pushed a commit that referenced this pull request Jun 10, 2019
Summary:
FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since #2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode).
The patch reverts this change to reduce the scope of such locks.
Pull Request resolved: #5372

Differential Revision: D15545831

Pulled By: maysamyabandeh

fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426
vagogte pushed a commit to vagogte/rocksdb that referenced this pull request Jun 18, 2019
Summary:
FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since facebook#2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode).
The patch reverts this change to reduce the scope of such locks.
Pull Request resolved: facebook#5372

Differential Revision: D15545831

Pulled By: maysamyabandeh

fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants