New WriteImpl to pipeline WAL/memtable write #2286

yiwu-arbug · 2017-05-12T20:01:11Z

Summary:
PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that

Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit.
Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes.

Merging #2056 and #2058 into this PR.

Test Plan:
Set db_options.enable_pipelined_write=(true|false) and run all tests.

facebook-github-bot · 2017-05-12T20:04:38Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-05-12T20:26:15Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-12T20:30:26Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

siying · 2017-05-12T20:56:52Z

db/write_thread.cc

+  }
+  Writer w;
+  if (!LinkOne(&w, newest_memtable_writer_)) {
+    AwaitState(&w, STATE_MEMTABLE_WRITER_LEADER, &ctx);


How do you make sure the current thread is never group committed into other groups and it is not the leader?

In EnterAsMemTableWriter, a leader will never pick a follower with batch == nullptr.

siying · 2017-05-12T20:59:09Z

It's nice. This is my last comment, and I look forward to more unit tests.

facebook-github-bot · 2017-05-12T22:47:46Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-12T22:53:46Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-05-12T23:34:05Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-12T23:34:19Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-05-15T18:19:45Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-16T20:31:20Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-16T20:44:55Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-16T20:47:35Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-16T21:04:24Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-16T21:43:55Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-16T23:28:29Z

@yiwu-arbug updated the pull request - view changes - changes since last import

yiwu-arbug · 2017-05-16T23:31:48Z

@siying I fix two tests that was failing with pipelined write enabled (write_callback_test and DBTest.FlushSchedule), and address the comment about dummy writers and the one with WriteGroup::ToVector(). For unit tests it appears MultiThreadedDBTest was very good at catching any error that I have during my development, so I think they are sufficient. Please take a look again when you have time.

facebook-github-bot · 2017-05-16T23:33:31Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-05-18T19:39:02Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-18T19:40:18Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-05-18T19:47:58Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-18T19:48:07Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2017-05-18T21:37:39Z

@yiwu-arbug updated the pull request - view changes - changes since last import

facebook-github-bot · 2017-05-18T21:38:04Z

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that 1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit. 2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes. Merging #2056 and #2058 into this PR. Closes #2286 Differential Revision: D5054606 Pulled By: yiwu-arbug fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d

siying

Also, I think we should add this as a part of daily stress test.

siying · 2017-05-20T01:01:48Z

db/db_impl_write.cc

+    last_batch_group_size_ =
+        write_thread_.EnterAsBatchGroupLeader(&w, &wal_write_group);
+    const SequenceNumber current_sequence =
+        write_thread_.UpdateLastSequence(versions_->LastSequence()) + 1;


I forgot to check something.

versions_->LastSequence() is only updated after memtable insertion is done.

So that if we have a sequence of:

WAL Write 1， WAL Write 2, Memtable Write 1, will WAL write 1 and 2 get the same sequence number?

The only use of the logic here is to update write thread's internal last sequence after db open. After that the internal last sequence will always be no less than versions_->LastSequence() This is something I forget to cleanup. I might rather get rid of the internal last seuqence, put it inside versions_. Will update with a followup patch.

yiwu-arbug · 2017-05-20T02:56:04Z

Sure. Adding it to stress test is a good idea.

Summary: PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that 1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit. 2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes. Merging #2056 and #2058 into this PR. Closes #2286 Differential Revision: D5054606 Pulled By: yiwu-arbug fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d

maysamyabandeh · 2019-04-24T21:48:54Z

db/flush_scheduler.cc

-    assert(checking_set_.count(cfd) == 0);
-    checking_set_.insert(cfd);
-  }
+  std::lock_guard<std::mutex> lock(checking_mutex_);


I wonder why this lock is brought to outside the block? The result would be that in debug mode the entire :: ScheduleFlush is protected by a lock and in release mode, it is not.

I don't recall the exact reason, but from the patch it seems to prevent race condition with TakeNextColumnFamily or Empty. The lock should not be used in release mode.

It seems safe to reduce the scope, though.

Thanks @yiwu-arbug for chiming in. We should use the same concurrency mechanism in debug mode as in the release mode so if there is a concurrency issue it would show up in our tsan tests.

The lock here is to guard the checking_set_ which is used in debug mode to verify correctness. Otherwise FlushScheduler is lock free.

Of cause it would be great if we can remove the checking_set_ entirely and come up with another way for verification without interfering the concurrency mechanism.

I am confused. Why cannot we simply reduce the scope of the lock back to checking_set_. Why the entire function has to be protected by the lock?

I left too many comments and probably confuse you. I mean, reduce the scope seems right :)

maysamyabandeh · 2019-04-24T21:50:25Z

@yiwu-arbug do you recall why the scope of this lock is extended to entire function?

rocksdb/db/flush_scheduler.cc

Line 16 in aa56b7e

std::lock_guard<std::mutex> lock(checking_mutex_);

Summary: FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since #2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode). The patch reverts this change to reduce the scope of such locks. Pull Request resolved: #5372 Differential Revision: D15545831 Pulled By: maysamyabandeh fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426

Summary: FlushScheduler's methods are instrumented with debug-time locks to check the scheduler state against a simple container definition. Since facebook#2286 the scope of such locks are widened to the entire methods' body. The result is that the concurrency tested during testing (in debug mode) is stricter than the concurrency level manifested at runtime (in release mode). The patch reverts this change to reduce the scope of such locks. Pull Request resolved: facebook#5372 Differential Revision: D15545831 Pulled By: maysamyabandeh fbshipit-source-id: 01d69191afb1dd807d4bdc990fc74813ae7b5426

yiwu-arbug requested review from maysamyabandeh and siying May 12, 2017 20:01

facebook-github-bot added the CLA Signed label May 12, 2017

yiwu-arbug force-pushed the pipeline_option branch from 76aa50d to b6cc737 Compare May 12, 2017 20:26

yiwu-arbug changed the title ~~Add pipeline write option (pipeline write part 2)~~ New WriteImpl to pipeline WAL/memtable write May 12, 2017

yiwu-arbug mentioned this pull request May 12, 2017

Pipeline write implementation (pipeline write part 3) #2058

Closed

siying reviewed May 12, 2017

View reviewed changes

yiwu-arbug force-pushed the pipeline_option branch from c05c54b to fcf2059 Compare May 12, 2017 23:33

yiwu-arbug force-pushed the pipeline_option branch from 6c3a18c to 533c0b9 Compare May 16, 2017 20:47

yiwu-arbug force-pushed the pipeline_option branch from 88682fa to 28e3d43 Compare May 16, 2017 23:28

Yi Wu added 3 commits May 18, 2017 12:37

pipeline_writer

a6c95d8

pipeline option

67b11d5

Change naming of the option and add comments

507cf6c

Yi Wu added 9 commits May 18, 2017 12:38

clang format

2886bf8

merge WriteThread and WritePipeline

45b441a

address comment

21c4701

default disable pipelined write

57e53e1

Remove WriteGroup::ToVector()

6cce947

remove dummy writer logic

b000916

code cleanup

9d994f6

Fix write_callback_test

d107f4b

Fix DBTest.FlushSchedule

2a93957

yiwu-arbug force-pushed the pipeline_option branch from 28e3d43 to 2a93957 Compare May 18, 2017 19:38

fix clang and windows build

49979e9

fix lint

abd4566

siying approved these changes May 19, 2017

View reviewed changes

facebook-github-bot closed this in 07bdcb9 May 19, 2017

siying reviewed May 20, 2017

View reviewed changes

yiwu-arbug deleted the pipeline_option branch May 20, 2017 02:55

maysamyabandeh reviewed Apr 24, 2019

View reviewed changes

maysamyabandeh mentioned this pull request May 29, 2019

Remove global locks from FlushScheduler #5372

Closed

New WriteImpl to pipeline WAL/memtable write #2286

New WriteImpl to pipeline WAL/memtable write #2286

Conversation

yiwu-arbug commented May 12, 2017 • edited Loading

facebook-github-bot commented May 12, 2017

facebook-github-bot commented May 12, 2017

facebook-github-bot commented May 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siying commented May 12, 2017

facebook-github-bot commented May 12, 2017

facebook-github-bot commented May 12, 2017

facebook-github-bot commented May 12, 2017

facebook-github-bot commented May 12, 2017

facebook-github-bot commented May 15, 2017

facebook-github-bot commented May 16, 2017

facebook-github-bot commented May 16, 2017

facebook-github-bot commented May 16, 2017

facebook-github-bot commented May 16, 2017

facebook-github-bot commented May 16, 2017

facebook-github-bot commented May 16, 2017

yiwu-arbug commented May 16, 2017

facebook-github-bot commented May 16, 2017

facebook-github-bot commented May 18, 2017

facebook-github-bot commented May 18, 2017

facebook-github-bot commented May 18, 2017

facebook-github-bot commented May 18, 2017

facebook-github-bot commented May 18, 2017

facebook-github-bot commented May 18, 2017

siying left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiwu-arbug commented May 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maysamyabandeh commented Apr 24, 2019 • edited Loading

yiwu-arbug commented May 12, 2017 •

edited

Loading

maysamyabandeh commented Apr 24, 2019 •

edited

Loading