Do not hold mutex when write keys if not necessary #7516

Little-Wallace · 2020-10-07T08:12:43Z

Problem Summary

RocksDB will acquire the global mutex of db instance for every time when user calls Write. When RocksDB schedules a lot of compaction jobs, it will compete the mutex with write thread and it will hurt the write performance.

Problem Solution:

I want to use log_write_mutex to replace the global mutex in most case so that we do not acquire it in write-thread unless there is a write-stall event or a write-buffer-full event occur.

Test plan

make check
CI
COMPILE_WITH_TSAN=1 make db_stress
make crash_test
make crash_test_with_multiops_wp_txn
make crash_test_with_multiops_wc_txn
make crash_test_with_atomic_flush

db/db_impl/db_impl.cc

db/db_impl/db_impl_write.cc

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

Little-Wallace · 2022-02-11T17:16:07Z

@yiwu-arbug PTAL again

Little-Wallace · 2022-02-11T17:16:34Z

I have fixed failed CI caused by data race and deadlock .

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

riversand963 · 2022-05-26T15:48:53Z

Thanks for the PR. I plan to take a look this week.

riversand963 · 2022-05-31T15:38:16Z

Thanks @Little-Wallace for the PR! Overall I think this is an improvement.
I will need to take another look. In the meantime, I have proposed #10078 to remove single_column_family_mode_, so that we don't have to make it lock-free.

Summary: This variable is actually not being used for anything meaningful, thus remove it. This can make #7516 slightly simpler by reducing the amount of state that must be made lock-free. Pull Request resolved: #10078 Test Plan: make check Reviewed By: ajkr Differential Revision: D36779817 Pulled By: riversand963 fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406

riversand963 · 2022-07-14T23:14:58Z

I tried db_bench with simple benchmarks, e.g. fillrandom, overwrite, fillseq, and have not been able to see perf improvement in terms of ops/sec (kind of expected). I will try more tweaking later.

based on my experience, you need a large write throughput (with a large writebatch) for a long time and set target_file_size_base to a small value (such as 8MB). And when the number of sst increase to one hundred thousand, you can observe speed reduction.

Thanks for the suggestion. Will try later.

facebook-github-bot · 2022-07-15T01:04:08Z

@Little-Wallace has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-20T00:05:49Z

@Little-Wallace has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-20T00:07:38Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963 · 2022-07-21T00:00:30Z

Good thing is that I haven't been able to observe a perf regression so far.

tabokie · 2022-07-21T05:03:15Z

@riversand963 One suggestion for the benchmark is to use a smaller target_file_size_base: in TiKV we use compaction guard which typically cuts SST files into 10MB chunks, that is much smaller than RocksDB's 64MB default. The higher file count is why we hit the mutex hard. Here's an issue of a production cluster with similar issue: tikv/tikv#12601.

Edit: Oops, apparently little wallace already made this point🤣 ignore me.

riversand963 · 2022-07-21T18:19:56Z

While showing end-to-end performance gain requires more efforts, it's easy to show that time spent holding the db mutex has drastically decreased. One of the updated unit tests in this PR, i.e. perf_context_test has an assertion.
Before this PR,

ASSERT_GT(total_db_mutex_nanos, 2000U);

After this PR,

ASSERT_LT(total_db_mutex_nanos, 100U);

I did another simple benchmarking on a non-vm host.

TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillseq,overwrite -duration=60 -batch_size=100 -perf_level=5

Results show

db_mutex_lock_nanos = 564408021 (before)
db_mutex_lock_nanos = 11142 (after)

facebook-github-bot · 2022-07-21T18:23:29Z

@Little-Wallace has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-21T18:27:26Z

@riversand963 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: This variable is actually not being used for anything meaningful, thus remove it. This can make facebook#7516 slightly simpler by reducing the amount of state that must be made lock-free. Pull Request resolved: facebook#10078 Test Plan: make check Reviewed By: ajkr Differential Revision: D36779817 Pulled By: riversand963 fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406 Signed-off-by: tabokie <xy.tao@outlook.com>

…#10187) Summary: Resolves facebook#10129 I extracted this fix from facebook#7516 since it's also already a bug in main branch, and we want to separate it from the main part of the PR. There can be a race condition between two threads. Thread 1 executes `DBImpl::FindObsoleteFiles()` while thread 2 executes `GetSortedWals()`. ``` Time thread 1 thread 2 | mutex_.lock | read disable_delete_obsolete_files_ | ... | wait on log_sync_cv and release mutex_ | mutex_.lock | ++disable_delete_obsolete_files_ | mutex_.unlock | mutex_.lock | while (pending_purge_obsolete_files > 0) { bg_cv.wait;} | wake up with mutex_ locked | compute WALs tracked by MANIFEST | mutex_.unlock | wake up with mutex_ locked | ++pending_purge_obsolete_files_ | mutex_.unlock | | delete obsolete WAL | WAL missing but tracked in MANIFEST. V ``` The fix proposed eliminates the possibility of the above by increasing `pending_purge_obsolete_files_` before `FindObsoleteFiles()` can possibly release the mutex. Pull Request resolved: facebook#10187 Test Plan: make check Reviewed By: ltamasi Differential Revision: D37214235 Pulled By: riversand963 fbshipit-source-id: 556ab1b58ae6d19150169dfac4db08195c797184 Signed-off-by: tabokie <xy.tao@outlook.com>

Summary: RocksDB will acquire the global mutex of db instance for every time when user calls `Write`. When RocksDB schedules a lot of compaction jobs, it will compete the mutex with write thread and it will hurt the write performance. I want to use log_write_mutex to replace the global mutex in most case so that we do not acquire it in write-thread unless there is a write-stall event or a write-buffer-full event occur. Pull Request resolved: facebook#7516 Test Plan: 1. make check 2. CI 3. COMPILE_WITH_TSAN=1 make db_stress make crash_test make crash_test_with_multiops_wp_txn make crash_test_with_multiops_wc_txn make crash_test_with_atomic_flush Reviewed By: siying Differential Revision: D36908702 Pulled By: riversand963 fbshipit-source-id: 59b13881f4f5c0a58fd3ca79128a396d9cd98efe Signed-off-by: tabokie <xy.tao@outlook.com>

Summary: This variable is actually not being used for anything meaningful, thus remove it. This can make facebook#7516 slightly simpler by reducing the amount of state that must be made lock-free. Pull Request resolved: facebook#10078 Test Plan: make check Reviewed By: ajkr Differential Revision: D36779817 Pulled By: riversand963 fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406 Signed-off-by: tabokie <xy.tao@outlook.com>

…#10187) Summary: Resolves facebook#10129 I extracted this fix from facebook#7516 since it's also already a bug in main branch, and we want to separate it from the main part of the PR. There can be a race condition between two threads. Thread 1 executes `DBImpl::FindObsoleteFiles()` while thread 2 executes `GetSortedWals()`. ``` Time thread 1 thread 2 | mutex_.lock | read disable_delete_obsolete_files_ | ... | wait on log_sync_cv and release mutex_ | mutex_.lock | ++disable_delete_obsolete_files_ | mutex_.unlock | mutex_.lock | while (pending_purge_obsolete_files > 0) { bg_cv.wait;} | wake up with mutex_ locked | compute WALs tracked by MANIFEST | mutex_.unlock | wake up with mutex_ locked | ++pending_purge_obsolete_files_ | mutex_.unlock | | delete obsolete WAL | WAL missing but tracked in MANIFEST. V ``` The fix proposed eliminates the possibility of the above by increasing `pending_purge_obsolete_files_` before `FindObsoleteFiles()` can possibly release the mutex. Pull Request resolved: facebook#10187 Test Plan: make check Reviewed By: ltamasi Differential Revision: D37214235 Pulled By: riversand963 fbshipit-source-id: 556ab1b58ae6d19150169dfac4db08195c797184 Signed-off-by: tabokie <xy.tao@outlook.com>

Summary: RocksDB will acquire the global mutex of db instance for every time when user calls `Write`. When RocksDB schedules a lot of compaction jobs, it will compete the mutex with write thread and it will hurt the write performance. I want to use log_write_mutex to replace the global mutex in most case so that we do not acquire it in write-thread unless there is a write-stall event or a write-buffer-full event occur. Pull Request resolved: facebook#7516 Test Plan: 1. make check 2. CI 3. COMPILE_WITH_TSAN=1 make db_stress make crash_test make crash_test_with_multiops_wp_txn make crash_test_with_multiops_wc_txn make crash_test_with_atomic_flush Reviewed By: siying Differential Revision: D36908702 Pulled By: riversand963 fbshipit-source-id: 59b13881f4f5c0a58fd3ca79128a396d9cd98efe Signed-off-by: tabokie <xy.tao@outlook.com>

mdcallag · 2022-09-26T15:49:46Z

I see large improvements with fillseq (leveled, universal) and overwrite (universal) in IO-bound workloads -- up to 1.5X more throughput. Thank you for making RocksDB better.

https://twitter.com/MarkCallaghanDB/status/1574425353564475394

facebook-github-bot added the CLA Signed label Oct 7, 2020

Little-Wallace changed the title ~~Do not hold mutex when write keys.~~ Do not hold mutex when write keys if not necessary Oct 7, 2020

yiwu-arbug reviewed Oct 7, 2020

View reviewed changes

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved

db/db_impl/db_impl_write.cc Show resolved Hide resolved

Little-Wallace force-pushed the no-mutex branch 2 times, most recently from 9040d54 to c66ed81 Compare October 11, 2020 11:12

Little-Wallace mentioned this pull request Oct 19, 2020

remove mutex for write tikv/rocksdb#197

Open

Little-Wallace closed this Oct 19, 2020

Little-Wallace force-pushed the no-mutex branch from c66ed81 to ed90e2a Compare October 19, 2020 07:37

remove mutex in all write method

c9589a8

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

yiwu-arbug reopened this Oct 19, 2020

Little-Wallace force-pushed the no-mutex branch from 131150f to c9589a8 Compare October 30, 2020 07:32

Little-Wallace added 6 commits November 12, 2020 10:48

Merge branch 'master' into no-mutex

2b847d4

Merge branch 'master' into no-mutex

f9b52f3

Merge branch 'main' into no-mutex

8dab46e

fix conflict

d399620

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

fix format

ab6edb4

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

do not hold mutex for SyncClosedLogs

5fe832a

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

Little-Wallace force-pushed the no-mutex branch from 6122e6f to 5fe832a Compare February 11, 2022 16:28

Little-Wallace force-pushed the no-mutex branch from 0541285 to 5fe832a Compare March 2, 2022 03:45

release mutex before wait sync

9f3dbe5

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

riversand963 self-requested a review May 20, 2022 18:18

riversand963 self-assigned this May 26, 2022

riversand963 mentioned this pull request May 31, 2022

Remove unused variable single_column_family_mode_ #10078

Closed

riversand963 added 2 commits June 3, 2022 09:52

Merge remote-tracking branch 'upstream/main' into no-mutex

551dbee

Add log_file_number_size to LogContext

e07daeb

Merge remote-tracking branch 'upstream/main' into no-mutex

dd0fda9

riversand963 changed the title ~~Do not hold mutex when write keys if not necessary~~ [1st attempt] Do not hold mutex when write keys if not necessary Jul 21, 2022

riversand963 changed the title ~~[1st attempt] Do not hold mutex when write keys if not necessary~~ Do not hold mutex when write keys if not necessary Jul 21, 2022

Merge remote-tracking branch 'upstream/main' into no-mutex

c16664a

facebook-github-bot closed this in 1e9bf25 Jul 21, 2022

riversand963 mentioned this pull request Aug 8, 2022

ha_rocksdb::external_lock use many CPU time for memory only readonly workload facebook/mysql-5.6#1201

Open

ajkr mentioned this pull request Nov 28, 2022

Revise LockWAL API and WAL collection #10953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not hold mutex when write keys if not necessary #7516

Do not hold mutex when write keys if not necessary #7516

Little-Wallace commented Oct 7, 2020 •

edited by riversand963

Loading

Little-Wallace commented Feb 11, 2022

Little-Wallace commented Feb 11, 2022

riversand963 commented May 26, 2022

riversand963 commented May 31, 2022

riversand963 commented Jul 14, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 20, 2022

facebook-github-bot commented Jul 20, 2022

riversand963 commented Jul 21, 2022

tabokie commented Jul 21, 2022 •

edited

Loading

riversand963 commented Jul 21, 2022 •

edited

Loading

facebook-github-bot commented Jul 21, 2022

facebook-github-bot commented Jul 21, 2022

mdcallag commented Sep 26, 2022

Do not hold mutex when write keys if not necessary #7516

Do not hold mutex when write keys if not necessary #7516

Conversation

Little-Wallace commented Oct 7, 2020 • edited by riversand963 Loading

Problem Summary

Problem Solution:

Test plan

Little-Wallace commented Feb 11, 2022

Little-Wallace commented Feb 11, 2022

riversand963 commented May 26, 2022

riversand963 commented May 31, 2022

riversand963 commented Jul 14, 2022

facebook-github-bot commented Jul 15, 2022

facebook-github-bot commented Jul 20, 2022

facebook-github-bot commented Jul 20, 2022

riversand963 commented Jul 21, 2022

tabokie commented Jul 21, 2022 • edited Loading

riversand963 commented Jul 21, 2022 • edited Loading

facebook-github-bot commented Jul 21, 2022

facebook-github-bot commented Jul 21, 2022

mdcallag commented Sep 26, 2022

Little-Wallace commented Oct 7, 2020 •

edited by riversand963

Loading

tabokie commented Jul 21, 2022 •

edited

Loading

riversand963 commented Jul 21, 2022 •

edited

Loading