Prevent a case of WriteBufferManager flush thrashing #6364

ajkr · 2020-02-03T20:34:59Z

Previously, the flushes triggered by WriteBufferManager could affect
the same CF repeatedly if it happens to get consecutive writes. Such
flushes are not particularly useful for reducing memory usage since
they switch nearly-empty memtables to immutable while they've just begun
filling their first arena block. In fact they may not even reduce the
mutable memory count if they involve replacing one mutable memtable containing
one arena block with a new mutable memtable containing one arena block.
Further, if such switches happen even a few times before a flush finishes,
the immutable memtable limit will be reached and writes will stall.

This PR adds a heuristic to not switch memtables to immutable for CFs
that already have one or more immutable memtables awaiting flush. There
is a memory usage regression if the user continues writing to the same
CF, that DB does not have any CFs eligible for switching, flushes
are not finishing, and the WriteBufferManager was constructed with
allow_stall=false. Before, it would grow by switching nearly empty
memtables until writes stall. Now, it would grow by filling memtables
until writes stall. This feels like an acceptable behavior change because
users who prefer to stall over violate the memory limit should be using
allow_stall=true, which is unaffected by this PR.

Test Plan:

Command:

rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num_multi_db=8 -num_column_families=2 -write_buffer_size=4194304 -db_write_buffer_size=16777216 -compression_type=none -statistics=true -target_file_size_base=4194304 -max_bytes_for_level_base=16777216

rocksdb.db.write.stall count before this PR: 175
rocksdb.db.write.stall count after this PR: 0

db/db_impl/db_impl_write.cc

facebook-github-bot

@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajkr · 2020-02-28T21:20:14Z

@Cheng-Chang I talked to @siying and he prefers WriteBufferManager to be strict, while this PR would make it less strict. so I am not sure this change is desirable, although we didn't discuss this PR specifically.

ghost · 2020-02-28T23:11:10Z

@ajkr I thought it should be imported, in that case, I'll discard the diff in Phabricator.

facebook-github-bot · 2022-07-14T06:45:51Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T06:54:19Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T06:57:39Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-07-14T06:59:07Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-07-14T07:00:16Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jay-zhuang

LGTM. The change does make sense to me, plus the user is able to explicitly stall the write with allow_stall=true in WBM.

It is a behavior change, should update the doc like:

rocksdb/include/rocksdb/options.h

Lines 906 to 908 in b283f04

    
           // the DBs. If the total size of all live memtables of all the DBs exceeds 
        
           // a limit, a flush will be triggered in the next DB to which the next write 
        
           // is issued.

Previously, the flushes triggered by `WriteBufferManager` could affect the same CF repeatedly if it happens to get consecutive writes. Such flushes are not particularly useful for reducing memory usage since they switch nearly-empty memtables to immutable while they've just begun filling their first arena block. In fact they may not even reduce the mutable memory count if they involve replacing one mutable memtable with one arena block with a new mutable memtable with one arena block. Further, if such switches happen even a few times before a flush finishes, the immutable memtable limit will be reached and writes will stall. This PR adds a heuristic to not switch memtables to immutable for CFs that already have one or more immutable memtables awaiting flush. There is a memory usage regression if the user continues writing to the same CF, that DB does not have any CFs eligible for switching, and flushes are not finishing. Before it would grow by switching nearly empty memtables until writes stall. Now it would grow by filling memtables until writes stall. Test Plan: - Command: `rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num_multi_db=8 -num_column_families=2 -write_buffer_size=4194304 -db_write_buffer_size=16777216 -compression_type=none -statistics=true -target_file_size_base=4194304 -max_bytes_for_level_base=16777216` - `rocksdb.db.write.stall` count before this PR: 175 - `rocksdb.db.write.stall` count after this PR: 0

facebook-github-bot · 2022-08-17T19:26:12Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-08-17T19:37:20Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-08-17T19:43:20Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-08-17T19:43:45Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-08-17T20:14:00Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-08-17T20:14:36Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-08-17T20:37:28Z

@ajkr has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-08-17T20:38:26Z

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed label Feb 3, 2020

ajkr requested review from siying and al13n321 February 3, 2020 22:03

ajkr commented Feb 4, 2020

View reviewed changes

db/db_impl/db_impl_write.cc Outdated Show resolved Hide resolved

ajkr force-pushed the fix-write-buffer-manager-flush-thrashing branch from 8d0fcbd to 4394b14 Compare February 4, 2020 18:55

facebook-github-bot reviewed Feb 28, 2020

View reviewed changes

ajkr removed the request for review from al13n321 July 13, 2022 00:06

ajkr force-pushed the fix-write-buffer-manager-flush-thrashing branch from 4394b14 to 5684a05 Compare July 14, 2022 06:45

ajkr requested review from hx235 and jay-zhuang July 14, 2022 06:51

ajkr force-pushed the fix-write-buffer-manager-flush-thrashing branch from 5684a05 to ac62444 Compare July 14, 2022 06:54

jay-zhuang approved these changes Jul 18, 2022

View reviewed changes

ajkr and others added 4 commits August 17, 2022 09:43

fix bug eg for num unflushed < min to merge

bec6cc6

make format

9071ce6

add unit test

8bc5f79

ajkr force-pushed the fix-write-buffer-manager-flush-thrashing branch from 11d07f0 to 8bc5f79 Compare August 17, 2022 19:26

HISTORY.md

9c2dc96

update options.h

0b5ff8a

fix lite

509e61c

fix test sleeping task usage

2a09338

facebook-github-bot closed this in 9116601 Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent a case of WriteBufferManager flush thrashing #6364

Prevent a case of WriteBufferManager flush thrashing #6364

ajkr commented Feb 3, 2020 •

edited

Loading

facebook-github-bot left a comment

ajkr commented Feb 28, 2020 •

edited

Loading

ghost commented Feb 28, 2020

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

jay-zhuang left a comment •

edited

Loading

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

	// the DBs. If the total size of all live memtables of all the DBs exceeds
	// a limit, a flush will be triggered in the next DB to which the next write
	// is issued.

Prevent a case of WriteBufferManager flush thrashing #6364

Prevent a case of WriteBufferManager flush thrashing #6364

Conversation

ajkr commented Feb 3, 2020 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

ajkr commented Feb 28, 2020 • edited Loading

ghost commented Feb 28, 2020

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

facebook-github-bot commented Jul 14, 2022

jay-zhuang left a comment • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

facebook-github-bot commented Aug 17, 2022

ajkr commented Feb 3, 2020 •

edited

Loading

ajkr commented Feb 28, 2020 •

edited

Loading

jay-zhuang left a comment •

edited

Loading