-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent a case of WriteBufferManager flush thrashing #6364
Conversation
8d0fcbd
to
4394b14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cheng-Chang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@Cheng-Chang I talked to @siying and he prefers |
@ajkr I thought it should be imported, in that case, I'll discard the diff in Phabricator. |
4394b14
to
5684a05
Compare
@ajkr has updated the pull request. You must reimport the pull request before landing. |
5684a05
to
ac62444
Compare
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The change does make sense to me, plus the user is able to explicitly stall the write with allow_stall=true
in WBM.
It is a behavior change, should update the doc like:
rocksdb/include/rocksdb/options.h
Lines 906 to 908 in b283f04
// the DBs. If the total size of all live memtables of all the DBs exceeds | |
// a limit, a flush will be triggered in the next DB to which the next write | |
// is issued. |
Previously, the flushes triggered by `WriteBufferManager` could affect the same CF repeatedly if it happens to get consecutive writes. Such flushes are not particularly useful for reducing memory usage since they switch nearly-empty memtables to immutable while they've just begun filling their first arena block. In fact they may not even reduce the mutable memory count if they involve replacing one mutable memtable with one arena block with a new mutable memtable with one arena block. Further, if such switches happen even a few times before a flush finishes, the immutable memtable limit will be reached and writes will stall. This PR adds a heuristic to not switch memtables to immutable for CFs that already have one or more immutable memtables awaiting flush. There is a memory usage regression if the user continues writing to the same CF, that DB does not have any CFs eligible for switching, and flushes are not finishing. Before it would grow by switching nearly empty memtables until writes stall. Now it would grow by filling memtables until writes stall. Test Plan: - Command: `rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num_multi_db=8 -num_column_families=2 -write_buffer_size=4194304 -db_write_buffer_size=16777216 -compression_type=none -statistics=true -target_file_size_base=4194304 -max_bytes_for_level_base=16777216` - `rocksdb.db.write.stall` count before this PR: 175 - `rocksdb.db.write.stall` count after this PR: 0
11d07f0
to
8bc5f79
Compare
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Previously, the flushes triggered by
WriteBufferManager
could affectthe same CF repeatedly if it happens to get consecutive writes. Such
flushes are not particularly useful for reducing memory usage since
they switch nearly-empty memtables to immutable while they've just begun
filling their first arena block. In fact they may not even reduce the
mutable memory count if they involve replacing one mutable memtable containing
one arena block with a new mutable memtable containing one arena block.
Further, if such switches happen even a few times before a flush finishes,
the immutable memtable limit will be reached and writes will stall.
This PR adds a heuristic to not switch memtables to immutable for CFs
that already have one or more immutable memtables awaiting flush. There
is a memory usage regression if the user continues writing to the same
CF, that DB does not have any CFs eligible for switching, flushes
are not finishing, and the
WriteBufferManager
was constructed withallow_stall=false
. Before, it would grow by switching nearly emptymemtables until writes stall. Now, it would grow by filling memtables
until writes stall. This feels like an acceptable behavior change because
users who prefer to stall over violate the memory limit should be using
allow_stall=true
, which is unaffected by this PR.Test Plan:
rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num_multi_db=8 -num_column_families=2 -write_buffer_size=4194304 -db_write_buffer_size=16777216 -compression_type=none -statistics=true -target_file_size_base=4194304 -max_bytes_for_level_base=16777216
rocksdb.db.write.stall
count before this PR: 175rocksdb.db.write.stall
count after this PR: 0