-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898
Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898
Conversation
9c08dc2
to
f522f44
Compare
TODO: If memory stays close to the limit, will FreeMem() cause all the dbs to be added and removed from the queue frequently. For example, one write can cause all writing dbs to enqueue, while a cache eviction can cause all of them to be removed. Then another write comes and the memory usage goes above the limit and all dbs are queued again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
memtable/write_buffer_manager.cc
Outdated
} | ||
|
||
bool WriteBufferManager::ShouldStall() { | ||
if (enabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to impact existing behavior. Its better if the stall behavior is enabled through an option in the WriteBufferManager
constructor. And if its enabled, I think the stall condition should only be memory_usage() >= buffer_size_
, since the idea is to have a hard limit. Also, once the stall is triggered, it might be a good idea to have a low watermark for removing the stall in order to give some time for flushes to complete.
da87b12
to
55d658b
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Sorry for the late comment - was just passing by and had a question about these. Is it really better for a DB to be in stall mode for a longer time than it is to thrash in and out of stall mode? I tend to think a stalled DB is basically unavailable so we'd only want to stall when absolutely necessary to avoid crashing. If the problem is the thrashing is so crazy it slows the DB from recovering, maybe there can be a gentler low watermark, just low enough to prevent irrelevant things like block evictions from changing the stall mode? |
@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok? Low watermark is added because if memory stays close to buffer_size that it will not add/remove DBs from stall continously and give time for memory to go down so that stall doesn't happen frequently/continously. |
50% might be too aggressive. |
Probably, though I am wondering about a couple things related to the approach:
|
For 2nd point: Condition I wanted to add is if flush is going on and we stall, we wait for some percentage of flush to complete before releasing the stall. But i see your point and we cannot gurantee that with buffer_size_. I am not sure if we can add lower watermark in that case considering in terms memory in flush as it will be keep changing as DB who will write might/might not flush and stall and if yes, what it should be. |
55d658b
to
b69f0d8
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
b69f0d8
to
02167e0
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
Removed lower watermark as of now and updated code to remove stall when buffer_size is increased dynamically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@ajkr I believe this is the case, since its a one way relationship from WBM to the block cache. WBM doesn't know about the block cache memory usage. |
02167e0
to
80b6cf9
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Ran ./db_bench --benchmarks="fillseq,readrandom,readseq" -cost_write_buffer_to_cache=true -db_write_buffer_size=10000000 (Cache size = 8MB) to test any hang. |
9c5598f
to
441f58d
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A couple of minor comments.
…ize. Summary: Even when WriteBufferManager is shared across DBs and column families to maintain memory usage under a limit, OOMs have been observed when flush cannot finish but writes continuously insert to memtables. In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write, this change will stall incoming writers until flush is completed and memory_usage drops. Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_ (memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true. Stalling will be enabled by user by setting WriteBufferManager::allow_stall = true during object creation. DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall() (which adds dummy stall object to the writer's queue). Then DB is blocked on a state State::Blocked (current write doesn't go through). WBStallInterface object maintained by every DB instance is added to the queue of WriteBufferManager. If multiple DBs tries to write during this stall, they will also be blocked when check WriteBufferManager::ShouldStall() returns true. End Stall condition: When flush is finished and memory usage goes down below buffer_size, stall will be removed and all DBs will be signalled. WriterBufferManager::EndWriteStall() is called, which removes all instances from its queue and signal them to continue. WBStallInterface's state is changed to State::Running and DB instances are unblocked. DBImpl then signal all incoming writers of that DB to continue by calling write_thread_.EndWriteStall() (which removes dummy stall object from the queue). If buffer_size is increased through SetBufferSize, it will also end the stall if memory usage drops below new buffer_size. DB instance creates WBMStallInterface which is an interface to block and signal DBs during stall. When DB needs to be blocked or signalled by WriteBufferManager, state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED). Test Plan: Added a new test db/db_write_buffer_manager_test.cc Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
335ee06
to
b3616c5
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
b3616c5
to
1c81a1a
Compare
@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing. |
@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@akankshamahajan15 merged this pull request in 596e900. |
Summary: #7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet. This PR targets to introduce this feature for rocksJava. Pull Request resolved: #9076 Reviewed By: akankshamahajan15 Differential Revision: D31931092 Pulled By: anand1976 fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578
Summary: facebook/rocksdb#7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet. This PR targets to introduce this feature for rocksJava. Pull Request resolved: facebook/rocksdb#9076 Reviewed By: akankshamahajan15 Differential Revision: D31931092 Pulled By: anand1976 fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578
Summary: When WriteBufferManager is shared across DBs and column families
to maintain memory usage under a limit, OOMs have been observed when flush cannot
finish but writes continuously insert to memtables.
In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
this change will stall incoming writers until flush is completed and memory_usage
drops.
Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
(memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.
DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
(which adds dummy stall object to the writer's queue).
Then DB is blocked on a state State::Blocked (current write doesn't go
through). WBStallInterface object maintained by every DB instance is added to the queue of
WriteBufferManager.
If multiple DBs tries to write during this stall, they will also be
blocked when check WriteBufferManager::ShouldStall() returns true.
End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
to complete and avoid continous stalling if memory usage remains close to buffer_size.
WriterBufferManager::EndWriteStall() is called,
which removes all instances from its queue and signal them to continue.
Their state is changed to State::Running and they are unblocked. DBImpl
then signal all incoming writers of that DB to continue by calling
write_thread_.EndWriteStall() (which removes dummy stall object from the
queue).
DB instance creates WBMStallInterface which is an interface to block and
signal DBs during stall.
When DB needs to be blocked or signalled by WriteBufferManager,
state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).
Test Plan: Added a new test db/db_write_buffer_manager_test.cc
Reviewers:
Subscribers:
Tasks:
Tags: