Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898

Closed

Conversation

akankshamahajan15
Copy link
Contributor

@akankshamahajan15 akankshamahajan15 commented Jan 26, 2021

Summary: When WriteBufferManager is shared across DBs and column families
to maintain memory usage under a limit, OOMs have been observed when flush cannot
finish but writes continuously insert to memtables.
In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
this change will stall incoming writers until flush is completed and memory_usage
drops.

Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
(memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.

DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
(which adds dummy stall object to the writer's queue).
Then DB is blocked on a state State::Blocked (current write doesn't go
through). WBStallInterface object maintained by every DB instance is added to the queue of
WriteBufferManager.

If multiple DBs tries to write during this stall, they will also be
blocked when check WriteBufferManager::ShouldStall() returns true.

End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
to complete and avoid continous stalling if memory usage remains close to buffer_size.

WriterBufferManager::EndWriteStall() is called,
which removes all instances from its queue and signal them to continue.
Their state is changed to State::Running and they are unblocked. DBImpl
then signal all incoming writers of that DB to continue by calling
write_thread_.EndWriteStall() (which removes dummy stall object from the
queue).

DB instance creates WBMStallInterface which is an interface to block and
signal DBs during stall.
When DB needs to be blocked or signalled by WriteBufferManager,
state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).

Test Plan: Added a new test db/db_write_buffer_manager_test.cc

Reviewers:

Subscribers:

Tasks:

Tags:

@akankshamahajan15
Copy link
Contributor Author

TODO: If memory stays close to the limit, will FreeMem() cause all the dbs to be added and removed from the queue frequently. For example, one write can cause all writing dbs to enqueue, while a cache eviction can cause all of them to be removed. Then another write comes and the memory usage goes above the limit and all dbs are queued again.
Need to find a way to avoid it.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

memtable/write_buffer_manager.cc Outdated Show resolved Hide resolved
db/db_impl/db_impl.h Outdated Show resolved Hide resolved
db/db_impl/db_impl.h Outdated Show resolved Hide resolved
db/db_impl/db_impl_write.cc Show resolved Hide resolved
}

bool WriteBufferManager::ShouldStall() {
if (enabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to impact existing behavior. Its better if the stall behavior is enabled through an option in the WriteBufferManager constructor. And if its enabled, I think the stall condition should only be memory_usage() >= buffer_size_, since the idea is to have a hard limit. Also, once the stall is triggered, it might be a good idea to have a low watermark for removing the stall in order to give some time for flushes to complete.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15 akankshamahajan15 changed the title Stall writes in WriteBufferManager when soft limit is reached. Stall writes in WriteBufferManager when memory_usage exceeds buffer_size Feb 17, 2021
@akankshamahajan15
Copy link
Contributor Author

TODO: If memory stays close to the limit, will FreeMem() cause all the dbs to be added and removed from the queue frequently. For example, one write can cause all writing dbs to enqueue, while a cache eviction can cause all of them to be removed. Then another write comes and the memory usage goes above the limit and all dbs are queued again.
Need to find a way to avoid it.
It has been addressed by adding a limit and wait until memory being flushed is less than buffer_size/2.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr
Copy link
Contributor

ajkr commented Feb 18, 2021

TODO: If memory stays close to the limit, will FreeMem() cause all the dbs to be added and removed from the queue frequently. For example, one write can cause all writing dbs to enqueue, while a cache eviction can cause all of them to be removed. Then another write comes and the memory usage goes above the limit and all dbs are queued again.
Need to find a way to avoid it.

Also, once the stall is triggered, it might be a good idea to have a low watermark for removing the stall in order to give some time for flushes to complete.

End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2.

Sorry for the late comment - was just passing by and had a question about these. Is it really better for a DB to be in stall mode for a longer time than it is to thrash in and out of stall mode? I tend to think a stalled DB is basically unavailable so we'd only want to stall when absolutely necessary to avoid crashing. If the problem is the thrashing is so crazy it slows the DB from recovering, maybe there can be a gentler low watermark, just low enough to prevent irrelevant things like block evictions from changing the stall mode?

@akankshamahajan15
Copy link
Contributor Author

Sorry for the late comment - was just passing by and had a question about these. Is it really better for a DB to be in stall mode for a longer time than it is to thrash in and out of stall mode? I tend to think a stalled DB is basically unavailable so we'd only want to stall when absolutely necessary to avoid crashing. If the problem is the thrashing is so crazy it slows the DB from recovering, maybe there can be a gentler low watermark, just low enough to prevent irrelevant things like block evictions from changing the stall mode?

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok? Low watermark is added because if memory stays close to buffer_size that it will not add/remove DBs from stall continously and give time for memory to go down so that stall doesn't happen frequently/continously.

@akankshamahajan15
Copy link
Contributor Author

Sorry for the late comment - was just passing by and had a question about these. Is it really better for a DB to be in stall mode for a longer time than it is to thrash in and out of stall mode? I tend to think a stalled DB is basically unavailable so we'd only want to stall when absolutely necessary to avoid crashing. If the problem is the thrashing is so crazy it slows the DB from recovering, maybe there can be a gentler low watermark, just low enough to prevent irrelevant things like block evictions from changing the stall mode?

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok? Low watermark is added because if memory stays close to buffer_size that it will not add/remove DBs from stall continously and give time for memory to go down so that stall doesn't happen frequently/continously.

50% might be too aggressive.

@ajkr
Copy link
Contributor

ajkr commented Feb 18, 2021

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok?

Probably, though I am wondering about a couple things related to the approach:

  • If the end of write stall mode can only be triggered when flush finishes (and not when a block is evicted from cache), would that also prevent thrashing?
  • Is it guaranteed that size of memtables awaiting flush is any particular portion of buffer_size at the point stalling starts? I recall during a write it picks only the "best" CF in the current DB to flush, so we might not flush all memtables before reaching the limit. If that's the case, then no matter the low watermark we choose (50% of buffer_size, 90%, or something else), it's possible that watermark is already satisfied as soon as we enter stall mode, before any flush finishes.

@akankshamahajan15
Copy link
Contributor Author

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok?

Probably, though I am wondering about a couple things related to the approach:

  • If the end of write stall mode can only be triggered when flush finishes (and not when a block is evicted from cache), would that also prevent thrashing?
  • Is it guaranteed that size of memtables awaiting flush is any particular portion of buffer_size at the point stalling starts? I recall during a write it picks only the "best" CF in the current DB to flush, so we might not flush all memtables before reaching the limit. If that's the case, then no matter the low watermark we choose (50% of buffer_size, 90%, or something else), it's possible that watermark is already satisfied as soon as we enter stall mode, before any flush finishes.

For 2nd point: Condition I wanted to add is if flush is going on and we stall, we wait for some percentage of flush to complete before releasing the stall. But i see your point and we cannot gurantee that with buffer_size_. I am not sure if we can add lower watermark in that case considering in terms memory in flush as it will be keep changing as DB who will write might/might not flush and stall and if yes, what it should be.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@akankshamahajan15
Copy link
Contributor Author

For 2nd point: Condition I wanted to add is if flush is going on and we stall, we wait for some percentage of flush to complete before releasing the stall. But i see your point and we cannot gurantee that with buffer_size_. I am not sure if we can add lower watermark in that case considering in terms memory in flush as it will be keep changing as DB who will write might/might not flush and stall and if yes, what it should be.

Removed lower watermark as of now and updated code to remove stall when buffer_size is increased dynamically.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

memtable/write_buffer_manager.cc Show resolved Hide resolved
memtable/write_buffer_manager.cc Outdated Show resolved Hide resolved
memtable/write_buffer_manager.cc Outdated Show resolved Hide resolved
db/db_impl/db_impl.h Outdated Show resolved Hide resolved
@anand1976
Copy link
Contributor

Probably, though I am wondering about a couple things related to the approach:

  • If the end of write stall mode can only be triggered when flush finishes (and not when a block is evicted from cache), would that also prevent thrashing?

@ajkr I believe this is the case, since its a one way relationship from WBM to the block cache. WBM doesn't know about the block cache memory usage.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@akankshamahajan15
Copy link
Contributor Author

Ran ./db_bench --benchmarks="fillseq,readrandom,readseq" -cost_write_buffer_to_cache=true -db_write_buffer_size=10000000 (Cache size = 8MB) to test any hang.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of minor comments.

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved
memtable/write_buffer_manager.cc Show resolved Hide resolved
…ize.

Summary: Even when WriteBufferManager is shared across DBs and column families
to maintain memory usage under a limit, OOMs have been observed when flush cannot
finish but writes continuously insert to memtables.
In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
this change will stall incoming writers until flush is completed and memory_usage
drops.

Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
(memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.
Stalling will be enabled by user by setting
WriteBufferManager::allow_stall = true during object creation.

DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
(which adds dummy stall object to the writer's queue).
Then DB is blocked on a state State::Blocked (current write doesn't go
through). WBStallInterface object maintained by every DB instance is added to the queue of
WriteBufferManager.

If multiple DBs tries to write during this stall, they will also be
blocked when check WriteBufferManager::ShouldStall() returns true.

End Stall condition: When flush is finished and memory usage goes down below buffer_size, stall will be removed and all DBs will be signalled.

WriterBufferManager::EndWriteStall() is called,
which removes all instances from its queue and signal them to continue.
WBStallInterface's state is changed to State::Running and DB instances are unblocked. DBImpl
then signal all incoming writers of that DB to continue by calling
write_thread_.EndWriteStall() (which removes dummy stall object from the
queue).

If buffer_size is increased through SetBufferSize, it will also end the stall if memory
usage drops below new buffer_size.

DB instance creates WBMStallInterface which is an interface to block and
signal DBs during stall.
When DB needs to be blocked or signalled by WriteBufferManager,
state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).

Test Plan: Added a new test db/db_write_buffer_manager_test.cc

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@akankshamahajan15 merged this pull request in 596e900.

facebook-github-bot pushed a commit that referenced this pull request Oct 26, 2021
Summary:
#7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet.

This PR targets to introduce this feature for rocksJava.

Pull Request resolved: #9076

Reviewed By: akankshamahajan15

Differential Revision: D31931092

Pulled By: anand1976

fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578
yoori pushed a commit to yoori/rocksdb that referenced this pull request Nov 26, 2023
Summary:
facebook/rocksdb#7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet.

This PR targets to introduce this feature for rocksJava.

Pull Request resolved: facebook/rocksdb#9076

Reviewed By: akankshamahajan15

Differential Revision: D31931092

Pulled By: anand1976

fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants