Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898

akankshamahajan15 · 2021-01-26T20:56:48Z

Summary: When WriteBufferManager is shared across DBs and column families
to maintain memory usage under a limit, OOMs have been observed when flush cannot
finish but writes continuously insert to memtables.
In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
this change will stall incoming writers until flush is completed and memory_usage
drops.

Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
(memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.

DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
(which adds dummy stall object to the writer's queue).
Then DB is blocked on a state State::Blocked (current write doesn't go
through). WBStallInterface object maintained by every DB instance is added to the queue of
WriteBufferManager.

If multiple DBs tries to write during this stall, they will also be
blocked when check WriteBufferManager::ShouldStall() returns true.

End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
to complete and avoid continous stalling if memory usage remains close to buffer_size.

WriterBufferManager::EndWriteStall() is called,
which removes all instances from its queue and signal them to continue.
Their state is changed to State::Running and they are unblocked. DBImpl
then signal all incoming writers of that DB to continue by calling
write_thread_.EndWriteStall() (which removes dummy stall object from the
queue).

DB instance creates WBMStallInterface which is an interface to block and
signal DBs during stall.
When DB needs to be blocked or signalled by WriteBufferManager,
state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).

Test Plan: Added a new test db/db_write_buffer_manager_test.cc

Reviewers:

Subscribers:

Tasks:

Tags:

akankshamahajan15 · 2021-01-26T21:18:53Z

TODO: If memory stays close to the limit, will FreeMem() cause all the dbs to be added and removed from the queue frequently. For example, one write can cause all writing dbs to enqueue, while a cache eviction can cause all of them to be removed. Then another write comes and the memory usage goes above the limit and all dbs are queued again.
Need to find a way to avoid it.

facebook-github-bot

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

memtable/write_buffer_manager.cc

db/db_impl/db_impl.h

db/db_impl/db_impl_write.cc

anand1976 · 2021-02-08T22:41:47Z

memtable/write_buffer_manager.cc

+}
+
+bool WriteBufferManager::ShouldStall() {
+  if (enabled()) {


I don't think we want to impact existing behavior. Its better if the stall behavior is enabled through an option in the WriteBufferManager constructor. And if its enabled, I think the stall condition should only be memory_usage() >= buffer_size_, since the idea is to have a hard limit. Also, once the stall is triggered, it might be a good idea to have a low watermark for removing the stall in order to give some time for flushes to complete.

facebook-github-bot · 2021-02-17T04:51:09Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

akankshamahajan15 · 2021-02-17T04:53:45Z

TODO: If memory stays close to the limit, will FreeMem() cause all the dbs to be added and removed from the queue frequently. For example, one write can cause all writing dbs to enqueue, while a cache eviction can cause all of them to be removed. Then another write comes and the memory usage goes above the limit and all dbs are queued again.
Need to find a way to avoid it.
It has been addressed by adding a limit and wait until memory being flushed is less than buffer_size/2.

facebook-github-bot

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ajkr · 2021-02-18T06:13:57Z

TODO: If memory stays close to the limit, will FreeMem() cause all the dbs to be added and removed from the queue frequently. For example, one write can cause all writing dbs to enqueue, while a cache eviction can cause all of them to be removed. Then another write comes and the memory usage goes above the limit and all dbs are queued again.
Need to find a way to avoid it.

Also, once the stall is triggered, it might be a good idea to have a low watermark for removing the stall in order to give some time for flushes to complete.

End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2.

Sorry for the late comment - was just passing by and had a question about these. Is it really better for a DB to be in stall mode for a longer time than it is to thrash in and out of stall mode? I tend to think a stalled DB is basically unavailable so we'd only want to stall when absolutely necessary to avoid crashing. If the problem is the thrashing is so crazy it slows the DB from recovering, maybe there can be a gentler low watermark, just low enough to prevent irrelevant things like block evictions from changing the stall mode?

akankshamahajan15 · 2021-02-18T17:52:11Z

Sorry for the late comment - was just passing by and had a question about these. Is it really better for a DB to be in stall mode for a longer time than it is to thrash in and out of stall mode? I tend to think a stalled DB is basically unavailable so we'd only want to stall when absolutely necessary to avoid crashing. If the problem is the thrashing is so crazy it slows the DB from recovering, maybe there can be a gentler low watermark, just low enough to prevent irrelevant things like block evictions from changing the stall mode?

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok? Low watermark is added because if memory stays close to buffer_size that it will not add/remove DBs from stall continously and give time for memory to go down so that stall doesn't happen frequently/continously.

akankshamahajan15 · 2021-02-18T17:52:43Z

Sorry for the late comment - was just passing by and had a question about these. Is it really better for a DB to be in stall mode for a longer time than it is to thrash in and out of stall mode? I tend to think a stalled DB is basically unavailable so we'd only want to stall when absolutely necessary to avoid crashing. If the problem is the thrashing is so crazy it slows the DB from recovering, maybe there can be a gentler low watermark, just low enough to prevent irrelevant things like block evictions from changing the stall mode?

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok? Low watermark is added because if memory stays close to buffer_size that it will not add/remove DBs from stall continously and give time for memory to go down so that stall doesn't happen frequently/continously.

50% might be too aggressive.

ajkr · 2021-02-18T19:17:03Z

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok?

Probably, though I am wondering about a couple things related to the approach:

If the end of write stall mode can only be triggered when flush finishes (and not when a block is evicted from cache), would that also prevent thrashing?
Is it guaranteed that size of memtables awaiting flush is any particular portion of buffer_size at the point stalling starts? I recall during a write it picks only the "best" CF in the current DB to flush, so we might not flush all memtables before reaching the limit. If that's the case, then no matter the low watermark we choose (50% of buffer_size, 90%, or something else), it's possible that watermark is already satisfied as soon as we enter stall mode, before any flush finishes.

akankshamahajan15 · 2021-02-18T20:37:20Z

@ajkr, I see. If i set lower watermark to be 90% instead of 50% of buffer_size, does that seem ok?

Probably, though I am wondering about a couple things related to the approach:

If the end of write stall mode can only be triggered when flush finishes (and not when a block is evicted from cache), would that also prevent thrashing?

Is it guaranteed that size of memtables awaiting flush is any particular portion of buffer_size at the point stalling starts? I recall during a write it picks only the "best" CF in the current DB to flush, so we might not flush all memtables before reaching the limit. If that's the case, then no matter the low watermark we choose (50% of buffer_size, 90%, or something else), it's possible that watermark is already satisfied as soon as we enter stall mode, before any flush finishes.

For 2nd point: Condition I wanted to add is if flush is going on and we stall, we wait for some percentage of flush to complete before releasing the stall. But i see your point and we cannot gurantee that with buffer_size_. I am not sure if we can add lower watermark in that case considering in terms memory in flush as it will be keep changing as DB who will write might/might not flush and stall and if yes, what it should be.

facebook-github-bot · 2021-03-09T03:25:34Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-03-09T03:25:51Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

akankshamahajan15 · 2021-03-09T03:29:33Z

For 2nd point: Condition I wanted to add is if flush is going on and we stall, we wait for some percentage of flush to complete before releasing the stall. But i see your point and we cannot gurantee that with buffer_size_. I am not sure if we can add lower watermark in that case considering in terms memory in flush as it will be keep changing as DB who will write might/might not flush and stall and if yes, what it should be.

Removed lower watermark as of now and updated code to remove stall when buffer_size is increased dynamically.

facebook-github-bot

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

memtable/write_buffer_manager.cc

db/db_impl/db_impl.h

anand1976 · 2021-03-29T22:44:32Z

Probably, though I am wondering about a couple things related to the approach:

If the end of write stall mode can only be triggered when flush finishes (and not when a block is evicted from cache), would that also prevent thrashing?

@ajkr I believe this is the case, since its a one way relationship from WBM to the block cache. WBM doesn't know about the block cache memory usage.

facebook-github-bot · 2021-04-19T01:02:41Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-04-19T02:22:55Z

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

akankshamahajan15 · 2021-04-19T19:08:23Z

Ran ./db_bench --benchmarks="fillseq,readrandom,readseq" -cost_write_buffer_to_cache=true -db_write_buffer_size=10000000 (Cache size = 8MB) to test any hang.

facebook-github-bot · 2021-04-19T20:42:56Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

db/db_impl/db_impl.cc

facebook-github-bot · 2021-04-20T01:08:41Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

anand1976

LGTM. A couple of minor comments.

db/db_impl/db_impl.cc

memtable/write_buffer_manager.cc

…ize. Summary: Even when WriteBufferManager is shared across DBs and column families to maintain memory usage under a limit, OOMs have been observed when flush cannot finish but writes continuously insert to memtables. In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write, this change will stall incoming writers until flush is completed and memory_usage drops. Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_ (memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true. Stalling will be enabled by user by setting WriteBufferManager::allow_stall = true during object creation. DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall() (which adds dummy stall object to the writer's queue). Then DB is blocked on a state State::Blocked (current write doesn't go through). WBStallInterface object maintained by every DB instance is added to the queue of WriteBufferManager. If multiple DBs tries to write during this stall, they will also be blocked when check WriteBufferManager::ShouldStall() returns true. End Stall condition: When flush is finished and memory usage goes down below buffer_size, stall will be removed and all DBs will be signalled. WriterBufferManager::EndWriteStall() is called, which removes all instances from its queue and signal them to continue. WBStallInterface's state is changed to State::Running and DB instances are unblocked. DBImpl then signal all incoming writers of that DB to continue by calling write_thread_.EndWriteStall() (which removes dummy stall object from the queue). If buffer_size is increased through SetBufferSize, it will also end the stall if memory usage drops below new buffer_size. DB instance creates WBMStallInterface which is an interface to block and signal DBs during stall. When DB needs to be blocked or signalled by WriteBufferManager, state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED). Test Plan: Added a new test db/db_write_buffer_manager_test.cc Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot · 2021-04-21T17:37:48Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-04-21T18:08:14Z

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot · 2021-04-21T19:02:39Z

@akankshamahajan15 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-04-21T19:03:45Z

@akankshamahajan15 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-04-21T20:54:19Z

@akankshamahajan15 merged this pull request in 596e900.

Summary: #7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet. This PR targets to introduce this feature for rocksJava. Pull Request resolved: #9076 Reviewed By: akankshamahajan15 Differential Revision: D31931092 Pulled By: anand1976 fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578

Summary: facebook/rocksdb#7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet. This PR targets to introduce this feature for rocksJava. Pull Request resolved: facebook/rocksdb#9076 Reviewed By: akankshamahajan15 Differential Revision: D31931092 Pulled By: anand1976 fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578

facebook-github-bot added the CLA Signed label Jan 26, 2021

akankshamahajan15 force-pushed the stall_writes_wbm branch 3 times, most recently from 9c08dc2 to f522f44 Compare January 26, 2021 21:12

akankshamahajan15 requested a review from anand1976 January 26, 2021 21:20

facebook-github-bot reviewed Jan 27, 2021

View reviewed changes

anand1976 reviewed Feb 8, 2021

View reviewed changes

matthewvon mentioned this pull request Feb 14, 2021

Feature: add SetBufferSize() so that managed size can be dynamic #7961

Closed

akankshamahajan15 force-pushed the stall_writes_wbm branch from da87b12 to 55d658b Compare February 17, 2021 04:51

akankshamahajan15 changed the title ~~Stall writes in WriteBufferManager when soft limit is reached.~~ Stall writes in WriteBufferManager when memory_usage exceeds buffer_size Feb 17, 2021

facebook-github-bot reviewed Feb 17, 2021

View reviewed changes

akankshamahajan15 force-pushed the stall_writes_wbm branch from 55d658b to b69f0d8 Compare March 9, 2021 03:25

akankshamahajan15 force-pushed the stall_writes_wbm branch from b69f0d8 to 02167e0 Compare March 9, 2021 03:25

facebook-github-bot reviewed Mar 9, 2021

View reviewed changes

akankshamahajan15 requested review from ajkr and anand1976 March 9, 2021 03:31

anand1976 reviewed Mar 29, 2021

View reviewed changes

memtable/write_buffer_manager.cc Show resolved Hide resolved

memtable/write_buffer_manager.cc Outdated Show resolved Hide resolved

memtable/write_buffer_manager.cc Outdated Show resolved Hide resolved

db/db_impl/db_impl.h Outdated Show resolved Hide resolved

akankshamahajan15 force-pushed the stall_writes_wbm branch from 02167e0 to 80b6cf9 Compare April 1, 2021 01:31

akankshamahajan15 force-pushed the stall_writes_wbm branch from 9c5598f to 441f58d Compare April 19, 2021 20:42

anand1976 reviewed Apr 19, 2021

View reviewed changes

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved

akankshamahajan15 requested a review from anand1976 April 20, 2021 17:22

anand1976 approved these changes Apr 21, 2021

View reviewed changes

db/db_impl/db_impl.cc Outdated Show resolved Hide resolved

memtable/write_buffer_manager.cc Show resolved Hide resolved

akankshamahajan15 added 7 commits April 20, 2021 21:32

Add test db_write_buffer_manager_test.cc to test stalling of writes.

6eda1f5

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Fix ASAN failures

d4ddbd4

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Addressed comments and remove stall if buffer_size is increased.

8963ab8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Addressed comments

46059a2

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Address missed signals to end stall

fa00b33

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Remove DB from queue when DB is in stall and DB::Close is called

8f8d503

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

akankshamahajan15 force-pushed the stall_writes_wbm branch from 335ee06 to b3616c5 Compare April 21, 2021 17:37

Update History.md and minor changes

1c81a1a

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

akankshamahajan15 force-pushed the stall_writes_wbm branch from b3616c5 to 1c81a1a Compare April 21, 2021 19:02

facebook-github-bot closed this in 596e900 Apr 21, 2021

facebook-github-bot added the Merged label Apr 21, 2021

Myasuka mentioned this pull request Oct 26, 2021

[RocksJava] Introduce allowStall option for write buffer manager constructor #9076

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898

Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898

akankshamahajan15 commented Jan 26, 2021 •

edited

Loading

akankshamahajan15 commented Jan 26, 2021

facebook-github-bot left a comment

anand1976 Feb 8, 2021

facebook-github-bot commented Feb 17, 2021

akankshamahajan15 commented Feb 17, 2021

facebook-github-bot left a comment

ajkr commented Feb 18, 2021 •

edited

Loading

akankshamahajan15 commented Feb 18, 2021

akankshamahajan15 commented Feb 18, 2021

ajkr commented Feb 18, 2021

akankshamahajan15 commented Feb 18, 2021

facebook-github-bot commented Mar 9, 2021

facebook-github-bot commented Mar 9, 2021

akankshamahajan15 commented Mar 9, 2021

facebook-github-bot left a comment

anand1976 commented Mar 29, 2021

facebook-github-bot commented Apr 19, 2021

facebook-github-bot commented Apr 19, 2021

akankshamahajan15 commented Apr 19, 2021

facebook-github-bot commented Apr 19, 2021

facebook-github-bot commented Apr 20, 2021

anand1976 left a comment

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898

Stall writes in WriteBufferManager when memory_usage exceeds buffer_size #7898

Conversation

akankshamahajan15 commented Jan 26, 2021 • edited Loading

akankshamahajan15 commented Jan 26, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

anand1976 Feb 8, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Feb 17, 2021

akankshamahajan15 commented Feb 17, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

ajkr commented Feb 18, 2021 • edited Loading

akankshamahajan15 commented Feb 18, 2021

akankshamahajan15 commented Feb 18, 2021

ajkr commented Feb 18, 2021

akankshamahajan15 commented Feb 18, 2021

facebook-github-bot commented Mar 9, 2021

facebook-github-bot commented Mar 9, 2021

akankshamahajan15 commented Mar 9, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

anand1976 commented Mar 29, 2021

facebook-github-bot commented Apr 19, 2021

facebook-github-bot commented Apr 19, 2021

akankshamahajan15 commented Apr 19, 2021

facebook-github-bot commented Apr 19, 2021

facebook-github-bot commented Apr 20, 2021

anand1976 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

facebook-github-bot commented Apr 21, 2021

akankshamahajan15 commented Jan 26, 2021 •

edited

Loading

ajkr commented Feb 18, 2021 •

edited

Loading