Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix interference between max_total_wal_size and db_write_buffer_size checks #1893

Closed
wants to merge 1 commit into from

Conversation

al13n321
Copy link
Contributor

This is a trivial fix for OOMs we've seen a few days ago in logdevice.

RocksDB get into the following state:
(1) Write throughput is too high for flushes to keep up. Compactions are out of the picture - automatic compactions are disabled, and for manual compactions we don't care that much if they fall behind. We write to many CFs, with only a few L0 sst files in each, so compactions are not needed most of the time.
(2) total_log_size_ is consistently greater than GetMaxTotalWalSize(). It doesn't get smaller since flushes are falling ever further behind.
(3) Total size of memtables is way above db_write_buffer_size and keeps growing. But the write_buffer_manager_->ShouldFlush() is not checked because (2) prevents it (for no good reason, afaict; this is what this commit fixes).
(4) Every call to WriteImpl() hits the MaybeFlushColumnFamilies() path. This keeps flushing the memtables one by one in order of increasing log file number.
(5) No write stalling trigger is hit. We rely on max_write_buffer_number to stall writes when flushes can't keep up, but (3) prevents it from kicking in. Memtables keep piling up in memory until we run OOM.

@facebook-github-bot
Copy link
Contributor

@yiwu-arbug has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Just try to fully understand the bug, why (4) didn't ultimately flush one of the large memtable which should resolves the situation?

@al13n321
Copy link
Contributor Author

why (4) didn't ultimately flush one of the large memtable which should resolves the situation?

At the time of sending the PR I thought that it was because flushes couldn't keep up (i.e. (4) kept flushing memtables one by one, but new writes kept coming, creating new memtables faster than (4) flushed them). This seems like a plausible scenario, worth fixing.

But apparently that's not what was happening there. Instead, a MaybeFlushColumnFamilies() call has set alive_log_files_.begin()->getting_flushed to true and requested some flushes. Completion of the requested flushes was supposed to make the oldest file obsolete and remove it from the alive_log_files_ list. But that never happened, I have no idea why; investigating. A few hours later we got an OOM.

@yiwu-arbug
Copy link
Contributor

make sense. when alive_log_files_.begin()->getting_flushed is true MaybeFlushColumnFamilies() is short-circuit as well: https://fburl.com/ykmm3z3s

@al13n321
Copy link
Contributor Author

But apparently that's not what was happening there.

I take that back after looking a bit more at the logs. Both this and #1903 were happening; most of the OOMs followed the scenario from this PR.

@al13n321 al13n321 deleted the fl branch February 23, 2017 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants