-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable manual compaction during ReFitLevel()
#7250
Conversation
Will let the crash test run for a long time. While this diff was still undergoing changes, I saw an unexplained assertion failure on |
afd19a2
to
61afb8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
manual_compaction_paused_.store(false, std::memory_order_release); | ||
InstrumentedMutexLock l(&mutex_); | ||
assert(manual_compaction_paused_ > 0); | ||
manual_compaction_paused_.fetch_add(-1, std::memory_order_release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a signal here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any pending manual compactions were already finished (typically by failing with Status::Incomplete
) during DisableManualCompaction()
, so it's my understanding that nothing is waiting on this condition. manual_compaction_paused_
is quite a confusing name as nothing seems to be paused..
Tangential - another name that confused me a lot in this PR is "incomplete", as a manual compaction returning Status::Incomplete
, and a manual compaction setting ManualCompactionState::incomplete
, mean completely different things :p.
@@ -848,11 +848,17 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options, | |||
if (options.change_level) { | |||
ROCKS_LOG_INFO(immutable_db_options_.info_log, | |||
"[RefitLevel] waiting for background threads to stop"); | |||
DisableManualCompaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see this would prevent some issues, but is only disabling manual compaction in refitting phase enough? If it is not, should we disable it in the whole CompactRange() phase if change_level is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR ensures any concurrent manual compaction commits its files before we do ReFitLevel()
. We know ReFitLevel()
has a bug in identifying which levels are empty -- but that bug appears independent of whether those levels became nonempty due to a manual compaction or automatic compaction. I don't know of any bug at this point that is specific to manual compaction running concurrently with a change_level
CompactRange()
and committing prior to ReFitLevel()
.
That said, I wouldn't mind if we disabled all flushes, compactions, and ingestions for the entire duration of a change_level
CompactRange()
. Then we can simplify things a lot. But it's a significant change in terms of code and behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. Disabling manual compaction here and check level emptiness might be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: DBImpl::refitting_level_
(see https://github.com/facebook/rocksdb/blob/master/db/db_impl/db_impl_compaction_flush.cc#L1300-L1306) can probably be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be possible two manual compactions both finish all their RunManualCompaction()
steps before either one calls DisableManualCompaction()
? In that case the calls to DisableManualCompaction()
do not appear effective since manual compactions are only registered during RunManualCompaction()
steps. Then, the two ReFitLevel()
s can execute in parallel. I do hope we can consolidate these eventually, just am not sure we're there yet.
30765c2
to
02cfa16
Compare
@ajkr has updated the pull request. You must reimport the pull request before landing. |
The crash test succeeds for several hours. |
@ajkr has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@ajkr has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the change. Maybe @ltamasi can give it another pass, as it is tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @ajkr ! LGTM, just a couple of minor comments.
s = PauseBackgroundWork(); | ||
bool bg_paused = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but we could achieve the same effect by moving ContinueBackgroundWork
inside the if
block, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, done.
manual_compaction_paused_.store(false, std::memory_order_release); | ||
InstrumentedMutexLock l(&mutex_); | ||
assert(manual_compaction_paused_ > 0); | ||
manual_compaction_paused_.fetch_add(-1, std::memory_order_release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use fetch_sub
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
db/db_test2.cc
Outdated
auto paused = reinterpret_cast<std::atomic<bool>*>(arg); | ||
ASSERT_FALSE(paused->load(std::memory_order_acquire)); | ||
paused->store(true, std::memory_order_release); | ||
auto paused = reinterpret_cast<std::atomic<int>*>(arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: static_cast
would suffice here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -848,11 +848,17 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options, | |||
if (options.change_level) { | |||
ROCKS_LOG_INFO(immutable_db_options_.info_log, | |||
"[RefitLevel] waiting for background threads to stop"); | |||
DisableManualCompaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: DBImpl::refitting_level_
(see https://github.com/facebook/rocksdb/blob/master/db/db_impl/db_impl_compaction_flush.cc#L1300-L1306) can probably be removed now?
Manual compaction with `CompactRangeOptions::change_levels` set could refit to a level targeted by another manual compaction. If force_consistency_checks were disabled, it could be possible for overlapping files to be written at that target level. This PR prevents the possibility by calling `DisableManualCompaction()` prior to `ReFitLevel()`. It also improves the manual compaction disabling mechanism to wait for pending manual compactions to complete before returning, and support disabling from multiple threads. Fixes facebook#6432. Test Plan: crash test command that repro'd the bug reliably: ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --simple -target_file_size_base=524288 -write_buffer_size=1048576 -clear_column_family_one_in=0 -reopen=0 -max_key=10000000 -column_families=1 -max_background_compactions=8 -compact_range_one_in=100000 -compression_type=none -compaction_style=1 -num_levels=5 -universal_min_merge_width=4 -universal_max_merge_width=8 -level0_file_num_compaction_trigger=12 -rate_limiter_bytes_per_sec=1048576000 -universal_max_size_amplification_percent=100 --duration=3600 --interval=60 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --enable_compaction_filter=0 ```
09eb9d5
to
46cf49c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the fast and detailed reviews!
@@ -848,11 +848,17 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options, | |||
if (options.change_level) { | |||
ROCKS_LOG_INFO(immutable_db_options_.info_log, | |||
"[RefitLevel] waiting for background threads to stop"); | |||
DisableManualCompaction(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be possible two manual compactions both finish all their RunManualCompaction()
steps before either one calls DisableManualCompaction()
? In that case the calls to DisableManualCompaction()
do not appear effective since manual compactions are only registered during RunManualCompaction()
steps. Then, the two ReFitLevel()
s can execute in parallel. I do hope we can consolidate these eventually, just am not sure we're there yet.
s = PauseBackgroundWork(); | ||
bool bg_paused = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, done.
manual_compaction_paused_.store(false, std::memory_order_release); | ||
InstrumentedMutexLock l(&mutex_); | ||
assert(manual_compaction_paused_ > 0); | ||
manual_compaction_paused_.fetch_add(-1, std::memory_order_release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
db/db_test2.cc
Outdated
auto paused = reinterpret_cast<std::atomic<bool>*>(arg); | ||
ASSERT_FALSE(paused->load(std::memory_order_acquire)); | ||
paused->store(true, std::memory_order_release); | ||
auto paused = reinterpret_cast<std::atomic<int>*>(arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@ajkr has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Manual compaction with `CompactRangeOptions::change_levels` set could refit to a level targeted by another manual compaction. If force_consistency_checks were disabled, it could be possible for overlapping files to be written at that target level. This PR prevents the possibility by calling `DisableManualCompaction()` prior to `ReFitLevel()`. It also improves the manual compaction disabling mechanism to wait for pending manual compactions to complete before returning, and support disabling from multiple threads. Fixes facebook#6432. Pull Request resolved: facebook#7250 Test Plan: crash test command that repro'd the bug reliably: ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --simple -target_file_size_base=524288 -write_buffer_size=1048576 -clear_column_family_one_in=0 -reopen=0 -max_key=10000000 -column_families=1 -max_background_compactions=8 -compact_range_one_in=100000 -compression_type=none -compaction_style=1 -num_levels=5 -universal_min_merge_width=4 -universal_max_merge_width=8 -level0_file_num_compaction_trigger=12 -rate_limiter_bytes_per_sec=1048576000 -universal_max_size_amplification_percent=100 --duration=3600 --interval=60 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --enable_compaction_filter=0 ``` Reviewed By: ltamasi Differential Revision: D23090800 Pulled By: ajkr fbshipit-source-id: afcbcd51b42ce76789fdb907d8b9ada790709c13
Manual compaction with
CompactRangeOptions::change_levels
set couldrefit to a level targeted by another manual compaction. If
force_consistency_checks were disabled, it could be possible for
overlapping files to be written at that target level.
This PR prevents the possibility by calling
DisableManualCompaction()
prior to
ReFitLevel()
. It also improves the manual compaction disablingmechanism to wait for pending manual compactions to complete before
returning, and support disabling from multiple threads.
Fixes #6432.
Test Plan: crash test command that repro'd the bug reliably: