Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable manual compaction during ReFitLevel() #7250

Closed
wants to merge 7 commits into from

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Aug 12, 2020

Manual compaction with CompactRangeOptions::change_levels set could
refit to a level targeted by another manual compaction. If
force_consistency_checks were disabled, it could be possible for
overlapping files to be written at that target level.

This PR prevents the possibility by calling DisableManualCompaction()
prior to ReFitLevel(). It also improves the manual compaction disabling
mechanism to wait for pending manual compactions to complete before
returning, and support disabling from multiple threads.

Fixes #6432.

Test Plan: crash test command that repro'd the bug reliably:

$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --simple -target_file_size_base=524288 -write_buffer_size=1048576 -clear_column_family_one_in=0 -reopen=0 -max_key=10000000 -column_families=1 -max_background_compactions=8 -compact_range_one_in=100000 -compression_type=none -compaction_style=1 -num_levels=5 -universal_min_merge_width=4 -universal_max_merge_width=8 -level0_file_num_compaction_trigger=12 -rate_limiter_bytes_per_sec=1048576000 -universal_max_size_amplification_percent=100 --duration=3600 --interval=60 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --enable_compaction_filter=0

@ajkr
Copy link
Contributor Author

ajkr commented Aug 12, 2020

Will let the crash test run for a long time. While this diff was still undergoing changes, I saw an unexplained assertion failure on AssertHeld(), and another unexplained failure saying "Compaction input files inconsistent". I am not sure those are fixed in the current version, or if they are related to this problem at all.

@ajkr ajkr requested a review from siying August 12, 2020 23:10
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

manual_compaction_paused_.store(false, std::memory_order_release);
InstrumentedMutexLock l(&mutex_);
assert(manual_compaction_paused_ > 0);
manual_compaction_paused_.fetch_add(-1, std::memory_order_release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a signal here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any pending manual compactions were already finished (typically by failing with Status::Incomplete) during DisableManualCompaction(), so it's my understanding that nothing is waiting on this condition. manual_compaction_paused_ is quite a confusing name as nothing seems to be paused..

Tangential - another name that confused me a lot in this PR is "incomplete", as a manual compaction returning Status::Incomplete, and a manual compaction setting ManualCompactionState::incomplete, mean completely different things :p.

@@ -848,11 +848,17 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options,
if (options.change_level) {
ROCKS_LOG_INFO(immutable_db_options_.info_log,
"[RefitLevel] waiting for background threads to stop");
DisableManualCompaction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see this would prevent some issues, but is only disabling manual compaction in refitting phase enough? If it is not, should we disable it in the whole CompactRange() phase if change_level is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR ensures any concurrent manual compaction commits its files before we do ReFitLevel(). We know ReFitLevel() has a bug in identifying which levels are empty -- but that bug appears independent of whether those levels became nonempty due to a manual compaction or automatic compaction. I don't know of any bug at this point that is specific to manual compaction running concurrently with a change_level CompactRange() and committing prior to ReFitLevel().

That said, I wouldn't mind if we disabled all flushes, compactions, and ingestions for the entire duration of a change_level CompactRange(). Then we can simplify things a lot. But it's a significant change in terms of code and behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. Disabling manual compaction here and check level emptiness might be good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: DBImpl::refitting_level_ (see https://github.com/facebook/rocksdb/blob/master/db/db_impl/db_impl_compaction_flush.cc#L1300-L1306) can probably be removed now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible two manual compactions both finish all their RunManualCompaction() steps before either one calls DisableManualCompaction()? In that case the calls to DisableManualCompaction() do not appear effective since manual compactions are only registered during RunManualCompaction() steps. Then, the two ReFitLevel()s can execute in parallel. I do hope we can consolidate these eventually, just am not sure we're there yet.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@ajkr
Copy link
Contributor Author

ajkr commented Aug 13, 2020

Will let the crash test run for a long time. While this diff was still undergoing changes, I saw an unexplained assertion failure on AssertHeld(), and another unexplained failure saying "Compaction input files inconsistent". I am not sure those are fixed in the current version, or if they are related to this problem at all.

The crash test succeeds for several hours.
Also, added a unit test, which is verified to repro the overlapping invariant violation.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ajkr ajkr requested a review from ltamasi August 13, 2020 17:47
Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the change. Maybe @ltamasi can give it another pass, as it is tricky.

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @ajkr ! LGTM, just a couple of minor comments.

s = PauseBackgroundWork();
bool bg_paused = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but we could achieve the same effect by moving ContinueBackgroundWork inside the if block, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, done.

manual_compaction_paused_.store(false, std::memory_order_release);
InstrumentedMutexLock l(&mutex_);
assert(manual_compaction_paused_ > 0);
manual_compaction_paused_.fetch_add(-1, std::memory_order_release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use fetch_sub here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

db/db_test2.cc Outdated
auto paused = reinterpret_cast<std::atomic<bool>*>(arg);
ASSERT_FALSE(paused->load(std::memory_order_acquire));
paused->store(true, std::memory_order_release);
auto paused = reinterpret_cast<std::atomic<int>*>(arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: static_cast would suffice here and below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -848,11 +848,17 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options,
if (options.change_level) {
ROCKS_LOG_INFO(immutable_db_options_.info_log,
"[RefitLevel] waiting for background threads to stop");
DisableManualCompaction();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: DBImpl::refitting_level_ (see https://github.com/facebook/rocksdb/blob/master/db/db_impl/db_impl_compaction_flush.cc#L1300-L1306) can probably be removed now?

Manual compaction with `CompactRangeOptions::change_levels` set could
refit to a level targeted by another manual compaction. If
force_consistency_checks were disabled, it could be possible for
overlapping files to be written at that target level.

This PR prevents the possibility by calling `DisableManualCompaction()`
prior to `ReFitLevel()`. It also improves the manual compaction disabling
mechanism to wait for pending manual compactions to complete before
returning, and support disabling from multiple threads.

Fixes facebook#6432.

Test Plan: crash test command that repro'd the bug reliably:

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --simple -target_file_size_base=524288 -write_buffer_size=1048576 -clear_column_family_one_in=0 -reopen=0 -max_key=10000000 -column_families=1 -max_background_compactions=8 -compact_range_one_in=100000 -compression_type=none -compaction_style=1 -num_levels=5 -universal_min_merge_width=4 -universal_max_merge_width=8 -level0_file_num_compaction_trigger=12 -rate_limiter_bytes_per_sec=1048576000 -universal_max_size_amplification_percent=100 --duration=3600 --interval=60 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --enable_compaction_filter=0
```
Copy link
Contributor Author

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the fast and detailed reviews!

@@ -848,11 +848,17 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options,
if (options.change_level) {
ROCKS_LOG_INFO(immutable_db_options_.info_log,
"[RefitLevel] waiting for background threads to stop");
DisableManualCompaction();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible two manual compactions both finish all their RunManualCompaction() steps before either one calls DisableManualCompaction()? In that case the calls to DisableManualCompaction() do not appear effective since manual compactions are only registered during RunManualCompaction() steps. Then, the two ReFitLevel()s can execute in parallel. I do hope we can consolidate these eventually, just am not sure we're there yet.

s = PauseBackgroundWork();
bool bg_paused = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, done.

manual_compaction_paused_.store(false, std::memory_order_release);
InstrumentedMutexLock l(&mutex_);
assert(manual_compaction_paused_ > 0);
manual_compaction_paused_.fetch_add(-1, std::memory_order_release);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

db/db_test2.cc Outdated
auto paused = reinterpret_cast<std::atomic<bool>*>(arg);
ASSERT_FALSE(paused->load(std::memory_order_acquire));
paused->store(true, std::memory_order_release);
auto paused = reinterpret_cast<std::atomic<int>*>(arg);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr merged this pull request in a1aa3f8.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
Manual compaction with `CompactRangeOptions::change_levels` set could
refit to a level targeted by another manual compaction. If
force_consistency_checks were disabled, it could be possible for
overlapping files to be written at that target level.

This PR prevents the possibility by calling `DisableManualCompaction()`
prior to `ReFitLevel()`. It also improves the manual compaction disabling
mechanism to wait for pending manual compactions to complete before
returning, and support disabling from multiple threads.

Fixes facebook#6432.

Pull Request resolved: facebook#7250

Test Plan:
crash test command that repro'd the bug reliably:

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --simple -target_file_size_base=524288 -write_buffer_size=1048576 -clear_column_family_one_in=0 -reopen=0 -max_key=10000000 -column_families=1 -max_background_compactions=8 -compact_range_one_in=100000 -compression_type=none -compaction_style=1 -num_levels=5 -universal_min_merge_width=4 -universal_max_merge_width=8 -level0_file_num_compaction_trigger=12 -rate_limiter_bytes_per_sec=1048576000 -universal_max_size_amplification_percent=100 --duration=3600 --interval=60 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --enable_compaction_filter=0
```

Reviewed By: ltamasi

Differential Revision: D23090800

Pulled By: ajkr

fbshipit-source-id: afcbcd51b42ce76789fdb907d8b9ada790709c13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data corruption with concurrent manual compactions
4 participants