Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DisableManualCompaction may fail to cancel an unscheduled task #9659

Closed
wants to merge 7 commits into from

Conversation

jay-zhuang
Copy link
Contributor

@jay-zhuang jay-zhuang commented Mar 4, 2022

#9625 didn't change the unschedule condition which was waiting for the background thread to clean-up the compaction.
make sure we only unschedule the task when it's scheduled.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jay-zhuang jay-zhuang changed the title Avoid DisableManualCompaction to unschedule an unscheduled task Avoid DisableManualCompaction to cancel an unscheduled task Mar 4, 2022
@jay-zhuang jay-zhuang changed the title Avoid DisableManualCompaction to cancel an unscheduled task DisableManualCompaction may unable to cancel an unscheduled task Mar 4, 2022
@jay-zhuang jay-zhuang changed the title DisableManualCompaction may unable to cancel an unscheduled task DisableManualCompaction may fail to cancel an unscheduled task Mar 4, 2022
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment on lines 1910 to 1912
manual.done = true;
manual.status =
Status::Incomplete(Status::SubCode::kManualCompactionPaused);
Copy link
Contributor

@ajkr ajkr Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a compaction has been scheduled and is currently executing, should we still wait for !manual.in_progress?

edit: For example if RunManualCompaction() ends while BG compaction is still running, BG compaction may attempt to modify an already-deleted ManualCompactionState.

What if we have this just below the bg_cv_.Wait() ?

if (manual.in_progress) {
  continue;
}

Copy link
Contributor Author

@jay-zhuang jay-zhuang Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. after looking into it, just checking manual.in_progress might not be enough, because the BG thread might already picked the compaction task but not yet update the manual.in_progress (as the thread-pool queue is not protected by db.mutex, I just added an unittest for such case).
How about make manual a shared_ptr?

(other options I considered but has their problems:

  1. env_->UnSchedule() returns the number of tasks it cancelled, we can check that to make sure a scheduled compaction is cancelled, otherwise wait. The problem is it cannot work for multiple non-exclusive manual compactions as one manual compaction will unschedule() 2 jobs, another won't unschedule() any.
  2. add manual job id in the Tag (hacky and may have problem if there're too many manual compactions)
  3. coordinate between all FG manual compaction threads to make sure all tasks are unscheduled, if not, wait. (I think it's possible, but seems more complicated than just make manual a shared_ptr.)

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@jay-zhuang jay-zhuang marked this pull request as ready for review March 9, 2022 20:32
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

db_->DisableManualCompaction();

compact_thread.join();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests work for me before the fix. Are they expected to fail?

Copy link
Contributor Author

@jay-zhuang jay-zhuang Mar 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DisableJustStartedManualCompaction should fail. I consistently get segfault on macos:
image

For DisableMultiManualCompaction, yes, it success without the fix. It is used to test one foreground manual compaction unschedule 2 tasks from the queue and the other one can still return with InComplete status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's fine on linux, even passed ASAN and TSAN...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info - tests LGTM now. Some notes below -

The first test could produce a hang without the fix if I deleted the early-return in RunManualCompaction() based on manual compaction being paused. That early return seems to prevent us from actually scheduling background compactions to the low pri pool.

Regarding the second test, there might be something platform dependent about accessing a variable from an already-returned stack frame. I can't find an error even though the ManualCompactionState clearly went out of scope and should've been destroyed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's fine on linux, even passed ASAN and TSAN...

I have some task about enabling ASAN_OPTIONS=detect_stack_use_after_return=1 (T94055374). Can't remember what motivated that but it'd be interesting to see if it'd help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it helps with ASAN_OPTIONS="detect_stack_use_after_return=1":
image

(another hacky way to force it fail is calling some other functions after compactRange() to mess up the stack, but I think we should use detect_stack_use_after_return).

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr pushed a commit that referenced this pull request Mar 13, 2022
Summary:
#9625 didn't change the unschedule condition which was waiting for the background thread to clean-up the compaction.
make sure we only unschedule the task when it's scheduled.

Pull Request resolved: #9659

Reviewed By: ajkr

Differential Revision: D34651820

Pulled By: jay-zhuang

fbshipit-source-id: 23f42081b15ec8886cd81cbf131b116e0c74dc2f
jay-zhuang added a commit to jay-zhuang/rocksdb that referenced this pull request Mar 15, 2022
In facebook#9659, when `DisableManualCompaction()` is issued, the foreground
manual compaction thread does not have to wait background compaction
thread to finish. Which could be a problem that the user re-enable
manual compaction with `EnableManualCompaction()`, it may re-enable the
BG compaction which supposed be cancelled.
This patch makes the FG compaction wait on
`manual_compaction_state.done`, which either be set by BG compaction or
Unschedule callback. Then when FG manual compaction thread returns, it
should not have BG compaction running. So shared_ptr is no longer needed
for `manual_compaction_state`.

Test Plan: a StressTest and unittest
facebook-github-bot pushed a commit that referenced this pull request Mar 15, 2022
Summary:
In #9659, when `DisableManualCompaction()` is issued, the foreground
manual compaction thread does not have to wait background compaction
thread to finish. Which could be a problem that the user re-enable
manual compaction with `EnableManualCompaction()`, it may re-enable the
BG compaction which supposed be cancelled.
This patch makes the FG compaction wait on
`manual_compaction_state.done`, which either be set by BG compaction or
Unschedule callback. Then when FG manual compaction thread returns, it
should not have BG compaction running. So shared_ptr is no longer needed
for `manual_compaction_state`.

Pull Request resolved: #9694

Test Plan: a StressTest and unittest

Reviewed By: ajkr

Differential Revision: D34885472

Pulled By: jay-zhuang

fbshipit-source-id: e6476175b43e8c59cd49f5c09241036a0716c274
ajkr pushed a commit that referenced this pull request Mar 23, 2022
Summary:
#9625 didn't change the unschedule condition which was waiting for the background thread to clean-up the compaction.
make sure we only unschedule the task when it's scheduled.

Pull Request resolved: #9659

Reviewed By: ajkr

Differential Revision: D34651820

Pulled By: jay-zhuang

fbshipit-source-id: 23f42081b15ec8886cd81cbf131b116e0c74dc2f
ajkr pushed a commit that referenced this pull request Mar 23, 2022
Summary:
In #9659, when `DisableManualCompaction()` is issued, the foreground
manual compaction thread does not have to wait background compaction
thread to finish. Which could be a problem that the user re-enable
manual compaction with `EnableManualCompaction()`, it may re-enable the
BG compaction which supposed be cancelled.
This patch makes the FG compaction wait on
`manual_compaction_state.done`, which either be set by BG compaction or
Unschedule callback. Then when FG manual compaction thread returns, it
should not have BG compaction running. So shared_ptr is no longer needed
for `manual_compaction_state`.

Pull Request resolved: #9694

Test Plan: a StressTest and unittest

Reviewed By: ajkr

Differential Revision: D34885472

Pulled By: jay-zhuang

fbshipit-source-id: e6476175b43e8c59cd49f5c09241036a0716c274
ajkr pushed a commit that referenced this pull request Mar 29, 2022
Summary:
In #9659, when `DisableManualCompaction()` is issued, the foreground
manual compaction thread does not have to wait background compaction
thread to finish. Which could be a problem that the user re-enable
manual compaction with `EnableManualCompaction()`, it may re-enable the
BG compaction which supposed be cancelled.
This patch makes the FG compaction wait on
`manual_compaction_state.done`, which either be set by BG compaction or
Unschedule callback. Then when FG manual compaction thread returns, it
should not have BG compaction running. So shared_ptr is no longer needed
for `manual_compaction_state`.

Pull Request resolved: #9694

Test Plan: a StressTest and unittest

Reviewed By: ajkr

Differential Revision: D34885472

Pulled By: jay-zhuang

fbshipit-source-id: e6476175b43e8c59cd49f5c09241036a0716c274
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants