Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent corruption with parallel manual compactions and change_level == true #9077

Closed

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Oct 26, 2021

The bug can impact the following scenario. There must be two CompactRange()s, call them A and B. Compaction A must have change_level=true. Compactions A and B must run in parallel, and new data must be added while they run as well.

Now, on to the details of the race condition. Compaction A must reach the refitting phase while B's next step is to trivial move new data (i.e., data that has been inserted behind A) down to the same level that A's refit targets (CompactRangeOptions::target_level). B must be unregistered (i.e., has not yet called AddManualCompaction() for the current RunManualCompaction()) while A invokes DisableManualCompaction()s to prepare for refitting. In the old code, B could still proceed to register a manual compaction, while A had disabled manual compaction.

The next part of the race condition is B picks and schedules a trivial move while A has released the lock in refitting phase in order to persist the LSM state change (i.e., the log phase of LogAndApply()). That way, B does not see the refitted data when picking a trivial-move compaction. So it is susceptible to picking one that overlaps.

Finally, B executes the picked trivial-move compaction. Trivial-move compactions are special in that they never check whether manual compaction is disabled. So the picked compaction causing overlap ends up being applied, leading to LSM corruption if force_consistency_checks=false, or entering read-only mode with Status::Corruption if force_consistency_checks=true (the default).

The fix is just to prevent B from registering itself in RunManualCompaction() while manual compactions are disabled, consequently preventing any trivial move or other compaction from being picked/scheduled.

Thanks to @siying for finding the bug.

Test Plan: The test does not go all the way in exposing the bug because it requires a compaction to be picked/scheduled while logging LSM state change for RefitLevel(). But the fix is to make such a compaction not picked/scheduled in the first place, so any repro of that scenario would end up hanging RefitLevel() logging. So instead I just verified no such compaction is registered in the scenario where RefitLevel() disables manual compactions.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I have a comment to the unit test.

{
"DBCompactionTest::ChangeLevelConflictsWithManual:"
"PostForegroundCompactRange",
"DBImpl::CompactRange:PostRefitLevel",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug is only triggers when refitting has finished and EnableManualCompaction() is called, before background job started (for the trivial move). If possible, we should keep this order in the test. Of course, after the fix, there won't be a background job anymore, so that sync dependency would be an no-op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try this again. IIRC it can only repro with a sync point ensuring BackgroundCallCompaction starts before the refit level MANIFEST write proceeds (to ensure the trivial move is picked before the refitting is applied). And that sync point must hang after this fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree my unit test is misleading in any case. I've been considering whether to repurpose it to just verify manual compactions are not registered while manual_compaction_paused_ > 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try this again. IIRC it can only repro with a sync point ensuring BackgroundCallCompaction starts before the refit level MANIFEST write proceeds (to ensure the trivial move is picked before the refitting is applied). And that sync point must hang after this fix.

But how would that fail? BackgroundCallCompaction would check manual_compaction_paused and fail. It only gets through if refitting completely finishes and releases manual_compaction_paused. Maybe it's another bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try this again. IIRC it can only repro with a sync point ensuring BackgroundCallCompaction starts before the refit level MANIFEST write proceeds (to ensure the trivial move is picked before the refitting is applied). And that sync point must hang after this fix.

But how would that fail? BackgroundCallCompaction would check manual_compaction_paused and fail. It only gets through if refitting completely finishes and releases manual_compaction_paused. Maybe it's another bug?

Sorry, I meant the dependency I mentioned is needed in addition to the one you mentioned, not in place of it. Looking at your test case, it seems to have both:

            {
                "BackgroundCallCompaction:0",
                "VersionSet::LogAndApply:WriteManifestStart",
            },
            {
                "DBImpl::CompactRange:PostRefitLevel:ManualCompactionEnabled",
                "BackgroundCallCompaction:0.5",
            },

The first dependency shown above causes hang when no compaction is picked/scheduled, but it is necessary to repro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first dependency shown above causes hang when no compaction is picked/scheduled, but it is necessary to repro.

I actually meant the second one. Is there an equivalence here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but the second one alone doesn't repro anything. Anyways, I can just add it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Forget about it then. LGTM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I meant it doesn't guarantee to repro. But it should be good anyways. This is done now.

@facebook-github-bot
Copy link
Contributor

@ajkr has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ajkr merged this pull request in f24c39a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants