Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cut output files at compaction cursors #10227

Closed

Conversation

littlepig2013
Copy link
Contributor

@littlepig2013 littlepig2013 commented Jun 22, 2022

Summary:

The files behind the compaction cursor contain newer data than the files ahead of it. If a compaction writes a file that spans from before its output level’s cursor to after it, then data before the cursor will be contaminated with the old timestamp from the data after the cursor. To avoid this, we can split the output file into two – one entirely before the cursor and one entirely after the cursor. Note that, in rare cases, we DO NOT need to cut the file if it is a trivial move since the file will not be contaminated by older files. In such case, the compact cursor is not guaranteed to be the boundary of the file, but it does not hurt the round-robin selection process.

Test Plan: Add 'RoundRobinCutOutputAtCompactCursor' unit test in db_compaction_test

Task: T122216351

@littlepig2013 littlepig2013 force-pushed the cut_output_at_cmpct_cursors branch 5 times, most recently from cf949b1 to 0ecab03 Compare June 22, 2022 19:02
@littlepig2013 littlepig2013 changed the title [DRAFT] Cut output files at compaction cursors Cut output files at compaction cursors Jun 22, 2022
Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@littlepig2013 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

const InternalKey output_split_key = c->GetOutputSplitKey();
if (output_split_key.Valid()) {
output_split_user_key = ExtractUserKey(output_split_key.Encode());
bounds.emplace_back(output_split_key.Encode());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusingly, InternalKey::Encode() does not make a copy. I think that makes it unsafe for use in bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace output_split_key in 648 with c->GetOutputSplitKey() should be fine?

facebook-github-bot pushed a commit that referenced this pull request Jul 27, 2022
Summary:
Update HISTORY.md for CompactionPri::kRoundRobin. Detailed implementation can be found in [PR10107](#10107), [PR10227](#10227), [PR10250](#10250), [PR10278](#10278), [PR10316](#10316), and [PR10341](#10341)

Pull Request resolved: #10421

Reviewed By: ajkr

Differential Revision: D38194070

Pulled By: littlepig2013

fbshipit-source-id: 4ce153dc0bf22cd865d09c5429955023dbc90f37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants