Refactor and support option migration for db with multiple CFs#14059
Refactor and support option migration for db with multiple CFs#14059hx235 wants to merge 2 commits intofacebook:mainfrom
Conversation
3cadcba to
632e2ce
Compare
|
crash test failed with an irrelevant error |
xingbowang
left a comment
There was a problem hiding this comment.
Looks good to me. I am curious that whether our existing crash test support config migration. If so, do we need to expand to support multi-cf?
| // violated. | ||
| // | ||
| // WARNING: using this to migrate from non-FIFO to FIFO compaction | ||
| // with `max_table_files_size` > 0 can cause the whole DB to be dropped right |
There was a problem hiding this comment.
Given the data dropping concern, is it possible to add an extra flag "fail_on_data_deletion=true" during the migration, so that if there is any data was deleted during migration, the migration would fail, and print how much data would be deleted. If data deletion is intentional, client could set the flag to false and then retry, to make sure it is indeed the intended behavior. I know we don't like to add new flags. However, given the potential impact, accident data deletion, I felt it is worth to add a new flag for this if possible.
There was a problem hiding this comment.
so that if there is any data was deleted during migration, the migration would fail, and print how much data would be deleted. ... client could set the flag to false and then retry,
That's a good suggestion. The challenge is correct estimation of to-be-delete data without actually doing the compaction so users can retry later. I will need to think more in a different PR. The API has been in here for 5+ years and hopefully the WARNING (and the nature of FIFO compaction dropping data) exist long enough so the existing users won't make the mistake.
| new_opts.compaction_style == CompactionStyle::kCompactionStyleLevel) && | ||
| new_opts.num_levels == 1) || | ||
| new_opts.compaction_style == CompactionStyle::kCompactionStyleFIFO) { | ||
| base_opts->target_file_size_base = 999999999999999; |
There was a problem hiding this comment.
Thanks - it ends up having to be UINT64_MAX/2 to avoid some overflow in later computation with target_file_size_base.
There was a problem hiding this comment.
Actually, there exists a subtle bug in the later computation with target_file_size_base in MultiplyCheckOverflow()
Line 1077 in b9951de
uint64_t MultiplyCheckOverflow(uint64_t op1, double op2) {
if (op1 == 0 || op2 <= 0) {
return 0;
}
// Buggy check!
if (std::numeric_limits<uint64_t>::max() / op1 < op2) {
return op1;
}
return static_cast<uint64_t>(op1 * op2);
}
Such check does not catch the troublesome case where op1 * op2 = std::numeric_limits<uint64_t>::max(). It's troublesome because op2 is double and op1 * op2 will be promoted to double first. std::numeric_limits<uint64_t>::max() can't be precisely represented in double and can be rounded up to the next precisely represented double 2^64 during the promotion, if rounding up has less error than rounding down. Therefore the rounded-up promotion result will be greater than std::numeric_limits<uint64_t>::max() hence casting error such as in https://github.com/facebook/rocksdb/actions/runs/19486118069/job/55768715545?pr=14059
For this case, fixing < to be <= is enough (see #14132 for the fix)
But I’m worried about op1 * op2 equals a number slightly smaller than std::numeric_limits<uint64_t>::max() but still can’t precisely represent and get rounded up to 2^64 similarly. The std::numeric_limits<uint64_t>::max() / op1 <= op2 doesn’t seem enough to catch this case. Before I figure out whether this case is possible, I will use the old/existing value 999999999999999 instead of making it bigger (or more likely to hit this bug if this bug exists).
Option migration is on my TODO list to add to stress/crash test. FIFO is more challenging but possible. |
| return s; | ||
| } | ||
|
|
||
| // Step 2: Prepare no-compaction CF descriptors |
There was a problem hiding this comment.
Confused by the naming here, what does PrepareNoCompaction* mean?
There was a problem hiding this comment.
It means creating CF options that lead to no compaction happening in that CF to rely on manual compaction.
| cro.target_level = dest_level; | ||
|
|
||
| if (dest_level == 0) { | ||
| // cannot use kForceOptimized because the compaction is expected to |
There was a problem hiding this comment.
Do you know why do we want a single L0 file?
There was a problem hiding this comment.
Looking at the history 420bdb4, it was to prevent trivial move to L0 violating assertion. Will add some note.
|
|
||
| // Step 6: Reopen DB if needed to rewrite manifest | ||
| if (s.ok() && any_need_reopen) { | ||
| db.reset(); |
There was a problem hiding this comment.
Fixed - had to fix a few more places here where reset() has been used to close the db.
49e423e to
3436d15
Compare
Context/Summary:
This PR adds multi-cf support to option migration. The original implementation sets options, opens db, compacts files and reopens the db in almost all the three branches below. Such design makes expanding to multi-cf difficult as it needs to change all these places within each of the branch causing code redundancy.
Therefore this PR
OptionChangeMigration()throughPrepareNoCompactionCFDescriptors()andOpenDBWithCFs()soMigrateAllCFs()can focus on compaction only.A few notes:
ApplySpecialSingleLevelSettings()inPrepareNoCompactionCFDescriptors()return CompactToLevel(new_opts, dbname, new_opts.num_levels - 1,/*l0_file_size=*/0, false);.PrepareNoCompactionCFDescriptors()is where we handle those decisions.Test plan: