Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata#14122
Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata#14122cbi42 wants to merge 5 commits intofacebook:mainfrom
Conversation
|
@cbi42 thanks, curious how far behind are you going to backport this fix? |
The strategy I discussed with @cbi42 was to just fix-forward and any backport would be simply relaxing the check to allow the keys occurring in the manifest, perhaps by simply inserting an enumerator in the ValueType enum before kMaxValue. (We aren't too worried about outside uses of PreferredSeqno value types being caught up in such a relaxation; it's a niche feature.) And btw, nothing in (this draft of) this PR fixes the problem with reading existing DBs. |
Thanks for the context, @pdillinger. If I am understanding this correctly, the "problem with reading existing DBs" will be fixed by #14127. Some questions:
|
The "problem with reading existing DBs" applies to only older versions of RocksDB that don't recognize the new enum. They can be fixed by backporting new enum types (specifically kTypeValuePreferredSeqno and kTypeColumnFamilyValuePreferredSeqno). We won't be backporting to all versions, but we can help to backport to the versions that you use, e.g. 8.11. |
…ypeMaxValid in file metadata (facebook#14122)" This reverts commit 8c7c8b8.
…ypeMaxValid in file metadata (facebook#14122)" Add a new unit test to capture the situation found by stress test This reverts commit 8c7c8b8.
Summary: Revert "Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata (#14122)" Add a new unit test to capture the situation found by stress test This reverts commit 8c7c8b8. Pull Request resolved: #14170 Test Plan: Unit Test Reviewed By: anand1976 Differential Revision: D88395956 Pulled By: xingbowang fbshipit-source-id: 226649dc79a86010ad326ffb2eae35109dc96bc4
Summary: Revert "Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata (#14122)" Add a new unit test to capture the situation found by stress test This reverts commit 8c7c8b8. Pull Request resolved: #14170 Test Plan: Unit Test Reviewed By: anand1976 Differential Revision: D88395956 Pulled By: xingbowang fbshipit-source-id: 226649dc79a86010ad326ffb2eae35109dc96bc4
…etion boundaries (#14184) Summary: **Context/Summary:** Truncated range deletion in input files can be output by CompactionIterator with type kMaxValid instead of kTypeRangeDeletion, to satisfy ordering requirement between the truncated range deletion start key and a file's point keys. There was a plan to skip such key in #14122 but blockers remain to fulfill the plan. Resumable compaction is not able to handle resumption from range deletion well at this point and should consider kMaxValid type same as kTypeRangeDeletion for resumption. Previously, it didn't and mistakenly allow resumption from a delete range. That led to an assertion failure, complaining about lacking information to update file boundaries in the presence of range deletion needed during cutting an output file, after the compaction resumes from that delete range and happens to cut the output file shortly after without any point keys in between. ``` frame #9: 0x00007f4f4743bc93 libc.so.6`__GI___assert_fail(assertion="meta.smallest.size() > 0", file="db/compaction/compaction_outputs.cc", line=530, function="rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&)") at assert.c:101:3 frame #10: 0x00007f4f4808c68c librocksdb.so.10.9`rocksdb::CompactionOutputs::AddRangeDels(this=0x00007f4f0c27e1a0, range_del_agg=0x00007f4f0c21ecc0, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, range_del_out_stats=0x00007f4f0dffa140, bottommost_level=false, icmp=0x00007f4ef4c93040, earliest_snapshot=13108729, keep_seqno_range=<unavailable>, next_table_min_key=0x00007f4ef4c8f540, full_history_ts_low="") at compaction_outputs.cc:530:7 frame #11: 0x00007f4f480480dd librocksdb.so.10.9`rocksdb::CompactionJob::FinishCompactionOutputFile(this=0x00007f4f0dffb890, input_status=<unavailable>, prev_table_last_internal_key=0x00007f4f0dffa650, next_table_min_key=0x00007f4ef4c8f540, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, c_iter=0x00007f4ef4c8f400, sub_compact=0x00007f4f0c27e000, outputs=0x00007f4f0c27e1a0) at compaction_job.cc:1917:31 ``` This PR simply prevents MaxValid from being a resumption point like regular range deletion - see commit 842d66e Besides that, the PR also improves the testing, variable naming, logging in resumable compaction codes that were needed to debug this assertion failure - see commit aecd4e7. These improvements are covered by existing tests. Pull Request resolved: #14184 Test Plan: - The stress initially surfaced the error. Using the exact same LSM shapes and files that were used in stress test but in a unit test, I'm able to get a deterministic repro and confirmed the fix resolves the error. This is the repro test hx235@1075936 ``` ./compaction_service_test --gtest_filter=ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume # Before fix [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from ResumableCompactionServiceTest [ RUN ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume compaction_service_test: db/compaction/compaction_outputs.cc:530: rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&): Assertion `meta.smallest.size() > 0' failed. Received signal 6 (Aborted) Invoking GDB for stack trace... [New LWP 2621610] [New LWP 2621611] [New LWP 2621612] [New LWP 2621613] [New LWP 2621614] [New LWP 2621630] [New LWP 2621631] # After fix Note: Google Test filter = ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from ResumableCompactionServiceTest [ RUN ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume [ OK ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume (4722 ms) [----------] 1 test from ResumableCompactionServiceTest (4722 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (4722 ms total) [ PASSED ] 1 test. ``` - Follow-up: I tried a couple time to coerce the truncated range delete from scratch in the unit test but failed doing so. Considering kMaxValid may not be outputted by compaction iterator anymore after https://github.com/facebook/rocksdb/pull/14122/files gets landed again (and obsolete the bug) ADN the simple nature of this fix 842d66e AND the worst case of such fix going wrong is just less resumption, I decided to leave writing a unit test to coerce truncated ranged deletion from scratch a follow-up. Maybe I will draw inspiration from https://github.com/facebook/rocksdb/pull/14122/files. Reviewed By: jaykorean Differential Revision: D88912663 Pulled By: hx235 fbshipit-source-id: 80a01135684c8fea659650faaa00c2dc452c482a
…etion boundaries (facebook#14184) Summary: **Context/Summary:** Truncated range deletion in input files can be output by CompactionIterator with type kMaxValid instead of kTypeRangeDeletion, to satisfy ordering requirement between the truncated range deletion start key and a file's point keys. There was a plan to skip such key in facebook#14122 but blockers remain to fulfill the plan. Resumable compaction is not able to handle resumption from range deletion well at this point and should consider kMaxValid type same as kTypeRangeDeletion for resumption. Previously, it didn't and mistakenly allow resumption from a delete range. That led to an assertion failure, complaining about lacking information to update file boundaries in the presence of range deletion needed during cutting an output file, after the compaction resumes from that delete range and happens to cut the output file shortly after without any point keys in between. ``` frame facebook#9: 0x00007f4f4743bc93 libc.so.6`__GI___assert_fail(assertion="meta.smallest.size() > 0", file="db/compaction/compaction_outputs.cc", line=530, function="rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&)") at assert.c:101:3 frame facebook#10: 0x00007f4f4808c68c librocksdb.so.10.9`rocksdb::CompactionOutputs::AddRangeDels(this=0x00007f4f0c27e1a0, range_del_agg=0x00007f4f0c21ecc0, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, range_del_out_stats=0x00007f4f0dffa140, bottommost_level=false, icmp=0x00007f4ef4c93040, earliest_snapshot=13108729, keep_seqno_range=<unavailable>, next_table_min_key=0x00007f4ef4c8f540, full_history_ts_low="") at compaction_outputs.cc:530:7 frame facebook#11: 0x00007f4f480480dd librocksdb.so.10.9`rocksdb::CompactionJob::FinishCompactionOutputFile(this=0x00007f4f0dffb890, input_status=<unavailable>, prev_table_last_internal_key=0x00007f4f0dffa650, next_table_min_key=0x00007f4ef4c8f540, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, c_iter=0x00007f4ef4c8f400, sub_compact=0x00007f4f0c27e000, outputs=0x00007f4f0c27e1a0) at compaction_job.cc:1917:31 ``` This PR simply prevents MaxValid from being a resumption point like regular range deletion - see commit 842d66e Besides that, the PR also improves the testing, variable naming, logging in resumable compaction codes that were needed to debug this assertion failure - see commit facebook@aecd4e7. These improvements are covered by existing tests. Pull Request resolved: facebook#14184 Test Plan: - The stress initially surfaced the error. Using the exact same LSM shapes and files that were used in stress test but in a unit test, I'm able to get a deterministic repro and confirmed the fix resolves the error. This is the repro test 1075936 ``` ./compaction_service_test --gtest_filter=ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume # Before fix [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from ResumableCompactionServiceTest [ RUN ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume compaction_service_test: db/compaction/compaction_outputs.cc:530: rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&): Assertion `meta.smallest.size() > 0' failed. Received signal 6 (Aborted) Invoking GDB for stack trace... [New LWP 2621610] [New LWP 2621611] [New LWP 2621612] [New LWP 2621613] [New LWP 2621614] [New LWP 2621630] [New LWP 2621631] # After fix Note: Google Test filter = ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from ResumableCompactionServiceTest [ RUN ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume [ OK ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume (4722 ms) [----------] 1 test from ResumableCompactionServiceTest (4722 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (4722 ms total) [ PASSED ] 1 test. ``` - Follow-up: I tried a couple time to coerce the truncated range delete from scratch in the unit test but failed doing so. Considering kMaxValid may not be outputted by compaction iterator anymore after https://github.com/facebook/rocksdb/pull/14122/files gets landed again (and obsolete the bug) ADN the simple nature of this fix 842d66e AND the worst case of such fix going wrong is just less resumption, I decided to leave writing a unit test to coerce truncated ranged deletion from scratch a follow-up. Maybe I will draw inspiration from https://github.com/facebook/rocksdb/pull/14122/files. Reviewed By: jaykorean Differential Revision: D88912663 Pulled By: hx235 fbshipit-source-id: 80a01135684c8fea659650faaa00c2dc452c482a
Summary: Range deletion start keys are considered during compaction for cutting output files. Due to some ordering requirement (see comment above InsertNextValidRangeTombstoneAtLevel()) between truncated range deletion start key and a file's point keys, there was logic in
rocksdb/db/range_del_aggregator.cc
Line 39 in f6c9c3b
rocksdb/db/dbformat.h
Lines 75 to 76 in f6c9c3b
For existing SST files, we want to avoid using this kTypeMaxValid, so this PR also introduces a new placeholder value type. This allows us to re-strengthen the relevant value type checks (IsExtendedValueType()) that was loosen for kTypeMaxValid.
Test plan:
python3 ./tools/db_crashtest.py blackbox --delrangepercent=11 --readpercent=35