Skip to content

Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata#14122

Closed
cbi42 wants to merge 5 commits intofacebook:mainfrom
cbi42:fix-range-del-file-boundary
Closed

Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata#14122
cbi42 wants to merge 5 commits intofacebook:mainfrom
cbi42:fix-range-del-file-boundary

Conversation

@cbi42
Copy link
Contributor

@cbi42 cbi42 commented Nov 13, 2025

Summary: Range deletion start keys are considered during compaction for cutting output files. Due to some ordering requirement (see comment above InsertNextValidRangeTombstoneAtLevel()) between truncated range deletion start key and a file's point keys, there was logic in

parsed_smallest.type = kTypeMaxValid;
that changes the value type to be kTypeMaxValid. However, kTypeMaxValid is not supposed to be persisted per

rocksdb/db/dbformat.h

Lines 75 to 76 in f6c9c3b

kTypeMaxValid, // Should be after the last valid type, only used for
// validation
. This can cause forward compatibility issues reported in #14101. This PR fixes this issue by removing the logic that sets kTypeMaxValid and always skip truncated range deletion start key in CompactionMergingIterator.

For existing SST files, we want to avoid using this kTypeMaxValid, so this PR also introduces a new placeholder value type. This allows us to re-strengthen the relevant value type checks (IsExtendedValueType()) that was loosen for kTypeMaxValid.

Test plan:

  • a unit test that persists kTypeMaxValid before this fix
  • crash test with frequent range deletion: python3 ./tools/db_crashtest.py blackbox --delrangepercent=11 --readpercent=35
  • Generate SST files with 0x1A as value type (kTypeMaxValid before this change) in file metadata. Run ldb with the strengthened check in IsExtendedValueType() to dump the MANIFEST. It failed to parse MANIFEST as expected before this PR and succeeds after this PR.
Error in processing file /tmp/rocksdbtest-543376/db_range_del_test_2549357_6547198162080866792/MANIFEST-000005 Corruption: VersionEdit: new-file4 entry  The file /tmp/rocksdbtest-543376/db_range_del_test_2549357_6547198162080866792/MANIFEST-000005 may be corrupted.

@meta-cla meta-cla bot added the CLA Signed label Nov 13, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 13, 2025

@cbi42 has imported this pull request. If you are a Meta employee, you can view this in D87016541.

@cbi42 cbi42 marked this pull request as ready for review November 14, 2025 18:39
@spraza
Copy link

spraza commented Nov 17, 2025

@cbi42 thanks, curious how far behind are you going to backport this fix?

@pdillinger
Copy link
Contributor

@cbi42 thanks, curious how far behind are you going to backport this fix?

The strategy I discussed with @cbi42 was to just fix-forward and any backport would be simply relaxing the check to allow the keys occurring in the manifest, perhaps by simply inserting an enumerator in the ValueType enum before kMaxValue. (We aren't too worried about outside uses of PreferredSeqno value types being caught up in such a relaxation; it's a niche feature.)

And btw, nothing in (this draft of) this PR fixes the problem with reading existing DBs.

@spraza
Copy link

spraza commented Nov 20, 2025

@cbi42 thanks, curious how far behind are you going to backport this fix?

The strategy I discussed with @cbi42 was to just fix-forward and any backport would be simply relaxing the check to allow the keys occurring in the manifest, perhaps by simply inserting an enumerator in the ValueType enum before kMaxValue. (We aren't too worried about outside uses of PreferredSeqno value types being caught up in such a relaxation; it's a niche feature.)

And btw, nothing in (this draft of) this PR fixes the problem with reading existing DBs.

Thanks for the context, @pdillinger.

If I am understanding this correctly, the "problem with reading existing DBs" will be fixed by #14127. Some questions:

  1. Is my understanding above correct?
  2. If so, looks like we are ok with backporting this change to 8.11 (MANIFEST file incompatible between 9.7.3 to 8.11.4 version #14101 (comment)). Is that correct?
  3. If so, how long before 8.11 can be patched?

@cbi42
Copy link
Contributor Author

cbi42 commented Nov 20, 2025

If I am understanding this correctly, the "problem with reading existing DBs" will be fixed by #14127. Some questions:

  1. Is my understanding above correct?
  2. If so, looks like we are ok with backporting this change to 8.11 (MANIFEST file incompatible between 9.7.3 to 8.11.4 version #14101 (comment)). Is that correct?
  3. If so, how long before 8.11 can be patched?

The "problem with reading existing DBs" applies to only older versions of RocksDB that don't recognize the new enum. They can be fixed by backporting new enum types (specifically kTypeValuePreferredSeqno and kTypeColumnFamilyValuePreferredSeqno).

We won't be backporting to all versions, but we can help to backport to the versions that you use, e.g. 8.11.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #14127, LGTM

@meta-codesync
Copy link

meta-codesync bot commented Nov 21, 2025

@cbi42 merged this pull request in 8c7c8b8.

xingbowang added a commit to xingbowang/rocksdb that referenced this pull request Dec 4, 2025
xingbowang added a commit to xingbowang/rocksdb that referenced this pull request Dec 4, 2025
…ypeMaxValid in file metadata (facebook#14122)"

Add a new unit test to capture the situation found by stress test

This reverts commit 8c7c8b8.
meta-codesync bot pushed a commit that referenced this pull request Dec 4, 2025
Summary:
Revert "Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata (#14122)"

Add a new unit test to capture the situation found by stress test

This reverts commit 8c7c8b8.

Pull Request resolved: #14170

Test Plan: Unit Test

Reviewed By: anand1976

Differential Revision: D88395956

Pulled By: xingbowang

fbshipit-source-id: 226649dc79a86010ad326ffb2eae35109dc96bc4
xingbowang added a commit that referenced this pull request Dec 4, 2025
Summary:
Revert "Fix a bug where compaction with range deletion can persist kTypeMaxValid in file metadata (#14122)"

Add a new unit test to capture the situation found by stress test

This reverts commit 8c7c8b8.

Pull Request resolved: #14170

Test Plan: Unit Test

Reviewed By: anand1976

Differential Revision: D88395956

Pulled By: xingbowang

fbshipit-source-id: 226649dc79a86010ad326ffb2eae35109dc96bc4
meta-codesync bot pushed a commit that referenced this pull request Dec 12, 2025
…etion boundaries (#14184)

Summary:
**Context/Summary:**

Truncated range deletion in input files can be output by CompactionIterator with type kMaxValid instead of kTypeRangeDeletion, to satisfy ordering requirement between the truncated range deletion start key and a file's point keys. There was a plan to skip such key in #14122 but blockers remain to fulfill the plan.

Resumable compaction is not able to handle resumption from range deletion well at this point and should consider kMaxValid type same as kTypeRangeDeletion for resumption. Previously, it didn't and mistakenly allow resumption from a delete range. That led to an assertion failure, complaining about lacking information to update file boundaries in the presence of range deletion needed during cutting an output file, after the compaction resumes from that delete range and happens to cut the output file shortly after without any point keys in between.

```
frame #9: 0x00007f4f4743bc93 libc.so.6`__GI___assert_fail(assertion="meta.smallest.size() > 0", file="db/compaction/compaction_outputs.cc", line=530, function="rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&)") at assert.c:101:3
frame #10: 0x00007f4f4808c68c librocksdb.so.10.9`rocksdb::CompactionOutputs::AddRangeDels(this=0x00007f4f0c27e1a0, range_del_agg=0x00007f4f0c21ecc0, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, range_del_out_stats=0x00007f4f0dffa140, bottommost_level=false, icmp=0x00007f4ef4c93040, earliest_snapshot=13108729, keep_seqno_range=<unavailable>, next_table_min_key=0x00007f4ef4c8f540, full_history_ts_low="") at compaction_outputs.cc:530:7
frame #11: 0x00007f4f480480dd librocksdb.so.10.9`rocksdb::CompactionJob::FinishCompactionOutputFile(this=0x00007f4f0dffb890, input_status=<unavailable>, prev_table_last_internal_key=0x00007f4f0dffa650, next_table_min_key=0x00007f4ef4c8f540, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, c_iter=0x00007f4ef4c8f400, sub_compact=0x00007f4f0c27e000, outputs=0x00007f4f0c27e1a0) at compaction_job.cc:1917:31
```

This PR simply prevents  MaxValid from being a resumption point like regular range deletion - see commit 842d66e

Besides that, the PR also improves the testing, variable naming, logging in resumable compaction codes that were needed to debug this assertion failure - see commit aecd4e7. These improvements are covered by existing tests.

Pull Request resolved: #14184

Test Plan:
- The stress initially surfaced the error. Using the exact same LSM shapes and files that were used in stress test but in a unit test, I'm able to get a deterministic repro and confirmed the fix resolves the error.  This is the repro test hx235@1075936
```
./compaction_service_test --gtest_filter=ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
# Before fix
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ResumableCompactionServiceTest
[ RUN      ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
compaction_service_test: db/compaction/compaction_outputs.cc:530: rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&): Assertion `meta.smallest.size() > 0' failed.
Received signal 6 (Aborted)
Invoking GDB for stack trace...
[New LWP 2621610]
[New LWP 2621611]
[New LWP 2621612]
[New LWP 2621613]
[New LWP 2621614]
[New LWP 2621630]
[New LWP 2621631]

# After fix
Note: Google Test filter = ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ResumableCompactionServiceTest
[ RUN      ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
[       OK ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume (4722 ms)
[----------] 1 test from ResumableCompactionServiceTest (4722 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4722 ms total)
[  PASSED  ] 1 test.

```
- Follow-up: I tried a couple time to coerce the truncated range delete from scratch in the unit test but failed doing so. Considering kMaxValid may not be outputted by compaction iterator anymore after https://github.com/facebook/rocksdb/pull/14122/files gets landed again (and obsolete the bug) ADN the simple nature of this fix 842d66e AND the worst case of such fix going wrong is just less resumption, I decided to leave writing a unit test to coerce truncated ranged deletion from scratch a follow-up. Maybe I will draw inspiration from https://github.com/facebook/rocksdb/pull/14122/files.

Reviewed By: jaykorean

Differential Revision: D88912663

Pulled By: hx235

fbshipit-source-id: 80a01135684c8fea659650faaa00c2dc452c482a
hx235 added a commit to hx235/rocksdb that referenced this pull request Dec 12, 2025
…etion boundaries (facebook#14184)

Summary:
**Context/Summary:**

Truncated range deletion in input files can be output by CompactionIterator with type kMaxValid instead of kTypeRangeDeletion, to satisfy ordering requirement between the truncated range deletion start key and a file's point keys. There was a plan to skip such key in facebook#14122 but blockers remain to fulfill the plan.

Resumable compaction is not able to handle resumption from range deletion well at this point and should consider kMaxValid type same as kTypeRangeDeletion for resumption. Previously, it didn't and mistakenly allow resumption from a delete range. That led to an assertion failure, complaining about lacking information to update file boundaries in the presence of range deletion needed during cutting an output file, after the compaction resumes from that delete range and happens to cut the output file shortly after without any point keys in between.

```
frame facebook#9: 0x00007f4f4743bc93 libc.so.6`__GI___assert_fail(assertion="meta.smallest.size() > 0", file="db/compaction/compaction_outputs.cc", line=530, function="rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&)") at assert.c:101:3
frame facebook#10: 0x00007f4f4808c68c librocksdb.so.10.9`rocksdb::CompactionOutputs::AddRangeDels(this=0x00007f4f0c27e1a0, range_del_agg=0x00007f4f0c21ecc0, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, range_del_out_stats=0x00007f4f0dffa140, bottommost_level=false, icmp=0x00007f4ef4c93040, earliest_snapshot=13108729, keep_seqno_range=<unavailable>, next_table_min_key=0x00007f4ef4c8f540, full_history_ts_low="") at compaction_outputs.cc:530:7
frame facebook#11: 0x00007f4f480480dd librocksdb.so.10.9`rocksdb::CompactionJob::FinishCompactionOutputFile(this=0x00007f4f0dffb890, input_status=<unavailable>, prev_table_last_internal_key=0x00007f4f0dffa650, next_table_min_key=0x00007f4ef4c8f540, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, c_iter=0x00007f4ef4c8f400, sub_compact=0x00007f4f0c27e000, outputs=0x00007f4f0c27e1a0) at compaction_job.cc:1917:31
```

This PR simply prevents  MaxValid from being a resumption point like regular range deletion - see commit 842d66e

Besides that, the PR also improves the testing, variable naming, logging in resumable compaction codes that were needed to debug this assertion failure - see commit facebook@aecd4e7. These improvements are covered by existing tests.

Pull Request resolved: facebook#14184

Test Plan:
- The stress initially surfaced the error. Using the exact same LSM shapes and files that were used in stress test but in a unit test, I'm able to get a deterministic repro and confirmed the fix resolves the error.  This is the repro test 1075936
```
./compaction_service_test --gtest_filter=ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
# Before fix
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ResumableCompactionServiceTest
[ RUN      ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
compaction_service_test: db/compaction/compaction_outputs.cc:530: rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&): Assertion `meta.smallest.size() > 0' failed.
Received signal 6 (Aborted)
Invoking GDB for stack trace...
[New LWP 2621610]
[New LWP 2621611]
[New LWP 2621612]
[New LWP 2621613]
[New LWP 2621614]
[New LWP 2621630]
[New LWP 2621631]

# After fix
Note: Google Test filter = ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ResumableCompactionServiceTest
[ RUN      ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
[       OK ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume (4722 ms)
[----------] 1 test from ResumableCompactionServiceTest (4722 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4722 ms total)
[  PASSED  ] 1 test.

```
- Follow-up: I tried a couple time to coerce the truncated range delete from scratch in the unit test but failed doing so. Considering kMaxValid may not be outputted by compaction iterator anymore after https://github.com/facebook/rocksdb/pull/14122/files gets landed again (and obsolete the bug) ADN the simple nature of this fix 842d66e AND the worst case of such fix going wrong is just less resumption, I decided to leave writing a unit test to coerce truncated ranged deletion from scratch a follow-up. Maybe I will draw inspiration from https://github.com/facebook/rocksdb/pull/14122/files.

Reviewed By: jaykorean

Differential Revision: D88912663

Pulled By: hx235

fbshipit-source-id: 80a01135684c8fea659650faaa00c2dc452c482a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants