DROP support for block-based SST format_version < 2#14315
Closed
pdillinger wants to merge 3 commits intofacebook:mainfrom
Closed
DROP support for block-based SST format_version < 2#14315pdillinger wants to merge 3 commits intofacebook:mainfrom
pdillinger wants to merge 3 commits intofacebook:mainfrom
Conversation
Summary: ... and remove some old code and tech debt in the process. This is arguably a great milestone and precendent in RocksDB history as for the first time we are explicitly dropping support for the ability to read source-of-truth data in old formats. (We previously dropped support for reading some old bloom filters, but those are performance optimizers not source-of-truth. facebook#10184) However, DBs written with default settings since release 4.6.0, which is very nearly 10 years ago, can still be read. And by using compaction with intermediate versions, there's an upgrade path going back to (AFAIK) early releases of LevelDB (from which RocksDB was forked). Some detail: * The magic number for LevelDB SST files (0xdb4775248b80fb57, most recently called kLegacyBlockBasedTableMagicNumber) now only exists in the code to provide a good error message and to test that good error message. * There is some notable refactoring and renaming around format_version handling. This is a bit of a messy area of code because the footer code being shared between different table formats (block-based, plain, cuckoo) means format_version in the footer is in ways tied to all of them, but in other ways is just tied to block-based table where we have been making updates. Hopefully code comments keep this clear. * Now that there are old format_versions we can't read (and can't write authoritatively in tests), I've needed to split out kMinSupportedFormatVersion into a constant for reads and for writes, currently the same at format_version=2. Comments describe how to update these in the future. * The idea of versioning the compression format is basically going away, though we're keeping BuiltinV2 in places just because it's already there. There's lots of room in the BuiltinV2 schema to expand to new built-in compression types, or new ways of handling existing compression algorithms. CompressionManager with CompatibilityName gives users the power to customize compression without the need for versions tied to format_version. Immediate follow-up: * Clean up compression loose ends like OLD_Compress Suggested follow-up: * Update plain table builder to migrate to new footer version so that we can drop support for legacy footer. We have to be careful that the (likely untested) forward compatibility path I put in place a while back works (or fix it and wait a while) before dropping support for plain table with legacy footer. Test Plan: * Some tests updated / added * A couple tests are obsolete: removed * Also updated format compatible test, which now doesn't need to dig as far back into history building RocksDB.
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D92577766. |
hx235
reviewed
Feb 10, 2026
| // When phasing out old format versions, first increase the write minimum, | ||
| // then later increase the read minimum when removing the implementation for | ||
| // both read and write. | ||
| constexpr uint32_t kMinSupportedBbtFormatVersionForWrite = 2; |
Contributor
There was a problem hiding this comment.
is it worth adding an assert on kMinSupportedBbtFormatVersionForWrite >= kMinSupportedBbtFormatVersionForRead? The InitializeOptions() depends on this invariant.
hx235
reviewed
Feb 10, 2026
| inline bool IsSupportedFormatVersion(uint32_t version) { | ||
| return version <= kLatestFormatVersion; | ||
| // Minimum format version supported for writing new SST files in block-based | ||
| // format. This should be >= kMinSupportedFormatVersionForRead. |
Contributor
There was a problem hiding this comment.
">= kMinSupportedBbtFormatVersionForRead"?
|
@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D92577766. |
|
@pdillinger merged this pull request in d8b1893. |
pdillinger
added a commit
to pdillinger/rocksdb
that referenced
this pull request
Feb 11, 2026
Summary: See facebook#14240 which brought this to my attention. Here I've added range deletions and compactions to the format compatible test, and fixed (likely longstanding) compatibility issues. The first fix was in Version::MaybeInitializeFileMetaData for an assertion failure simply from adding range deletions from some 5.x version. The second fix is a broader work-around for older SST files with unreliable num_entries/num_range_deletions/num_deletions statistics in their table properties. We depend on them only for some paranoid checks for compaction, so in my assessment the best way to deal with those files is to exclude the paranoid checks when dealing with the files with unrelaible data. (Details in code comments.) The important part is that compacting old files is exceptionally rare, so we aren't really interefering with the paranoid checks doing thier job on an ongoing basis. This depends on facebook#14315 (just landed) because there is a remaining undiagnosed problem with some very early releases, but I'm not fixing that because its support is being dropped. Test Plan: test extended (ran locally excluding some releases)
pdillinger
added a commit
to pdillinger/rocksdb
that referenced
this pull request
Feb 12, 2026
Summary: In follow-up to facebook#14315 Remove obsolete code replaced by new Compressor/Decompressor interface: * CompressionInfo and UncompressionInfo classes * UncompressionDict class * Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*, LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress) * OLD_CompressData and OLD_UncompressData * compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo The only small refactoring in this change that is not pure code removal or movement is in blob_file_builder_test.cc. Move some function implementations etc. from compression.h to compression.cc: * CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString * ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary * DecompressorDict::Populate * Most compression library includes Also cleaned up other includes of compression.h, which caused some other files to need new includes. Test Plan: existing tests
pdillinger
added a commit
to pdillinger/rocksdb
that referenced
this pull request
Feb 12, 2026
Summary: In follow-up to facebook#14315 Remove obsolete code replaced by new Compressor/Decompressor interface: * CompressionInfo and UncompressionInfo classes * UncompressionDict class * Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*, LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress) * OLD_CompressData and OLD_UncompressData * compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo The only small refactoring in this change that is not pure code removal or movement is in blob_file_builder_test.cc. Move some function implementations etc. from compression.h to compression.cc: * CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString * ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary * DecompressorDict::Populate * Most compression library includes Also cleaned up other includes of compression.h, which caused some other files to need new includes. Test Plan: existing tests
pdillinger
added a commit
to pdillinger/rocksdb
that referenced
this pull request
Feb 12, 2026
…ok#14315) Summary: After PR facebook#14315 dropped support for block-based table format_version < 2, several code paths became obsolete. This change removes them. Investigation findings: 1. Table properties are now a hard requirement for block-based SST files: - format_version >= 2 guarantees a properties block exists - Removed defensive conditionals like `if (rep_->table_properties)` - Missing properties block now returns Status::Corruption instead of just logging an error. This is important because some properties affect the semantic interpretation of the file. 2. Index type property (kIndexType) is now required: - kIndexType was introduced in Feb 2014 (commit 74939a9), ~11 months BEFORE format_version was introduced in Jan 2015 - BlockBasedTablePropertiesCollector::Finish() has always written kIndexType unconditionally for all block-based tables - Therefore all format_version >= 2 files have this property - Now returns Status::Corruption if missing instead of silently defaulting to kBinarySearch 3. Removed SetOldTableOptions() from sst_file_dumper: - This fallback handled files without a properties block - Dead code since format_version >= 2 guarantees properties exist 4. Removed kPropertiesBlockOldName ("rocksdb.stats") fallback: - The properties block was renamed from "rocksdb.stats" to "rocksdb.properties" in RocksDB 2.7 (April 2014) - format_version 2 was introduced in RocksDB 3.10 (Oct 2015) - All table formats (block-based, plain, cuckoo) were created after the rename, so they all use "rocksdb.properties" - The backward compatibility fallback in FindOptionalMetaBlock() was dead code for all supported table formats 5. Removed obsolete assertion about format_version 0 checksum in BlockBasedTableBuilder::WriteFooter() Test Plan: some tests updated for updated requirements. Mostly, CI including format compatible test
meta-codesync Bot
pushed a commit
that referenced
this pull request
Feb 13, 2026
Summary: See #14240 which brought this to my attention. Here I've added range deletions and compactions to the format compatible test, and fixed or worked-around compatibility issues (likely longstanding). The first fix was in Version::MaybeInitializeFileMetaData for an assertion failure simply from adding range deletions from some 5.x version. The second fix is a broader work-around for older SST files with unreliable num_entries/num_range_deletions/num_deletions statistics in their table properties. We depend on them only for some paranoid checks for compaction, so in my assessment the best way to deal with those files is to exclude the paranoid checks when dealing with the files with unrelaible data. (Details in code comments.) The important part is that compacting old files is exceptionally rare, so we aren't really interefering with the paranoid checks doing thier job on an ongoing basis. This depends on #14315 (just landed) because there is a remaining undiagnosed problem with some very early releases, but I'm not fixing that because its support is being dropped. Pull Request resolved: #14323 Test Plan: test extended (ran locally excluding some releases) Reviewed By: xingbowang Differential Revision: D93032653 Pulled By: pdillinger fbshipit-source-id: f90b32f30ba4764692e68d23705f42c778e0dc1d
meta-codesync Bot
pushed a commit
that referenced
this pull request
Feb 13, 2026
Summary: In follow-up to #14315 Remove obsolete code replaced by new Compressor/Decompressor interface: * OLD_CompressData and OLD_UncompressData * Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*, LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress) * CompressionInfo and UncompressionInfo classes * UncompressionDict class * compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo The only small refactoring in this change that is not pure code removal or movement is in blob_file_builder_test.cc. Move some function implementations etc. from compression.h to compression.cc: * CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString * ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary * DecompressorDict::Populate * Most compression library includes Also cleaned up other includes of compression.h, which caused some other files to need new includes. Pull Request resolved: #14325 Test Plan: existing tests Reviewed By: hx235 Differential Revision: D93120580 Pulled By: pdillinger fbshipit-source-id: ab5c50db7379c0387a8c0e379642c9ea2799eae5
pdillinger
added a commit
that referenced
this pull request
Feb 16, 2026
Summary: See #14240 which brought this to my attention. Here I've added range deletions and compactions to the format compatible test, and fixed or worked-around compatibility issues (likely longstanding). The first fix was in Version::MaybeInitializeFileMetaData for an assertion failure simply from adding range deletions from some 5.x version. The second fix is a broader work-around for older SST files with unreliable num_entries/num_range_deletions/num_deletions statistics in their table properties. We depend on them only for some paranoid checks for compaction, so in my assessment the best way to deal with those files is to exclude the paranoid checks when dealing with the files with unrelaible data. (Details in code comments.) The important part is that compacting old files is exceptionally rare, so we aren't really interefering with the paranoid checks doing thier job on an ongoing basis. This depends on #14315 (just landed) because there is a remaining undiagnosed problem with some very early releases, but I'm not fixing that because its support is being dropped. Pull Request resolved: #14323 Test Plan: test extended (ran locally excluding some releases) Reviewed By: xingbowang Differential Revision: D93032653 Pulled By: pdillinger fbshipit-source-id: f90b32f30ba4764692e68d23705f42c778e0dc1d
meta-codesync Bot
pushed a commit
that referenced
this pull request
Feb 18, 2026
#14327) Summary: After PR #14315 dropped support for block-based table format_version < 2, several code paths became obsolete. This change removes them. Investigation findings: 1. Table properties are now a hard requirement for block-based SST files: - format_version >= 2 guarantees a properties block exists - Removed defensive conditionals like `if (rep_->table_properties)` - Missing properties block now returns Status::Corruption instead of just logging an error. This is important because some properties affect the semantic interpretation of the file. 2. Index type property (kIndexType) is now required: - kIndexType was introduced in Feb 2014 (commit 74939a9), ~11 months BEFORE format_version was introduced in Jan 2015 - BlockBasedTablePropertiesCollector::Finish() has always written kIndexType unconditionally for all block-based tables - Therefore all format_version >= 2 files have this property - Now returns Status::Corruption if missing instead of silently defaulting to kBinarySearch 3. Removed SetOldTableOptions() from sst_file_dumper: - This fallback handled files without a properties block - Dead code since format_version >= 2 guarantees properties exist 4. Removed kPropertiesBlockOldName ("rocksdb.stats") fallback: - The properties block was renamed from "rocksdb.stats" to "rocksdb.properties" in RocksDB 2.7 (April 2014) - format_version 2 was introduced in RocksDB 3.10 (Oct 2015) - All table formats (block-based, plain, cuckoo) were created after the rename, so they all use "rocksdb.properties" - The backward compatibility fallback in FindOptionalMetaBlock() was dead code for all supported table formats 5. Removed obsolete assertion about format_version 0 checksum in BlockBasedTableBuilder::WriteFooter() Pull Request resolved: #14327 Test Plan: some tests updated for updated requirements. Mostly, CI including format compatible test Reviewed By: mszeszko-meta Differential Revision: D93124820 Pulled By: pdillinger fbshipit-source-id: eb12cbdca0e69f34a08051d5160c282384128a4a
doxtop
pushed a commit
to flyingw/rocksdb
that referenced
this pull request
Apr 7, 2026
Summary: ... and remove some old code and tech debt in the process. This is arguably a great milestone and precendent in RocksDB history as for the first time we are explicitly dropping support for the ability to read source-of-truth data in old formats. (We previously dropped support for reading some old bloom filters, but those are performance optimizers not source-of-truth. facebook#10184) However, DBs written with default settings since release 4.6.0, which is very nearly 10 years ago, can still be read. And by using compaction with intermediate versions, there's an upgrade path going back to (AFAIK) early releases of LevelDB (from which RocksDB was forked). Some detail: * The magic number for LevelDB SST files (0xdb4775248b80fb57, most recently called kLegacyBlockBasedTableMagicNumber) now only exists in the code to provide a good error message and to test that good error message. * There is some notable refactoring and renaming around format_version handling. This is a bit of a messy area of code because the footer code being shared between different table formats (block-based, plain, cuckoo) means format_version in the footer is in ways tied to all of them, but in other ways is just tied to block-based table where we have been making updates. Hopefully code comments keep this clear. * Now that there are old format_versions we can't read (and can't write authoritatively in tests), I've needed to split out kMinSupportedFormatVersion into a constant for reads and for writes, currently the same at format_version=2. Comments describe how to update these in the future. * The idea of versioning the compression format is basically going away, though we're keeping BuiltinV2 in places just because it's already there. There's lots of room in the BuiltinV2 schema to expand to new built-in compression types, or new ways of handling existing compression algorithms. CompressionManager with CompatibilityName gives users the power to customize compression without the need for versions tied to format_version. Immediate follow-up: * Clean up compression loose ends like OLD_Compress, OLD_Uncompress Suggested follow-up: * Update plain table builder to migrate to new footer version so that we can drop support for legacy footer. We have to be careful that the (likely untested) forward compatibility path I put in place a while back works (or fix it and wait a while) before dropping support for plain table with legacy footer. Pull Request resolved: facebook#14315 Test Plan: * Some tests updated / added * A couple tests are obsolete: removed * Also updated format compatible test, which now doesn't need to dig as far back into history building RocksDB. Reviewed By: hx235 Differential Revision: D92577766 Pulled By: pdillinger fbshipit-source-id: a23be846189d901ce087af4ca9a99cef18445cb7
doxtop
pushed a commit
to flyingw/rocksdb
that referenced
this pull request
Apr 7, 2026
Summary: See facebook#14240 which brought this to my attention. Here I've added range deletions and compactions to the format compatible test, and fixed or worked-around compatibility issues (likely longstanding). The first fix was in Version::MaybeInitializeFileMetaData for an assertion failure simply from adding range deletions from some 5.x version. The second fix is a broader work-around for older SST files with unreliable num_entries/num_range_deletions/num_deletions statistics in their table properties. We depend on them only for some paranoid checks for compaction, so in my assessment the best way to deal with those files is to exclude the paranoid checks when dealing with the files with unrelaible data. (Details in code comments.) The important part is that compacting old files is exceptionally rare, so we aren't really interefering with the paranoid checks doing thier job on an ongoing basis. This depends on facebook#14315 (just landed) because there is a remaining undiagnosed problem with some very early releases, but I'm not fixing that because its support is being dropped. Pull Request resolved: facebook#14323 Test Plan: test extended (ran locally excluding some releases) Reviewed By: xingbowang Differential Revision: D93032653 Pulled By: pdillinger fbshipit-source-id: f90b32f30ba4764692e68d23705f42c778e0dc1d
doxtop
pushed a commit
to flyingw/rocksdb
that referenced
this pull request
Apr 7, 2026
…14325) Summary: In follow-up to facebook#14315 Remove obsolete code replaced by new Compressor/Decompressor interface: * OLD_CompressData and OLD_UncompressData * Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*, LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress) * CompressionInfo and UncompressionInfo classes * UncompressionDict class * compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo The only small refactoring in this change that is not pure code removal or movement is in blob_file_builder_test.cc. Move some function implementations etc. from compression.h to compression.cc: * CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString * ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary * DecompressorDict::Populate * Most compression library includes Also cleaned up other includes of compression.h, which caused some other files to need new includes. Pull Request resolved: facebook#14325 Test Plan: existing tests Reviewed By: hx235 Differential Revision: D93120580 Pulled By: pdillinger fbshipit-source-id: ab5c50db7379c0387a8c0e379642c9ea2799eae5
doxtop
pushed a commit
to flyingw/rocksdb
that referenced
this pull request
Apr 7, 2026
…ok#14315) (facebook#14327) Summary: After PR facebook#14315 dropped support for block-based table format_version < 2, several code paths became obsolete. This change removes them. Investigation findings: 1. Table properties are now a hard requirement for block-based SST files: - format_version >= 2 guarantees a properties block exists - Removed defensive conditionals like `if (rep_->table_properties)` - Missing properties block now returns Status::Corruption instead of just logging an error. This is important because some properties affect the semantic interpretation of the file. 2. Index type property (kIndexType) is now required: - kIndexType was introduced in Feb 2014 (commit 74939a9), ~11 months BEFORE format_version was introduced in Jan 2015 - BlockBasedTablePropertiesCollector::Finish() has always written kIndexType unconditionally for all block-based tables - Therefore all format_version >= 2 files have this property - Now returns Status::Corruption if missing instead of silently defaulting to kBinarySearch 3. Removed SetOldTableOptions() from sst_file_dumper: - This fallback handled files without a properties block - Dead code since format_version >= 2 guarantees properties exist 4. Removed kPropertiesBlockOldName ("rocksdb.stats") fallback: - The properties block was renamed from "rocksdb.stats" to "rocksdb.properties" in RocksDB 2.7 (April 2014) - format_version 2 was introduced in RocksDB 3.10 (Oct 2015) - All table formats (block-based, plain, cuckoo) were created after the rename, so they all use "rocksdb.properties" - The backward compatibility fallback in FindOptionalMetaBlock() was dead code for all supported table formats 5. Removed obsolete assertion about format_version 0 checksum in BlockBasedTableBuilder::WriteFooter() Pull Request resolved: facebook#14327 Test Plan: some tests updated for updated requirements. Mostly, CI including format compatible test Reviewed By: mszeszko-meta Differential Revision: D93124820 Pulled By: pdillinger fbshipit-source-id: eb12cbdca0e69f34a08051d5160c282384128a4a
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary: ... and remove some old code and tech debt in the process.
This is arguably a great milestone and precendent in RocksDB history as for the first time we are explicitly dropping support for the ability to read source-of-truth data in old formats. (We previously dropped support for reading some old bloom filters, but those are performance optimizers not source-of-truth. #10184) However, DBs written with default settings since release 4.6.0, which is very nearly 10 years ago, can still be read. And by using compaction with intermediate versions, there's an upgrade path going back to (AFAIK) early releases of LevelDB (from which RocksDB was forked).
Some detail:
Immediate follow-up:
Suggested follow-up:
Test Plan: