Skip to content

DROP support for block-based SST format_version < 2#14315

Closed
pdillinger wants to merge 3 commits intofacebook:mainfrom
pdillinger:remove_fv_1
Closed

DROP support for block-based SST format_version < 2#14315
pdillinger wants to merge 3 commits intofacebook:mainfrom
pdillinger:remove_fv_1

Conversation

@pdillinger
Copy link
Copy Markdown
Contributor

Summary: ... and remove some old code and tech debt in the process.

This is arguably a great milestone and precendent in RocksDB history as for the first time we are explicitly dropping support for the ability to read source-of-truth data in old formats. (We previously dropped support for reading some old bloom filters, but those are performance optimizers not source-of-truth. #10184) However, DBs written with default settings since release 4.6.0, which is very nearly 10 years ago, can still be read. And by using compaction with intermediate versions, there's an upgrade path going back to (AFAIK) early releases of LevelDB (from which RocksDB was forked).

Some detail:

  • The magic number for LevelDB SST files (0xdb4775248b80fb57, most recently called kLegacyBlockBasedTableMagicNumber) now only exists in the code to provide a good error message and to test that good error message.
  • There is some notable refactoring and renaming around format_version handling. This is a bit of a messy area of code because the footer code being shared between different table formats (block-based, plain, cuckoo) means format_version in the footer is in ways tied to all of them, but in other ways is just tied to block-based table where we have been making updates. Hopefully code comments keep this clear.
  • Now that there are old format_versions we can't read (and can't write authoritatively in tests), I've needed to split out kMinSupportedFormatVersion into a constant for reads and for writes, currently the same at format_version=2. Comments describe how to update these in the future.
  • The idea of versioning the compression format is basically going away, though we're keeping BuiltinV2 in places just because it's already there. There's lots of room in the BuiltinV2 schema to expand to new built-in compression types, or new ways of handling existing compression algorithms. CompressionManager with CompatibilityName gives users the power to customize compression without the need for versions tied to format_version.

Immediate follow-up:

  • Clean up compression loose ends like OLD_Compress, OLD_Uncompress

Suggested follow-up:

  • Update plain table builder to migrate to new footer version so that we can drop support for legacy footer. We have to be careful that the (likely untested) forward compatibility path I put in place a while back works (or fix it and wait a while) before dropping support for plain table with legacy footer.

Test Plan:

  • Some tests updated / added
  • A couple tests are obsolete: removed
  • Also updated format compatible test, which now doesn't need to dig as far back into history building RocksDB.

Summary: ... and remove some old code and tech debt in the process.

This is arguably a great milestone and precendent in RocksDB history as
for the first time we are explicitly dropping support for the ability to
read source-of-truth data in old formats. (We previously dropped support for
reading some old bloom filters, but those are performance optimizers not
source-of-truth. facebook#10184) However, DBs written with default settings since
release 4.6.0, which is very nearly 10 years ago, can still be read. And
by using compaction with intermediate versions, there's an upgrade path
going back to (AFAIK) early releases of LevelDB (from which RocksDB was
forked).

Some detail:
* The magic number for LevelDB SST files (0xdb4775248b80fb57, most recently called
  kLegacyBlockBasedTableMagicNumber) now only exists in the code to
  provide a good error message and to test that good error message.
* There is some notable refactoring and renaming around format_version
  handling. This is a bit of a messy area of code because the footer
  code being shared between different table formats (block-based, plain,
  cuckoo) means format_version in the footer is in ways tied to all of
  them, but in other ways is just tied to block-based table where we
  have been making updates. Hopefully code comments keep this clear.
* Now that there are old format_versions we can't read (and can't write
  authoritatively in tests), I've needed to split out
  kMinSupportedFormatVersion into a constant for reads and for writes,
  currently the same at format_version=2. Comments describe how to
  update these in the future.
* The idea of versioning the compression format is basically going away,
  though we're keeping BuiltinV2 in places just because it's already
  there. There's lots of room in the BuiltinV2 schema to expand to new
  built-in compression types, or new ways of handling existing
  compression algorithms. CompressionManager with CompatibilityName
  gives users the power to customize compression without the need for
  versions tied to format_version.

Immediate follow-up:
* Clean up compression loose ends like OLD_Compress

Suggested follow-up:
* Update plain table builder to migrate to new footer version so that we
  can drop support for legacy footer. We have to be careful that the
  (likely untested) forward compatibility path I put in place a while
  back works (or fix it and wait a while) before dropping support for
  plain table with legacy footer.

Test Plan:
* Some tests updated / added
* A couple tests are obsolete: removed
* Also updated format compatible test, which now doesn't need to dig as
  far back into history building RocksDB.
@pdillinger pdillinger requested a review from hx235 February 7, 2026 00:16
@meta-cla meta-cla Bot added the CLA Signed label Feb 7, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Feb 7, 2026

@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D92577766.

Comment thread table/format.h
// When phasing out old format versions, first increase the write minimum,
// then later increase the read minimum when removing the implementation for
// both read and write.
constexpr uint32_t kMinSupportedBbtFormatVersionForWrite = 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth adding an assert on kMinSupportedBbtFormatVersionForWrite >= kMinSupportedBbtFormatVersionForRead? The InitializeOptions() depends on this invariant.

Comment thread table/format.h
inline bool IsSupportedFormatVersion(uint32_t version) {
return version <= kLatestFormatVersion;
// Minimum format version supported for writing new SST files in block-based
// format. This should be >= kMinSupportedFormatVersionForRead.
Copy link
Copy Markdown
Contributor

@hx235 hx235 Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

">= kMinSupportedBbtFormatVersionForRead"?

Copy link
Copy Markdown
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look good!

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Feb 11, 2026

@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D92577766.

@meta-codesync meta-codesync Bot closed this in d8b1893 Feb 11, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Feb 11, 2026

@pdillinger merged this pull request in d8b1893.

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Feb 11, 2026
Summary: See facebook#14240 which brought this to my attention. Here I've added
range deletions and compactions to the format compatible test, and fixed
(likely longstanding) compatibility issues.

The first fix was in Version::MaybeInitializeFileMetaData for an assertion
failure simply from adding range deletions from some 5.x version.

The second fix is a broader work-around for older SST files with
unreliable num_entries/num_range_deletions/num_deletions statistics in
their table properties. We depend on them only for some paranoid checks
for compaction, so in my assessment the best way to deal with those files
is to exclude the paranoid checks when dealing with the files with
unrelaible data. (Details in code comments.) The important part is that
compacting old files is exceptionally rare, so we aren't really
interefering with the paranoid checks doing thier job on an ongoing
basis.

This depends on facebook#14315 (just landed) because there is a remaining
undiagnosed problem with some very early releases, but I'm not fixing
that because its support is being dropped.

Test Plan: test extended (ran locally excluding some releases)
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Feb 12, 2026
Summary: In follow-up to facebook#14315

Remove obsolete code replaced by new Compressor/Decompressor interface:
* CompressionInfo and UncompressionInfo classes
* UncompressionDict class
* Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*,
  LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress)
* OLD_CompressData and OLD_UncompressData
* compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo

The only small refactoring in this change that is not pure code removal
or movement is in blob_file_builder_test.cc.

Move some function implementations etc. from compression.h to compression.cc:
* CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString
* ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary
* DecompressorDict::Populate
* Most compression library includes

Also cleaned up other includes of compression.h, which caused some other
files to need new includes.

Test Plan: existing tests
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Feb 12, 2026
Summary: In follow-up to facebook#14315

Remove obsolete code replaced by new Compressor/Decompressor interface:
* CompressionInfo and UncompressionInfo classes
* UncompressionDict class
* Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*,
  LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress)
* OLD_CompressData and OLD_UncompressData
* compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo

The only small refactoring in this change that is not pure code removal
or movement is in blob_file_builder_test.cc.

Move some function implementations etc. from compression.h to compression.cc:
* CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString
* ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary
* DecompressorDict::Populate
* Most compression library includes

Also cleaned up other includes of compression.h, which caused some other
files to need new includes.

Test Plan: existing tests
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Feb 12, 2026
…ok#14315)

Summary:
After PR facebook#14315 dropped support for block-based table format_version < 2,
several code paths became obsolete. This change removes them.

Investigation findings:

1. Table properties are now a hard requirement for block-based SST files:
   - format_version >= 2 guarantees a properties block exists
   - Removed defensive conditionals like `if (rep_->table_properties)`
   - Missing properties block now returns Status::Corruption instead of
     just logging an error. This is important because some properties
     affect the semantic interpretation of the file.

2. Index type property (kIndexType) is now required:
   - kIndexType was introduced in Feb 2014 (commit 74939a9), ~11 months
     BEFORE format_version was introduced in Jan 2015
   - BlockBasedTablePropertiesCollector::Finish() has always written
     kIndexType unconditionally for all block-based tables
   - Therefore all format_version >= 2 files have this property
   - Now returns Status::Corruption if missing instead of silently
     defaulting to kBinarySearch

3. Removed SetOldTableOptions() from sst_file_dumper:
   - This fallback handled files without a properties block
   - Dead code since format_version >= 2 guarantees properties exist

4. Removed kPropertiesBlockOldName ("rocksdb.stats") fallback:
   - The properties block was renamed from "rocksdb.stats" to
     "rocksdb.properties" in RocksDB 2.7 (April 2014)
   - format_version 2 was introduced in RocksDB 3.10 (Oct 2015)
   - All table formats (block-based, plain, cuckoo) were created after
     the rename, so they all use "rocksdb.properties"
   - The backward compatibility fallback in FindOptionalMetaBlock() was
     dead code for all supported table formats

5. Removed obsolete assertion about format_version 0 checksum in
   BlockBasedTableBuilder::WriteFooter()

Test Plan: some tests updated for updated requirements. Mostly, CI including format compatible test
meta-codesync Bot pushed a commit that referenced this pull request Feb 13, 2026
Summary:
See #14240 which brought this to my attention. Here I've added range deletions and compactions to the format compatible test, and fixed or worked-around compatibility issues (likely longstanding).

The first fix was in Version::MaybeInitializeFileMetaData for an assertion failure simply from adding range deletions from some 5.x version.

The second fix is a broader work-around for older SST files with unreliable num_entries/num_range_deletions/num_deletions statistics in their table properties. We depend on them only for some paranoid checks for compaction, so in my assessment the best way to deal with those files is to exclude the paranoid checks when dealing with the files with unrelaible data. (Details in code comments.) The important part is that compacting old files is exceptionally rare, so we aren't really interefering with the paranoid checks doing thier job on an ongoing basis.

This depends on #14315 (just landed) because there is a remaining undiagnosed problem with some very early releases, but I'm not fixing that because its support is being dropped.

Pull Request resolved: #14323

Test Plan: test extended (ran locally excluding some releases)

Reviewed By: xingbowang

Differential Revision: D93032653

Pulled By: pdillinger

fbshipit-source-id: f90b32f30ba4764692e68d23705f42c778e0dc1d
meta-codesync Bot pushed a commit that referenced this pull request Feb 13, 2026
Summary:
In follow-up to #14315

Remove obsolete code replaced by new Compressor/Decompressor interface:
* OLD_CompressData and OLD_UncompressData
* Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*, LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress)
* CompressionInfo and UncompressionInfo classes
* UncompressionDict class
* compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo

The only small refactoring in this change that is not pure code removal or movement is in blob_file_builder_test.cc.

Move some function implementations etc. from compression.h to compression.cc:
* CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString
* ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary
* DecompressorDict::Populate
* Most compression library includes

Also cleaned up other includes of compression.h, which caused some other files to need new includes.

Pull Request resolved: #14325

Test Plan: existing tests

Reviewed By: hx235

Differential Revision: D93120580

Pulled By: pdillinger

fbshipit-source-id: ab5c50db7379c0387a8c0e379642c9ea2799eae5
pdillinger added a commit that referenced this pull request Feb 16, 2026
Summary:
See #14240 which brought this to my attention. Here I've added range deletions and compactions to the format compatible test, and fixed or worked-around compatibility issues (likely longstanding).

The first fix was in Version::MaybeInitializeFileMetaData for an assertion failure simply from adding range deletions from some 5.x version.

The second fix is a broader work-around for older SST files with unreliable num_entries/num_range_deletions/num_deletions statistics in their table properties. We depend on them only for some paranoid checks for compaction, so in my assessment the best way to deal with those files is to exclude the paranoid checks when dealing with the files with unrelaible data. (Details in code comments.) The important part is that compacting old files is exceptionally rare, so we aren't really interefering with the paranoid checks doing thier job on an ongoing basis.

This depends on #14315 (just landed) because there is a remaining undiagnosed problem with some very early releases, but I'm not fixing that because its support is being dropped.

Pull Request resolved: #14323

Test Plan: test extended (ran locally excluding some releases)

Reviewed By: xingbowang

Differential Revision: D93032653

Pulled By: pdillinger

fbshipit-source-id: f90b32f30ba4764692e68d23705f42c778e0dc1d
meta-codesync Bot pushed a commit that referenced this pull request Feb 18, 2026
#14327)

Summary:
After PR #14315 dropped support for block-based table format_version < 2, several code paths became obsolete. This change removes them.

Investigation findings:

1. Table properties are now a hard requirement for block-based SST files:
   - format_version >= 2 guarantees a properties block exists
   - Removed defensive conditionals like `if (rep_->table_properties)`
   - Missing properties block now returns Status::Corruption instead of just logging an error. This is important because some properties affect the semantic interpretation of the file.

2. Index type property (kIndexType) is now required:
   - kIndexType was introduced in Feb 2014 (commit 74939a9), ~11 months BEFORE format_version was introduced in Jan 2015
   - BlockBasedTablePropertiesCollector::Finish() has always written kIndexType unconditionally for all block-based tables
   - Therefore all format_version >= 2 files have this property
   - Now returns Status::Corruption if missing instead of silently defaulting to kBinarySearch

3. Removed SetOldTableOptions() from sst_file_dumper:
   - This fallback handled files without a properties block
   - Dead code since format_version >= 2 guarantees properties exist

4. Removed kPropertiesBlockOldName ("rocksdb.stats") fallback:
   - The properties block was renamed from "rocksdb.stats" to "rocksdb.properties" in RocksDB 2.7 (April 2014)
   - format_version 2 was introduced in RocksDB 3.10 (Oct 2015)
   - All table formats (block-based, plain, cuckoo) were created after the rename, so they all use "rocksdb.properties"
   - The backward compatibility fallback in FindOptionalMetaBlock() was dead code for all supported table formats

5. Removed obsolete assertion about format_version 0 checksum in BlockBasedTableBuilder::WriteFooter()

Pull Request resolved: #14327

Test Plan: some tests updated for updated requirements. Mostly, CI including format compatible test

Reviewed By: mszeszko-meta

Differential Revision: D93124820

Pulled By: pdillinger

fbshipit-source-id: eb12cbdca0e69f34a08051d5160c282384128a4a
doxtop pushed a commit to flyingw/rocksdb that referenced this pull request Apr 7, 2026
Summary:
... and remove some old code and tech debt in the process.

This is arguably a great milestone and precendent in RocksDB history as for the first time we are explicitly dropping support for the ability to read source-of-truth data in old formats. (We previously dropped support for reading some old bloom filters, but those are performance optimizers not source-of-truth. facebook#10184) However, DBs written with default settings since release 4.6.0, which is very nearly 10 years ago, can still be read. And by using compaction with intermediate versions, there's an upgrade path going back to (AFAIK) early releases of LevelDB (from which RocksDB was forked).

Some detail:
* The magic number for LevelDB SST files (0xdb4775248b80fb57, most recently called kLegacyBlockBasedTableMagicNumber) now only exists in the code to provide a good error message and to test that good error message.
* There is some notable refactoring and renaming around format_version handling. This is a bit of a messy area of code because the footer code being shared between different table formats (block-based, plain, cuckoo) means format_version in the footer is in ways tied to all of them, but in other ways is just tied to block-based table where we have been making updates. Hopefully code comments keep this clear.
* Now that there are old format_versions we can't read (and can't write authoritatively in tests), I've needed to split out kMinSupportedFormatVersion into a constant for reads and for writes, currently the same at format_version=2. Comments describe how to update these in the future.
* The idea of versioning the compression format is basically going away, though we're keeping BuiltinV2 in places just because it's already there. There's lots of room in the BuiltinV2 schema to expand to new built-in compression types, or new ways of handling existing compression algorithms. CompressionManager with CompatibilityName gives users the power to customize compression without the need for versions tied to format_version.

Immediate follow-up:
* Clean up compression loose ends like OLD_Compress, OLD_Uncompress

Suggested follow-up:
* Update plain table builder to migrate to new footer version so that we can drop support for legacy footer. We have to be careful that the (likely untested) forward compatibility path I put in place a while back works (or fix it and wait a while) before dropping support for plain table with legacy footer.

Pull Request resolved: facebook#14315

Test Plan:
* Some tests updated / added
* A couple tests are obsolete: removed
* Also updated format compatible test, which now doesn't need to dig as far back into history building RocksDB.

Reviewed By: hx235

Differential Revision: D92577766

Pulled By: pdillinger

fbshipit-source-id: a23be846189d901ce087af4ca9a99cef18445cb7
doxtop pushed a commit to flyingw/rocksdb that referenced this pull request Apr 7, 2026
Summary:
See facebook#14240 which brought this to my attention. Here I've added range deletions and compactions to the format compatible test, and fixed or worked-around compatibility issues (likely longstanding).

The first fix was in Version::MaybeInitializeFileMetaData for an assertion failure simply from adding range deletions from some 5.x version.

The second fix is a broader work-around for older SST files with unreliable num_entries/num_range_deletions/num_deletions statistics in their table properties. We depend on them only for some paranoid checks for compaction, so in my assessment the best way to deal with those files is to exclude the paranoid checks when dealing with the files with unrelaible data. (Details in code comments.) The important part is that compacting old files is exceptionally rare, so we aren't really interefering with the paranoid checks doing thier job on an ongoing basis.

This depends on facebook#14315 (just landed) because there is a remaining undiagnosed problem with some very early releases, but I'm not fixing that because its support is being dropped.

Pull Request resolved: facebook#14323

Test Plan: test extended (ran locally excluding some releases)

Reviewed By: xingbowang

Differential Revision: D93032653

Pulled By: pdillinger

fbshipit-source-id: f90b32f30ba4764692e68d23705f42c778e0dc1d
doxtop pushed a commit to flyingw/rocksdb that referenced this pull request Apr 7, 2026
…14325)

Summary:
In follow-up to facebook#14315

Remove obsolete code replaced by new Compressor/Decompressor interface:
* OLD_CompressData and OLD_UncompressData
* Individual compression/decompression functions (Snappy_*, Zlib_*, BZip2_*, LZ4_*, LZ4HC_*, XPRESS_*, ZSTD_Compress, ZSTD_Uncompress)
* CompressionInfo and UncompressionInfo classes
* UncompressionDict class
* compression::PutDecompressedSizeInfo and GetDecompressedSizeInfo

The only small refactoring in this change that is not pure code removal or movement is in blob_file_builder_test.cc.

Move some function implementations etc. from compression.h to compression.cc:
* CompressionTypeToString, CompressionTypeFromString, CompressionOptionsToString
* ZSTD_TrainDictionary (both overloads), ZSTD_FinalizeDictionary
* DecompressorDict::Populate
* Most compression library includes

Also cleaned up other includes of compression.h, which caused some other files to need new includes.

Pull Request resolved: facebook#14325

Test Plan: existing tests

Reviewed By: hx235

Differential Revision: D93120580

Pulled By: pdillinger

fbshipit-source-id: ab5c50db7379c0387a8c0e379642c9ea2799eae5
doxtop pushed a commit to flyingw/rocksdb that referenced this pull request Apr 7, 2026
…ok#14315) (facebook#14327)

Summary:
After PR facebook#14315 dropped support for block-based table format_version < 2, several code paths became obsolete. This change removes them.

Investigation findings:

1. Table properties are now a hard requirement for block-based SST files:
   - format_version >= 2 guarantees a properties block exists
   - Removed defensive conditionals like `if (rep_->table_properties)`
   - Missing properties block now returns Status::Corruption instead of just logging an error. This is important because some properties affect the semantic interpretation of the file.

2. Index type property (kIndexType) is now required:
   - kIndexType was introduced in Feb 2014 (commit 74939a9), ~11 months BEFORE format_version was introduced in Jan 2015
   - BlockBasedTablePropertiesCollector::Finish() has always written kIndexType unconditionally for all block-based tables
   - Therefore all format_version >= 2 files have this property
   - Now returns Status::Corruption if missing instead of silently defaulting to kBinarySearch

3. Removed SetOldTableOptions() from sst_file_dumper:
   - This fallback handled files without a properties block
   - Dead code since format_version >= 2 guarantees properties exist

4. Removed kPropertiesBlockOldName ("rocksdb.stats") fallback:
   - The properties block was renamed from "rocksdb.stats" to "rocksdb.properties" in RocksDB 2.7 (April 2014)
   - format_version 2 was introduced in RocksDB 3.10 (Oct 2015)
   - All table formats (block-based, plain, cuckoo) were created after the rename, so they all use "rocksdb.properties"
   - The backward compatibility fallback in FindOptionalMetaBlock() was dead code for all supported table formats

5. Removed obsolete assertion about format_version 0 checksum in BlockBasedTableBuilder::WriteFooter()

Pull Request resolved: facebook#14327

Test Plan: some tests updated for updated requirements. Mostly, CI including format compatible test

Reviewed By: mszeszko-meta

Differential Revision: D93124820

Pulled By: pdillinger

fbshipit-source-id: eb12cbdca0e69f34a08051d5160c282384128a4a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants