Blog file format: unified WAL and blob file format (1/n) by pdillinger · Pull Request #14675 · facebook/rocksdb

pdillinger · 2026-04-28T02:14:04Z

Summary:
Introduces the "blog" file format (portmanteau of "blob" + "log"), a new unified file format for WAL and blob files in RocksDB. This change makes the new format an opt-in option for blob files ONLY. An immediate follow-up will add WAL support. This format is intended to be the future default for both WAL and blob files, and likely also manifest files.

The impetus for this new file format was an apparent convergence in requirements for interesting and useful future directions for RocksDB, along with some tech debt:

Supporting blob "direct write" (key-value separation in the memtable) with WAL enabled and at least the option to have all the WAL+blob data go into one file to reduce overheads in some cases like WAL sync write with blob direct write. (In other cases, separating WAL WriteBatches and blobs into distinct files would likely be the better choice.) The "preamble start" marker record is intended to support this case so that WriteBatches can carry external values in a "preamble" in memory and the WriteBatch doesn't need to be rewritten on storage to a single blog file serving both WAL and blob functions. (Details in later work.)
Preserve the continuity of each blob value for efficient reads (NOTE: WAL/Manifest format often breaks up payloads), and extend this continuity to WriteBatches so that keys/values with known checksums could be carried and extended to the WriteBatch and its contiguous encoding in the blog-as-WAL file. (The goal is to leverage checksums across layers as much as possible rather than computing new ones at each layer; only CRC checksums are "extendable.")
Support some "linear log" workloads with monotonically increasing keys and FIFO pruning of old data. A CF could be configured to use its own blog-as-WAL files writing this data, and those files could get indexing information written to them as each file is sealed. This would enable moderately efficient read queries that process WriteBatch records for results, and no WAL->SST write amplification.
Modernize blog and WAL formats with features like explicit versioning and extensibility, configurable and context-aware checksums, debugging and statistical information, customizable compression (CompressionManager aware), and more.

New DB options: use_blog_format_for_blobs, blog_checksum. Other public API changes:

ChecksumType moved to include/rocksdb/checksum_type.h.
kStreamingCompressionSentinel (0x7F) added to CompressionType enum. Some included refactoring:
BlobLogWriter::log_number_ removed (was unused).
BlobLogWriter::AppendFooter renamed to LegacyAppendFooterAndClose.

Test Plan:
New unit tests validate the blog file format core (33 tests in blog_format_test) covering header encode/decode round-trips, property encoding, escape sequence generation and verification, padding scheme, irregular varints, context checksums, footer locator/properties, schema version rejection, and typed property accessors. Writer/reader round-trip tests (11 tests in blog_writer_test) cover single and multiple blob records, compact vs full format selection, mixed record types, preamble-start stub, footer records, checksum corruption detection, alignment invariants, and header properties.

Existing blob file builder, reader, cache, and source tests (41 tests across 4 test binaries) pass unmodified, verifying legacy blob format is not broken. The options_settable_test validates that use_blog_format_for_blobs and blog_checksum are properly wired through the options system. The log_test (211 tests) confirms legacy WAL format is completely unaffected.

Blog-as-blob integration is exercised by db_crashtest.py with use_blog_format_for_blobs and blog_checksum randomized across iterations, stress-testing write/crash/recovery cycles with various checksum types (CRC32c, xxHash, xxHash64, XXH3), compression configurations, and fault injection.

Summary: Introduces the "blog" file format (portmanteau of "blob" + "log"), a new unified file format for WAL and blob files in RocksDB. This change makes the new format an opt-in option for blob files ONLY. An immediate follow-up will add WAL support. This format is intended to be the future default for both WAL and blob files, and likely also manifest files. The impetus for this new file format was an apparent convergence in requirements for interesting and useful future directions for RocksDB, along with some tech debt: * Supporting blob "direct write" (key-value separation in the memtable) with WAL enabled and at least the option to have all the WAL+blob data go into one file to reduce overheads in some cases like WAL sync write with blob direct write. (In other cases, separating WAL WriteBatches and blobs into distinct files would likely be the better choice.) The "preamble start" marker record is intended to support this case so that WriteBatches can carry external values in a "preamble" in memory and the WriteBatch doesn't need to be rewritten on storage to a single blog file serving both WAL and blob functions. (Details in later work.) * Preserve the continuity of each blob value for efficient reads (NOTE: WAL/Manifest format often breaks up payloads), and extend this continuity to WriteBatches so that keys/values with known checksums could be carried and extended to the WriteBatch and its contiguous encoding in the blog-as-WAL file. (The goal is to leverage checksums across layers as much as possible rather than computing new ones at each layer; only CRC checksums are "extendable.") * Support some "linear log" workloads with monotonically increasing keys and FIFO pruning of old data. A CF could be configured to use its own blog-as-WAL files writing this data, and those files could get indexing information written to them as each file is sealed. This would enable moderately efficient read queries that process WriteBatch records for results, and no WAL->SST write amplification. * Modernize blog and WAL formats with features like explicit versioning and extensibility, configurable and context-aware checksums, debugging and statistical information, customizable compression (CompressionManager aware), and more. New DB options: use_blog_format_for_blobs, blog_checksum. Other public API changes: * ChecksumType moved to include/rocksdb/checksum_type.h. * kStreamingCompressionSentinel (0x7F) added to CompressionType enum. Some included refactoring: * BlobLogWriter::log_number_ removed (was unused). * BlobLogWriter::AppendFooter renamed to LegacyAppendFooterAndClose. Test Plan: TODO

meta-codesync · 2026-04-28T02:14:56Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D102718613.

github-actions · 2026-04-28T02:31:08Z

⚠️ clang-tidy: 1 warning(s) on changed lines

Completed in 1715.2s.

Summary by check

Check	Count
`cppcoreguidelines-pro-type-member-init`	1
Total	1

Details

db/blog/blog_format.cc (1 warning(s))

db/blog/blog_format.cc:594:1: warning: constructor does not initialize these fields: bytes [cppcoreguidelines-pro-type-member-init]

github-actions · 2026-04-28T03:10:13Z

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 5ba6eb5

Code Review: Blog File Format — Unified WAL and Blob File Format (1/n)

PR: Blog file format: unified WAL and blob file format (1/n)
Author: pdillinger
Scope: 45 files changed, 3700 insertions, 191 deletions
Review method: Multi-agent parallel review (9 agents) with cross-agent debate and synthesis

Critical Findings

F1. MultiGetBlob Not Updated for Blog Format (HIGH)

BlobFileReader::MultiGetBlob() (db/blob/blob_file_reader.cc:416-570) was not modified and contains four legacy-format assumptions that break with blog format:

Line 445: IsValidBlobOffset() uses legacy header/footer sizes — will incorrectly reject valid blog offsets.
Line 450: req->compression != compression_type_ — for blog format, compression_type_ is kNoCompression but BlobIndex stores per-record actual types. Every compressed blog blob fails with "Compression type mismatch."
Line 458: CalculateAdjustmentForRecordHeader(key_size) computes wrong offset — blog format has a 5-byte trailer after the payload, not a legacy header before it.
Line 551: VerifyBlob() uses legacy CRC, not blog's VerifyBlogRecordTrailer().

Impact: Any MultiGet hitting a blog-format blob file fails with corruption errors. SingleGet is properly updated; MultiGet is not.

Recommendation: Implement blog format support in MultiGetBlob() or add an early guard routing blog-format files through single-blob reads.

Medium Findings

F2. BlobIndex Size Semantics Change Affects GC Statistics (MEDIUM)

Legacy stores always-compressed size; blog stores actual on-disk size (may be uncompressed when compression is ineffective). This is correct for read/write but changes accounting for compaction_iterator.cc:1480 (total_blob_bytes_relocated += blob_index.size()) and blob_garbage_meter.cc.

Recommendation: Document the semantic change; verify GC ratio calculations aren't affected.

F3. Footer Locator Offset Division Without Alignment Assertion (MEDIUM)

In CloseBlobFile: static_cast<uint32_t>((locator_offset - props_offset) / 4) — integer division truncates if not 4-byte aligned. No assertion guards this.

Recommendation: Add assert((locator_offset - props_offset) % 4 == 0).

F4. Generic Decompressor for Blog Format (MEDIUM)

Blog format uses GetDecompressor() (generic) vs legacy's GetDecompressorOptimizeFor(). Functionally correct per the CompressionManager API contract, but may have performance implications.

Recommendation: Benchmark blob read throughput to quantify any difference.

Suggestions

S1. Move blog_format_spec.html from repo root to docs/components/blob/.
S2. Consider format-aware minimum file size validation in OpenFile.
S3. Verify CompressBlock handles max_compressed_size = blob.size() constraint correctly for all compression types.

Positive Observations

Robust format detection via 12-byte magic in BlogFileHeader::IsBlogFormat() — transparent to consumers.
Clean backward compatibility — legacy files work identically; dual-path is well-structured.
Per-record compression eliminates the legacy WART of storing expanded compressed data.
Defense-in-depth context checksums at every level using incarnation_id.
Complete API migration — all callers of renamed/removed methods properly updated.
Extensible footer reusing the record infrastructure for integrity protection.
Comprehensive unit tests — 33 format tests + 11 writer tests.

The full review is written to review-findings.md. The primary blocker is F1 (MultiGetBlob), which needs either implementation or explicit documentation as a known limitation.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

Claude may miss context from files not in the diff
Large PRs may be truncated
Always apply human judgment to AI suggestions

Commands:

/claude-review [context] — Request a code review
/claude-query <question> — Ask about the PR or codebase

meta-codesync · 2026-05-20T05:09:38Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this in D102718613.

pdillinger requested a review from xingbowang April 28, 2026 02:14

meta-cla Bot added the CLA Signed label Apr 28, 2026

pdillinger added 3 commits May 19, 2026 10:22

Merge remote-tracking branch 'origin/main' into blog_format_no_wal

6c623e4

Many fixes / updates

6bcb65f

More updates/enhancements

8802960

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog file format: unified WAL and blob file format (1/n)#14675

Blog file format: unified WAL and blob file format (1/n)#14675
pdillinger wants to merge 4 commits into
facebook:mainfrom
pdillinger:blog_format_no_wal

pdillinger commented Apr 28, 2026

Uh oh!

meta-codesync Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

meta-codesync Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pdillinger commented Apr 28, 2026

Uh oh!

meta-codesync Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ clang-tidy: 1 warning(s) on changed lines

Summary by check

Details

Uh oh!

github-actions Bot commented Apr 28, 2026

✅ Claude Code Review

Code Review: Blog File Format — Unified WAL and Blob File Format (1/n)

Critical Findings

F1. MultiGetBlob Not Updated for Blog Format (HIGH)

Medium Findings

F2. BlobIndex Size Semantics Change Affects GC Statistics (MEDIUM)

F3. Footer Locator Offset Division Without Alignment Assertion (MEDIUM)

F4. Generic Decompressor for Blog Format (MEDIUM)

Suggestions

Positive Observations

Uh oh!

meta-codesync Bot commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 28, 2026 •

edited

Loading