Extract IndexFactory as unified pluggable index abstraction#14602
Extract IndexFactory as unified pluggable index abstraction#14602zaidoon1 wants to merge 3 commits intofacebook:mainfrom
Conversation
|
| Check | Count |
|---|---|
cert-err58-cpp |
1 |
| Total | 1 |
Details
table/block_based/block_based_table_factory.cc (1 warning(s))
table/block_based/block_based_table_factory.cc:230:5: warning: initialization of 'block_base_table_index_mode_string_map' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]
a826b42 to
72a081c
Compare
|
/claude-review |
✅ Claude Code ReviewRequested by @xingbowang Code Review: Extract IndexFactory as unified pluggable index abstractionRecommendation: Request Changes The architectural direction is sound — unifying the index abstraction and introducing CRITICALC1. Broken backward-compatibility shim — The C2. Meta block key prefix change — C3. C4. Options migration missing — HIGHH1. kPrimaryOnly null safety — H2. H3. SUGGESTIONS
Full report in ℹ️ About this responseGenerated by Claude Code. Limitations:
Commands:
|
side question, can i run this command to get the review started or does it only work if a member from rocksdb team do it? |
✅ Claude Code ReviewRequested by @zaidoon1 Code Review: Extract IndexFactory as Unified Pluggable Index Abstraction35 files changed, 3205 insertions, 1409 deletions | Diff truncated — review covers visible portions. Critical FindingsC1. Meta Block Prefix Change Breaks Cross-Version SST CompatibilitySeverity: Critical The prefix changes from Fix: Support both prefixes during lookup in s = FindMetaBlock(meta_iter, kIndexFactoryMetaPrefix + udi_name, &handle);
if (!s.ok()) {
s = FindMetaBlock(meta_iter, kUserDefinedIndexPrefix + udi_name, &handle);
}High Severity FindingsH1. Verify All UserDefinedIndexFactory Implementations Are UpdatedFiles using old API ( H2. IndexFactoryOptions Default Comparator Changed to nullptrOld H3. Loss of Factory Name Validation in ReadOptionsOld H4. kPrimaryOnly Has No Recovery Path for Corrupted Custom IndexNo built-in index fallback. Corrupted custom meta block → zero keys readable. Backup/restore without factory configured → unreadable DB. Medium Severity
Suggestions
Full report written to ℹ️ About this responseGenerated by Claude Code. Limitations:
Commands:
|
|
@xingbowang so the main points from the AI review is about backwards compatibility. Given this api is experimental, do we need to worry about it? Not sure if you have internal use cases for it right now? |
I added you in the allow list, so it works for you as well. |
|
We do have internal usage, but since most of the times are deployed in one binary, instead of dynamic loading, we should be able to fix that. |
| inline constexpr const char* kIndexFactoryMetaPrefix = "rocksdb.index_factory."; | ||
|
|
||
| // ============================================================================ | ||
| // IndexFactory: pluggable index for BlockBasedTable SST files. |
There was a problem hiding this comment.
Could we add an explicit comment that this API is intentionally asymmetric between build and read?
Right now IndexFactory reads as if built-in and custom indexes both go through the same abstraction on both sides. But the actual design is narrower: write/build is unified through IndexFactory, while built-in reads still use the richer internal BlockBasedTable::IndexReader path, and custom reads are adapted into that path via IndexFactoryReaderWrapper.
That seems like the right design for now, but it is not obvious from the header, and readers could easily assume builtin NewReader() returning NotSupported is an incomplete refactor rather than an intentional boundary. A short note near IndexFactoryReader / IndexFactory::NewReader() would make the design intent much clearer.
If you want suggested code-comment text, I'd use:
// NOTE: The IndexFactory API is intentionally asymmetric.
// Built-in and custom indexes share the factory abstraction for SST
// construction, but built-in index reads continue to use the internal
// BlockBasedTable::IndexReader path. That internal reader contract carries
// table-local behaviors such as cache/prefetch/pinning and iterator reuse
// that are not part of this public SPI. Custom IndexFactoryReader
// implementations are adapted to the internal reader contract via
// IndexFactoryReaderWrapper.| // for any mode other than kBuiltinOnly. | ||
| // | ||
| // kBuiltinOnly (default): | ||
| // Only the built-in binary search index is used. |
There was a problem hiding this comment.
"binary search index" is not accurate, it could be other built in types.
| // - Partitioned index (kTwoLevelIndexSearch) in kPrimary/kPrimaryOnly | ||
| // - Partitioned filters in kPrimary/kPrimaryOnly | ||
| // - Parallel compression in any mode that uses a custom index | ||
| enum class IndexMode { |
There was a problem hiding this comment.
BlockBasedTableOptions::IndexMode still feels hard to read in its current form. The main issue is the value names: kSecondary / kPrimary make the reader stop and decode what is actually primary here. In database terminology, "primary index" and "secondary index" usually refer to different logical indexes over the data, so those names already carry a strong meaning for users. Here, though, the enum is not modeling multiple user-visible indexes in that sense; it is controlling which SST index implementations are built and which one is the default on reads. kPrimaryOnly also changes the physical SST layout, not just index priority, which makes the naming even less obvious. kBuiltinOnly is also a bit misleading, since the "built-in" side is not always binary search; index_type can still make it hash or partitioned.
I think this would be easier to understand if we keep the single enum, but rename the values to describe the semantics directly, e.g.:
enum class IndexMode {
kStandardOnly,
kStandardDefault,
kCustomDefault,
kCustomOnly,
};That maps cleanly to the current behavior:
kStandardOnly: only the standard index is built/usedkStandardDefault: both indexes are built, standard is the defaultkCustomDefault: both indexes are built, custom is the defaultkCustomOnly: only the custom index is built
This keeps the API simple, preserves the rollout model, and makes the mode names self-describing without forcing readers to translate "primary/secondary" into actual behavior.
|
Since the refactor touches existing internal indexes, could you run some benchmark to measure this refactor does not introduce performance regression. Essentially, it means run benchmark on read and flush without UDI and make sure no perf regression is observed. Please share some number. |
| // kCustom: force the custom IndexFactory index for this read. | ||
| // In kSecondary mode, this is how you select the custom index | ||
| // for individual reads without changing index_mode. | ||
| enum class ReadIndex : uint8_t { |
There was a problem hiding this comment.
It may be worth adding one explicit sentence here that ReadIndex is intentionally a two-way selector because BlockBasedTable currently supports exactly two index read targets per table: one standard index selected by BlockBasedTableOptions::index_type, and at most one custom index from user_defined_index_factory. Right now that constraint is mostly implied by the API shape, but not stated clearly, so a reader could reasonably wonder why this is a fixed enum (kBuiltin / kCustom) instead of something more general like an index ID/name.
Also, the wording "built-in binary search index" is a bit misleading here, since the standard index path can still be hash or partitioned depending on index_type. If we keep the current naming, I think "standard index" would be more accurate than "built-in binary search index."
| enum class ReadIndex : uint8_t { | ||
| kDefault = 0, | ||
| kBuiltin = 1, | ||
| kCustom = 2, |
There was a problem hiding this comment.
Related to the semantics here: kCustom reads more like a strict selection ("must use the custom index for this read"), but the implementation behaves more like a best-effort preference. If no custom index is available for a table/file, reads fall back to the standard index instead of treating this as an invalid request. That seems like a reasonable migration/compatibility behavior, but the current name does not make it obvious.
If that fallback is intentional, would it make sense to rename kCustom to something like kPreferCustom? That would better match the actual contract and reduce the chance that readers interpret this as a strict selector.
| // Fault injection note: the custom index meta block is vulnerable to | ||
| // metadata write fault injection (metadata_write_fault_one_in). If the | ||
| // meta block is corrupted, kPrimaryOnly has no fallback index and the | ||
| // compaction iterator reads zero keys from the affected SST. This is | ||
| // expected behavior — the standard binary search index (in kPrimary and | ||
| // below) is part of the SST's main block layout and is not affected by | ||
| // metadata write faults, providing a natural fallback. The stress tool | ||
| // disables compaction_verify_record_count for kPrimary/kPrimaryOnly | ||
| // when write fault injection is active. Without fault injection, all | ||
| // modes pass the compaction record count check correctly. |
There was a problem hiding this comment.
I don't think we need this detail here. The one in stress test is good enough.
| // | ||
| // Thread safety: all methods except EstimatedSize() are called from a | ||
| // single thread (the emit thread in BlockBasedTableBuilder). Parallel | ||
| // compression is not supported for custom IndexFactory implementations. |
There was a problem hiding this comment.
Given the new API we provided on this interface, the limitation of not support parallel compression is just a limit for specific UDI implementation, right? If a new UDI implement some of the interface, it would be able to be supported by parallel compression.
…ntation Address feedback from xingbowang on PR facebook#14602: 1. Rename IndexMode enum values to be self-describing: kBuiltinOnly -> kStandardOnly kSecondary -> kStandardDefault kPrimary -> kCustomDefault kPrimaryOnly -> kCustomOnly The new names describe behavior directly (which index is built, which is the default) instead of using primary/secondary terminology that conflicts with database index semantics. 2. Replace 'binary search index' with 'standard index' in all comments. The built-in index is not always binary search — it can be hash or partitioned depending on BlockBasedTableOptions::index_type. 3. Add asymmetric design note to IndexFactory header explaining that the build path is unified through IndexFactory while the read path uses the internal BlockBasedTable::IndexReader for built-ins and IndexFactoryReaderWrapper as an adapter for custom implementations. 4. Add ReadIndex rationale comment explaining why it is a fixed two-way enum (exactly two read targets per SST: standard + custom). 5. Document kCustom fallback behavior: when no custom index is available for a given SST, reads fall back to the standard index. 6. Remove fault injection implementation detail from the public index_factory.h header (kept in db_stress_test_base.cc where it belongs). 7. Clarify that parallel compression support is per-implementation: custom IndexFactory implementations can support it by overriding SupportsParallelAddEntry/PrepareAddEntry/FinishAddEntry.
Refactors the block-based table's index subsystem to make custom indexes
first-class citizens alongside the built-in binary search index. Both
are IndexFactory subclasses at the same abstraction level, following the
FilterPolicy model where built-in implementations are proper subclasses
of the public interface.
New public API:
include/rocksdb/index_factory.h:
IndexFactory, IndexFactoryBuilder, IndexFactoryReader,
IndexFactoryIterator — unified interface for all index types.
BlockBasedTableOptions::IndexMode enum:
kBuiltinOnly — standard binary search only (default)
kSecondary — both indexes; standard primary, custom per-read
kPrimary — both indexes; custom primary for all reads
kPrimaryOnly — custom only; no standard index built
ReadOptions::ReadIndex enum:
kDefault — use whatever IndexMode says
kBuiltin — force built-in for this read
kCustom — force custom index for this read
Built-in IndexFactory implementations:
BinarySearchIndexFactory, HashIndexFactory, PartitionedIndexFactory
wrap the existing internal IndexBuilder/IndexReader behind the public
interface. The table builder creates the built-in index through the
factory, same as custom indexes.
Architecture:
BlockBasedTableBuilder manages all indexes through IndexFactoryBuilder.
The built-in index uses a fast path (AddIndexEntryDirect) that passes
internal keys directly to the underlying IndexBuilder, avoiding the
user-key translation layer. Custom indexes receive user keys through
the standard AddIndexEntry path. Zero per-block overhead for the common
case (kBuiltinOnly with no custom index).
PartitionCoordinator interface decouples PartitionedFilterBlockBuilder
from the concrete PartitionedIndexBuilder type, allowing pluggable
index implementations without leaking internal types.
IndexFactoryReaderWrapper dispatches reads between built-in and custom
indexes based on IndexMode and per-read ReadIndex selection.
Single-index mode (kPrimaryOnly):
The standard index is not built — only the custom IndexFactory produces
an index. A minimal empty block satisfies the SST footer format. The
index_key_is_user_key property is set to 0 to match the custom index
wrapper's internal key format.
Backward compatibility:
user_defined_index.h provides using aliases (UserDefinedIndexFactory =
IndexFactory, etc.). Existing code compiles without changes. The old
use_udi_as_primary_index, skip_standard_index, and fail_if_no_udi_on_open
booleans are replaced by the single IndexMode enum.
Performance:
kBuiltinOnly path: zero per-block overhead (fast path passes internal
keys directly), 2 well-predicted branches per key (~0ns), 24 bytes
empty vector per SST. No read-path overhead (wrapper not installed).
Factory constructed on stack (no heap allocation). BlockBasedTableOptions
referenced by pointer (no copy).
…ntation Address feedback from xingbowang on PR facebook#14602: 1. Rename IndexMode enum values to be self-describing: kBuiltinOnly -> kStandardOnly kSecondary -> kStandardDefault kPrimary -> kCustomDefault kPrimaryOnly -> kCustomOnly The new names describe behavior directly (which index is built, which is the default) instead of using primary/secondary terminology that conflicts with database index semantics. 2. Replace 'binary search index' with 'standard index' in all comments. The built-in index is not always binary search — it can be hash or partitioned depending on BlockBasedTableOptions::index_type. 3. Add asymmetric design note to IndexFactory header explaining that the build path is unified through IndexFactory while the read path uses the internal BlockBasedTable::IndexReader for built-ins and IndexFactoryReaderWrapper as an adapter for custom implementations. 4. Add ReadIndex rationale comment explaining why it is a fixed two-way enum (exactly two read targets per SST: standard + custom). 5. Document kCustom fallback behavior: when no custom index is available for a given SST, reads fall back to the standard index. 6. Remove fault injection implementation detail from the public index_factory.h header (kept in db_stress_test_base.cc where it belongs). 7. Clarify that parallel compression support is per-implementation: custom IndexFactory implementations can support it by overriding SupportsParallelAddEntry/PrepareAddEntry/FinishAddEntry.
Wrap long lines exceeding 80-column limit introduced by the longer IndexMode enum names (kStandardOnly/kStandardDefault/kCustomDefault/ kCustomOnly). No functional change.
a4b25e7 to
9132665
Compare
I've addressed all the feedback, working on the benchmarks now. |
Benchmark Results: No Regression ObservedRan Setup
Write workloads (flush path — most affected by this refactor)
Read workloadsTwo DB states tested for reads: Post-overwrite (LSM has L0 churn):
Post-compact (clean LSM):
ConclusionAll deltas (using median, robust to outliers) are within ±2%, well inside benchmark noise (per-benchmark stdev was 1–5% on this hardware). No regression observed in either flush or read paths when UDI is not configured. This matches expectations: the refactor's hot path for |
Reproducing the BenchmarksBuild (both versions)# Baseline (merge-base of this PR with upstream/main)
git checkout bad2d5b0a
make clean
DEBUG_LEVEL=0 make -j$(nproc) db_bench
mv db_bench /tmp/db_bench_baseline
# PR branch
git checkout a4b25e7a4
make clean
DEBUG_LEVEL=0 make -j$(nproc) db_bench
mv db_bench /tmp/db_bench_prfillseq (write-only flush benchmark)$binary \
--benchmarks=fillseq \
--db=/tmp/bench_db \
--num=10000000 \
--value_size=100 \
--key_size=16 \
--seed=42 \
--threads=1 \
--use_existing_db=0Combined: write + compact + read benchmarks$binary \
--benchmarks="fillrandom,compact,overwrite,readrandom,readseq" \
--db=/tmp/bench_db \
--num=10000000 \
--reads=1000000 \
--value_size=100 \
--key_size=16 \
--seed=42 \
--threads=1 \
--use_existing_db=0This benchmark sequence:
The Focused read benchmark (post-compact, no overwrite)$binary \
--benchmarks="fillrandom,compact,readrandom,readseq" \
--db=/tmp/bench_db \
--num=10000000 \
--reads=1000000 \
--value_size=100 \
--key_size=16 \
--seed=42 \
--threads=1 \
--use_existing_db=0The Iterations
DB path was wiped ( |
Refactors the block-based table's index subsystem to make custom indexes first-class citizens alongside the built-in binary search index. Both are IndexFactory subclasses at the same abstraction level, following the FilterPolicy model where built-in implementations are proper subclasses of the public interface.
New public API:
include/rocksdb/index_factory.h:
IndexFactory, IndexFactoryBuilder, IndexFactoryReader,
IndexFactoryIterator — unified interface for all index types.
BlockBasedTableOptions::IndexMode enum:
kBuiltinOnly — standard binary search only (default)
kSecondary — both indexes; standard primary, custom per-read
kPrimary — both indexes; custom primary for all reads
kPrimaryOnly — custom only; no standard index built
ReadOptions::ReadIndex enum:
kDefault — use whatever IndexMode says
kBuiltin — force built-in for this read
kCustom — force custom index for this read
Built-in IndexFactory implementations:
BinarySearchIndexFactory, HashIndexFactory, PartitionedIndexFactory
wrap the existing internal IndexBuilder/IndexReader behind the public
interface. The table builder creates the built-in index through the
factory, same as custom indexes.
Architecture:
BlockBasedTableBuilder manages all indexes through IndexFactoryBuilder.
The built-in index uses a fast path (AddIndexEntryDirect) that passes
internal keys directly to the underlying IndexBuilder, avoiding the
user-key translation layer. Custom indexes receive user keys through
the standard AddIndexEntry path. Zero per-block overhead for the common
case (kBuiltinOnly with no custom index).
PartitionCoordinator interface decouples PartitionedFilterBlockBuilder
from the concrete PartitionedIndexBuilder type, allowing pluggable
index implementations without leaking internal types.
IndexFactoryReaderWrapper dispatches reads between built-in and custom
indexes based on IndexMode and per-read ReadIndex selection.
Single-index mode (kPrimaryOnly):
The standard index is not built — only the custom IndexFactory produces
an index. A minimal empty block satisfies the SST footer format. The
index_key_is_user_key property is set to 0 to match the custom index
wrapper's internal key format.
Backward compatibility:
user_defined_index.h provides using aliases (UserDefinedIndexFactory =
IndexFactory, etc.). Existing code compiles without changes. The old
use_udi_as_primary_index, skip_standard_index, and fail_if_no_udi_on_open
booleans are replaced by the single IndexMode enum.
Performance:
kBuiltinOnly path: zero per-block overhead (fast path passes internal
keys directly), 2 well-predicted branches per key (~0ns), 24 bytes
empty vector per SST. No read-path overhead (wrapper not installed).
Factory constructed on stack (no heap allocation). BlockBasedTableOptions
referenced by pointer (no copy).
This attempts to address #14547 (comment)