Skip to content

Extract IndexFactory as unified pluggable index abstraction#14602

Open
zaidoon1 wants to merge 3 commits intofacebook:mainfrom
zaidoon1:zaidoon/UDI-refactor
Open

Extract IndexFactory as unified pluggable index abstraction#14602
zaidoon1 wants to merge 3 commits intofacebook:mainfrom
zaidoon1:zaidoon/UDI-refactor

Conversation

@zaidoon1
Copy link
Copy Markdown
Contributor

Refactors the block-based table's index subsystem to make custom indexes first-class citizens alongside the built-in binary search index. Both are IndexFactory subclasses at the same abstraction level, following the FilterPolicy model where built-in implementations are proper subclasses of the public interface.

New public API:

include/rocksdb/index_factory.h:
IndexFactory, IndexFactoryBuilder, IndexFactoryReader,
IndexFactoryIterator — unified interface for all index types.

BlockBasedTableOptions::IndexMode enum:
kBuiltinOnly — standard binary search only (default)
kSecondary — both indexes; standard primary, custom per-read
kPrimary — both indexes; custom primary for all reads
kPrimaryOnly — custom only; no standard index built

ReadOptions::ReadIndex enum:
kDefault — use whatever IndexMode says
kBuiltin — force built-in for this read
kCustom — force custom index for this read

Built-in IndexFactory implementations:

BinarySearchIndexFactory, HashIndexFactory, PartitionedIndexFactory
wrap the existing internal IndexBuilder/IndexReader behind the public
interface. The table builder creates the built-in index through the
factory, same as custom indexes.

Architecture:

BlockBasedTableBuilder manages all indexes through IndexFactoryBuilder.
The built-in index uses a fast path (AddIndexEntryDirect) that passes
internal keys directly to the underlying IndexBuilder, avoiding the
user-key translation layer. Custom indexes receive user keys through
the standard AddIndexEntry path. Zero per-block overhead for the common
case (kBuiltinOnly with no custom index).

PartitionCoordinator interface decouples PartitionedFilterBlockBuilder
from the concrete PartitionedIndexBuilder type, allowing pluggable
index implementations without leaking internal types.

IndexFactoryReaderWrapper dispatches reads between built-in and custom
indexes based on IndexMode and per-read ReadIndex selection.

Single-index mode (kPrimaryOnly):

The standard index is not built — only the custom IndexFactory produces
an index. A minimal empty block satisfies the SST footer format. The
index_key_is_user_key property is set to 0 to match the custom index
wrapper's internal key format.

Backward compatibility:

user_defined_index.h provides using aliases (UserDefinedIndexFactory =
IndexFactory, etc.). Existing code compiles without changes. The old
use_udi_as_primary_index, skip_standard_index, and fail_if_no_udi_on_open
booleans are replaced by the single IndexMode enum.

Performance:

kBuiltinOnly path: zero per-block overhead (fast path passes internal
keys directly), 2 well-predicted branches per key (~0ns), 24 bytes
empty vector per SST. No read-path overhead (wrapper not installed).
Factory constructed on stack (no heap allocation). BlockBasedTableOptions
referenced by pointer (no copy).

This attempts to address #14547 (comment)

@meta-cla meta-cla Bot added the CLA Signed label Apr 11, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 11, 2026

⚠️ clang-tidy: 1 warning(s) on changed lines

Completed in 3282.3s.

Summary by check

Check Count
cert-err58-cpp 1
Total 1

Details

table/block_based/block_based_table_factory.cc (1 warning(s))
table/block_based/block_based_table_factory.cc:230:5: warning: initialization of 'block_base_table_index_mode_string_map' with static storage duration may throw an exception that cannot be caught [cert-err58-cpp]

@zaidoon1 zaidoon1 force-pushed the zaidoon/UDI-refactor branch 15 times, most recently from a826b42 to 72a081c Compare April 12, 2026 00:43
@xingbowang
Copy link
Copy Markdown
Contributor

/claude-review

@github-actions
Copy link
Copy Markdown

✅ Claude Code Review

Requested by @xingbowang


Code Review: Extract IndexFactory as unified pluggable index abstraction

Recommendation: Request Changes

The architectural direction is sound — unifying the index abstraction and introducing IndexMode are improvements over the boolean flags. However, the PR has 4 critical issues that must be resolved:

CRITICAL

C1. Broken backward-compatibility shim — The using UserDefinedIndexFactory = IndexFactory alias cannot preserve the old virtual method signatures (NewBuilder() returning raw pointer vs Status NewBuilder(opts, unique_ptr&)). Any existing subclass breaks. Additionally, IndexFactoryOptions::comparator defaults to nullptr (was BytewiseComparator()).

C2. Meta block key prefix changekIndexFactoryMetaPrefix = "rocksdb.index_factory." vs old "rocksdb.user_defined_index.". SSTs written by new code are unreadable by old binaries and vice versa. Fix: keep old prefix or add dual-prefix reader lookup.

C3. ReadOptions::table_index_factory removal incomplete — 76+ usages in utilities/trie_index/, 6 in table_test.cc, 1 in db_bench_tool.cc are not shown in the diff. The PR appears to only update db_stress_tool/ and db/wide/ tests.

C4. Options migration missinguse_udi_as_primary_index/fail_if_no_udi_on_open marked kDeprecated (silently ignored) with no conversion to IndexMode enum. Existing OPTIONS files lose their configuration on upgrade.

HIGH

H1. kPrimaryOnly null safetyindex_builder is null; diff is truncated so not all dereference sites are verifiable. Also operational risks: sst_dump/ldb/backup-restore without custom factory fail.

H2. IndexFactory::Type() returns "IndexFactory" (was "UserDefinedIndexFactory") — breaks ObjectLibrary registration and CreateFromString.

H3. index_key_is_user_key property inconsistency — PR description says 0, implementation sets 1 for kPrimaryOnly.

SUGGESTIONS

  • Split into phased migration (enum + interface + deprecation + removal)
  • Add dual meta block key reader support for SST compatibility
  • Extract PartitionCoordinator decoupling as separate smaller PR

Full report in review-findings.md.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@zaidoon1
Copy link
Copy Markdown
Contributor Author

/claude-review

side question, can i run this command to get the review started or does it only work if a member from rocksdb team do it?

@github-actions
Copy link
Copy Markdown

✅ Claude Code Review

Requested by @zaidoon1


Code Review: Extract IndexFactory as Unified Pluggable Index Abstraction

35 files changed, 3205 insertions, 1409 deletions | Diff truncated — review covers visible portions.


Critical Findings

C1. Meta Block Prefix Change Breaks Cross-Version SST Compatibility

Severity: Critical

The prefix changes from "rocksdb.user_defined_index." to "rocksdb.index_factory.". The writer uses the new prefix, the reader searches for the new prefix. SSTs written with the old prefix won't be found by new readers, and vice versa. This breaks rolling upgrades.

Fix: Support both prefixes during lookup in block_based_table_reader.cc:

s = FindMetaBlock(meta_iter, kIndexFactoryMetaPrefix + udi_name, &handle);
if (!s.ok()) {
  s = FindMetaBlock(meta_iter, kUserDefinedIndexPrefix + udi_name, &handle);
}

High Severity Findings

H1. Verify All UserDefinedIndexFactory Implementations Are Updated

Files using old API (table_index_factory, old NewBuilder()/NewReader() signatures) include utilities/trie_index/trie_index_factory.h (TrieIndexFactory), trie_index_db_test.cc (75+ refs), trie_index_test.cc (7+ refs), table/table_test.cc (6+ refs), tools/db_bench_tool.cc. The diff is truncated — verify these are all updated.

H2. IndexFactoryOptions Default Comparator Changed to nullptr

Old UserDefinedIndexOption::comparator defaulted to BytewiseComparator(). New IndexFactoryOptions::comparator defaults to nullptr. External code relying on the default will crash.

H3. Loss of Factory Name Validation in ReadOptions

Old table_index_factory pointer allowed name validation (name_ == read_options.table_index_factory->Name()). New read_index = kCustom enum has no factory reference — no validation that the requested index type matches the SST.

H4. kPrimaryOnly Has No Recovery Path for Corrupted Custom Index

No built-in index fallback. Corrupted custom meta block → zero keys readable. Backup/restore without factory configured → unreadable DB.


Medium Severity

  • M1: Parallel compression null guards for index_builder are defensive but unreachable (custom indexes disable parallel compression). Prefer assert over silent null checks.
  • M4: Stress test doesn't exercise ReadOptions::read_index = kBuiltin override.

Suggestions

  • S1: Consider splitting into smaller PRs (API, built-in wrappers, IndexMode, cleanup).
  • S2: Add LIKELY/UNLIKELY hints to ForwardAddIndexEntryToAll() fast path.
  • S3: Verify db_bench is updated for new options.
  • S4: Add a release note for the API changes.

Full report written to review-findings.md. To the reviewer's side question: yes, anyone can trigger this review — it runs against the diff provided in the prompt.


ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@zaidoon1
Copy link
Copy Markdown
Contributor Author

@xingbowang so the main points from the AI review is about backwards compatibility. Given this api is experimental, do we need to worry about it? Not sure if you have internal use cases for it right now?

@xingbowang
Copy link
Copy Markdown
Contributor

side question, can i run this command to get the review started or does it only work if a member from rocksdb team do it?

I added you in the allow list, so it works for you as well.
Normally the review would automatically be kicked off. But we had a bug when you submit the diff, so it didn't kick off. Therefore, I manually kicked it off by leaving a comment

@xingbowang
Copy link
Copy Markdown
Contributor

We do have internal usage, but since most of the times are deployed in one binary, instead of dynamic loading, we should be able to fix that.

inline constexpr const char* kIndexFactoryMetaPrefix = "rocksdb.index_factory.";

// ============================================================================
// IndexFactory: pluggable index for BlockBasedTable SST files.
Copy link
Copy Markdown
Contributor

@xingbowang xingbowang Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an explicit comment that this API is intentionally asymmetric between build and read?

Right now IndexFactory reads as if built-in and custom indexes both go through the same abstraction on both sides. But the actual design is narrower: write/build is unified through IndexFactory, while built-in reads still use the richer internal BlockBasedTable::IndexReader path, and custom reads are adapted into that path via IndexFactoryReaderWrapper.

That seems like the right design for now, but it is not obvious from the header, and readers could easily assume builtin NewReader() returning NotSupported is an incomplete refactor rather than an intentional boundary. A short note near IndexFactoryReader / IndexFactory::NewReader() would make the design intent much clearer.

If you want suggested code-comment text, I'd use:

// NOTE: The IndexFactory API is intentionally asymmetric.
// Built-in and custom indexes share the factory abstraction for SST
// construction, but built-in index reads continue to use the internal
// BlockBasedTable::IndexReader path. That internal reader contract carries
// table-local behaviors such as cache/prefetch/pinning and iterator reuse
// that are not part of this public SPI. Custom IndexFactoryReader
// implementations are adapted to the internal reader contract via
// IndexFactoryReaderWrapper.

Comment thread include/rocksdb/table.h Outdated
// for any mode other than kBuiltinOnly.
//
// kBuiltinOnly (default):
// Only the built-in binary search index is used.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"binary search index" is not accurate, it could be other built in types.

Comment thread include/rocksdb/table.h
// - Partitioned index (kTwoLevelIndexSearch) in kPrimary/kPrimaryOnly
// - Partitioned filters in kPrimary/kPrimaryOnly
// - Parallel compression in any mode that uses a custom index
enum class IndexMode {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockBasedTableOptions::IndexMode still feels hard to read in its current form. The main issue is the value names: kSecondary / kPrimary make the reader stop and decode what is actually primary here. In database terminology, "primary index" and "secondary index" usually refer to different logical indexes over the data, so those names already carry a strong meaning for users. Here, though, the enum is not modeling multiple user-visible indexes in that sense; it is controlling which SST index implementations are built and which one is the default on reads. kPrimaryOnly also changes the physical SST layout, not just index priority, which makes the naming even less obvious. kBuiltinOnly is also a bit misleading, since the "built-in" side is not always binary search; index_type can still make it hash or partitioned.

I think this would be easier to understand if we keep the single enum, but rename the values to describe the semantics directly, e.g.:

enum class IndexMode {
  kStandardOnly,
  kStandardDefault,
  kCustomDefault,
  kCustomOnly,
};

That maps cleanly to the current behavior:

  • kStandardOnly: only the standard index is built/used
  • kStandardDefault: both indexes are built, standard is the default
  • kCustomDefault: both indexes are built, custom is the default
  • kCustomOnly: only the custom index is built

This keeps the API simple, preserves the rollout model, and makes the mode names self-describing without forcing readers to translate "primary/secondary" into actual behavior.

@xingbowang
Copy link
Copy Markdown
Contributor

Since the refactor touches existing internal indexes, could you run some benchmark to measure this refactor does not introduce performance regression. Essentially, it means run benchmark on read and flush without UDI and make sure no perf regression is observed. Please share some number.

Comment thread include/rocksdb/options.h
// kCustom: force the custom IndexFactory index for this read.
// In kSecondary mode, this is how you select the custom index
// for individual reads without changing index_mode.
enum class ReadIndex : uint8_t {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding one explicit sentence here that ReadIndex is intentionally a two-way selector because BlockBasedTable currently supports exactly two index read targets per table: one standard index selected by BlockBasedTableOptions::index_type, and at most one custom index from user_defined_index_factory. Right now that constraint is mostly implied by the API shape, but not stated clearly, so a reader could reasonably wonder why this is a fixed enum (kBuiltin / kCustom) instead of something more general like an index ID/name.

Also, the wording "built-in binary search index" is a bit misleading here, since the standard index path can still be hash or partitioned depending on index_type. If we keep the current naming, I think "standard index" would be more accurate than "built-in binary search index."

Comment thread include/rocksdb/options.h
enum class ReadIndex : uint8_t {
kDefault = 0,
kBuiltin = 1,
kCustom = 2,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the semantics here: kCustom reads more like a strict selection ("must use the custom index for this read"), but the implementation behaves more like a best-effort preference. If no custom index is available for a table/file, reads fall back to the standard index instead of treating this as an invalid request. That seems like a reasonable migration/compatibility behavior, but the current name does not make it obvious.

If that fallback is intentional, would it make sense to rename kCustom to something like kPreferCustom? That would better match the actual contract and reduce the chance that readers interpret this as a strict selector.

Comment thread include/rocksdb/index_factory.h Outdated
Comment on lines +64 to +73
// Fault injection note: the custom index meta block is vulnerable to
// metadata write fault injection (metadata_write_fault_one_in). If the
// meta block is corrupted, kPrimaryOnly has no fallback index and the
// compaction iterator reads zero keys from the affected SST. This is
// expected behavior — the standard binary search index (in kPrimary and
// below) is part of the SST's main block layout and is not affected by
// metadata write faults, providing a natural fallback. The stress tool
// disables compaction_verify_record_count for kPrimary/kPrimaryOnly
// when write fault injection is active. Without fault injection, all
// modes pass the compaction record count check correctly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this detail here. The one in stress test is good enough.

Comment thread include/rocksdb/index_factory.h Outdated
//
// Thread safety: all methods except EstimatedSize() are called from a
// single thread (the emit thread in BlockBasedTableBuilder). Parallel
// compression is not supported for custom IndexFactory implementations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the new API we provided on this interface, the limitation of not support parallel compression is just a limit for specific UDI implementation, right? If a new UDI implement some of the interface, it would be able to be supported by parallel compression.

zaidoon1 added a commit to zaidoon1/rocksdb that referenced this pull request May 1, 2026
…ntation

Address feedback from xingbowang on PR facebook#14602:

1. Rename IndexMode enum values to be self-describing:
   kBuiltinOnly  -> kStandardOnly
   kSecondary    -> kStandardDefault
   kPrimary      -> kCustomDefault
   kPrimaryOnly  -> kCustomOnly

   The new names describe behavior directly (which index is built,
   which is the default) instead of using primary/secondary terminology
   that conflicts with database index semantics.

2. Replace 'binary search index' with 'standard index' in all comments.
   The built-in index is not always binary search — it can be hash or
   partitioned depending on BlockBasedTableOptions::index_type.

3. Add asymmetric design note to IndexFactory header explaining that
   the build path is unified through IndexFactory while the read path
   uses the internal BlockBasedTable::IndexReader for built-ins and
   IndexFactoryReaderWrapper as an adapter for custom implementations.

4. Add ReadIndex rationale comment explaining why it is a fixed two-way
   enum (exactly two read targets per SST: standard + custom).

5. Document kCustom fallback behavior: when no custom index is available
   for a given SST, reads fall back to the standard index.

6. Remove fault injection implementation detail from the public
   index_factory.h header (kept in db_stress_test_base.cc where it
   belongs).

7. Clarify that parallel compression support is per-implementation:
   custom IndexFactory implementations can support it by overriding
   SupportsParallelAddEntry/PrepareAddEntry/FinishAddEntry.
zaidoon1 added 3 commits May 1, 2026 04:07
Refactors the block-based table's index subsystem to make custom indexes
first-class citizens alongside the built-in binary search index. Both
are IndexFactory subclasses at the same abstraction level, following the
FilterPolicy model where built-in implementations are proper subclasses
of the public interface.

New public API:

  include/rocksdb/index_factory.h:
    IndexFactory, IndexFactoryBuilder, IndexFactoryReader,
    IndexFactoryIterator — unified interface for all index types.

  BlockBasedTableOptions::IndexMode enum:
    kBuiltinOnly  — standard binary search only (default)
    kSecondary    — both indexes; standard primary, custom per-read
    kPrimary      — both indexes; custom primary for all reads
    kPrimaryOnly  — custom only; no standard index built

  ReadOptions::ReadIndex enum:
    kDefault  — use whatever IndexMode says
    kBuiltin  — force built-in for this read
    kCustom   — force custom index for this read

Built-in IndexFactory implementations:

  BinarySearchIndexFactory, HashIndexFactory, PartitionedIndexFactory
  wrap the existing internal IndexBuilder/IndexReader behind the public
  interface. The table builder creates the built-in index through the
  factory, same as custom indexes.

Architecture:

  BlockBasedTableBuilder manages all indexes through IndexFactoryBuilder.
  The built-in index uses a fast path (AddIndexEntryDirect) that passes
  internal keys directly to the underlying IndexBuilder, avoiding the
  user-key translation layer. Custom indexes receive user keys through
  the standard AddIndexEntry path. Zero per-block overhead for the common
  case (kBuiltinOnly with no custom index).

  PartitionCoordinator interface decouples PartitionedFilterBlockBuilder
  from the concrete PartitionedIndexBuilder type, allowing pluggable
  index implementations without leaking internal types.

  IndexFactoryReaderWrapper dispatches reads between built-in and custom
  indexes based on IndexMode and per-read ReadIndex selection.

Single-index mode (kPrimaryOnly):

  The standard index is not built — only the custom IndexFactory produces
  an index. A minimal empty block satisfies the SST footer format. The
  index_key_is_user_key property is set to 0 to match the custom index
  wrapper's internal key format.

Backward compatibility:

  user_defined_index.h provides using aliases (UserDefinedIndexFactory =
  IndexFactory, etc.). Existing code compiles without changes. The old
  use_udi_as_primary_index, skip_standard_index, and fail_if_no_udi_on_open
  booleans are replaced by the single IndexMode enum.

Performance:

  kBuiltinOnly path: zero per-block overhead (fast path passes internal
  keys directly), 2 well-predicted branches per key (~0ns), 24 bytes
  empty vector per SST. No read-path overhead (wrapper not installed).
  Factory constructed on stack (no heap allocation). BlockBasedTableOptions
  referenced by pointer (no copy).
…ntation

Address feedback from xingbowang on PR facebook#14602:

1. Rename IndexMode enum values to be self-describing:
   kBuiltinOnly  -> kStandardOnly
   kSecondary    -> kStandardDefault
   kPrimary      -> kCustomDefault
   kPrimaryOnly  -> kCustomOnly

   The new names describe behavior directly (which index is built,
   which is the default) instead of using primary/secondary terminology
   that conflicts with database index semantics.

2. Replace 'binary search index' with 'standard index' in all comments.
   The built-in index is not always binary search — it can be hash or
   partitioned depending on BlockBasedTableOptions::index_type.

3. Add asymmetric design note to IndexFactory header explaining that
   the build path is unified through IndexFactory while the read path
   uses the internal BlockBasedTable::IndexReader for built-ins and
   IndexFactoryReaderWrapper as an adapter for custom implementations.

4. Add ReadIndex rationale comment explaining why it is a fixed two-way
   enum (exactly two read targets per SST: standard + custom).

5. Document kCustom fallback behavior: when no custom index is available
   for a given SST, reads fall back to the standard index.

6. Remove fault injection implementation detail from the public
   index_factory.h header (kept in db_stress_test_base.cc where it
   belongs).

7. Clarify that parallel compression support is per-implementation:
   custom IndexFactory implementations can support it by overriding
   SupportsParallelAddEntry/PrepareAddEntry/FinishAddEntry.
Wrap long lines exceeding 80-column limit introduced by the longer
IndexMode enum names (kStandardOnly/kStandardDefault/kCustomDefault/
kCustomOnly). No functional change.
@zaidoon1 zaidoon1 force-pushed the zaidoon/UDI-refactor branch from a4b25e7 to 9132665 Compare May 1, 2026 08:08
@zaidoon1
Copy link
Copy Markdown
Contributor Author

zaidoon1 commented May 1, 2026

Since the refactor touches existing internal indexes, could you run some benchmark to measure this refactor does not introduce performance regression. Essentially, it means run benchmark on read and flush without UDI and make sure no perf regression is observed. Please share some number.

I've addressed all the feedback, working on the benchmarks now.

@zaidoon1
Copy link
Copy Markdown
Contributor Author

zaidoon1 commented May 1, 2026

Benchmark Results: No Regression Observed

Ran db_bench comparing PR branch (a4b25e7a4) vs merge-base (bad2d5b0a) without UDI (default options = kStandardOnly mode).

Setup

  • 10M keys, 16B key, 100B value, Snappy compression, 1 thread
  • Apple M-series, 14 cores
  • Fresh DB per run, release build (DEBUG_LEVEL=0)
  • Both binaries built with identical flags

Write workloads (flush path — most affected by this refactor)

Benchmark N Baseline (ops/s) PR (ops/s) Δ median
fillseq 3 641,557 642,000 +0.07%
fillrandom 3 372,002 368,696 -0.89%
overwrite 3 369,388 373,713 +1.17%

Read workloads

Two DB states tested for reads:

Post-overwrite (LSM has L0 churn):

Benchmark N Baseline (ops/s) PR (ops/s) Δ median
readrandom 3 121,017 123,444 +2.01%
readseq 3 8,497,981 8,357,360 -1.65%

Post-compact (clean LSM):

Benchmark N Baseline (ops/s) PR (ops/s) Δ median
readrandom 5 372,883 368,647 -1.14%
readseq 5 8,999,199 8,943,983 -0.61%

Conclusion

All deltas (using median, robust to outliers) are within ±2%, well inside benchmark noise (per-benchmark stdev was 1–5% on this hardware). No regression observed in either flush or read paths when UDI is not configured.

This matches expectations: the refactor's hot path for kStandardOnly mode goes through BuiltinIndexFactoryBuilder::AddIndexEntryDirect() which forwards directly to the same internal IndexBuilder as before, with no extra abstraction overhead. Read paths for the standard built-in index continue to use the existing BlockBasedTable::IndexReader path unchanged.

@zaidoon1
Copy link
Copy Markdown
Contributor Author

zaidoon1 commented May 1, 2026

Reproducing the Benchmarks

Build (both versions)

# Baseline (merge-base of this PR with upstream/main)
git checkout bad2d5b0a
make clean
DEBUG_LEVEL=0 make -j$(nproc) db_bench
mv db_bench /tmp/db_bench_baseline

# PR branch
git checkout a4b25e7a4
make clean
DEBUG_LEVEL=0 make -j$(nproc) db_bench
mv db_bench /tmp/db_bench_pr

fillseq (write-only flush benchmark)

$binary \
  --benchmarks=fillseq \
  --db=/tmp/bench_db \
  --num=10000000 \
  --value_size=100 \
  --key_size=16 \
  --seed=42 \
  --threads=1 \
  --use_existing_db=0

Combined: write + compact + read benchmarks

$binary \
  --benchmarks="fillrandom,compact,overwrite,readrandom,readseq" \
  --db=/tmp/bench_db \
  --num=10000000 \
  --reads=1000000 \
  --value_size=100 \
  --key_size=16 \
  --seed=42 \
  --threads=1 \
  --use_existing_db=0

This benchmark sequence:

  1. fillrandom — write 10M random keys (exercises memtable + flush path)
  2. compact — full manual compaction (settles LSM into clean state)
  3. overwrite — overwrite all 10M keys (exercises flush + L0 churn)
  4. readrandom — 1M random point lookups (exercises index seek path)
  5. readseq — sequential scan of 1M keys (exercises iterator)

The readrandom/readseq numbers in the "post-overwrite (LSM has L0 churn)" table come from this combined run.

Focused read benchmark (post-compact, no overwrite)

$binary \
  --benchmarks="fillrandom,compact,readrandom,readseq" \
  --db=/tmp/bench_db \
  --num=10000000 \
  --reads=1000000 \
  --value_size=100 \
  --key_size=16 \
  --seed=42 \
  --threads=1 \
  --use_existing_db=0

The readrandom/readseq numbers in the "post-compact (clean LSM)" table come from this run, with iteration order alternated between baseline/PR to reduce systematic bias from cold-cache / system warm-up effects.

Iterations

  • 3 iterations for write workloads (fillseq, fillrandom, overwrite)
  • 3 iterations for post-overwrite reads (from combined run)
  • 5 iterations for post-compact reads (focused run, alternating order)

DB path was wiped (rm -rf) between every run to ensure a fresh starting state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants