Skip to content

Support ExternalTable PinnableSlice Get#14497

Closed
joshkang97 wants to merge 4 commits into
facebook:mainfrom
joshkang97:external_table_get
Closed

Support ExternalTable PinnableSlice Get#14497
joshkang97 wants to merge 4 commits into
facebook:mainfrom
joshkang97:external_table_get

Conversation

@joshkang97
Copy link
Copy Markdown
Contributor

@joshkang97 joshkang97 commented Mar 23, 2026

Summary

The ExternalTableReader Get API has been modified to use PinnableSlice instead of std::string, this will allow implementations to utilize zero-copy Gets (e.g. reading from mmap or a cache). This is not a compatible change, but the API is marked as experimental, so should be allowed.

Test Plan

  • New unit tests

@joshkang97 joshkang97 changed the title Support External Table PinnableSlice Get Support ExternalTable/SstFileReader PinnableSlice Get Mar 23, 2026
@joshkang97 joshkang97 marked this pull request as ready for review March 23, 2026 17:21
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 23, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D97782011.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 23, 2026

✅ clang-tidy: No findings on changed lines

Completed in 348.4s.

@joshkang97 joshkang97 changed the title Support ExternalTable/SstFileReader PinnableSlice Get Support ExternalTable PinnableSlice Get Mar 23, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 23, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D97782011.

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 23, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D97782011.

@github-actions
Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 51d4067


Now I have all the information needed. Here's the review:


Summary

This PR changes ExternalTableReader::Get and MultiGet to use PinnableSlice instead of std::string, enabling zero-copy pinning. It also implements the previously-stubbed ExternalTableReaderAdapter::Get() that bridges the external reader's Get to RocksDB's internal GetContext::SaveValue. The change is reasonable for an experimental API, but the adapter implementation has a correctness issue.

Issues Found

🔴 Critical

1. ExternalTableReaderAdapter::Get passes a local PinnableSlice as value_pinner to SaveValue, but the local is destroyed at function exit while GetContext may hold a pointer to it.

table/external_table.cc:253-260 (new code)

Perspective 1 (Call-Chain Analysis): GetContext::SaveValue with a non-null value_pinner calls pinnable_val_->PinSlice(value_to_use, value_pinner), which calls value_pinner->DelegateCleanupsTo(this). This transfers cleanup ownership but the data that PinnableSlice points to is the Slice value — which in the pinned case points into the local PinnableSlice value's backing memory. When value goes out of scope at the end of Get(), the pinned data pointer becomes dangling.

Specifically, the code does:

PinnableSlice value;
s = reader_->Get(read_options, parsed_key.user_key, prefix_extractor, &value);
// ...
get_context->SaveValue(found_key, value, &matched, &s,
                       value.IsPinned() ? &value : nullptr);

When the external reader pins data (e.g., from mmap or cache), value.IsPinned() is true, so &value is passed as value_pinner. SaveValue will call get_context->pinnable_val_->PinSlice(value, &value) which delegates cleanups from value to the caller's PinnableSlice and sets data_ to point at the pinned data. The cleanups are delegated correctly, but the issue is: once this function returns, the local PinnableSlice value is destroyed. Its destructor invokes Reset() which calls Cleanable::Reset() — but cleanups were already delegated, so that's fine. The pinned data pointer should still be valid since it points to the external reader's backing store, not the local PinnableSlice's buffer.

Wait — on closer re-examination: The DelegateCleanupsTo transfers the cleanup chain, and the data pointer in the caller's PinnableSlice points to the same address as the external reader's data (which is owned by the external reader, e.g., its internal pinned_data_ map). The local PinnableSlice value is only a conduit. This is actually correct — the cleanup function is what keeps the data alive, and it was transferred. Let me downgrade this.

Actually this is correct. I'll remove this as a critical issue.


🟡 Suggestion

1. Missing ExternalTableReaderAdapter::MultiGet override — the default TableReader::MultiGet now works but could be optimized.

table/external_table.cc — no MultiGet override exists.

Perspective 3 (Cross-Component): The default TableReader::MultiGet (in table/table_reader.h:149-157) iterates over the range and calls Get() for each key. Since Get() is now implemented, MultiGet works end-to-end (confirmed by the PinnedGetTest using db_->MultiGet). However, this means each key goes through ParseInternalKeyreader_->Get()SaveValue individually. An override could batch-call ExternalTableReader::MultiGet to leverage any batching optimizations in the external implementation.

Suggested fix: Consider adding a MultiGet override in ExternalTableReaderAdapter that calls reader_->MultiGet() in a batch and then calls SaveValue for each found result.

2. pin_cleanup_count_ is not atomic — potential issue if tests are extended to concurrent scenarios.

table/table_test.cc:7121int pin_cleanup_count_ = 0;

Perspective 2 (Correctness): This counter is incremented in PinCleanup which is called during PinnableSlice::Reset(). Currently this is single-threaded in tests, but if ever used in a concurrent context, it would be a data race. A minor nitpick for test-only code.

3. SstReaderTest changes add MultiGet test coverage but use SstFileReader::MultiGet which takes std::vector<std::string>* — this doesn't exercise the PinnableSlice pinning path through GetContext.

table/table_test.cc:7321-7331 (diff)

Perspective 5 (Testing): The SstFileReader::MultiGet converts results from PinnableSlice back to std::string (at sst_file_reader.cc:149), so the zero-copy benefit is not tested here. The PinnedGetTest properly exercises the pin path through db_->Get and db_->MultiGet, which is good.

4. SstReaderTest now uses a different DummyExternalTableFactory with support_property_block=false which doesn't implement Get, only iterator access — the test expansion with MultiGet relies on the reader's Get() working.

table/table_test.cc:7295-7331 (diff for SstReaderTest)

Perspective 1 (Call-Chain): Wait — the SstReaderTest uses DummyExternalTableFactory with support_property_block=false, not true. Looking at the diff, the original test had support_property_block=false. But the new test code calls reader->MultiGet(ReadOptions(), keys, &values) which is SstFileReader::MultiGet. This calls r->table_reader->MultiGet() on the adapter, which calls ExternalTableReaderAdapter::Get() (the newly implemented one), which calls reader_->Get() (the DummyExternalTableReader::Get). The DummyExternalTableReader does implement Get regardless of support_property_block. So this is fine.

However, the diff changes support_property_block from false to unchanged (false is already the value in the SstReaderTest on current main). Let me re-check... Looking at the diff more carefully: the SstReaderTest in the diff still uses /*support_property_block=*/false. This is correct because the test works.

🟢 Nitpick

1. The PinnedDummyExternalTableReader cleanup count test creates a tight coupling between the test and internal implementation details.

table/table_test.cc:7375-7376

The assertion ASSERT_EQ(factory->last_reader()->pin_cleanup_count(), 4) at the end counts 2 from Get + 2 from MultiGet (key1 and key2). This is fragile if the internal pinning strategy changes. A comment explaining the expected count would help.

2. The last_reader_ raw pointer in PinnedDummyExternalTableFactory is never nulled after the reader is potentially destroyed.

table/table_test.cc:7223mutable PinnedDummyExternalTableReader* last_reader_ = nullptr;

This is a dangling pointer risk if the reader is destroyed before the factory. In the current test it's safe because the DB holds the reader alive, but the pattern is fragile.

3. Consider adding a forward declaration or #include for PinnableSlice in external_table.h.

While PinnableSlice is transitively included (via advanced_iterator.hslice.h), relying on transitive includes is brittle. An explicit #include "rocksdb/slice.h" or at least a comment would be more robust.

Cross-Component Analysis

Context Assessment
WritePreparedTxnDB OK — read_callback_ visibility handled by GetContext upstream, not in this adapter
ReadOnly DB / SecondaryInstance OK — External tables live in bottommost level, read path unchanged
BlobDB OK — External tables only support kTypeValue; the adapter constructs ParsedInternalKey with kTypeValue
User-defined timestamps Potential gap — The adapter hardcodes ParsedInternalKey(parsed_key.user_key, 0, ValueType::kTypeValue) with sequence 0. This should work because external tables are at the bottom with seq 0, but timestamp handling is not tested
Merge operations Not supported — If a merge exists in memtable for a key in an external table, the adapter returns kTypeValue with seq 0. GetContext will handle this correctly (merge with base value). This is fine.
FIFO / Universal compaction OK — External tables are bottommost-only
Concurrent writers OK — The adapter's Get is stateless, no shared mutable state

Positive Observations

  • Good separation of concerns: the adapter bridges PinnableSlice pinning semantics correctly by conditionally passing &value as value_pinner only when value.IsPinned().
  • The PinnedGetTest is well-designed — it verifies both the pinning (IsPinned()) and cleanup semantics (pin_cleanup_count).
  • The NotFound handling in the adapter (returning Status::OK() when the external reader returns NotFound) correctly follows the GetContext convention where absence is signaled by not calling SaveValue.
  • Proper release note in unreleased_history/.
  • The change to SstReaderTest adds valuable MultiGet coverage that was missing before.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 24, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D97782011.

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented Mar 24, 2026

@joshkang97 merged this pull request in 304ae71.

doxtop pushed a commit to flyingw/rocksdb that referenced this pull request Apr 7, 2026
Summary:
The ExternalTableReader `Get` API has been modified to use PinnableSlice instead of std::string, this will allow implementations to utilize zero-copy Gets (e.g. reading from mmap or a cache). This is not a compatible change, but the API is marked as experimental, so should be allowed.

Pull Request resolved: facebook#14497

Test Plan: - New unit tests

Reviewed By: xingbowang

Differential Revision: D97782011

Pulled By: joshkang97

fbshipit-source-id: 8a9e8c5bc5ff5e8dee6c0f2ee745521f09042cef
joshkang97 added a commit that referenced this pull request Apr 10, 2026
Summary:
The ExternalTableReader `Get` API has been modified to use PinnableSlice instead of std::string, this will allow implementations to utilize zero-copy Gets (e.g. reading from mmap or a cache). This is not a compatible change, but the API is marked as experimental, so should be allowed.

Pull Request resolved: #14497

Test Plan: - New unit tests

Reviewed By: xingbowang

Differential Revision: D97782011

Pulled By: joshkang97

fbshipit-source-id: 8a9e8c5bc5ff5e8dee6c0f2ee745521f09042cef
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants