Skip to content

batch read catalog#464

Merged
lokax merged 9 commits intoeloqdata:mainfrom
lokax:yf-catalog-slow
Feb 28, 2026
Merged

batch read catalog#464
lokax merged 9 commits intoeloqdata:mainfrom
lokax:yf-catalog-slow

Conversation

@lokax
Copy link
Collaborator

@lokax lokax commented Feb 26, 2026

Summary by CodeRabbit

  • Refactor
    • Improved catalog retrieval to use batched prefetching and a bulk catalog-read path, reducing initialization latency and making catalog iteration faster and more efficient.
  • Chores
    • Updated an internal module reference for the data substrate.

Copilot AI review requested due to automatic review settings February 26, 2026 06:17
@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

📥 Commits

Reviewing files that changed from the base of the PR and between f086fbf and be48daa.

📒 Files selected for processing (4)
  • src/mongo/db/modules/eloq/data_substrate
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
  • src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
  • src/mongo/db/modules/eloq/src/eloq_recovery_unit.h
 _________________________________________
< Here's Johnny! Ready to axe those bugs. >
 -----------------------------------------
  \
   \   (\__/)
       (•ㅅ•)
       /   づ

✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.

Walkthrough

This PR updates the eloq data_substrate submodule pointer, adds EloqRecoveryUnit::batchReadCatalog to perform batched catalog reads, and refactors EloqCatalogRecordStoreCursor to prefetch catalog metadata in bulk during construction and consume from an internal prefetch cache.

Changes

Cohort / File(s) Summary
Submodule Reference
src/mongo/db/modules/eloq/data_substrate
Updated submodule pointer from commit 4b74b90098f7... to 7a52f73c01c5... (pointer-only change).
Cursor Prefetch Implementation
src/mongo/db/modules/eloq/src/eloq_record_store.cpp
Refactored EloqCatalogRecordStoreCursor to perform a bulk batchReadCatalog during construction, store results in _prefetched with _prefetchIndex, and have next() consume from the prefetch cache instead of per-record catalog reads. Removed per-entry on-demand catalog read logic.
Recovery Unit Batch Read API
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h, src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
Added void batchReadCatalog(OperationContext*, const std::vector<std::string>&, std::vector<std::pair<bool, txservice::CatalogRecord>>* out) which builds CatalogKey entries, issues a BatchReadTxRequest marked as a catalog batch to the tx service, waits for results, asserts success, and returns per-name (exists, CatalogRecord) pairs in input order.

Sequence Diagram(s)

sequenceDiagram
    participant Cursor as EloqCatalogRecordStoreCursor
    participant RecoveryUnit as EloqRecoveryUnit
    participant TxService as TransactionService
    participant Cache as PrefetchCache

    Cursor->>Cursor: ctor(tableNameVector)
    Cursor->>RecoveryUnit: batchReadCatalog(tableNameVector)
    RecoveryUnit->>RecoveryUnit: build CatalogKey entries
    RecoveryUnit->>TxService: submit BatchReadTxRequest (catalog batch)
    TxService->>TxService: fetch multiple catalog entries
    TxService-->>RecoveryUnit: return batch results
    RecoveryUnit->>Cache: populate _prefetched (exists, CatalogRecord) pairs
    RecoveryUnit-->>Cursor: ctor returns (prefetch ready)

    Note over Cursor,Cache: iteration
    Cursor->>Cache: next() reads _prefetched[_prefetchIndex]
    Cache-->>Cursor: return Record (prefetched metadata)
    Cursor->>Cursor: increment _prefetchIndex
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

Suggested Reviewers

  • xiexiaoy
  • lzxddz
  • liunyl

Poem

🐰 With twitching ears I sniff the cache,
I gather catalogs in one swift dash.
No hop-by-hop reads to slow my race,
A prefetch heap keeps pace apace.
Hooray—one batch, a happy trace! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'batch read catalog' directly and clearly summarizes the main change: introducing batch catalog reading functionality across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a batch catalog read path and updates the catalog record store cursor to prefetch catalog metadata in bulk.

Changes:

  • Added EloqRecoveryUnit::batchReadCatalog() API and implementation using a batch read TX request.
  • Updated EloqCatalogRecordStoreCursor to prefetch catalog schema metadata for all tables up front and iterate over prefetched results.
  • Updated the data_substrate submodule revision.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h Declares new batch catalog read API with doc comment
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp Implements batch catalog read via BatchReadTxRequest
src/mongo/db/modules/eloq/src/eloq_record_store.cpp Switches catalog cursor to use batch prefetch instead of per-table reads
src/mongo/db/modules/eloq/data_substrate Bumps submodule to pick up batch read support dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

records.reserve(tableNames.size());
for (const std::string& name : tableNames) {
keys.emplace_back(txservice::TableName(std::string_view(name),
txservice::TableType::RangePartition,
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructs txservice::TableName with TableType::RangePartition, but the previous per-table path in EloqCatalogRecordStoreCursor used TableType::Primary. If the catalog entries are keyed by the table type, this will cause misses or reading the wrong catalog records. Use the same TableType as the existing single-read path (or derive the correct type from the input list) so batch reads address the same catalog keys.

Suggested change
txservice::TableType::RangePartition,
txservice::TableType::Primary,

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +114
/**
* Batch read catalog for multiple tables. out[i] corresponds to tableNames[i]:
* (true, record) if exists, else (false, empty). Uses readCatalog per table
* (serial); Phase 3 may switch to BatchReadCatalogTxRequest for concurrent read.
*/
void batchReadCatalog(OperationContext* opCtx,
const std::vector<std::string>& tableNames,
std::vector<std::pair<bool, txservice::CatalogRecord>>* out);
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this method uses readCatalog serially and may later switch to a batch request, but the implementation already uses txservice::BatchReadTxRequest. Please update the comment to match the current behavior (and name the actual request type used) to avoid misleading future maintainers.

Copilot uses AI. Check for mistakes.
Comment on lines +385 to +397
txservice::BatchReadTxRequest req(&txservice::catalog_ccm_name,
0,
read_batch,
false,
false,
false,
coro.yieldFuncPtr,
coro.resumeFuncPtr,
_txm,
false,
0,
false,
true); // is_catalog_batch
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor call has many positional bool/numeric arguments (several false and 0), which is hard to read and easy to accidentally break when the API changes. Consider introducing clearly named local variables for each flag/value (or a small options struct/builder if available in txservice) and/or adding inline comments per argument so the intent of each parameter is explicit.

Suggested change
txservice::BatchReadTxRequest req(&txservice::catalog_ccm_name,
0,
read_batch,
false,
false,
false,
coro.yieldFuncPtr,
coro.resumeFuncPtr,
_txm,
false,
0,
false,
true); // is_catalog_batch
const auto* ccmName = &txservice::catalog_ccm_name;
const int64_t txPriority = 0;
const bool enableStrongConsistency = false;
const bool enableSnapshotRead = false;
const bool enableTracing = false;
auto yieldFunc = coro.yieldFuncPtr;
auto resumeFunc = coro.resumeFuncPtr;
auto* txManager = _txm;
const bool allowDirtyRead = false;
const int64_t timeoutMs = 0;
const bool allowPartialResult = false;
const bool isCatalogBatch = true; // is_catalog_batch
txservice::BatchReadTxRequest req(ccmName,
txPriority,
read_batch,
enableStrongConsistency,
enableSnapshotRead,
enableTracing,
yieldFunc,
resumeFunc,
txManager,
allowDirtyRead,
timeoutMs,
allowPartialResult,
isCatalogBatch);

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)

355-406: LGTM! Batch catalog read implementation is correct.

The function correctly manages the lifetime of local vectors (keys, records) that back the TxKey pointers in read_batch, and properly waits for completion before extracting results.

Minor style note: Consider using [[maybe_unused]] instead of (void)opCtx for unused parameter suppression, which is more idiomatic in modern C++:

 void EloqRecoveryUnit::batchReadCatalog(
-    OperationContext* opCtx,
+    [[maybe_unused]] OperationContext* opCtx,
     const std::vector<std::string>& tableNames,
     std::vector<std::pair<bool, txservice::CatalogRecord>>* out) {
-    (void)opCtx;

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp` around lines 355 - 406,
Replace the C-style unused-parameter suppression in
EloqRecoveryUnit::batchReadCatalog by marking the parameter as unused with the
modern C++ attribute: remove the (void)opCtx statement and annotate the opCtx
parameter with [[maybe_unused]] (or if you prefer, use [[maybe_unused]] on a
local alias) so the compiler knows opCtx is intentionally unused; update the
function signature reference to opCtx accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/mongo/db/modules/eloq/src/eloq_recovery_unit.h`:
- Around line 107-114: The doc comment for batchReadCatalog is outdated: update
the description to state that batchReadCatalog performs a batched catalog read
using BatchReadTxRequest with is_catalog_batch=true (not serial per-table
readCatalog calls), and mention that Phase 3 may adjust concurrency if needed;
also either remove the unused opCtx parameter from the signature or explicitly
document why it is unused (e.g., add a short inline comment noting "(void)opCtx;
// kept for API compatibility") so callers/readers understand the
intentionality; reference the batchReadCatalog method, the readCatalog mention
in the comment, and the use of BatchReadTxRequest/is_catalog_batch in the
implementation when making these edits.

---

Nitpick comments:
In `@src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp`:
- Around line 355-406: Replace the C-style unused-parameter suppression in
EloqRecoveryUnit::batchReadCatalog by marking the parameter as unused with the
modern C++ attribute: remove the (void)opCtx statement and annotate the opCtx
parameter with [[maybe_unused]] (or if you prefer, use [[maybe_unused]] on a
local alias) so the compiler knows opCtx is intentionally unused; update the
function signature reference to opCtx accordingly.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72b1461 and f149ac4.

📒 Files selected for processing (4)
  • src/mongo/db/modules/eloq/data_substrate
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
  • src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp
  • src/mongo/db/modules/eloq/src/eloq_recovery_unit.h

Comment on lines +107 to +114
/**
* Batch read catalog for multiple tables. out[i] corresponds to tableNames[i]:
* (true, record) if exists, else (false, empty). Uses readCatalog per table
* (serial); Phase 3 may switch to BatchReadCatalogTxRequest for concurrent read.
*/
void batchReadCatalog(OperationContext* opCtx,
const std::vector<std::string>& tableNames,
std::vector<std::pair<bool, txservice::CatalogRecord>>* out);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation is inconsistent with implementation.

The comment states "Uses readCatalog per table (serial); Phase 3 may switch to BatchReadCatalogTxRequest for concurrent read." However, the implementation already uses BatchReadTxRequest with is_catalog_batch=true for batch reading. Please update the comment to reflect the actual implementation.

Also, note that opCtx is explicitly unused in the implementation ((void)opCtx;). If this is intentional for API consistency, consider adding a comment. Otherwise, it could be removed.

📝 Suggested documentation fix
     /**
      * Batch read catalog for multiple tables. out[i] corresponds to tableNames[i]:
-     * (true, record) if exists, else (false, empty). Uses readCatalog per table
-     * (serial); Phase 3 may switch to BatchReadCatalogTxRequest for concurrent read.
+     * (true, record) if exists, else (false, empty). Uses BatchReadTxRequest for
+     * concurrent batch catalog reads.
      */
     void batchReadCatalog(OperationContext* opCtx,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* Batch read catalog for multiple tables. out[i] corresponds to tableNames[i]:
* (true, record) if exists, else (false, empty). Uses readCatalog per table
* (serial); Phase 3 may switch to BatchReadCatalogTxRequest for concurrent read.
*/
void batchReadCatalog(OperationContext* opCtx,
const std::vector<std::string>& tableNames,
std::vector<std::pair<bool, txservice::CatalogRecord>>* out);
/**
* Batch read catalog for multiple tables. out[i] corresponds to tableNames[i]:
* (true, record) if exists, else (false, empty). Uses BatchReadTxRequest for
* concurrent batch catalog reads.
*/
void batchReadCatalog(OperationContext* opCtx,
const std::vector<std::string>& tableNames,
std::vector<std::pair<bool, txservice::CatalogRecord>>* out);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mongo/db/modules/eloq/src/eloq_recovery_unit.h` around lines 107 - 114,
The doc comment for batchReadCatalog is outdated: update the description to
state that batchReadCatalog performs a batched catalog read using
BatchReadTxRequest with is_catalog_batch=true (not serial per-table readCatalog
calls), and mention that Phase 3 may adjust concurrency if needed; also either
remove the unused opCtx parameter from the signature or explicitly document why
it is unused (e.g., add a short inline comment noting "(void)opCtx; // kept for
API compatibility") so callers/readers understand the intentionality; reference
the batchReadCatalog method, the readCatalog mention in the comment, and the use
of BatchReadTxRequest/is_catalog_batch in the implementation when making these
edits.

@lokax lokax removed the trigger-ci label Feb 28, 2026
@lokax lokax merged commit f21afdb into eloqdata:main Feb 28, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants