Remove the retry in EloqCatalogRecordStore::insertRecord() on failed to create table. by lzxddz · Pull Request #222 · eloqdata/eloqdoc

lzxddz · 2025-09-29T07:24:09Z

If the TxExecution executed UpsertTableTxRequest, it should be re-used and must abort or commit the EloqRecoveryUnit.

So, we should do retry at the higher level of calling EloqCatalogRecordStore::insertRecord() if create table failed instead of retrying in function.

Summary by CodeRabbit

Bug Fixes
- Improves reliability of catalog and table creation under contention with clearer, actionable error statuses.
- Adds retry logic for marking feature usage, reducing spurious failures during collection setup.
- Ensures status checks prevent unexpected terminations during catalog updates.
Refactor
- APIs now return explicit statuses for feature tracking and catalog updates, enabling better error propagation.
- Introduces a write-context hint for record lookups to optimize discovery paths.
- Enhances diagnostic logging for catalog operations (including transaction context) to aid troubleshooting.

…insertRecord() on failed to create table

coderabbitai · 2025-09-29T07:24:17Z

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly and accurately conveys the removal of retry logic in EloqCatalogRecordStore::insertRecord when table creation fails, which is the primary change of the pull request. It is clear, specific, and free of extraneous details, enabling a reviewer to understand the main update at a glance.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/mongo/db/storage/kv/kv_catalog.cpp (2)

240-245: Inconsistent error handling in markNonRepairableFeatureAsNotInUse.

While markNonRepairableFeatureAsInUse and markRepairableFeatureAsInUse now return Status, this function still returns void and ignores the Status from putInfo on line 244. This creates an inconsistent API where some feature-marking operations can report errors while others cannot.

Consider updating this function to also return Status for consistency:

-void KVCatalog::FeatureTracker::markNonRepairableFeatureAsNotInUse(OperationContext* opCtx,
-                                                                   NonRepairableFeature feature) {
+Status KVCatalog::FeatureTracker::markNonRepairableFeatureAsNotInUse(OperationContext* opCtx,
+                                                                      NonRepairableFeature feature) {
     FeatureBits versionInfo = getInfo(opCtx);
     versionInfo.nonRepairableFeatures &= ~static_cast<NonRepairableFeatureMask>(feature);
-    putInfo(opCtx, versionInfo);
+    return putInfo(opCtx, versionInfo);
 }

260-265: Inconsistent error handling in markRepairableFeatureAsNotInUse.

Similar to markNonRepairableFeatureAsNotInUse, this function still returns void while its counterpart markRepairableFeatureAsInUse now returns Status. The Status from putInfo is being ignored on line 264.

Update for consistency:

-void KVCatalog::FeatureTracker::markRepairableFeatureAsNotInUse(OperationContext* opCtx,
-                                                                RepairableFeature feature) {
+Status KVCatalog::FeatureTracker::markRepairableFeatureAsNotInUse(OperationContext* opCtx,
+                                                                   RepairableFeature feature) {
     FeatureBits versionInfo = getInfo(opCtx);
     versionInfo.repairableFeatures &= ~static_cast<RepairableFeatureMask>(feature);
-    putInfo(opCtx, versionInfo);
+    return putInfo(opCtx, versionInfo);
 }

🧹 Nitpick comments (5)

src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)

538-539: Minor: unify logging style

Nit: consider a consistent prefix, e.g., EloqRecoveryUnit::createTable txNumber=..., table=... for easier grep.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

321-327: Consider improving the error message to indicate retry handling.

The error message should clarify that retries are expected to be handled by the caller, not that the operation failed permanently.

     auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
     if (errorCode != txservice::TxErrorCode::NO_ERROR) {
         MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
                         "may do DDL on the same table.";
         return {ErrorCodes::InternalError,
-                "[Create Table] Another transaction may do DDL on the same table"};
+                "[Create Table] DDL contention detected - caller should retry"};

src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp (2)

207-223: Consider adding logging for retry attempts and capping the backoff duration.

The retry loop correctly implements the higher-level retry pattern as per PR objectives. However, consider:

Adding debug logging to track retry attempts for troubleshooting
Capping the backoff to avoid excessive delays (current max is 10ms)

             int retryCount = 0;
             const int maxRetry = 10;
             Status tmp_st = Status::OK();
             while (retryCount++ < maxRetry) {
                 newRU->beginUnitOfWork(opCtx);
                 if (!_catalog->getFeatureTracker()->isRepairableFeatureInUse(opCtx, feature)) {
                     tmp_st =
                         _catalog->getFeatureTracker()->markRepairableFeatureAsInUse(opCtx, feature);
                     if (tmp_st.isOK()) {
                         newRU->commitUnitOfWork();
                         break;
                     }
                 }
                 newRU->abortUnitOfWork();
-                opCtx->sleepFor(retryCount * Milliseconds{1});
+                if (retryCount > 1) {
+                    MONGO_LOG(1) << "Retrying markRepairableFeatureAsInUse, attempt " << retryCount;
+                }
+                opCtx->sleepFor(Milliseconds{std::min(retryCount, 5)});
             }

240-256: Extract the retry logic into a reusable helper to eliminate duplication.

This retry pattern is identical to the one above (lines 207-223). Consider extracting it into a helper function or lambda to reduce code duplication and ensure consistent behavior.

+        auto retryFeatureUpdate = [&](auto featureUpdateFunc) -> Status {
+            int retryCount = 0;
+            const int maxRetry = 10;
+            Status status = Status::OK();
+            while (retryCount++ < maxRetry) {
+                newRU->beginUnitOfWork(opCtx);
+                status = featureUpdateFunc();
+                if (status.isOK()) {
+                    newRU->commitUnitOfWork();
+                    break;
+                }
+                newRU->abortUnitOfWork();
+                if (retryCount > 1) {
+                    MONGO_LOG(1) << "Retrying feature update, attempt " << retryCount;
+                }
+                opCtx->sleepFor(Milliseconds{std::min(retryCount, 5)});
+            }
+            return status;
+        };
+
             int retryCount = 0;
             const int maxRetry = 10;
             Status tmp_st = Status::OK();
-            while (retryCount++ < maxRetry) {
-                newRU->beginUnitOfWork(opCtx);
-                if (!_catalog->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, feature)) {
-                    tmp_st = _catalog->getFeatureTracker()->markNonRepairableFeatureAsInUse(
-                        opCtx, feature);
-                    if (tmp_st.isOK()) {
-                        newRU->commitUnitOfWork();
-                        break;
-                    }
-                }
-                newRU->abortUnitOfWork();
-                opCtx->sleepFor(retryCount * Milliseconds{1});
-            }
+            tmp_st = retryFeatureUpdate([&]() -> Status {
+                if (!_catalog->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, feature)) {
+                    return _catalog->getFeatureTracker()->markNonRepairableFeatureAsInUse(
+                        opCtx, feature);
+                }
+                return Status::OK();
+            });
             fassert(70004, tmp_st);

Then apply the same helper to the RepairableFeature retry loop above.

src/mongo/db/storage/kv/kv_catalog.cpp (1)

301-337: Clarify findRecord’s isForWrite parameter and the diagnostic log
The fourth argument to findRecord is isForWrite, not an upsert flag—passing true here is correct for write operations. However, the MONGO_LOG call still prints opCtx->isUpsert() (which will always be false in this context) and conflates unrelated semantics. Change the log to reflect the isForWrite argument, for example:
MONGO_LOG(1) << "FeatureTracker::putInfo: findRecord isForWrite=" << true;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76c626b and dafe3c6.

📒 Files selected for processing (10)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (3 hunks)
src/mongo/db/modules/eloq/src/eloq_record_store.h (1 hunks)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (5 hunks)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (1 hunks)
src/mongo/db/modules/eloq/tx_service (1 hunks)
src/mongo/db/storage/kv/kv_catalog.cpp (4 hunks)
src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (3 hunks)
src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp (2 hunks)
src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp (1 hunks)
src/mongo/db/storage/record_store.h (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (3)

src/mongo/db/modules/eloq/src/base/eloq_util.h (3)

txservice (41-41)

TableName (77-83)

TableName (86-121)

src/mongo/db/modules/eloq/src/eloq_record_store.h (1)

TableName (203-205)

src/mongo/db/storage/record_store.h (3)

bool (313-343)

bool (359-368)

bool (370-375)

src/mongo/db/modules/eloq/src/eloq_record_store.h (1)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (16)

findRecord (178-182)

findRecord (178-180)

findRecord (184-219)

findRecord (184-187)

findRecord (741-772)

findRecord (741-743)

opCtx (142-144)

opCtx (142-142)

opCtx (460-469)

opCtx (460-460)

opCtx (575-580)

opCtx (575-575)

id (123-125)

id (123-123)

id (508-544)

id (508-508)

src/mongo/db/storage/record_store.h (1)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (12)

findRecord (178-182)

findRecord (178-180)

findRecord (184-219)

findRecord (184-187)

findRecord (741-772)

findRecord (741-743)

opCtx (142-144)

opCtx (142-142)

opCtx (460-469)

opCtx (460-460)

opCtx (575-580)

opCtx (575-575)

src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (1)

src/mongo/db/storage/kv/kv_catalog.cpp (8)

markNonRepairableFeatureAsInUse (233-238)

markNonRepairableFeatureAsInUse (233-234)

markNonRepairableFeatureAsNotInUse (240-245)

markNonRepairableFeatureAsNotInUse (240-241)

markRepairableFeatureAsInUse (253-258)

markRepairableFeatureAsInUse (253-254)

putInfo (301-337)

putInfo (301-301)

src/mongo/db/storage/kv/kv_catalog.cpp (1)

src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (1)

_rid (191-198)

src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (6)

MONGO_LOG (93-121)

MONGO_LOG (471-506)

MONGO_LOG (546-550)

MONGO_LOG (552-558)

MONGO_LOG (560-566)

MONGO_LOG (568-573)

🔇 Additional comments (7)

src/mongo/db/modules/eloq/tx_service (1)

1-1: Confirm submodule bump & API compatibility

Submodule src/mongo/db/modules/eloq/tx_service advanced from 2bd068afeac7e7621c15bd7ac718f6e96cd5ce73 to af75de6e5d0d068fa9dda2a21b6cdf621ecd1bd0 (SHA exists on remote).

Validate upstream change moves create-table retry logic out of EloqCatalogRecordStore::insertRecord without altering its public interface.

Verify callers now implement bounded-backoff retries and reuse TxExecution, with no implicit loops in insertRecord.

Confirm tests cover: retry-then-succeed for createTable; idempotency (no duplicates); and correct recovery-unit abort/commit on all paths.

Ensure CI invokes git submodule update --init --recursive for reproducible builds.

src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)

324-327: Extra context in logs looks fine

The added txn/context in readCatalog logs is helpful for traceability. Ensure log volume at level 1 is acceptable in production paths.

Also applies to: 347-351

764-775: Propagating isForWrite through discoverTable/readCatalog is correct

This aligns with the new findRecord overload. Once the default in RecordStore is fixed, callers can opt-in safely.

src/mongo/db/modules/eloq/src/eloq_record_store.h (1)

176-178: Remove unused kMaxRetryLimit constant

The kMaxRetryLimit constant is no longer referenced after moving retries to callers; delete it to prevent confusion.

src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (1)

193-195: Signature extension approved; default only in header
Header declares isForWrite = false, the .cpp definition omits the default, and all call sites correctly compile.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

178-182: LGTM! Clean delegation pattern for the new findRecord overload.

The addition of the isForWrite parameter and delegation from the existing method maintains backward compatibility while enabling write-context control through the recovery unit.

Also applies to: 184-187, 196-196

src/mongo/db/storage/kv/kv_catalog.cpp (1)

253-257: LGTM! Consistent error handling pattern.

The function signature change mirrors the non-repairable version, maintaining consistency in the API and enabling proper error propagation.

coderabbitai · 2025-09-29T07:31:23Z

src/mongo/db/modules/eloq/src/eloq_record_store.cpp

+    // for (uint16_t i = 1; i < kMaxRetryLimit; ++i) {
+    //     auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
+    //     if (errorCode != txservice::TxErrorCode::NO_ERROR) {
+    //         MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
+    //                         "may do DDL on the same table.";
+    //     } else {
+    //         if (exist) {
+    //             const char* msg = "Collection already exists in Eloq storage engine";
+    //             warning() << msg << ", ns: " << tableName.StringView();
+    //             return {ErrorCodes::NamespaceExists, msg};
+    //         }
+
+    //         auto status = ru->createTable(tableName, metadata);
+    //         if (status.isOK()) {
+    //             return {recordId};
+    //         }
+    //     }
+
+    //     mongo::Milliseconds duration{uniformDist(randomEngine)};
+    //     MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms";
+    //     opCtx->sleepFor(duration);
+    //     MONGO_LOG(1) << "Retry count: " << i;
+    //     catalogRecord.Reset();
+    // }
+
+    // return {ErrorCodes::InternalError,
+    //         "[Create Table] opertion reaches the maximum number of retries."};


🛠️ Refactor suggestion | 🟠 Major

Remove the commented-out retry loop code.

Since the retry logic has been intentionally moved to a higher level as per the PR objectives, the commented-out code should be deleted rather than retained. This improves code readability and prevents confusion about the intended behavior.

- // for (uint16_t i = 1; i < kMaxRetryLimit; ++i) { - // auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true); - // if (errorCode != txservice::TxErrorCode::NO_ERROR) { - // MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction " - // "may do DDL on the same table."; - // } else { - // if (exist) { - // const char* msg = "Collection already exists in Eloq storage engine"; - // warning() << msg << ", ns: " << tableName.StringView(); - // return {ErrorCodes::NamespaceExists, msg}; - // } - - // auto status = ru->createTable(tableName, metadata); - // if (status.isOK()) { - // return {recordId}; - // } - // } - - // mongo::Milliseconds duration{uniformDist(randomEngine)}; - // MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms"; - // opCtx->sleepFor(duration); - // MONGO_LOG(1) << "Retry count: " << i; - // catalogRecord.Reset(); - // } - - // return {ErrorCodes::InternalError, - // "[Create Table] opertion reaches the maximum number of retries."}; - auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// for (uint16_t i = 1; i < kMaxRetryLimit; ++i) {

// auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);

// if (errorCode != txservice::TxErrorCode::NO_ERROR) {

// MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "

// "may do DDL on the same table.";

// } else {

// if (exist) {

// const char* msg = "Collection already exists in Eloq storage engine";

// warning() << msg << ", ns: " << tableName.StringView();

// return {ErrorCodes::NamespaceExists, msg};

// }

// auto status = ru->createTable(tableName, metadata);

// if (status.isOK()) {

// return {recordId};

// }

// }

// mongo::Milliseconds duration{uniformDist(randomEngine)};

// MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms";

// opCtx->sleepFor(duration);

// MONGO_LOG(1) << "Retry count: " << i;

// catalogRecord.Reset();

// }

// return {ErrorCodes::InternalError,

// "[Create Table] opertion reaches the maximum number of retries."};

// (Removed obsolete retry loop)

auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);

if (errorCode != txservice::TxErrorCode::NO_ERROR) {

MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "

"may do DDL on the same table.";

return {ErrorCodes::InternalError, "Error reading catalog"};

}

if (exist) {

const char* msg = "Collection already exists in Eloq storage engine";

warning() << msg << ", ns: " << tableName.StringView();

return {ErrorCodes::NamespaceExists, msg};

}

auto status = ru->createTable(tableName, metadata);

if (!status.isOK()) {

return {status.code(), status.reason()};

}

return {recordId};

🤖 Prompt for AI Agents

In src/mongo/db/modules/eloq/src/eloq_record_store.cpp around lines 293 to 319, remove the entire commented-out retry loop and the commented return that follows (the block of code starting with the for-loop comments through the final commented return), since retry logic was moved up the call stack; delete these commented lines to improve readability and ensure there are no leftover dead comments, then run a build/format to confirm no leftover references or formatting issues.

coderabbitai · 2025-09-29T07:31:24Z

src/mongo/db/modules/eloq/src/eloq_record_store.h

+    bool findRecord(OperationContext* opCtx,
+                    const RecordId& id,
+                    RecordData* out,
+                    bool isForWrite) const override;



🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

New overload OK; tighten ownership and return semantics in implementation

In the corresponding .cpp (see provided snippet), two fixes are advisable:

Construct txservice::TableName with owned storage to avoid dangling references when keys are copied into RU maps.

Return owned RecordData to avoid lifetime issues.

Suggested changes in src/mongo/db/modules/eloq/src/eloq_record_store.cpp (lines ~183–218):

- txservice::TableName tableName{ - id.getStringView(), txservice::TableType::Primary, txservice::TableEngine::EloqDoc}; + // Own the underlying string to avoid dangling string_view in downstream caches/maps. + // If available, prefer the helper: + // const auto tableName = MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true); + txservice::TableName tableName{ + std::string{id.getStringView()}, + txservice::TableType::Primary, + txservice::TableEngine::EloqDoc}; @@ - *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}; + *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}.getOwned(); return true; @@ - *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}; + *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}.getOwned();

If using the helper, ensure the include exists at the top of the .cpp:

+#include "mongo/db/modules/eloq/src/base/eloq_util.h"

🏁 Script executed:

#!/bin/bash set -e echo "Module files:" fd -t f src/mongo/db/modules/eloq/src echo "---- eloq_record_store.cpp (1-200) ----" sed -n '1,200p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp || echo "Not found" echo "---- eloq_record_store.cpp (150-260) ----" sed -n '150,260p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp || echo "Not found" echo "---- eloq_util.h (1-200) ----" sed -n '1,200p' src/mongo/db/modules/eloq/src/base/eloq_util.h || echo "eloq_util.h not found" echo "---- Helper search ----" grep -R "MongoTableToTxServiceTableName" -n src/mongo/db/modules/eloq/src || echo "Helper not found" echo "---- getOwned search ----" grep -R "getOwned" -n src echo "---- RecordData definition ----" grep -R "class RecordData" -n src/mongo/db

Length of output: 435

🏁 Script executed:

#!/usr/bin/env bash set -e echo "Module files in src/mongo/db/modules/eloq/src:" fd . src/mongo/db/modules/eloq/src -t f echo "---- eloq_record_store.cpp (1-200) ----" sed -n '1,200p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp echo "---- eloq_record_store.cpp (150-260) ----" sed -n '150,260p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp echo "---- eloq_util.h (1-200) ----" sed -n '1,200p' src/mongo/db/modules/eloq/src/base/eloq_util.h echo "---- Searching for helper ----" grep -R "MongoTableToTxServiceTableName" -n src/mongo/db/modules/eloq/src echo "---- Searching for getOwned ----" grep -R "getOwned" -n src echo "---- RecordData definition ----" grep -R "class RecordData" -n src/mongo/db

Length of output: 60084

Tighten ownership in findRecord: copy table name and return owned RecordData

In src/mongo/db/modules/eloq/src/eloq_record_store.cpp, update the overload to:

// Replace string_view‐based TableName with owned string const auto tableName = Eloq::MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true);

And ensure both RecordData assignments call getOwned():

- *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}; + *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}.getOwned(); … - *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}; + *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}.getOwned();

No new includes required—the helper in eloq_util.h is already imported.

🤖 Prompt for AI Agents

In src/mongo/db/modules/eloq/src/eloq_record_store.h around lines 74 to 78, tighten ownership for findRecord by updating the implementation in eloq_record_store.cpp to construct tableName with an owned string (use Eloq::MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true)) instead of a string_view, and ensure both RecordData assignments return owned buffers by calling getOwned() on the RecordData values before assigning to out; no new includes are needed.

coderabbitai · 2025-09-29T07:31:24Z

src/mongo/db/storage/kv/kv_catalog_feature_tracker.h

+    // void markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);
+    Status markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);


🛠️ Refactor suggestion | 🟠 Major

Remove commented-out void method signatures.

The API change from void to Status return type is appropriate for enabling error propagation. However, the old signatures should be removed from the code. If migration guidance is needed, it should be documented in release notes or migration guides rather than as commented code.

- // void markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature); Status markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature); - // void markRepairableFeatureAsInUse(OperationContext* opCtx, RepairableFeature feature); Status markRepairableFeatureAsInUse(OperationContext* opCtx, RepairableFeature feature); - // void putInfo(OperationContext* opCtx, const FeatureBits& versionInfo); Status putInfo(OperationContext* opCtx, const FeatureBits& versionInfo);

Also applies to: 157-158, 181-182

🤖 Prompt for AI Agents

In src/mongo/db/storage/kv/kv_catalog_feature_tracker.h around lines 138-139 (and similarly at 157-158 and 181-182), remove the commented-out old void method signatures so only the new Status-returning declarations remain; delete those commented lines, ensure whitespace and formatting remain consistent, and run a quick build to verify no references rely on the old signatures.

coderabbitai · 2025-09-29T07:31:24Z

src/mongo/db/storage/kv/kv_catalog.cpp

+Status KVCatalog::FeatureTracker::markNonRepairableFeatureAsInUse(OperationContext* opCtx,
+                                                                  NonRepairableFeature feature) {
    FeatureBits versionInfo = getInfo(opCtx);
    versionInfo.nonRepairableFeatures |= static_cast<NonRepairableFeatureMask>(feature);
-    putInfo(opCtx, versionInfo);
+    return putInfo(opCtx, versionInfo);


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

LGTM! Proper error propagation added.

The function signature change from void to Status allows callers to handle failures when marking non-repairable features, which aligns with the PR objective of moving retry logic to higher levels.

Let me verify that all callers of this function handle the new Status return value:

🏁 Script executed:

#!/bin/bash # Search for calls to markNonRepairableFeatureAsInUse to ensure they handle the Status return ast-grep --pattern 'markNonRepairableFeatureAsInUse($_)' # Also search with ripgrep for any usage patterns rg -n "markNonRepairableFeatureAsInUse" --type cpp -C3

Length of output: 10431

🏁 Script executed:

#!/bin/bash sed -n '120,160p' src/mongo/db/storage/kv/kv_catalog_feature_tracker.h rg -n "MONGO_WARN_UNUSED_RESULT" -C1 src/mongo/db/storage/kv/kv_catalog_feature_tracker.h

Length of output: 1822

🏁 Script executed:

#!/bin/bash rg -n "markRepairableFeatureAsInUse" -C3 src

Length of output: 9327

Annotate new Status-returning methods with MONGO_WARN_UNUSED_RESULT and update tests
Add MONGO_WARN_UNUSED_RESULT to both markNonRepairableFeatureAsInUse and markRepairableFeatureAsInUse in kv_catalog_feature_tracker.h to enforce handling their returned Status, and update all tests that currently ignore these calls to wrap them in ASSERT_OK(…).

🤖 Prompt for AI Agents

In src/mongo/db/storage/kv/kv_catalog.cpp lines ~233-237 and the corresponding declarations in src/mongo/db/storage/kv/kv_catalog_feature_tracker.h, the new Status-returning methods markNonRepairableFeatureAsInUse and markRepairableFeatureAsInUse must be annotated with MONGO_WARN_UNUSED_RESULT so callers must check the returned Status; add MONGO_WARN_UNUSED_RESULT to both method declarations in the header, then search tests that call these methods without checking the return and update each to wrap the call in ASSERT_OK(...). Ensure compile includes the header change and all modified tests use ASSERT_OK to verify the returned Status.

coderabbitai · 2025-09-29T07:31:24Z

src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp

        if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
                                                                                    feature)) {
-            _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(opCtx,
-                                                                                        feature);
+            auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
+                opCtx, feature);
+            fassert(70005, st);
        }


⚠️ Potential issue | 🔴 Critical

Fix inverted condition and incorrect fassert usage

Condition is likely inverted: we should mark the feature only if it's not already marked in use.

fassert(70005, st) is invalid for a Status; use fassertStatusOK (or uassertStatusOK/invariantStatusOK).

Apply:

- if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, - feature)) { - auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse( - opCtx, feature); - fassert(70005, st); - } + if (!_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, + feature)) { + auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse( + opCtx, feature); + fassertStatusOK(70005, st); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,

feature)) {

_engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(opCtx,

feature);

auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(

opCtx, feature);

fassert(70005, st);

}

if (!_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,

feature)) {

auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(

opCtx, feature);

fassertStatusOK(70005, st);

}

🤖 Prompt for AI Agents

In src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp around lines 249-254, the code currently calls markNonRepairableFeatureAsInUse only when isNonRepairableFeatureInUse returns true and then uses fassert(70005, st) on a Status; invert the condition so you call markNonRepairableFeatureAsInUse only when the feature is NOT already in use, capture its returned Status, and replace the invalid fassert call with fassertStatusOK(70005, st) (or another status-check helper) to properly assert on the Status result.

coderabbitai · 2025-09-29T07:31:24Z

src/mongo/db/storage/record_store.h

+    virtual bool findRecord(OperationContext* opCtx,
+                            const RecordId& loc,
+                            RecordData* out,
+                            bool isForWrite) const {
+        assert(false);
+    }


⚠️ Potential issue | 🔴 Critical

Avoid UB: provide a safe default for the new overload

assert(false) with no return will be compiled out in release, causing undefined behavior. Default to existing 3‑arg path.

- virtual bool findRecord(OperationContext* opCtx, - const RecordId& loc, - RecordData* out, - bool isForWrite) const { - assert(false); - } + virtual bool findRecord(OperationContext* opCtx, + const RecordId& loc, + RecordData* out, + bool /*isForWrite*/) const { + // Default behavior: ignore isForWrite and delegate to existing API. + return findRecord(opCtx, loc, out); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

virtual bool findRecord(OperationContext* opCtx,

const RecordId& loc,

RecordData* out,

bool isForWrite) const {

assert(false);

}

virtual bool findRecord(OperationContext* opCtx,

const RecordId& loc,

RecordData* out,

bool /*isForWrite*/) const {

// Default behavior: ignore isForWrite and delegate to existing API.

return findRecord(opCtx, loc, out);

}

🤖 Prompt for AI Agents

In src/mongo/db/storage/record_store.h around lines 370 to 375, the new 4-arg findRecord implementation uses assert(false) which is unsafe in release builds; replace the assert with a safe default that forwards to the existing 3‑argument overload, e.g., return findRecord(opCtx, loc, out); so the method returns the same result as the 3‑arg path and avoids undefined behavior.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 332fe4b and 5496331.

📒 Files selected for processing (1)

src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp (2 hunks)

coderabbitai · 2025-09-29T10:03:38Z

src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp

+                } else {
+                    newRU->commitUnitOfWork();
+                    break;
+                }


⚠️ Potential issue | 🔴 Critical

Reset tmp_st before breaking on “feature already in use”.

If a prior iteration of this loop failed (e.g. mark*FeatureAsInUse returned NamespaceExists), tmp_st retains that error. On the next pass, if we observe the feature already in use we commit and break, but tmp_st still holds the earlier failure. The subsequent uassertStatusOK(tmp_st) therefore throws even though the operation logically succeeded. Please clear tmp_st to Status::OK() in the “already in use” branch for both retry loops.

Apply this diff (showing both sites):

} else { + tmp_st = Status::OK(); newRU->commitUnitOfWork(); break; } @@ } else { + tmp_st = Status::OK(); newRU->commitUnitOfWork(); break; }

Also applies to: 262-264

🤖 Prompt for AI Agents

In src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp around lines 223-226 (and also apply same change at 262-264), tmp_st may retain a prior non-OK error from a failed mark*FeatureAsInUse call; update the “feature already in use” branch so that before calling newRU->commitUnitOfWork() and breaking you reset tmp_st to Status::OK() (i.e., clear the stored error) so the subsequent uassertStatusOK(tmp_st) does not incorrectly throw; apply the same reset in both retry loops at the two indicated sites.

coderabbitai · 2025-09-29T10:03:38Z

src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp

+            uassertStatusOK(tmp_st);

-            newRU->commitUnitOfWork();
            opCtx->setRecoveryUnit(oldRU, oldState);



⚠️ Potential issue | 🔴 Critical

Restore the original RecoveryUnit before calling uassertStatusOK.

Right now a failure status triggers uassertStatusOK(tmp_st) before we restore oldRU. That throws and exits the function while opCtx is still bound to newRU, leaking the temporary recovery unit and leaving the context in the wrong state. Please restore oldRU (ideally via ScopeGuard, or at minimum by swapping the two statements) before asserting, at both retry sites.

Suggested fix:

- uassertStatusOK(tmp_st); - - opCtx->setRecoveryUnit(oldRU, oldState); + opCtx->setRecoveryUnit(oldRU, oldState); + uassertStatusOK(tmp_st);

…and likewise for the collation feature block.

Also applies to: 266-269

🤖 Prompt for AI Agents

In src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp around lines 228-231 (and likewise the retry site at 266-269 and the collation feature block), the code calls uassertStatusOK(tmp_st) before restoring the original RecoveryUnit, which leaks the temporary RU on failure; restore oldRU on opCtx (preferably via a ScopeGuard that swaps back in the destructor) before calling uassertStatusOK so that any thrown uassert will occur after the opCtx is returned to its original state; apply the same change at both retry sites and in the collation block to ensure oldRU is always restored prior to asserting the status.

…to create table. (#222) * Retry at call CreateCollection instead of in EloqCatalogRecordStore::insertRecord() on failed to create table

Retry at call CreateCollection instead of in EloqCatalogRecordStore::…

dafe3c6

…insertRecord() on failed to create table

lzxddz self-assigned this Sep 29, 2025

lzxddz added the trigger-ci label Sep 29, 2025

lzxddz linked an issue Sep 29, 2025 that may be closed by this pull request

Test the mongo tool mongodump/mongorestore #213

Closed

5 tasks

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

update

332fe4b

xiexiaoy self-requested a review September 29, 2025 09:34

xiexiaoy approved these changes Sep 29, 2025

View reviewed changes

change to uassert

5496331

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

lzxddz merged commit 077a8d2 into eloqdata:main Sep 30, 2025
3 checks passed

coderabbitai bot mentioned this pull request Sep 30, 2025

Retry after sleep random milliseconds when failed to create collection #226

Merged

This was referenced Oct 15, 2025

Use brpc Module Interface to refactor ThreadGroup loop #232

Merged

Fix compile error when eloq module is disabled #251

Merged

xiexiaoy pushed a commit that referenced this pull request Oct 21, 2025

Remove the retry in EloqCatalogRecordStore::insertRecord() on failed …

8b2d3d9

…to create table. (#222) * Retry at call CreateCollection instead of in EloqCatalogRecordStore::insertRecord() on failed to create table

coderabbitai bot mentioned this pull request Oct 24, 2025

Acquire write intent for base table only #269

Merged

This was referenced Dec 8, 2025

optimize large scan reqs #347

Merged

Retry auto commit upserts on out-of-memory error #369

Merged

This was referenced Dec 18, 2025

Support TTL Feature #361

Merged

Remove EloqUnitRecovery::_kvPair #387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the retry in EloqCatalogRecordStore::insertRecord() on failed to create table.#222

Remove the retry in EloqCatalogRecordStore::insertRecord() on failed to create table.#222
lzxddz merged 3 commits intoeloqdata:mainfrom
lzxddz:fix-feature-tracker

lzxddz commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// void markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);
		Status markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);

Conversation

lzxddz commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lzxddz commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading