Skip to content

Remove the retry in EloqCatalogRecordStore::insertRecord() on failed to create table.#222

Merged
lzxddz merged 3 commits intoeloqdata:mainfrom
lzxddz:fix-feature-tracker
Sep 30, 2025
Merged

Remove the retry in EloqCatalogRecordStore::insertRecord() on failed to create table.#222
lzxddz merged 3 commits intoeloqdata:mainfrom
lzxddz:fix-feature-tracker

Conversation

@lzxddz
Copy link
Collaborator

@lzxddz lzxddz commented Sep 29, 2025

If the TxExecution executed UpsertTableTxRequest, it should be re-used and must abort or commit the EloqRecoveryUnit.

So, we should do retry at the higher level of calling EloqCatalogRecordStore::insertRecord() if create table failed instead of retrying in function.

Summary by CodeRabbit

  • Bug Fixes

    • Improves reliability of catalog and table creation under contention with clearer, actionable error statuses.
    • Adds retry logic for marking feature usage, reducing spurious failures during collection setup.
    • Ensures status checks prevent unexpected terminations during catalog updates.
  • Refactor

    • APIs now return explicit statuses for feature tracking and catalog updates, enabling better error propagation.
    • Introduces a write-context hint for record lookups to optimize discovery paths.
    • Enhances diagnostic logging for catalog operations (including transaction context) to aid troubleshooting.

@lzxddz lzxddz self-assigned this Sep 29, 2025
@coderabbitai
Copy link

coderabbitai bot commented Sep 29, 2025

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly and accurately conveys the removal of retry logic in EloqCatalogRecordStore::insertRecord when table creation fails, which is the primary change of the pull request. It is clear, specific, and free of extraneous details, enabling a reviewer to understand the main update at a glance.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Comment @coderabbitai help to get the list of available commands and usage tips.

@lzxddz lzxddz linked an issue Sep 29, 2025 that may be closed by this pull request
5 tasks
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/mongo/db/storage/kv/kv_catalog.cpp (2)

240-245: Inconsistent error handling in markNonRepairableFeatureAsNotInUse.

While markNonRepairableFeatureAsInUse and markRepairableFeatureAsInUse now return Status, this function still returns void and ignores the Status from putInfo on line 244. This creates an inconsistent API where some feature-marking operations can report errors while others cannot.

Consider updating this function to also return Status for consistency:

-void KVCatalog::FeatureTracker::markNonRepairableFeatureAsNotInUse(OperationContext* opCtx,
-                                                                   NonRepairableFeature feature) {
+Status KVCatalog::FeatureTracker::markNonRepairableFeatureAsNotInUse(OperationContext* opCtx,
+                                                                      NonRepairableFeature feature) {
     FeatureBits versionInfo = getInfo(opCtx);
     versionInfo.nonRepairableFeatures &= ~static_cast<NonRepairableFeatureMask>(feature);
-    putInfo(opCtx, versionInfo);
+    return putInfo(opCtx, versionInfo);
 }

260-265: Inconsistent error handling in markRepairableFeatureAsNotInUse.

Similar to markNonRepairableFeatureAsNotInUse, this function still returns void while its counterpart markRepairableFeatureAsInUse now returns Status. The Status from putInfo is being ignored on line 264.

Update for consistency:

-void KVCatalog::FeatureTracker::markRepairableFeatureAsNotInUse(OperationContext* opCtx,
-                                                                RepairableFeature feature) {
+Status KVCatalog::FeatureTracker::markRepairableFeatureAsNotInUse(OperationContext* opCtx,
+                                                                   RepairableFeature feature) {
     FeatureBits versionInfo = getInfo(opCtx);
     versionInfo.repairableFeatures &= ~static_cast<RepairableFeatureMask>(feature);
-    putInfo(opCtx, versionInfo);
+    return putInfo(opCtx, versionInfo);
 }
🧹 Nitpick comments (5)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)

538-539: Minor: unify logging style

Nit: consider a consistent prefix, e.g., EloqRecoveryUnit::createTable txNumber=..., table=... for easier grep.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

321-327: Consider improving the error message to indicate retry handling.

The error message should clarify that retries are expected to be handled by the caller, not that the operation failed permanently.

     auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
     if (errorCode != txservice::TxErrorCode::NO_ERROR) {
         MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
                         "may do DDL on the same table.";
         return {ErrorCodes::InternalError,
-                "[Create Table] Another transaction may do DDL on the same table"};
+                "[Create Table] DDL contention detected - caller should retry"};
src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp (2)

207-223: Consider adding logging for retry attempts and capping the backoff duration.

The retry loop correctly implements the higher-level retry pattern as per PR objectives. However, consider:

  1. Adding debug logging to track retry attempts for troubleshooting
  2. Capping the backoff to avoid excessive delays (current max is 10ms)
             int retryCount = 0;
             const int maxRetry = 10;
             Status tmp_st = Status::OK();
             while (retryCount++ < maxRetry) {
                 newRU->beginUnitOfWork(opCtx);
                 if (!_catalog->getFeatureTracker()->isRepairableFeatureInUse(opCtx, feature)) {
                     tmp_st =
                         _catalog->getFeatureTracker()->markRepairableFeatureAsInUse(opCtx, feature);
                     if (tmp_st.isOK()) {
                         newRU->commitUnitOfWork();
                         break;
                     }
                 }
                 newRU->abortUnitOfWork();
-                opCtx->sleepFor(retryCount * Milliseconds{1});
+                if (retryCount > 1) {
+                    MONGO_LOG(1) << "Retrying markRepairableFeatureAsInUse, attempt " << retryCount;
+                }
+                opCtx->sleepFor(Milliseconds{std::min(retryCount, 5)});
             }

240-256: Extract the retry logic into a reusable helper to eliminate duplication.

This retry pattern is identical to the one above (lines 207-223). Consider extracting it into a helper function or lambda to reduce code duplication and ensure consistent behavior.

+        auto retryFeatureUpdate = [&](auto featureUpdateFunc) -> Status {
+            int retryCount = 0;
+            const int maxRetry = 10;
+            Status status = Status::OK();
+            while (retryCount++ < maxRetry) {
+                newRU->beginUnitOfWork(opCtx);
+                status = featureUpdateFunc();
+                if (status.isOK()) {
+                    newRU->commitUnitOfWork();
+                    break;
+                }
+                newRU->abortUnitOfWork();
+                if (retryCount > 1) {
+                    MONGO_LOG(1) << "Retrying feature update, attempt " << retryCount;
+                }
+                opCtx->sleepFor(Milliseconds{std::min(retryCount, 5)});
+            }
+            return status;
+        };
+
             int retryCount = 0;
             const int maxRetry = 10;
             Status tmp_st = Status::OK();
-            while (retryCount++ < maxRetry) {
-                newRU->beginUnitOfWork(opCtx);
-                if (!_catalog->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, feature)) {
-                    tmp_st = _catalog->getFeatureTracker()->markNonRepairableFeatureAsInUse(
-                        opCtx, feature);
-                    if (tmp_st.isOK()) {
-                        newRU->commitUnitOfWork();
-                        break;
-                    }
-                }
-                newRU->abortUnitOfWork();
-                opCtx->sleepFor(retryCount * Milliseconds{1});
-            }
+            tmp_st = retryFeatureUpdate([&]() -> Status {
+                if (!_catalog->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, feature)) {
+                    return _catalog->getFeatureTracker()->markNonRepairableFeatureAsInUse(
+                        opCtx, feature);
+                }
+                return Status::OK();
+            });
             fassert(70004, tmp_st);

Then apply the same helper to the RepairableFeature retry loop above.

src/mongo/db/storage/kv/kv_catalog.cpp (1)

301-337: Clarify findRecord’s isForWrite parameter and the diagnostic log
The fourth argument to findRecord is isForWrite, not an upsert flag—passing true here is correct for write operations. However, the MONGO_LOG call still prints opCtx->isUpsert() (which will always be false in this context) and conflates unrelated semantics. Change the log to reflect the isForWrite argument, for example:

MONGO_LOG(1) << "FeatureTracker::putInfo: findRecord isForWrite=" << true;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76c626b and dafe3c6.

📒 Files selected for processing (10)
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp (3 hunks)
  • src/mongo/db/modules/eloq/src/eloq_record_store.h (1 hunks)
  • src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (5 hunks)
  • src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (1 hunks)
  • src/mongo/db/modules/eloq/tx_service (1 hunks)
  • src/mongo/db/storage/kv/kv_catalog.cpp (4 hunks)
  • src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (3 hunks)
  • src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp (2 hunks)
  • src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp (1 hunks)
  • src/mongo/db/storage/record_store.h (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (3)
src/mongo/db/modules/eloq/src/base/eloq_util.h (3)
  • txservice (41-41)
  • TableName (77-83)
  • TableName (86-121)
src/mongo/db/modules/eloq/src/eloq_record_store.h (1)
  • TableName (203-205)
src/mongo/db/storage/record_store.h (3)
  • bool (313-343)
  • bool (359-368)
  • bool (370-375)
src/mongo/db/modules/eloq/src/eloq_record_store.h (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (16)
  • findRecord (178-182)
  • findRecord (178-180)
  • findRecord (184-219)
  • findRecord (184-187)
  • findRecord (741-772)
  • findRecord (741-743)
  • opCtx (142-144)
  • opCtx (142-142)
  • opCtx (460-469)
  • opCtx (460-460)
  • opCtx (575-580)
  • opCtx (575-575)
  • id (123-125)
  • id (123-123)
  • id (508-544)
  • id (508-508)
src/mongo/db/storage/record_store.h (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (12)
  • findRecord (178-182)
  • findRecord (178-180)
  • findRecord (184-219)
  • findRecord (184-187)
  • findRecord (741-772)
  • findRecord (741-743)
  • opCtx (142-144)
  • opCtx (142-142)
  • opCtx (460-469)
  • opCtx (460-460)
  • opCtx (575-580)
  • opCtx (575-575)
src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (1)
src/mongo/db/storage/kv/kv_catalog.cpp (8)
  • markNonRepairableFeatureAsInUse (233-238)
  • markNonRepairableFeatureAsInUse (233-234)
  • markNonRepairableFeatureAsNotInUse (240-245)
  • markNonRepairableFeatureAsNotInUse (240-241)
  • markRepairableFeatureAsInUse (253-258)
  • markRepairableFeatureAsInUse (253-254)
  • putInfo (301-337)
  • putInfo (301-301)
src/mongo/db/storage/kv/kv_catalog.cpp (1)
src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (1)
  • _rid (191-198)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (6)
  • MONGO_LOG (93-121)
  • MONGO_LOG (471-506)
  • MONGO_LOG (546-550)
  • MONGO_LOG (552-558)
  • MONGO_LOG (560-566)
  • MONGO_LOG (568-573)
🔇 Additional comments (7)
src/mongo/db/modules/eloq/tx_service (1)

1-1: Confirm submodule bump & API compatibility

  • Submodule src/mongo/db/modules/eloq/tx_service advanced from 2bd068afeac7e7621c15bd7ac718f6e96cd5ce73 to af75de6e5d0d068fa9dda2a21b6cdf621ecd1bd0 (SHA exists on remote).
  • Validate upstream change moves create-table retry logic out of EloqCatalogRecordStore::insertRecord without altering its public interface.
  • Verify callers now implement bounded-backoff retries and reuse TxExecution, with no implicit loops in insertRecord.
  • Confirm tests cover: retry-then-succeed for createTable; idempotency (no duplicates); and correct recovery-unit abort/commit on all paths.
  • Ensure CI invokes git submodule update --init --recursive for reproducible builds.
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)

324-327: Extra context in logs looks fine

The added txn/context in readCatalog logs is helpful for traceability. Ensure log volume at level 1 is acceptable in production paths.

Also applies to: 347-351


764-775: Propagating isForWrite through discoverTable/readCatalog is correct

This aligns with the new findRecord overload. Once the default in RecordStore is fixed, callers can opt-in safely.

src/mongo/db/modules/eloq/src/eloq_record_store.h (1)

176-178: Remove unused kMaxRetryLimit constant

The kMaxRetryLimit constant is no longer referenced after moving retries to callers; delete it to prevent confusion.

src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (1)

193-195: Signature extension approved; default only in header
Header declares isForWrite = false, the .cpp definition omits the default, and all call sites correctly compile.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

178-182: LGTM! Clean delegation pattern for the new findRecord overload.

The addition of the isForWrite parameter and delegation from the existing method maintains backward compatibility while enabling write-context control through the recovery unit.

Also applies to: 184-187, 196-196

src/mongo/db/storage/kv/kv_catalog.cpp (1)

253-257: LGTM! Consistent error handling pattern.

The function signature change mirrors the non-repairable version, maintaining consistency in the API and enabling proper error propagation.

Comment on lines +293 to +319
// for (uint16_t i = 1; i < kMaxRetryLimit; ++i) {
// auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
// if (errorCode != txservice::TxErrorCode::NO_ERROR) {
// MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
// "may do DDL on the same table.";
// } else {
// if (exist) {
// const char* msg = "Collection already exists in Eloq storage engine";
// warning() << msg << ", ns: " << tableName.StringView();
// return {ErrorCodes::NamespaceExists, msg};
// }

// auto status = ru->createTable(tableName, metadata);
// if (status.isOK()) {
// return {recordId};
// }
// }

// mongo::Milliseconds duration{uniformDist(randomEngine)};
// MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms";
// opCtx->sleepFor(duration);
// MONGO_LOG(1) << "Retry count: " << i;
// catalogRecord.Reset();
// }

// return {ErrorCodes::InternalError,
// "[Create Table] opertion reaches the maximum number of retries."};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove the commented-out retry loop code.

Since the retry logic has been intentionally moved to a higher level as per the PR objectives, the commented-out code should be deleted rather than retained. This improves code readability and prevents confusion about the intended behavior.

-    // for (uint16_t i = 1; i < kMaxRetryLimit; ++i) {
-    //     auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
-    //     if (errorCode != txservice::TxErrorCode::NO_ERROR) {
-    //         MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
-    //                         "may do DDL on the same table.";
-    //     } else {
-    //         if (exist) {
-    //             const char* msg = "Collection already exists in Eloq storage engine";
-    //             warning() << msg << ", ns: " << tableName.StringView();
-    //             return {ErrorCodes::NamespaceExists, msg};
-    //         }
-
-    //         auto status = ru->createTable(tableName, metadata);
-    //         if (status.isOK()) {
-    //             return {recordId};
-    //         }
-    //     }
-
-    //     mongo::Milliseconds duration{uniformDist(randomEngine)};
-    //     MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms";
-    //     opCtx->sleepFor(duration);
-    //     MONGO_LOG(1) << "Retry count: " << i;
-    //     catalogRecord.Reset();
-    // }
-
-    // return {ErrorCodes::InternalError,
-    //         "[Create Table] opertion reaches the maximum number of retries."};
-
     auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// for (uint16_t i = 1; i < kMaxRetryLimit; ++i) {
// auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
// if (errorCode != txservice::TxErrorCode::NO_ERROR) {
// MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
// "may do DDL on the same table.";
// } else {
// if (exist) {
// const char* msg = "Collection already exists in Eloq storage engine";
// warning() << msg << ", ns: " << tableName.StringView();
// return {ErrorCodes::NamespaceExists, msg};
// }
// auto status = ru->createTable(tableName, metadata);
// if (status.isOK()) {
// return {recordId};
// }
// }
// mongo::Milliseconds duration{uniformDist(randomEngine)};
// MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms";
// opCtx->sleepFor(duration);
// MONGO_LOG(1) << "Retry count: " << i;
// catalogRecord.Reset();
// }
// return {ErrorCodes::InternalError,
// "[Create Table] opertion reaches the maximum number of retries."};
// (Removed obsolete retry loop)
auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
if (errorCode != txservice::TxErrorCode::NO_ERROR) {
MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
"may do DDL on the same table.";
return {ErrorCodes::InternalError, "Error reading catalog"};
}
if (exist) {
const char* msg = "Collection already exists in Eloq storage engine";
warning() << msg << ", ns: " << tableName.StringView();
return {ErrorCodes::NamespaceExists, msg};
}
auto status = ru->createTable(tableName, metadata);
if (!status.isOK()) {
return {status.code(), status.reason()};
}
return {recordId};
🤖 Prompt for AI Agents
In src/mongo/db/modules/eloq/src/eloq_record_store.cpp around lines 293 to 319,
remove the entire commented-out retry loop and the commented return that follows
(the block of code starting with the for-loop comments through the final
commented return), since retry logic was moved up the call stack; delete these
commented lines to improve readability and ensure there are no leftover dead
comments, then run a build/format to confirm no leftover references or
formatting issues.

Comment on lines +74 to 78
bool findRecord(OperationContext* opCtx,
const RecordId& id,
RecordData* out,
bool isForWrite) const override;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

New overload OK; tighten ownership and return semantics in implementation

In the corresponding .cpp (see provided snippet), two fixes are advisable:

  • Construct txservice::TableName with owned storage to avoid dangling references when keys are copied into RU maps.
  • Return owned RecordData to avoid lifetime issues.

Suggested changes in src/mongo/db/modules/eloq/src/eloq_record_store.cpp (lines ~183–218):

-    txservice::TableName tableName{
-        id.getStringView(), txservice::TableType::Primary, txservice::TableEngine::EloqDoc};
+    // Own the underlying string to avoid dangling string_view in downstream caches/maps.
+    // If available, prefer the helper:
+    //   const auto tableName = MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true);
+    txservice::TableName tableName{
+        std::string{id.getStringView()},
+        txservice::TableType::Primary,
+        txservice::TableEngine::EloqDoc};

@@
-            *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())};
+            *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}.getOwned();
             return true;
@@
-    *out = RecordData{metadata.data(), static_cast<int>(metadata.size())};
+    *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}.getOwned();

If using the helper, ensure the include exists at the top of the .cpp:

+#include "mongo/db/modules/eloq/src/base/eloq_util.h"

🏁 Script executed:

#!/bin/bash
set -e

echo "Module files:"
fd -t f src/mongo/db/modules/eloq/src

echo "---- eloq_record_store.cpp (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp || echo "Not found"

echo "---- eloq_record_store.cpp (150-260) ----"
sed -n '150,260p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp || echo "Not found"

echo "---- eloq_util.h (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/base/eloq_util.h || echo "eloq_util.h not found"

echo "---- Helper search ----"
grep -R "MongoTableToTxServiceTableName" -n src/mongo/db/modules/eloq/src || echo "Helper not found"

echo "---- getOwned search ----"
grep -R "getOwned" -n src

echo "---- RecordData definition ----"
grep -R "class RecordData" -n src/mongo/db

Length of output: 435


🏁 Script executed:

#!/usr/bin/env bash
set -e

echo "Module files in src/mongo/db/modules/eloq/src:"
fd . src/mongo/db/modules/eloq/src -t f

echo "---- eloq_record_store.cpp (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp

echo "---- eloq_record_store.cpp (150-260) ----"
sed -n '150,260p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp

echo "---- eloq_util.h (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/base/eloq_util.h

echo "---- Searching for helper ----"
grep -R "MongoTableToTxServiceTableName" -n src/mongo/db/modules/eloq/src

echo "---- Searching for getOwned ----"
grep -R "getOwned" -n src

echo "---- RecordData definition ----"
grep -R "class RecordData" -n src/mongo/db

Length of output: 60084


Tighten ownership in findRecord: copy table name and return owned RecordData

In src/mongo/db/modules/eloq/src/eloq_record_store.cpp, update the overload to:

// Replace string_view‐based TableName with owned string
const auto tableName =
    Eloq::MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true);

And ensure both RecordData assignments call getOwned():

-    *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())};
+    *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}.getOwned();-    *out = RecordData{metadata.data(), static_cast<int>(metadata.size())};
+    *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}.getOwned();

No new includes required—the helper in eloq_util.h is already imported.

🤖 Prompt for AI Agents
In src/mongo/db/modules/eloq/src/eloq_record_store.h around lines 74 to 78,
tighten ownership for findRecord by updating the implementation in
eloq_record_store.cpp to construct tableName with an owned string (use
Eloq::MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true))
instead of a string_view, and ensure both RecordData assignments return owned
buffers by calling getOwned() on the RecordData values before assigning to out;
no new includes are needed.

Comment on lines +138 to +139
// void markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);
Status markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove commented-out void method signatures.

The API change from void to Status return type is appropriate for enabling error propagation. However, the old signatures should be removed from the code. If migration guidance is needed, it should be documented in release notes or migration guides rather than as commented code.

-    // void markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);
     Status markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);

-    // void markRepairableFeatureAsInUse(OperationContext* opCtx, RepairableFeature feature);
     Status markRepairableFeatureAsInUse(OperationContext* opCtx, RepairableFeature feature);

-    // void putInfo(OperationContext* opCtx, const FeatureBits& versionInfo);
     Status putInfo(OperationContext* opCtx, const FeatureBits& versionInfo);

Also applies to: 157-158, 181-182

🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_catalog_feature_tracker.h around lines 138-139
(and similarly at 157-158 and 181-182), remove the commented-out old void method
signatures so only the new Status-returning declarations remain; delete those
commented lines, ensure whitespace and formatting remain consistent, and run a
quick build to verify no references rely on the old signatures.

Comment on lines +233 to +237
Status KVCatalog::FeatureTracker::markNonRepairableFeatureAsInUse(OperationContext* opCtx,
NonRepairableFeature feature) {
FeatureBits versionInfo = getInfo(opCtx);
versionInfo.nonRepairableFeatures |= static_cast<NonRepairableFeatureMask>(feature);
putInfo(opCtx, versionInfo);
return putInfo(opCtx, versionInfo);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

LGTM! Proper error propagation added.

The function signature change from void to Status allows callers to handle failures when marking non-repairable features, which aligns with the PR objective of moving retry logic to higher levels.

Let me verify that all callers of this function handle the new Status return value:


🏁 Script executed:

#!/bin/bash
# Search for calls to markNonRepairableFeatureAsInUse to ensure they handle the Status return
ast-grep --pattern 'markNonRepairableFeatureAsInUse($_)'

# Also search with ripgrep for any usage patterns
rg -n "markNonRepairableFeatureAsInUse" --type cpp -C3

Length of output: 10431


🏁 Script executed:

#!/bin/bash
sed -n '120,160p' src/mongo/db/storage/kv/kv_catalog_feature_tracker.h
rg -n "MONGO_WARN_UNUSED_RESULT" -C1 src/mongo/db/storage/kv/kv_catalog_feature_tracker.h

Length of output: 1822


🏁 Script executed:

#!/bin/bash
rg -n "markRepairableFeatureAsInUse" -C3 src

Length of output: 9327


Annotate new Status-returning methods with MONGO_WARN_UNUSED_RESULT and update tests
Add MONGO_WARN_UNUSED_RESULT to both markNonRepairableFeatureAsInUse and markRepairableFeatureAsInUse in kv_catalog_feature_tracker.h to enforce handling their returned Status, and update all tests that currently ignore these calls to wrap them in ASSERT_OK(…).

🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_catalog.cpp lines ~233-237 and the corresponding
declarations in src/mongo/db/storage/kv/kv_catalog_feature_tracker.h, the new
Status-returning methods markNonRepairableFeatureAsInUse and
markRepairableFeatureAsInUse must be annotated with MONGO_WARN_UNUSED_RESULT so
callers must check the returned Status; add MONGO_WARN_UNUSED_RESULT to both
method declarations in the header, then search tests that call these methods
without checking the return and update each to wrap the call in ASSERT_OK(...).
Ensure compile includes the header change and all modified tests use ASSERT_OK
to verify the returned Status.

Comment on lines 249 to 254
if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
feature)) {
_engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(opCtx,
feature);
auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
opCtx, feature);
fassert(70005, st);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix inverted condition and incorrect fassert usage

  • Condition is likely inverted: we should mark the feature only if it's not already marked in use.
  • fassert(70005, st) is invalid for a Status; use fassertStatusOK (or uassertStatusOK/invariantStatusOK).

Apply:

-        if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
-                                                                                    feature)) {
-            auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
-                opCtx, feature);
-            fassert(70005, st);
-        }
+        if (!_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
+                                                                                     feature)) {
+            auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
+                opCtx, feature);
+            fassertStatusOK(70005, st);
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
feature)) {
_engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(opCtx,
feature);
auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
opCtx, feature);
fassert(70005, st);
}
if (!_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
feature)) {
auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
opCtx, feature);
fassertStatusOK(70005, st);
}
🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp around lines
249-254, the code currently calls markNonRepairableFeatureAsInUse only when
isNonRepairableFeatureInUse returns true and then uses fassert(70005, st) on a
Status; invert the condition so you call markNonRepairableFeatureAsInUse only
when the feature is NOT already in use, capture its returned Status, and replace
the invalid fassert call with fassertStatusOK(70005, st) (or another
status-check helper) to properly assert on the Status result.

Comment on lines +370 to +375
virtual bool findRecord(OperationContext* opCtx,
const RecordId& loc,
RecordData* out,
bool isForWrite) const {
assert(false);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid UB: provide a safe default for the new overload

assert(false) with no return will be compiled out in release, causing undefined behavior. Default to existing 3‑arg path.

-    virtual bool findRecord(OperationContext* opCtx,
-                            const RecordId& loc,
-                            RecordData* out,
-                            bool isForWrite) const {
-        assert(false);
-    }
+    virtual bool findRecord(OperationContext* opCtx,
+                            const RecordId& loc,
+                            RecordData* out,
+                            bool /*isForWrite*/) const {
+        // Default behavior: ignore isForWrite and delegate to existing API.
+        return findRecord(opCtx, loc, out);
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
virtual bool findRecord(OperationContext* opCtx,
const RecordId& loc,
RecordData* out,
bool isForWrite) const {
assert(false);
}
virtual bool findRecord(OperationContext* opCtx,
const RecordId& loc,
RecordData* out,
bool /*isForWrite*/) const {
// Default behavior: ignore isForWrite and delegate to existing API.
return findRecord(opCtx, loc, out);
}
🤖 Prompt for AI Agents
In src/mongo/db/storage/record_store.h around lines 370 to 375, the new 4-arg
findRecord implementation uses assert(false) which is unsafe in release builds;
replace the assert with a safe default that forwards to the existing 3‑argument
overload, e.g., return findRecord(opCtx, loc, out); so the method returns the
same result as the 3‑arg path and avoids undefined behavior.

@xiexiaoy xiexiaoy self-requested a review September 29, 2025 09:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 332fe4b and 5496331.

📒 Files selected for processing (1)
  • src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp (2 hunks)

Comment on lines +223 to +226
} else {
newRU->commitUnitOfWork();
break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Reset tmp_st before breaking on “feature already in use”.

If a prior iteration of this loop failed (e.g. mark*FeatureAsInUse returned NamespaceExists), tmp_st retains that error. On the next pass, if we observe the feature already in use we commit and break, but tmp_st still holds the earlier failure. The subsequent uassertStatusOK(tmp_st) therefore throws even though the operation logically succeeded. Please clear tmp_st to Status::OK() in the “already in use” branch for both retry loops.

Apply this diff (showing both sites):

                 } else {
+                    tmp_st = Status::OK();
                     newRU->commitUnitOfWork();
                     break;
                 }
@@
                 } else {
+                    tmp_st = Status::OK();
                     newRU->commitUnitOfWork();
                     break;
                 }

Also applies to: 262-264

🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp around lines 223-226
(and also apply same change at 262-264), tmp_st may retain a prior non-OK error
from a failed mark*FeatureAsInUse call; update the “feature already in use”
branch so that before calling newRU->commitUnitOfWork() and breaking you reset
tmp_st to Status::OK() (i.e., clear the stored error) so the subsequent
uassertStatusOK(tmp_st) does not incorrectly throw; apply the same reset in both
retry loops at the two indicated sites.

Comment on lines +228 to 231
uassertStatusOK(tmp_st);

newRU->commitUnitOfWork();
opCtx->setRecoveryUnit(oldRU, oldState);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Restore the original RecoveryUnit before calling uassertStatusOK.

Right now a failure status triggers uassertStatusOK(tmp_st) before we restore oldRU. That throws and exits the function while opCtx is still bound to newRU, leaking the temporary recovery unit and leaving the context in the wrong state. Please restore oldRU (ideally via ScopeGuard, or at minimum by swapping the two statements) before asserting, at both retry sites.

Suggested fix:

-            uassertStatusOK(tmp_st);
-
-            opCtx->setRecoveryUnit(oldRU, oldState);
+            opCtx->setRecoveryUnit(oldRU, oldState);
+            uassertStatusOK(tmp_st);

…and likewise for the collation feature block.

Also applies to: 266-269

🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp around lines 228-231
(and likewise the retry site at 266-269 and the collation feature block), the
code calls uassertStatusOK(tmp_st) before restoring the original RecoveryUnit,
which leaks the temporary RU on failure; restore oldRU on opCtx (preferably via
a ScopeGuard that swaps back in the destructor) before calling uassertStatusOK
so that any thrown uassert will occur after the opCtx is returned to its
original state; apply the same change at both retry sites and in the collation
block to ensure oldRU is always restored prior to asserting the status.

@lzxddz lzxddz merged commit 077a8d2 into eloqdata:main Sep 30, 2025
3 checks passed
xiexiaoy pushed a commit that referenced this pull request Oct 21, 2025
…to create table. (#222)

* Retry at call CreateCollection instead of in EloqCatalogRecordStore::insertRecord() on failed to create table
This was referenced Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test the mongo tool mongodump/mongorestore

2 participants