Remove the retry in EloqCatalogRecordStore::insertRecord() on failed to create table.#222
Conversation
…insertRecord() on failed to create table
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/mongo/db/storage/kv/kv_catalog.cpp (2)
240-245: Inconsistent error handling in markNonRepairableFeatureAsNotInUse.While
markNonRepairableFeatureAsInUseandmarkRepairableFeatureAsInUsenow return Status, this function still returns void and ignores the Status fromputInfoon line 244. This creates an inconsistent API where some feature-marking operations can report errors while others cannot.Consider updating this function to also return Status for consistency:
-void KVCatalog::FeatureTracker::markNonRepairableFeatureAsNotInUse(OperationContext* opCtx, - NonRepairableFeature feature) { +Status KVCatalog::FeatureTracker::markNonRepairableFeatureAsNotInUse(OperationContext* opCtx, + NonRepairableFeature feature) { FeatureBits versionInfo = getInfo(opCtx); versionInfo.nonRepairableFeatures &= ~static_cast<NonRepairableFeatureMask>(feature); - putInfo(opCtx, versionInfo); + return putInfo(opCtx, versionInfo); }
260-265: Inconsistent error handling in markRepairableFeatureAsNotInUse.Similar to
markNonRepairableFeatureAsNotInUse, this function still returns void while its counterpartmarkRepairableFeatureAsInUsenow returns Status. The Status fromputInfois being ignored on line 264.Update for consistency:
-void KVCatalog::FeatureTracker::markRepairableFeatureAsNotInUse(OperationContext* opCtx, - RepairableFeature feature) { +Status KVCatalog::FeatureTracker::markRepairableFeatureAsNotInUse(OperationContext* opCtx, + RepairableFeature feature) { FeatureBits versionInfo = getInfo(opCtx); versionInfo.repairableFeatures &= ~static_cast<RepairableFeatureMask>(feature); - putInfo(opCtx, versionInfo); + return putInfo(opCtx, versionInfo); }
🧹 Nitpick comments (5)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)
538-539: Minor: unify logging styleNit: consider a consistent prefix, e.g.,
EloqRecoveryUnit::createTable txNumber=..., table=...for easier grep.src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)
321-327: Consider improving the error message to indicate retry handling.The error message should clarify that retries are expected to be handled by the caller, not that the operation failed permanently.
auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true); if (errorCode != txservice::TxErrorCode::NO_ERROR) { MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction " "may do DDL on the same table."; return {ErrorCodes::InternalError, - "[Create Table] Another transaction may do DDL on the same table"}; + "[Create Table] DDL contention detected - caller should retry"};src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp (2)
207-223: Consider adding logging for retry attempts and capping the backoff duration.The retry loop correctly implements the higher-level retry pattern as per PR objectives. However, consider:
- Adding debug logging to track retry attempts for troubleshooting
- Capping the backoff to avoid excessive delays (current max is 10ms)
int retryCount = 0; const int maxRetry = 10; Status tmp_st = Status::OK(); while (retryCount++ < maxRetry) { newRU->beginUnitOfWork(opCtx); if (!_catalog->getFeatureTracker()->isRepairableFeatureInUse(opCtx, feature)) { tmp_st = _catalog->getFeatureTracker()->markRepairableFeatureAsInUse(opCtx, feature); if (tmp_st.isOK()) { newRU->commitUnitOfWork(); break; } } newRU->abortUnitOfWork(); - opCtx->sleepFor(retryCount * Milliseconds{1}); + if (retryCount > 1) { + MONGO_LOG(1) << "Retrying markRepairableFeatureAsInUse, attempt " << retryCount; + } + opCtx->sleepFor(Milliseconds{std::min(retryCount, 5)}); }
240-256: Extract the retry logic into a reusable helper to eliminate duplication.This retry pattern is identical to the one above (lines 207-223). Consider extracting it into a helper function or lambda to reduce code duplication and ensure consistent behavior.
+ auto retryFeatureUpdate = [&](auto featureUpdateFunc) -> Status { + int retryCount = 0; + const int maxRetry = 10; + Status status = Status::OK(); + while (retryCount++ < maxRetry) { + newRU->beginUnitOfWork(opCtx); + status = featureUpdateFunc(); + if (status.isOK()) { + newRU->commitUnitOfWork(); + break; + } + newRU->abortUnitOfWork(); + if (retryCount > 1) { + MONGO_LOG(1) << "Retrying feature update, attempt " << retryCount; + } + opCtx->sleepFor(Milliseconds{std::min(retryCount, 5)}); + } + return status; + }; + int retryCount = 0; const int maxRetry = 10; Status tmp_st = Status::OK(); - while (retryCount++ < maxRetry) { - newRU->beginUnitOfWork(opCtx); - if (!_catalog->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, feature)) { - tmp_st = _catalog->getFeatureTracker()->markNonRepairableFeatureAsInUse( - opCtx, feature); - if (tmp_st.isOK()) { - newRU->commitUnitOfWork(); - break; - } - } - newRU->abortUnitOfWork(); - opCtx->sleepFor(retryCount * Milliseconds{1}); - } + tmp_st = retryFeatureUpdate([&]() -> Status { + if (!_catalog->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, feature)) { + return _catalog->getFeatureTracker()->markNonRepairableFeatureAsInUse( + opCtx, feature); + } + return Status::OK(); + }); fassert(70004, tmp_st);Then apply the same helper to the RepairableFeature retry loop above.
src/mongo/db/storage/kv/kv_catalog.cpp (1)
301-337: Clarify findRecord’s isForWrite parameter and the diagnostic log
The fourth argument tofindRecordisisForWrite, not an upsert flag—passingtruehere is correct for write operations. However, theMONGO_LOGcall still printsopCtx->isUpsert()(which will always be false in this context) and conflates unrelated semantics. Change the log to reflect theisForWriteargument, for example:MONGO_LOG(1) << "FeatureTracker::putInfo: findRecord isForWrite=" << true;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp(3 hunks)src/mongo/db/modules/eloq/src/eloq_record_store.h(1 hunks)src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp(5 hunks)src/mongo/db/modules/eloq/src/eloq_recovery_unit.h(1 hunks)src/mongo/db/modules/eloq/tx_service(1 hunks)src/mongo/db/storage/kv/kv_catalog.cpp(4 hunks)src/mongo/db/storage/kv/kv_catalog_feature_tracker.h(3 hunks)src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp(2 hunks)src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp(1 hunks)src/mongo/db/storage/record_store.h(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (3)
src/mongo/db/modules/eloq/src/base/eloq_util.h (3)
txservice(41-41)TableName(77-83)TableName(86-121)src/mongo/db/modules/eloq/src/eloq_record_store.h (1)
TableName(203-205)src/mongo/db/storage/record_store.h (3)
bool(313-343)bool(359-368)bool(370-375)
src/mongo/db/modules/eloq/src/eloq_record_store.h (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (16)
findRecord(178-182)findRecord(178-180)findRecord(184-219)findRecord(184-187)findRecord(741-772)findRecord(741-743)opCtx(142-144)opCtx(142-142)opCtx(460-469)opCtx(460-460)opCtx(575-580)opCtx(575-575)id(123-125)id(123-123)id(508-544)id(508-508)
src/mongo/db/storage/record_store.h (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (12)
findRecord(178-182)findRecord(178-180)findRecord(184-219)findRecord(184-187)findRecord(741-772)findRecord(741-743)opCtx(142-144)opCtx(142-142)opCtx(460-469)opCtx(460-460)opCtx(575-580)opCtx(575-575)
src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (1)
src/mongo/db/storage/kv/kv_catalog.cpp (8)
markNonRepairableFeatureAsInUse(233-238)markNonRepairableFeatureAsInUse(233-234)markNonRepairableFeatureAsNotInUse(240-245)markNonRepairableFeatureAsNotInUse(240-241)markRepairableFeatureAsInUse(253-258)markRepairableFeatureAsInUse(253-254)putInfo(301-337)putInfo(301-301)
src/mongo/db/storage/kv/kv_catalog.cpp (1)
src/mongo/db/storage/kv/kv_catalog_feature_tracker.h (1)
_rid(191-198)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (6)
MONGO_LOG(93-121)MONGO_LOG(471-506)MONGO_LOG(546-550)MONGO_LOG(552-558)MONGO_LOG(560-566)MONGO_LOG(568-573)
🔇 Additional comments (7)
src/mongo/db/modules/eloq/tx_service (1)
1-1: Confirm submodule bump & API compatibility
- Submodule
src/mongo/db/modules/eloq/tx_serviceadvanced from2bd068afeac7e7621c15bd7ac718f6e96cd5ce73toaf75de6e5d0d068fa9dda2a21b6cdf621ecd1bd0(SHA exists on remote).- Validate upstream change moves create-table retry logic out of
EloqCatalogRecordStore::insertRecordwithout altering its public interface.- Verify callers now implement bounded-backoff retries and reuse
TxExecution, with no implicit loops ininsertRecord.- Confirm tests cover: retry-then-succeed for createTable; idempotency (no duplicates); and correct recovery-unit abort/commit on all paths.
- Ensure CI invokes
git submodule update --init --recursivefor reproducible builds.src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
324-327: Extra context in logs looks fineThe added txn/context in readCatalog logs is helpful for traceability. Ensure log volume at level 1 is acceptable in production paths.
Also applies to: 347-351
764-775: Propagating isForWrite through discoverTable/readCatalog is correctThis aligns with the new findRecord overload. Once the default in RecordStore is fixed, callers can opt-in safely.
src/mongo/db/modules/eloq/src/eloq_record_store.h (1)
176-178: Remove unused kMaxRetryLimit constantThe
kMaxRetryLimitconstant is no longer referenced after moving retries to callers; delete it to prevent confusion.src/mongo/db/modules/eloq/src/eloq_recovery_unit.h (1)
193-195: Signature extension approved; default only in header
Header declaresisForWrite = false, the .cpp definition omits the default, and all call sites correctly compile.src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)
178-182: LGTM! Clean delegation pattern for the new findRecord overload.The addition of the
isForWriteparameter and delegation from the existing method maintains backward compatibility while enabling write-context control through the recovery unit.Also applies to: 184-187, 196-196
src/mongo/db/storage/kv/kv_catalog.cpp (1)
253-257: LGTM! Consistent error handling pattern.The function signature change mirrors the non-repairable version, maintaining consistency in the API and enabling proper error propagation.
| // for (uint16_t i = 1; i < kMaxRetryLimit; ++i) { | ||
| // auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true); | ||
| // if (errorCode != txservice::TxErrorCode::NO_ERROR) { | ||
| // MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction " | ||
| // "may do DDL on the same table."; | ||
| // } else { | ||
| // if (exist) { | ||
| // const char* msg = "Collection already exists in Eloq storage engine"; | ||
| // warning() << msg << ", ns: " << tableName.StringView(); | ||
| // return {ErrorCodes::NamespaceExists, msg}; | ||
| // } | ||
|
|
||
| // auto status = ru->createTable(tableName, metadata); | ||
| // if (status.isOK()) { | ||
| // return {recordId}; | ||
| // } | ||
| // } | ||
|
|
||
| // mongo::Milliseconds duration{uniformDist(randomEngine)}; | ||
| // MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms"; | ||
| // opCtx->sleepFor(duration); | ||
| // MONGO_LOG(1) << "Retry count: " << i; | ||
| // catalogRecord.Reset(); | ||
| // } | ||
|
|
||
| // return {ErrorCodes::InternalError, | ||
| // "[Create Table] opertion reaches the maximum number of retries."}; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove the commented-out retry loop code.
Since the retry logic has been intentionally moved to a higher level as per the PR objectives, the commented-out code should be deleted rather than retained. This improves code readability and prevents confusion about the intended behavior.
- // for (uint16_t i = 1; i < kMaxRetryLimit; ++i) {
- // auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);
- // if (errorCode != txservice::TxErrorCode::NO_ERROR) {
- // MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction "
- // "may do DDL on the same table.";
- // } else {
- // if (exist) {
- // const char* msg = "Collection already exists in Eloq storage engine";
- // warning() << msg << ", ns: " << tableName.StringView();
- // return {ErrorCodes::NamespaceExists, msg};
- // }
-
- // auto status = ru->createTable(tableName, metadata);
- // if (status.isOK()) {
- // return {recordId};
- // }
- // }
-
- // mongo::Milliseconds duration{uniformDist(randomEngine)};
- // MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms";
- // opCtx->sleepFor(duration);
- // MONGO_LOG(1) << "Retry count: " << i;
- // catalogRecord.Reset();
- // }
-
- // return {ErrorCodes::InternalError,
- // "[Create Table] opertion reaches the maximum number of retries."};
-
auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // for (uint16_t i = 1; i < kMaxRetryLimit; ++i) { | |
| // auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true); | |
| // if (errorCode != txservice::TxErrorCode::NO_ERROR) { | |
| // MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction " | |
| // "may do DDL on the same table."; | |
| // } else { | |
| // if (exist) { | |
| // const char* msg = "Collection already exists in Eloq storage engine"; | |
| // warning() << msg << ", ns: " << tableName.StringView(); | |
| // return {ErrorCodes::NamespaceExists, msg}; | |
| // } | |
| // auto status = ru->createTable(tableName, metadata); | |
| // if (status.isOK()) { | |
| // return {recordId}; | |
| // } | |
| // } | |
| // mongo::Milliseconds duration{uniformDist(randomEngine)}; | |
| // MONGO_LOG(1) << "Fail to create table in Eloq. Sleep for " << duration.count() << "ms"; | |
| // opCtx->sleepFor(duration); | |
| // MONGO_LOG(1) << "Retry count: " << i; | |
| // catalogRecord.Reset(); | |
| // } | |
| // return {ErrorCodes::InternalError, | |
| // "[Create Table] opertion reaches the maximum number of retries."}; | |
| // (Removed obsolete retry loop) | |
| auto [exist, errorCode] = ru->readCatalog(catalogKey, catalogRecord, true); | |
| if (errorCode != txservice::TxErrorCode::NO_ERROR) { | |
| MONGO_LOG(1) << "Eloq readCatalog error with write intent. Another transaction " | |
| "may do DDL on the same table."; | |
| return {ErrorCodes::InternalError, "Error reading catalog"}; | |
| } | |
| if (exist) { | |
| const char* msg = "Collection already exists in Eloq storage engine"; | |
| warning() << msg << ", ns: " << tableName.StringView(); | |
| return {ErrorCodes::NamespaceExists, msg}; | |
| } | |
| auto status = ru->createTable(tableName, metadata); | |
| if (!status.isOK()) { | |
| return {status.code(), status.reason()}; | |
| } | |
| return {recordId}; |
🤖 Prompt for AI Agents
In src/mongo/db/modules/eloq/src/eloq_record_store.cpp around lines 293 to 319,
remove the entire commented-out retry loop and the commented return that follows
(the block of code starting with the for-loop comments through the final
commented return), since retry logic was moved up the call stack; delete these
commented lines to improve readability and ensure there are no leftover dead
comments, then run a build/format to confirm no leftover references or
formatting issues.
| bool findRecord(OperationContext* opCtx, | ||
| const RecordId& id, | ||
| RecordData* out, | ||
| bool isForWrite) const override; | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
New overload OK; tighten ownership and return semantics in implementation
In the corresponding .cpp (see provided snippet), two fixes are advisable:
- Construct
txservice::TableNamewith owned storage to avoid dangling references when keys are copied into RU maps. - Return owned
RecordDatato avoid lifetime issues.
Suggested changes in src/mongo/db/modules/eloq/src/eloq_record_store.cpp (lines ~183–218):
- txservice::TableName tableName{
- id.getStringView(), txservice::TableType::Primary, txservice::TableEngine::EloqDoc};
+ // Own the underlying string to avoid dangling string_view in downstream caches/maps.
+ // If available, prefer the helper:
+ // const auto tableName = MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true);
+ txservice::TableName tableName{
+ std::string{id.getStringView()},
+ txservice::TableType::Primary,
+ txservice::TableEngine::EloqDoc};
@@
- *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())};
+ *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}.getOwned();
return true;
@@
- *out = RecordData{metadata.data(), static_cast<int>(metadata.size())};
+ *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}.getOwned();If using the helper, ensure the include exists at the top of the .cpp:
+#include "mongo/db/modules/eloq/src/base/eloq_util.h"🏁 Script executed:
#!/bin/bash
set -e
echo "Module files:"
fd -t f src/mongo/db/modules/eloq/src
echo "---- eloq_record_store.cpp (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp || echo "Not found"
echo "---- eloq_record_store.cpp (150-260) ----"
sed -n '150,260p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp || echo "Not found"
echo "---- eloq_util.h (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/base/eloq_util.h || echo "eloq_util.h not found"
echo "---- Helper search ----"
grep -R "MongoTableToTxServiceTableName" -n src/mongo/db/modules/eloq/src || echo "Helper not found"
echo "---- getOwned search ----"
grep -R "getOwned" -n src
echo "---- RecordData definition ----"
grep -R "class RecordData" -n src/mongo/dbLength of output: 435
🏁 Script executed:
#!/usr/bin/env bash
set -e
echo "Module files in src/mongo/db/modules/eloq/src:"
fd . src/mongo/db/modules/eloq/src -t f
echo "---- eloq_record_store.cpp (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp
echo "---- eloq_record_store.cpp (150-260) ----"
sed -n '150,260p' src/mongo/db/modules/eloq/src/eloq_record_store.cpp
echo "---- eloq_util.h (1-200) ----"
sed -n '1,200p' src/mongo/db/modules/eloq/src/base/eloq_util.h
echo "---- Searching for helper ----"
grep -R "MongoTableToTxServiceTableName" -n src/mongo/db/modules/eloq/src
echo "---- Searching for getOwned ----"
grep -R "getOwned" -n src
echo "---- RecordData definition ----"
grep -R "class RecordData" -n src/mongo/dbLength of output: 60084
Tighten ownership in findRecord: copy table name and return owned RecordData
In src/mongo/db/modules/eloq/src/eloq_record_store.cpp, update the overload to:
// Replace string_view‐based TableName with owned string
const auto tableName =
Eloq::MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true);And ensure both RecordData assignments call getOwned():
- *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())};
+ *out = RecordData{bsonObj.objdata(), static_cast<int>(bsonObj.objsize())}.getOwned();
…
- *out = RecordData{metadata.data(), static_cast<int>(metadata.size())};
+ *out = RecordData{metadata.data(), static_cast<int>(metadata.size())}.getOwned();No new includes required—the helper in eloq_util.h is already imported.
🤖 Prompt for AI Agents
In src/mongo/db/modules/eloq/src/eloq_record_store.h around lines 74 to 78,
tighten ownership for findRecord by updating the implementation in
eloq_record_store.cpp to construct tableName with an owned string (use
Eloq::MongoTableToTxServiceTableName(id.getStringView(), /*own_string=*/true))
instead of a string_view, and ensure both RecordData assignments return owned
buffers by calling getOwned() on the RecordData values before assigning to out;
no new includes are needed.
| // void markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature); | ||
| Status markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Remove commented-out void method signatures.
The API change from void to Status return type is appropriate for enabling error propagation. However, the old signatures should be removed from the code. If migration guidance is needed, it should be documented in release notes or migration guides rather than as commented code.
- // void markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);
Status markNonRepairableFeatureAsInUse(OperationContext* opCtx, NonRepairableFeature feature);
- // void markRepairableFeatureAsInUse(OperationContext* opCtx, RepairableFeature feature);
Status markRepairableFeatureAsInUse(OperationContext* opCtx, RepairableFeature feature);
- // void putInfo(OperationContext* opCtx, const FeatureBits& versionInfo);
Status putInfo(OperationContext* opCtx, const FeatureBits& versionInfo);Also applies to: 157-158, 181-182
🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_catalog_feature_tracker.h around lines 138-139
(and similarly at 157-158 and 181-182), remove the commented-out old void method
signatures so only the new Status-returning declarations remain; delete those
commented lines, ensure whitespace and formatting remain consistent, and run a
quick build to verify no references rely on the old signatures.
| Status KVCatalog::FeatureTracker::markNonRepairableFeatureAsInUse(OperationContext* opCtx, | ||
| NonRepairableFeature feature) { | ||
| FeatureBits versionInfo = getInfo(opCtx); | ||
| versionInfo.nonRepairableFeatures |= static_cast<NonRepairableFeatureMask>(feature); | ||
| putInfo(opCtx, versionInfo); | ||
| return putInfo(opCtx, versionInfo); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
LGTM! Proper error propagation added.
The function signature change from void to Status allows callers to handle failures when marking non-repairable features, which aligns with the PR objective of moving retry logic to higher levels.
Let me verify that all callers of this function handle the new Status return value:
🏁 Script executed:
#!/bin/bash
# Search for calls to markNonRepairableFeatureAsInUse to ensure they handle the Status return
ast-grep --pattern 'markNonRepairableFeatureAsInUse($_)'
# Also search with ripgrep for any usage patterns
rg -n "markNonRepairableFeatureAsInUse" --type cpp -C3Length of output: 10431
🏁 Script executed:
#!/bin/bash
sed -n '120,160p' src/mongo/db/storage/kv/kv_catalog_feature_tracker.h
rg -n "MONGO_WARN_UNUSED_RESULT" -C1 src/mongo/db/storage/kv/kv_catalog_feature_tracker.hLength of output: 1822
🏁 Script executed:
#!/bin/bash
rg -n "markRepairableFeatureAsInUse" -C3 srcLength of output: 9327
Annotate new Status-returning methods with MONGO_WARN_UNUSED_RESULT and update tests
Add MONGO_WARN_UNUSED_RESULT to both markNonRepairableFeatureAsInUse and markRepairableFeatureAsInUse in kv_catalog_feature_tracker.h to enforce handling their returned Status, and update all tests that currently ignore these calls to wrap them in ASSERT_OK(…).
🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_catalog.cpp lines ~233-237 and the corresponding
declarations in src/mongo/db/storage/kv/kv_catalog_feature_tracker.h, the new
Status-returning methods markNonRepairableFeatureAsInUse and
markRepairableFeatureAsInUse must be annotated with MONGO_WARN_UNUSED_RESULT so
callers must check the returned Status; add MONGO_WARN_UNUSED_RESULT to both
method declarations in the header, then search tests that call these methods
without checking the return and update each to wrap the call in ASSERT_OK(...).
Ensure compile includes the header change and all modified tests use ASSERT_OK
to verify the returned Status.
| if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, | ||
| feature)) { | ||
| _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(opCtx, | ||
| feature); | ||
| auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse( | ||
| opCtx, feature); | ||
| fassert(70005, st); | ||
| } |
There was a problem hiding this comment.
Fix inverted condition and incorrect fassert usage
- Condition is likely inverted: we should mark the feature only if it's not already marked in use.
fassert(70005, st)is invalid for a Status; usefassertStatusOK(oruassertStatusOK/invariantStatusOK).
Apply:
- if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
- feature)) {
- auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
- opCtx, feature);
- fassert(70005, st);
- }
+ if (!_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx,
+ feature)) {
+ auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(
+ opCtx, feature);
+ fassertStatusOK(70005, st);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, | |
| feature)) { | |
| _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse(opCtx, | |
| feature); | |
| auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse( | |
| opCtx, feature); | |
| fassert(70005, st); | |
| } | |
| if (!_engine->getCatalog()->getFeatureTracker()->isNonRepairableFeatureInUse(opCtx, | |
| feature)) { | |
| auto st = _engine->getCatalog()->getFeatureTracker()->markNonRepairableFeatureAsInUse( | |
| opCtx, feature); | |
| fassertStatusOK(70005, st); | |
| } |
🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_database_catalog_entry_base.cpp around lines
249-254, the code currently calls markNonRepairableFeatureAsInUse only when
isNonRepairableFeatureInUse returns true and then uses fassert(70005, st) on a
Status; invert the condition so you call markNonRepairableFeatureAsInUse only
when the feature is NOT already in use, capture its returned Status, and replace
the invalid fassert call with fassertStatusOK(70005, st) (or another
status-check helper) to properly assert on the Status result.
| virtual bool findRecord(OperationContext* opCtx, | ||
| const RecordId& loc, | ||
| RecordData* out, | ||
| bool isForWrite) const { | ||
| assert(false); | ||
| } |
There was a problem hiding this comment.
Avoid UB: provide a safe default for the new overload
assert(false) with no return will be compiled out in release, causing undefined behavior. Default to existing 3‑arg path.
- virtual bool findRecord(OperationContext* opCtx,
- const RecordId& loc,
- RecordData* out,
- bool isForWrite) const {
- assert(false);
- }
+ virtual bool findRecord(OperationContext* opCtx,
+ const RecordId& loc,
+ RecordData* out,
+ bool /*isForWrite*/) const {
+ // Default behavior: ignore isForWrite and delegate to existing API.
+ return findRecord(opCtx, loc, out);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| virtual bool findRecord(OperationContext* opCtx, | |
| const RecordId& loc, | |
| RecordData* out, | |
| bool isForWrite) const { | |
| assert(false); | |
| } | |
| virtual bool findRecord(OperationContext* opCtx, | |
| const RecordId& loc, | |
| RecordData* out, | |
| bool /*isForWrite*/) const { | |
| // Default behavior: ignore isForWrite and delegate to existing API. | |
| return findRecord(opCtx, loc, out); | |
| } |
🤖 Prompt for AI Agents
In src/mongo/db/storage/record_store.h around lines 370 to 375, the new 4-arg
findRecord implementation uses assert(false) which is unsafe in release builds;
replace the assert with a safe default that forwards to the existing 3‑argument
overload, e.g., return findRecord(opCtx, loc, out); so the method returns the
same result as the 3‑arg path and avoids undefined behavior.
| } else { | ||
| newRU->commitUnitOfWork(); | ||
| break; | ||
| } |
There was a problem hiding this comment.
Reset tmp_st before breaking on “feature already in use”.
If a prior iteration of this loop failed (e.g. mark*FeatureAsInUse returned NamespaceExists), tmp_st retains that error. On the next pass, if we observe the feature already in use we commit and break, but tmp_st still holds the earlier failure. The subsequent uassertStatusOK(tmp_st) therefore throws even though the operation logically succeeded. Please clear tmp_st to Status::OK() in the “already in use” branch for both retry loops.
Apply this diff (showing both sites):
} else {
+ tmp_st = Status::OK();
newRU->commitUnitOfWork();
break;
}
@@
} else {
+ tmp_st = Status::OK();
newRU->commitUnitOfWork();
break;
}Also applies to: 262-264
🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp around lines 223-226
(and also apply same change at 262-264), tmp_st may retain a prior non-OK error
from a failed mark*FeatureAsInUse call; update the “feature already in use”
branch so that before calling newRU->commitUnitOfWork() and breaking you reset
tmp_st to Status::OK() (i.e., clear the stored error) so the subsequent
uassertStatusOK(tmp_st) does not incorrectly throw; apply the same reset in both
retry loops at the two indicated sites.
| uassertStatusOK(tmp_st); | ||
|
|
||
| newRU->commitUnitOfWork(); | ||
| opCtx->setRecoveryUnit(oldRU, oldState); | ||
|
|
There was a problem hiding this comment.
Restore the original RecoveryUnit before calling uassertStatusOK.
Right now a failure status triggers uassertStatusOK(tmp_st) before we restore oldRU. That throws and exits the function while opCtx is still bound to newRU, leaking the temporary recovery unit and leaving the context in the wrong state. Please restore oldRU (ideally via ScopeGuard, or at minimum by swapping the two statements) before asserting, at both retry sites.
Suggested fix:
- uassertStatusOK(tmp_st);
-
- opCtx->setRecoveryUnit(oldRU, oldState);
+ opCtx->setRecoveryUnit(oldRU, oldState);
+ uassertStatusOK(tmp_st);…and likewise for the collation feature block.
Also applies to: 266-269
🤖 Prompt for AI Agents
In src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp around lines 228-231
(and likewise the retry site at 266-269 and the collation feature block), the
code calls uassertStatusOK(tmp_st) before restoring the original RecoveryUnit,
which leaks the temporary RU on failure; restore oldRU on opCtx (preferably via
a ScopeGuard that swaps back in the destructor) before calling uassertStatusOK
so that any thrown uassert will occur after the opCtx is returned to its
original state; apply the same change at both retry sites and in the collation
block to ensure oldRU is always restored prior to asserting the status.
…to create table. (#222) * Retry at call CreateCollection instead of in EloqCatalogRecordStore::insertRecord() on failed to create table
If the
TxExecutionexecutedUpsertTableTxRequest, it should be re-used and must abort or commit theEloqRecoveryUnit.So, we should do retry at the higher level of calling
EloqCatalogRecordStore::insertRecord()if create table failed instead of retrying in function.Summary by CodeRabbit
Bug Fixes
Refactor