Skip to content

align bulk write error with mongodb#402

Merged
lokax merged 16 commits intoeloqdata:mainfrom
lokax:yf-bulk-error
Dec 31, 2025
Merged

align bulk write error with mongodb#402
lokax merged 16 commits intoeloqdata:mainfrom
lokax:yf-bulk-error

Conversation

@lokax
Copy link
Collaborator

@lokax lokax commented Dec 29, 2025

Summary by CodeRabbit

  • New Features

    • Preflight duplicate-key checks for bulk inserts and added duplicate-key validation during updates to detect conflicts earlier.
  • Bug Fixes

    • Bulk writes now permit partial success: non-conflicting documents are inserted while per-document duplicate-key errors are reported without aborting the whole operation.
    • Improved error-path recovery allowing command-level continuation after certain insert failures.
  • Tests

    • Updated bulk-write and counter tests to assert partial-success semantics and detailed duplicate-key error reporting.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings December 29, 2025 08:25
@coderabbitai
Copy link

coderabbitai bot commented Dec 29, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds storage- and index-level batch duplicate-key checks before inserts and updates, implements those checks for the Eloq engine, adjusts insert/update control flow to run checks before writing, updates error handling to allow command-level transaction continuation for DuplicateKey in specific paths, and updates tests to expect partial-success results.

Changes

Cohort / File(s) Summary
Collection insert/update flow
src/mongo/db/catalog/collection_impl.cpp
Build BSON pointers for input docs; call RecordStore::batchCheckDuplicateKey and IndexCatalog::batchCheckDuplicateKey before insertRecords; only perform indexing after inserts; add per-index checkDuplicateKeysForUpdate during updates.
Index catalog API & impl
src/mongo/db/catalog/index_catalog.h, src/mongo/db/catalog/index_catalog_impl.h, src/mongo/db/catalog/index_catalog_impl.cpp
Add batchCheckDuplicateKey(OperationContext*, const std::vector<const BSONObj*>&) API and implementation that iterates ready unique indexes and delegates to index access methods; update indexRecords to apply per-index operations.
Index access method layer
src/mongo/db/index/index_access_method.h, src/mongo/db/index/index_access_method.cpp
Expose getSortedDataInterface(); add batchCheckDuplicateKey(...) and checkDuplicateKeysForUpdate(...) that delegate to SortedDataInterface (no-op fallback); add UpdateTicket wrapper.
Storage engine abstractions
src/mongo/db/storage/record_store.h, src/mongo/db/storage/sorted_data_interface.h
Add virtual RecordStore::batchCheckDuplicateKey(...) and SortedDataInterface::batchCheckDuplicateKey(...) plus checkDuplicateKeysForUpdate(...) with default no-op implementations.
Eloq storage engine
src/mongo/db/modules/eloq/src/eloq_index.h, .../eloq_index.cpp, .../eloq_record_store.h, .../eloq_record_store.cpp, src/mongo/db/modules/eloq/data_substrate
Implement _checkDuplicateKeysInternal, batchCheckDuplicateKey, and checkDuplicateKeysForUpdate; add BatchReadEntry; move preflight duplicate-check into dedicated batch API and switch Eloq insert paths to use per-record setKV with logging; submodule pointer update.
Write execution & commands
src/mongo/db/ops/write_ops_exec.cpp, src/mongo/db/commands/write_commands/write_commands.cpp
Add allowCommandLevelTransaction param to handleError; adjust error handling to allow continuation for DuplicateKey in command-level transactions and reset RecoveryUnit state where appropriate; minor formatting.
OperationContext RU API
src/mongo/db/operation_context.h, src/mongo/db/operation_context.cpp
Add setRecoveryUnitState(WriteUnitOfWork::RecoveryUnitState) to reset RU state without aborting outer transaction.
Tests updated
tests/jstests/core/batch_write_command_insert.js, tests/jstests/core/write_result.js, tests/jstests/core/opcounters_write_cmd.js
Update expectations to partial-success semantics (ok:1 with writeErrors, nInserted > 0), add detailed writeErrors assertions, and adjust opcounter expectations.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Collection
    participant RecordStore
    participant IndexCatalog
    participant IndexAccessMethod
    participant SortedDataInterface

    Client->>Collection: bulkInsert(docs)

    rect rgb(232,240,255)
    Note over Collection,RecordStore: Primary-key pre-check
    Collection->>RecordStore: batchCheckDuplicateKey(records)
    RecordStore-->>Collection: Status (OK / DuplicateKey)
    alt DuplicateKey
        Collection-->>Client: Error (early return)
    end
    end

    rect rgb(232,240,255)
    Note over Collection,IndexCatalog: Index pre-checks
    Collection->>IndexCatalog: batchCheckDuplicateKey(bsonObjPtrs)
    IndexCatalog->>IndexAccessMethod: batchCheckDuplicateKey(...)
    IndexAccessMethod->>SortedDataInterface: batchCheckDuplicateKey(...)
    SortedDataInterface-->>IndexAccessMethod: Status (OK / DuplicateKey)
    alt DuplicateKey
        IndexAccessMethod-->>IndexCatalog: Error
        IndexCatalog-->>Collection: Error
        Collection-->>Client: Error (early return)
    end
    end

    rect rgb(230,255,230)
    Note over Collection,RecordStore: Insert phase
    Collection->>RecordStore: insertRecords(...)
    RecordStore-->>Collection: inserted Records + RecordIds
    end

    rect rgb(255,244,214)
    Note over Collection,IndexCatalog: Post-insert indexing
    Collection->>IndexCatalog: indexRecords(bsonRecords)
    IndexCatalog->>IndexAccessMethod: index per-record keys / checkDuplicateKeysForUpdate
    IndexAccessMethod->>SortedDataInterface: write/index operations
    SortedDataInterface-->>IndexAccessMethod: OK
    IndexAccessMethod-->>IndexCatalog: OK
    IndexCatalog-->>Collection: OK
    end

    Collection-->>Client: WriteResult (nInserted, writeErrors)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

trigger-ci

Suggested reviewers

  • xiexiaoy
  • lzxddz
  • liunyl

Poem

🐰 I hop through keys both old and new,
I sniff for duplicates in the queue,
Pre-flight checks tidy every nest,
Some hops succeed while others rest,
I twitch my nose — records tucked and blessed.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.69% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'align bulk write error with mongodb' directly reflects the core purpose of the changeset: aligning bulk write error handling behavior with MongoDB standards.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (9)
src/mongo/db/commands/write_commands/write_commands.cpp (1)

29-31: Question logging component change and commented-out basic.h

You’re now defining the default log component as kStorage and have commented out mongo/platform/basic.h. The log component switch may be intentional, but it will move all log() output in this file under the storage category, which is different from typical command logging. Also, basic.h is usually included first in MongoDB source files; relying on transitive includes can be fragile.

Consider:

  • Confirming that kStorage is really the desired component for write command logs.
  • Reinstating #include "mongo/platform/basic.h" unless there is a concrete reason to omit it.

Also applies to: 58-59

tests/jstests/core/batch_write_command_insert.js (1)

190-248: Duplicate-key tests for unique index align with bulk write semantics

The single and multi-document duplicate-key tests now:

  • Expect result.ok with writeErrors populated.
  • Distinguish ordered vs unordered behavior via n and writeErrors.length.
  • Verify error indices and basic types of code/errmsg.

This matches MongoDB’s bulk-write API, where duplicate-key errors are reported per-operation without failing the whole command.

You might optionally tighten these by asserting writeErrors[i].code === ErrorCodes.DuplicateKey everywhere you are explicitly testing duplicate-key behavior, for consistency with the new tests below.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

60-70: Batch primary-key duplicate check for Eloq is well-structured

The new BatchReadEntry helper and EloqRecordStore::batchCheckDuplicateKey implementation look sound:

  • _id key extraction + checkKeySize mirror existing single-insert validation.
  • batchKeys correctly enforces uniqueness within the incoming batch.
  • The batchGetKV call checks for existing keys (or pending writes) in one round-trip and interprets RecordStatus::Normal as a duplicate.
  • Assigning record.id from the KeyString ensures a stable mapping between primary key and RecordId, which is later used by the indexing path.

This gives Eloq an efficient, pre-insert duplicate check on the primary key consistent with MongoDB semantics.

If you ever need more performance, you could consider reusing the BatchReadEntry array from this function in _insertRecords instead of rebuilding it, but that’s an optimization rather than a correctness issue.

Also applies to: 811-876


1139-1175: Assumptions and reuse in _insertRecords

In _insertRecords you:

  • Fetch DiscoveredTable and compute pkeySchemaVersion once.
  • Rebuild BatchReadEntry to generate primary-key KeyStrings and MongoKeys and call setKV with checkUnique = true per record.

A couple of points to call out:

  • This implementation implicitly assumes that callers have already run batchCheckDuplicateKey (for key-size and duplicate detection) and that records[i].id has been initialized consistently with the KeyString you’re now using. That’s true for CollectionImpl::_insertDocuments in this PR, but please double-check that no other call sites invoke insertRecords directly without the pre-check, or you may lose early duplicate detection or key-size errors for those paths.
  • You’re recomputing KeyStrings here that were already built in batchCheckDuplicateKey; if this becomes a hot path, consolidating that computation would avoid duplication.

Functionally this looks correct, but it would be good to document or assert the expected call ordering so future refactors don’t accidentally bypass the pre-check.

src/mongo/db/catalog/collection_impl.cpp (2)

66-78: Duplicate key_string.h include

mongo/db/storage/key_string.h is now included twice (lines 66 and 77). It’s harmless but redundant; you can safely drop the second include.


507-566: New pre-insert batch duplicate-check pipeline

The updated _insertDocuments now:

  1. Builds records/timestamps as before.
  2. Calls _recordStore->batchCheckDuplicateKey(opCtx, &records) to preflight primary-key duplicates (Eloq overrides this; other engines are effectively no-ops).
  3. Constructs bsonRecords and a parallel bsonObjPtrs array from the incoming documents.
  4. Calls _indexCatalog.batchCheckDuplicateKey(opCtx, bsonObjPtrs) to check all ready unique indexes in a single pass.
  5. Performs the actual insert via _recordStore->insertRecords(...).
  6. Copies the resulting RecordIds from records back into bsonRecords and then calls _indexCatalog.indexRecords(...).

This is a reasonable layering: it preserves the existing insert+index behavior while adding efficient pre-insert duplicate detection for engines that opt in.

Two things worth confirming:

  • That all RecordStore implementations you care about either override batchCheckDuplicateKey or are happy with the new default of “no pre-check, rely on insert-time enforcement”.
  • That the performInserts / WriteResult machinery in write_ops_exec.cpp still attributes errors per-document correctly for ordered vs unordered batches now that _insertDocuments returns a single Status for the whole group, with duplicate detection potentially happening before any writes.

If both hold, this pipeline should give you MongoDB-like bulk duplicate-key semantics while avoiding unnecessary writes when duplicates are detected early.

src/mongo/db/modules/eloq/src/eloq_index.cpp (3)

661-663: Consider defensive error handling instead of invariant.

The invariant assertion will crash the process if tuple.status_ is neither Normal nor Deleted. If the underlying txservice API introduces additional status values in the future, this could cause unexpected crashes.

🔎 Proposed fix
             } else {
-                invariant(tuple.status_ == txservice::RecordStatus::Deleted);
+                // Key does not exist - this is expected
+                if (tuple.status_ != txservice::RecordStatus::Deleted) {
+                    MONGO_LOG(1) << "batchCheckDuplicateKey unexpected status: "
+                                 << static_cast<int>(tuple.status_);
+                }
             }

764-774: Consider removing commented-out code or adding a clear explanation.

The commented-out existence check block documents the prior validation approach, but leaving large commented code blocks can reduce readability. If this is intentionally kept for reference, consider adding a brief comment explaining why (e.g., // Prior approach - now handled by batchCheckDuplicateKey). Otherwise, remove it.


786-788: Debug log line exceeds typical line length limits.

The log statement on line 787 is quite long. Consider breaking it up or simplifying for readability.

🔎 Proposed fix
     if (err != txservice::TxErrorCode::NO_ERROR) {
-        MONGO_LOG(1) << "yf: EloqUniqueIndex::insert setKV failed, index: " << _indexName.StringView() << ", key: " << key.toString() << ", error: " << txservice::TxErrorMessage(err);
+        MONGO_LOG(1) << "EloqUniqueIndex::insert setKV failed, index: "
+                     << _indexName.StringView() << ", key: " << key.toString()
+                     << ", error: " << txservice::TxErrorMessage(err);
     }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 331d3f2 and 4a2edc3.

📒 Files selected for processing (17)
  • src/mongo/db/catalog/collection_impl.cpp
  • src/mongo/db/catalog/index_catalog.h
  • src/mongo/db/catalog/index_catalog_impl.cpp
  • src/mongo/db/catalog/index_catalog_impl.h
  • src/mongo/db/commands/write_commands/write_commands.cpp
  • src/mongo/db/index/index_access_method.cpp
  • src/mongo/db/index/index_access_method.h
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.h
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
  • src/mongo/db/modules/eloq/src/eloq_record_store.h
  • src/mongo/db/operation_context.cpp
  • src/mongo/db/operation_context.h
  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/storage/record_store.h
  • src/mongo/db/storage/sorted_data_interface.h
  • tests/jstests/core/batch_write_command_insert.js
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.h
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.

Applied to files:

  • src/mongo/db/commands/write_commands/write_commands.cpp
🧬 Code graph analysis (11)
src/mongo/db/catalog/index_catalog_impl.h (5)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog.h (4)
  • batchCheckDuplicateKey (540-543)
  • vector (478-480)
  • BSONObj (449-452)
  • BSONObj (454-457)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/index/index_access_method.h (4)
src/mongo/db/storage/sorted_data_interface.h (3)
  • Status (136-140)
  • Status (146-148)
  • Status (188-191)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
src/mongo/db/catalog/index_catalog.h (4)
  • batchCheckDuplicateKey (540-543)
  • vector (478-480)
  • BSONObj (449-452)
  • BSONObj (454-457)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/index/index_access_method.cpp (5)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog.h (1)
  • batchCheckDuplicateKey (540-543)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.h (1)
  • getSortedDataInterface (165-167)
src/mongo/db/operation_context.h (1)
src/mongo/db/operation_context.cpp (2)
  • setRecoveryUnitState (485-487)
  • setRecoveryUnitState (485-485)
src/mongo/db/modules/eloq/src/eloq_index.cpp (6)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (15)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
  • MONGO_LOG (98-126)
  • MONGO_LOG (450-483)
  • MONGO_LOG (519-526)
  • MONGO_LOG (528-533)
  • MONGO_LOG (535-540)
  • MONGO_LOG (542-547)
  • mongoKey (724-724)
src/mongo/db/catalog/index_catalog.h (1)
  • batchCheckDuplicateKey (540-543)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
  • get (277-279)
  • get (277-277)
src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)
  • TxErrorCodeToMongoStatus (11-75)
  • TxErrorCodeToMongoStatus (11-11)
src/mongo/db/catalog/index_catalog.h (5)
src/mongo/db/storage/sorted_data_interface.h (3)
  • Status (136-140)
  • Status (146-148)
  • Status (188-191)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/storage/sorted_data_interface.h (4)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog.h (3)
  • batchCheckDuplicateKey (540-543)
  • BSONObj (449-452)
  • BSONObj (454-457)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (3)
src/mongo/db/modules/eloq/src/eloq_index.cpp (11)
  • mongoKey (236-236)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
  • MONGO_LOG (187-189)
  • MONGO_LOG (191-195)
  • MONGO_LOG (197-208)
  • MONGO_LOG (210-215)
  • MONGO_LOG (331-363)
  • MONGO_LOG (389-413)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
  • get (277-279)
  • get (277-277)
src/mongo/db/modules/eloq/src/eloq_record_store.h (1)
  • insertRecords (87-92)
src/mongo/db/storage/record_store.h (6)
src/mongo/db/storage/sorted_data_interface.h (4)
  • Status (136-140)
  • Status (146-148)
  • Status (188-191)
  • std (403-405)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog.h (2)
  • batchCheckDuplicateKey (540-543)
  • vector (478-480)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/modules/eloq/src/eloq_record_store.h (4)
src/mongo/db/storage/sorted_data_interface.h (3)
  • Status (136-140)
  • Status (146-148)
  • Status (188-191)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/catalog/index_catalog_impl.cpp (4)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (576-668)
  • batchCheckDuplicateKey (576-577)
  • opCtx (217-222)
  • opCtx (217-217)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog.h (1)
  • batchCheckDuplicateKey (540-543)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: CodeQL analysis (javascript-typescript)
🔇 Additional comments (21)
src/mongo/db/storage/sorted_data_interface.h (1)

129-140: LGTM! Well-designed extension point for batch duplicate-key validation.

The new batchCheckDuplicateKey method provides a clean extension point for storage engines to implement batch duplicate-key checking prior to inserts. The default no-op implementation ensures backward compatibility for engines that don't require this functionality, following the same pattern as other optional methods like compact() and touch().

src/mongo/db/operation_context.cpp (1)

485-487: LGTM! Clean setter for recovery unit state management.

The implementation correctly provides direct state manipulation capability as documented in the header. This enables resetting state after nested WriteUnitOfWork failures while maintaining the outer transaction.

src/mongo/db/catalog/index_catalog_impl.h (1)

344-345: LGTM! Proper catalog-level batch duplicate-key checking declaration.

The method declaration correctly exposes batch duplicate-key checking at the catalog implementation level, enabling validation across all unique indexes prior to insertion. The implementation (as shown in related snippets) appropriately delegates to individual index access methods.

src/mongo/db/operation_context.h (1)

120-125: LGTM! Clear documentation for recovery unit state setter.

The method declaration and documentation clearly describe the purpose: enabling state resets after nested WriteUnitOfWork failures while keeping the outer transaction active. This is a useful addition to the existing recovery unit management API.

src/mongo/db/modules/eloq/src/eloq_index.h (2)

25-27: LGTM! Required includes for batch duplicate-key checking.

The added includes are necessary for the batchCheckDuplicateKey implementation:

  • index_catalog.h and multikey_paths.h for index key extraction
  • simple_bsonobj_comparator.h for intra-batch duplicate detection

83-92: LGTM! Proper override of batch duplicate-key checking for EloqIndex.

The method declaration correctly overrides the SortedDataInterface API, enabling Eloq-specific batch duplicate-key validation. The documentation clearly describes the parameters and return values. The implementation (shown in related snippets) demonstrates comprehensive validation including intra-batch duplicate detection.

src/mongo/db/storage/record_store.h (1)

420-432: LGTM! Well-designed extension point for batch primary key validation.

The new batchCheckDuplicateKey method provides a clean extension point for record stores to validate primary keys (RecordId) in batch before insertion. The default no-op implementation maintains backward compatibility for storage engines that don't require this functionality. The input/output parameter design (records with potentially set RecordId) is appropriate for this use case.

src/mongo/db/index/index_access_method.h (2)

159-167: LGTM! Useful accessor for storage engine-specific implementations.

The getSortedDataInterface() accessor provides a clean way to access storage engine-specific implementations (like EloqIndex), enabling delegation patterns for specialized functionality. The documentation clearly describes its purpose.


169-177: LGTM! Appropriate index access method API for batch duplicate checking.

The method declaration correctly introduces batch duplicate-key checking at the index access layer, with clear documentation. The implementation (shown in related snippets) properly delegates to the underlying SortedDataInterface when available, falling back to OK otherwise, which maintains correct behavior across different storage engines.

src/mongo/db/modules/eloq/src/eloq_record_store.h (1)

240-241: LGTM! Proper override for Eloq record store batch duplicate checking.

The method declaration correctly overrides the RecordStore API, enabling Eloq-specific batch duplicate-key validation for primary keys. The implementation (shown in related snippets) demonstrates comprehensive validation including intra-batch duplicate detection and efficient batch KV lookups.

src/mongo/db/index/index_access_method.cpp (1)

621-630: Delegating batch duplicate checks to SDI looks correct

The new IndexAccessMethod::batchCheckDuplicateKey cleanly delegates to the underlying SortedDataInterface when present and is a no-op otherwise, which is the right behavior for engines that don't implement batch checking.

src/mongo/db/commands/write_commands/write_commands.cpp (1)

93-95: No semantic change in shouldSkipOutput guard

Wrapping if (shouldSkipOutput(opCtx)) in braces is a cosmetic/readability change only; behavior is unchanged.

tests/jstests/core/batch_write_command_insert.js (3)

105-121: Illegal-key insert now validated via ok: 1 + writeErrors

Switching this test from !result.ok to result.ok with a single writeErrors entry matches modern write-command behavior (command succeeds, operation-level failure). Assertions on n == 0 and collection count keep the semantics tight.


393-408: Ordered bulk duplicate-key count expectation corrected

Changing the expectation from coll.count() == 1 to coll.count() == 50 in the ordered bulk test is correct: documents _id:1..49 should be inserted before the duplicate _id:50 stops the batch. This makes the test consistent with upstream bulk-insert semantics.


428-473: New ordered vs unordered DuplicateKey tests look solid

The added tests:

  • For ordered inserts, assert n: 0, a single writeErrors[0] at index 0, and ErrorCodes.DuplicateKey.
  • For unordered inserts, assert n: 2, at-least-one writeErrors, all with ErrorCodes.DuplicateKey, and final document count of 3 (1 initial + 2 successes).

These precisely capture the expected BulkWriteException-style behavior and provide good regression coverage for future changes.

src/mongo/db/catalog/index_catalog_impl.cpp (1)

1449-1478: Unique-index batch duplicate check implementation is appropriate

IndexCatalogImpl::batchCheckDuplicateKey correctly:

  • Skips unfinished indexes and non-unique indexes.
  • Delegates to each ready unique index’s access method.
  • Short-circuits on first non-OK status.

This matches the intended use as a pre-insert guard over all relevant unique indexes and composes properly with the new SDI-level hook.

src/mongo/db/catalog/index_catalog.h (1)

234-236: IndexCatalog API for batch duplicate checking is consistent

Adding Impl::batchCheckDuplicateKey and the public IndexCatalog::batchCheckDuplicateKey wrapper cleanly exposes the new batch-check capability at the catalog layer, matching the implementations in IndexCatalogImpl and IndexAccessMethod.

The doc comment is clear about the primary contract (OK vs DuplicateKey), while still allowing other error statuses to propagate.

Also applies to: 533-543

src/mongo/db/ops/write_ops_exec.cpp (4)

225-250: LGTM with a note on intentional asymmetry.

The new allowCommandLevelTransaction parameter enables error handling within command-level transactions specifically for insert operations. Update and delete operations continue to throw (via the default false value). Ensure this asymmetry between insert and update/delete error handling is intentional and documented in design docs.


476-485: Clarify whether abandonSnapshot() is needed when resetting RecoveryUnit state.

The commented-out abandonSnapshot() call on line 483 suggests uncertainty about proper state cleanup. Directly resetting the RecoveryUnit state to kActiveUnitOfWork without clearing any failed snapshot state could potentially leave the recovery unit in an inconsistent state.

Please verify if abandonSnapshot() should be called, and either uncomment it or remove it with an explanatory comment if it's confirmed unnecessary.


515-532: Consistent error handling for single inserts.

The error handling pattern here mirrors the batch failure handling above, which is good for consistency. The same concern about the commented-out abandonSnapshot() applies here as well.


646-649: LGTM - consistent with other insert error handling paths.

The true argument for allowCommandLevelTransaction aligns with the insert error handling behavior established elsewhere in this file.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aligns bulk write error handling with MongoDB's standard behavior by changing error responses from ok: false to ok: true with a writeErrors array for duplicate key errors. The implementation introduces batch duplicate key checking for improved performance in the Eloq storage engine.

Key changes:

  • Modified bulk write error format to return ok: 1 with writeErrors array instead of ok: false
  • Implemented batch duplicate key checking in EloqRecordStore and EloqIndex to detect duplicates before insertion
  • Added recovery unit state management to handle nested WriteUnitOfWork failures in command-level transactions

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 27 comments.

Show a summary per file
File Description
tests/jstests/core/batch_write_command_insert.js Updated test assertions to expect ok: 1 with writeErrors; added new test cases for ordered/unordered batch error handling
src/mongo/db/ops/write_ops_exec.cpp Modified error handling to support command-level transactions; added recovery unit state reset logic
src/mongo/db/operation_context.h/cpp Added setRecoveryUnitState() method to manipulate recovery unit state without aborting transactions
src/mongo/db/storage/sorted_data_interface.h Declared virtual batchCheckDuplicateKey() method for duplicate checking in storage interfaces
src/mongo/db/storage/record_store.h Declared virtual batchCheckDuplicateKey() method for primary key duplicate checking
src/mongo/db/modules/eloq/src/eloq_record_store.h/cpp Implemented batchCheckDuplicateKey() with batch key validation and duplicate detection
src/mongo/db/modules/eloq/src/eloq_index.h/cpp Implemented batchCheckDuplicateKey() for unique index duplicate checking; commented out inline duplicate check in insert
src/mongo/db/index/index_access_method.h/cpp Added batchCheckDuplicateKey() method that delegates to underlying SortedDataInterface
src/mongo/db/catalog/index_catalog.h/cpp Added batchCheckDuplicateKey() to check all unique indexes before insertion
src/mongo/db/catalog/collection_impl.cpp Modified _insertDocuments() to call batch duplicate checks before actual insertion
src/mongo/db/commands/write_commands/write_commands.cpp Added logging include and minor formatting changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

<< idObj.toString();
return {ErrorCodes::DuplicateKey, "DuplicateKey"};
}
batchKeys.insert(idObj.getOwned());
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential performance issue: calling getOwned() for every key in the batch may cause unnecessary memory allocations. Since the BSONObj keys are extracted from documents that will remain in scope, consider using a different approach or ensuring this overhead is acceptable for the expected batch sizes.

Suggested change
batchKeys.insert(idObj.getOwned());
batchKeys.insert(idObj);

Copilot uses AI. Check for mistakes.
Comment on lines +519 to +520
MONGO_LOG(0) << "yf: handleError, exception = " << ex.toString() << ", code = " << ex.code() << ", key = " << it->doc.firstElement().fieldNameStringData() << ", value = " << it->doc.firstElement().toString() ;

Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logging statements should be removed before merging to production. These MONGO_LOG statements were likely added during development for debugging purposes.

Suggested change
MONGO_LOG(0) << "yf: handleError, exception = " << ex.toString() << ", code = " << ex.code() << ", key = " << it->doc.firstElement().fieldNameStringData() << ", value = " << it->doc.firstElement().toString() ;

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (4)

838-842: Remove developer identifier prefix and improve error context.

The logging statement contains a developer identifier "yf:" that should be removed before merging, and the getOwned() call may cause unnecessary allocations.

These issues were previously flagged:

  • Line 838: Developer identifier prefix "yf:" should be removed from logging statements
  • Line 840: Error message "DuplicateKey" is too generic and lacks context about which key caused the duplicate
  • Line 842: Calling getOwned() for every key in the batch may cause unnecessary memory allocations

867-869: Remove developer identifier prefix and improve error context.

The logging statement contains a developer identifier "yf:" that should be removed before merging, and the error message lacks context.

These issues were previously flagged:

  • Line 867: Developer identifier prefix "yf:" should be removed from logging statements
  • Line 869: Error message "DuplicateKey" is too generic and doesn't provide information about which key caused the duplicate or which collection/index was affected

1166-1166: Remove developer identifier prefix from logging.

The logging statement contains the developer identifier "yf:" which should be removed before merging to production.

This issue was previously flagged in past reviews.


1174-1174: Remove developer identifier prefix from logging.

The logging statement contains the developer identifier "yf:" which should be removed before merging to production.

This issue was previously flagged in past reviews.

🧹 Nitpick comments (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

1146-1153: Optimize by reusing record.id if batchCheckDuplicateKey was called.

If batchCheckDuplicateKey is called before _insertRecords (as suggested by the assertion at line 1132), then record.id is already set and contains the computed RecordId. This code redundantly re-extracts the ID from the BSONObj.

Once the integration issue (see previous comment on lines 1131-1132) is resolved, consider optimizing this to reuse the existing record.id:

🔎 Proposed optimization (apply after fixing the integration)
-    // All keys are valid (no duplicates found). Now insert all records.
-    // Rebuild batchEntries for insertion
     auto batchEntries = std::make_unique<BatchReadEntry[]>(nRecords);
     for (size_t i = 0; i < nRecords; i++) {
         Record& record = records[i];
-        BSONObj obj{record.data.data()};
-        const BSONObj idObj = getIdBSONObjWithoutFieldName(obj);
         BatchReadEntry& entry = batchEntries[i];
-        entry.resetToKey(idObj);
+        // Reuse the record.id that was already computed by batchCheckDuplicateKey
+        entry.keyString.resetFromBuffer(record.id.repr(), record.id.reprSize());
+        entry.mongoKey = std::make_unique<Eloq::MongoKey>(record.id);
     }

Note: Verify that the KeyString and MongoKey constructors support initialization from the RecordId representation.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a2edc3 and 37f82aa.

📒 Files selected for processing (1)
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
🔇 Additional comments (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

60-70: LGTM: Struct definition is clean and well-structured.

The BatchReadEntry structure consolidates the previously duplicated definition and provides a clear helper method for key initialization.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (7)

840-840: Improve error message with context.

The error message lacks information about which key caused the duplicate or which collection was affected, making debugging difficult.

Based on past review comments.

Enhance error message
-        return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+        return {ErrorCodes::DuplicateKey, 
+                str::stream() << "Duplicate key found in batch for collection " 
+                             << _tableName.StringView() << ": " << idObj.toString()};

842-842: Potential performance issue with getOwned() calls in batch.

Calling getOwned() for every key in the batch may cause unnecessary memory allocations. Since the BSONObj keys are extracted from documents that remain in scope throughout the batch operation, consider whether owned copies are truly necessary here.

Based on past review comments. The concern is that for large batches, repeated allocations could impact performance. Evaluate whether batchKeys.insert(idObj) would suffice, or implement a check to avoid getOwned() when the object is already owned.


869-869: Improve error message with context.

The error message lacks information about which key caused the duplicate or which index was affected, making debugging difficult.

Based on past review comments.

Enhance error message
-        return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+        return {ErrorCodes::DuplicateKey, 
+                str::stream() << "Duplicate key found for collection " 
+                             << _tableName.StringView() << ": " 
+                             << batchEntries[i].keyString.toString()};

838-838: Remove developer identifier prefix from production code.

The logging prefix "yf:" is a developer identifier that should be removed before merging to production.

Based on past review comments.

Apply this change
-        MONGO_LOG(1) << "yf: EloqRecordStore::batchCheckDuplicateKey duplicate key found within batch, key: "
+        MONGO_LOG(1) << "EloqRecordStore::batchCheckDuplicateKey duplicate key found within batch, key: "
                      << idObj.toString();

867-867: Remove developer identifier prefix from production code.

The logging prefix "yf:" should be removed before merging to production.

Based on past review comments.

Apply this change
-        MONGO_LOG(1) << "yf: EloqRecordStore::batchCheckDuplicateKey duplicate key found, key: "
+        MONGO_LOG(1) << "EloqRecordStore::batchCheckDuplicateKey duplicate key found, key: "
                      << batchEntries[i].keyString.toString();

1166-1166: Remove developer identifier prefix from production code.

The logging prefix "yf:" should be removed before merging to production.

Based on past review comments.

Apply this change
-        MONGO_LOG(1) << "yf: EloqRecordStore::_insertRecords setKV, table: " << _tableName.StringView() << ", txn: " << ru->getTxm()->TxNumber() << ", key: " << ks.toString() << ", checkUnique: " << checkUnique;
+        MONGO_LOG(1) << "EloqRecordStore::_insertRecords setKV, table: " << _tableName.StringView() << ", txn: " << ru->getTxm()->TxNumber() << ", key: " << ks.toString() << ", checkUnique: " << checkUnique;

1174-1174: Remove developer identifier prefix from production code.

The logging prefix "yf:" should be removed before merging to production.

Based on past review comments.

Apply this change
-            MONGO_LOG(1) << "yf: _insertRecords setKV failed, table: " << _tableName.StringView() << ", txn: " << ru->getTxm()->TxNumber() << ", key: " << ks.toString() << ", checkUnique: " << checkUnique << ", error: " << txservice::TxErrorMessage(err);
+            MONGO_LOG(1) << "_insertRecords setKV failed, table: " << _tableName.StringView() << ", txn: " << ru->getTxm()->TxNumber() << ", key: " << ks.toString() << ", checkUnique: " << checkUnique << ", error: " << txservice::TxErrorMessage(err);
🧹 Nitpick comments (1)
tests/jstests/core/write_result.js (1)

166-171: Test expectations correctly updated for partial success behavior.

The updated test now expects that when a bulk insert encounters a duplicate key on the second document, the first document successfully inserts (nInserted == 1), with write errors present but no write concern error. This aligns with the PR's goal to implement batch duplicate-key validation that enables partial successes.

Consider removing commented-out code

The commented-out lines 170-171 represent the old behavior and could be removed for code cleanliness, unless they're intentionally preserved for reference during the transition period.

 assert.eq(coll.count(), 1);
-// assert.commandFailedWithCode(result, ErrorCodes.DuplicateKey, result.errmsg)
-// assert.eq(coll.count(), 0);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37f82aa and 738be3f.

📒 Files selected for processing (3)
  • src/mongo/db/catalog/collection_impl.cpp
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
  • tests/jstests/core/write_result.js
🔇 Additional comments (4)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

60-70: LGTM! Well-structured helper for batch operations.

The BatchReadEntry struct provides a clean abstraction for batch read operations, encapsulating key string conversion and storage pointers. The resetToKey helper method properly manages the KeyString and MongoKey lifecycle.


1139-1176: LGTM! Insertion flow correctly sequences ID assignment and validation.

The implementation properly:

  1. Validates the precondition that RecordIds are null on entry (line 1131)
  2. Rebuilds batch entries and assigns RecordIds from key strings (lines 1145-1153)
  3. Performs per-record insertion with duplicate checking (lines 1167-1172)

This design correctly separates the pre-insert validation phase (via batchCheckDuplicateKey) from the actual insertion phase, where RecordIds are assigned.

src/mongo/db/catalog/collection_impl.cpp (2)

77-77: Added include for KeyString support.

The new include is necessary for the batch duplicate-key validation infrastructure.


518-555: Well-structured batch duplicate-key validation flow.

The implementation correctly sequences the validation and insertion phases:

  1. Pre-insert validation (lines 520, 534): Batch checks for duplicates at both the record store and index catalog levels before any mutations
  2. Actual insertion (line 540): Only proceeds if all duplicate checks pass
  3. Post-insert index updates (lines 545-558): Uses the newly assigned RecordIds for index operations

This design ensures atomicity—either all records pass validation and are inserted, or none are—which aligns with the PR's goal of proper bulk write error handling. The assertion at line 551 appropriately verifies that insertRecords has assigned RecordIds.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (12)
src/mongo/db/ops/write_ops_exec.cpp (6)

488-488: Remove commented-out code.

🔎 Proposed fix
-            //opCtx->recoveryUnit()->abandonSnapshot(); // Clear any failed snapshot state

535-535: Remove commented-out code.

🔎 Proposed fix
-                // opCtx->recoveryUnit()->abandonSnapshot();

371-371: Remove debug log before merge.

This always-on debug log with a developer prefix should be removed or changed to a higher log level.

🔎 Proposed fix
-    MONGO_LOG(0) <<  "yf: insertDocuments, commit";

Remove the line or change to:

-    MONGO_LOG(0) <<  "yf: insertDocuments, commit";
+    MONGO_LOG(3) <<  "insertDocuments commit";

477-477: Remove debug log before merge.

🔎 Proposed fix
-        MONGO_LOG(0) <<  "yf: catach, exception = " << e.toString() << ", code = " << e.code();

503-503: Remove debug log before merge.

🔎 Proposed fix
-                    MONGO_LOG(0) <<  "yf: insertDocuments, key = " << it->doc.firstElement().fieldNameStringData() << ", value = " << it->doc.firstElement().toString();

528-529: Remove debug log before merge.

🔎 Proposed fix
-            MONGO_LOG(0) <<  "yf: handleError, exception = " << ex.toString() << ", code = " << ex.code() << ", key = " << it->doc.firstElement().fieldNameStringData() << ", value = " << it->doc.firstElement().toString()    ;
src/mongo/db/modules/eloq/src/eloq_index.cpp (6)

702-703: Consider avoiding getOwned() for keys in batch.

Calling getOwned() for every key creates unnecessary copies. Since the source BSON documents remain in scope for the duration of the batch check, consider whether owned copies are truly necessary.

If owned copies are required for safety, the current approach is correct. Otherwise:

💡 Alternative approach
         // Add all keys to the vector
         for (const BSONObj& key : keys) {
-            allKeys.push_back(key.getOwned());
+            allKeys.push_back(key);  // Unowned reference, valid for batch duration
         }

Only use this if the lifetime of the source documents is guaranteed.


577-674: Critical: Memory safety issue with vector reallocation.

The keyStringBuffers vector is not pre-reserved, and as elements are added via emplace_back (line 616), reallocation can occur. This invalidates pointers held by MongoKey objects in mongoKeys (line 619-620), creating dangling pointers. Additionally, for multikey indexes, GetKeys() can return multiple keys per document, but the code at line 650/661 uses keyStringBuffers[batchIdx] assuming a 1:1 mapping with indexBatchTuples, which will be incorrect.

🔎 Proposed fix

Use a stable container for key storage and track the correct key-to-tuple association:

-    std::vector<std::string> keyStringBuffers;  // Store KeyString buffer data
-    std::vector<std::unique_ptr<Eloq::MongoKey>> mongoKeys;  // Store MongoKey objects
+    // Use deque for stable storage - elements don't move on push_back
+    std::deque<std::string> keyStringBuffers;
+    std::vector<std::unique_ptr<Eloq::MongoKey>> mongoKeys;
+    std::vector<size_t> tupleToBufferIndex;  // Map batch tuple index to buffer index
     std::vector<Eloq::MongoRecord> mongoRecords;  // Store MongoRecord objects
     std::vector<txservice::ScanBatchTuple> indexBatchTuples;
     
+    keyStringBuffers.reserve(keys.size());  // Reserve estimate
+    mongoKeys.reserve(keys.size());
+    mongoRecords.reserve(keys.size());
+    indexBatchTuples.reserve(keys.size());
+    tupleToBufferIndex.reserve(keys.size());

Then when building the batch:

         // Store KeyString buffer data
         keyStringBuffers.emplace_back(keyString.getBuffer(), keyString.getSize());
         
+        size_t bufferIndex = keyStringBuffers.size() - 1;
+        
         // Create MongoKey from buffer
         auto mongoKey = std::make_unique<Eloq::MongoKey>(
             keyStringBuffers.back().data(), keyStringBuffers.back().size());
         mongoKeys.push_back(std::move(mongoKey));
         
         // Create MongoRecord
         mongoRecords.emplace_back();

         // Add to batch
         indexBatchTuples.emplace_back(txservice::TxKey(mongoKeys.back().get()),
                                       &mongoRecords.back());
+        tupleToBufferIndex.push_back(bufferIndex);

Then when checking results:

         for (size_t batchIdx = 0; batchIdx < indexBatchTuples.size(); batchIdx++) {
             const txservice::ScanBatchTuple& tuple = indexBatchTuples[batchIdx];
+            size_t bufferIdx = tupleToBufferIndex[batchIdx];
             if (tuple.status_ == txservice::RecordStatus::Normal) {
                 // Key exists in database or writeset
                 if (currentRecordId.isNull()) {
                     // For insert operations, any existing key is a duplicate
                     MONGO_LOG(1) << "yf: _checkDuplicateKeysInternal duplicate key found, index: "
-                                 << _indexName.StringView() << ", key = " << keyStringBuffers[batchIdx];
+                                 << _indexName.StringView() << ", key = " << keyStringBuffers[bufferIdx];

606-608: Remove debug log prefix and improve error message.

🔎 Proposed fix
-            MONGO_LOG(1) << "yf: _checkDuplicateKeysInternal duplicate key found within batch, index: "
+            MONGO_LOG(1) << "_checkDuplicateKeysInternal duplicate key found within batch, index: "
                          << _indexName.StringView() << ", key: " << key.toString();
-            return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+            return {ErrorCodes::DuplicateKey, 
+                    str::stream() << "Duplicate key in batch for index " << _indexName.StringView() 
+                                  << ": " << key.toString()};

649-651: Remove debug log prefix and improve error message.

🔎 Proposed fix
-                    MONGO_LOG(1) << "yf: _checkDuplicateKeysInternal duplicate key found, index: "
+                    MONGO_LOG(1) << "_checkDuplicateKeysInternal duplicate key found, index: "
                                  << _indexName.StringView() << ", key = " << keyStringBuffers[batchIdx];
-                    return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+                    return {ErrorCodes::DuplicateKey,
+                            str::stream() << "Duplicate key for index " << _indexName.StringView()};

660-664: Remove debug log prefix and improve error message.

🔎 Proposed fix
-                        MONGO_LOG(1) << "yf: _checkDuplicateKeysInternal duplicate key found, index: "
+                        MONGO_LOG(1) << "_checkDuplicateKeysInternal duplicate key found, index: "
                                      << _indexName.StringView() << ", key = " << keyStringBuffers[batchIdx]
                                      << ", existingRecordId = " << existingRecordId
                                      << ", currentRecordId = " << currentRecordId;
-                        return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+                        return {ErrorCodes::DuplicateKey,
+                                str::stream() << "Duplicate key for index " << _indexName.StringView()
+                                              << " (existing doc: " << existingRecordId.toString() << ")"};

840-840: Remove debug log prefix.

🔎 Proposed fix
-        MONGO_LOG(1) << "yf: EloqUniqueIndex::insert setKV failed, index: " << _indexName.StringView() << ", key: " << key.toString() << ", error: " << txservice::TxErrorMessage(err);
+        MONGO_LOG(1) << "EloqUniqueIndex::insert setKV failed, index: " << _indexName.StringView() << ", key: " << key.toString() << ", error: " << txservice::TxErrorMessage(err);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 738be3f and e28f3ea.

📒 Files selected for processing (8)
  • src/mongo/db/catalog/collection_impl.cpp
  • src/mongo/db/index/index_access_method.cpp
  • src/mongo/db/index/index_access_method.h
  • src/mongo/db/modules/eloq/data_substrate
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.h
  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/storage/sorted_data_interface.h
✅ Files skipped from review due to trivial changes (1)
  • src/mongo/db/modules/eloq/data_substrate
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/mongo/db/storage/sorted_data_interface.h
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.h
  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
📚 Learning: 2025-11-24T03:11:24.955Z
Learnt from: lzxddz
Repo: eloqdata/eloqdoc PR: 325
File: src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp:1362-1384
Timestamp: 2025-11-24T03:11:24.955Z
Learning: The EloqDoc codebase uses an older version of glog that supports the two-parameter signature `google::InitGoogleLogging(const char* argv0, PrefixCallback)` for custom log prefix formatting, as seen in src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp line 1383.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
🧬 Code graph analysis (3)
src/mongo/db/index/index_access_method.h (4)
src/mongo/db/storage/sorted_data_interface.h (4)
  • Status (136-140)
  • Status (152-157)
  • Status (163-165)
  • Status (205-208)
src/mongo/db/modules/eloq/src/eloq_index.cpp (6)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
  • checkDuplicateKeysForUpdate (711-721)
  • checkDuplicateKeysForUpdate (711-713)
src/mongo/db/index/index_access_method.cpp (6)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
  • checkDuplicateKeysForUpdate (632-642)
  • checkDuplicateKeysForUpdate (632-634)
  • checkDuplicateKeysForUpdate (644-648)
  • checkDuplicateKeysForUpdate (644-645)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/modules/eloq/src/eloq_index.h (3)
src/mongo/db/modules/eloq/src/eloq_index.cpp (8)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
  • checkDuplicateKeysForUpdate (711-721)
  • checkDuplicateKeysForUpdate (711-713)
  • _checkDuplicateKeysInternal (579-674)
  • _checkDuplicateKeysInternal (579-581)
src/mongo/db/catalog/index_catalog.h (4)
  • batchCheckDuplicateKey (540-543)
  • vector (478-480)
  • BSONObj (449-452)
  • BSONObj (454-457)
src/mongo/db/storage/sorted_data_interface.h (4)
  • Status (136-140)
  • Status (152-157)
  • Status (163-165)
  • Status (205-208)
src/mongo/db/modules/eloq/src/eloq_index.cpp (5)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (15)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
  • MONGO_LOG (98-126)
  • MONGO_LOG (450-483)
  • MONGO_LOG (519-526)
  • MONGO_LOG (528-533)
  • MONGO_LOG (535-540)
  • MONGO_LOG (542-547)
  • mongoKey (724-724)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
  • get (277-279)
  • get (277-277)
src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)
  • TxErrorCodeToMongoStatus (11-75)
  • TxErrorCodeToMongoStatus (11-11)
src/mongo/db/index/index_access_method.cpp (6)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
  • checkDuplicateKeysForUpdate (632-642)
  • checkDuplicateKeysForUpdate (632-634)
  • checkDuplicateKeysForUpdate (644-648)
  • checkDuplicateKeysForUpdate (644-645)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
🔇 Additional comments (13)
src/mongo/db/index/index_access_method.h (1)

159-201: LGTM! Well-documented batch duplicate-check API.

The new public methods provide a clean interface for batch duplicate-key validation and update-time checks. Documentation is thorough and the delegation pattern to SortedDataInterface is appropriate.

src/mongo/db/index/index_access_method.cpp (1)

621-648: LGTM! Straightforward delegation pattern.

The implementations correctly delegate to the underlying SortedDataInterface with appropriate fallback behavior when the interface is unavailable. The wrapper method properly extracts added keys and location from the UpdateTicket.

src/mongo/db/catalog/collection_impl.cpp (4)

519-524: Batch duplicate-key check at record store level.

The pre-insert duplicate check at the record store level is correctly positioned before index checks and actual insertion.


727-743: Update-time duplicate-key validation added.

The code correctly iterates through ready unique indexes and performs duplicate-key checks before applying updates. The check uses the UpdateTicket which contains the added keys and current RecordId.


526-537: Batches are already bounded; multikey index performance for unique indexes warrants monitoring.

Batch sizes are capped at 256 KB (insertVectorMaxBytes), so the batch size concern is mitigated. The implementation properly uses batch operations (batchGetKV) for efficient checking. However, for collections with multiple unique indexes—especially multikey indexes where each document generates multiple keys—monitor performance as the index catalog iterates all unique indexes during the batch check.


545-555: RecordId availability after insertRecords is correctly managed.

Verification confirms that RecordStore::insertRecords always sets record.id for all records before returning, and RecordStore::batchCheckDuplicateKey never modifies record.id (the id assignment in EloqRecordStore::batchCheckDuplicateKey is commented out). The code flow is correct.

src/mongo/db/modules/eloq/src/eloq_index.h (2)

25-27: Necessary includes for batch duplicate-check functionality.

The added includes support the new batch duplicate-check methods.


83-119: Well-documented batch duplicate-check interface.

The new public methods and private helper are clearly documented with parameter descriptions and return value semantics. The separation of the internal helper allows code reuse between insert and update paths.

src/mongo/db/ops/write_ops_exec.cpp (1)

225-250: Command-level transaction support added for inserts.

The new allowCommandLevelTransaction parameter enables insert operations to handle errors within a command-level transaction without throwing. The logic correctly distinguishes between multi-document transactions (which should still throw) and command-level transactions.

src/mongo/db/modules/eloq/src/eloq_index.cpp (4)

30-30: Include added for MongoRecord support.


676-709: Batch duplicate-check delegates to internal helper.

The implementation correctly extracts keys from documents using the index schema and delegates to the internal helper with a null RecordId for insert operations.


711-721: Update duplicate-check delegates to internal helper.

Correctly delegates to the internal helper with the currentRecordId to allow the existing key if it belongs to the document being updated.


816-842: Constraint enforcement is properly delegated to TxUpsert.

The setKV method with the checkUnique parameter (not checkConstraint) correctly delegates unique constraint enforcement to the underlying _txm->TxUpsert() call. Error codes including DUPLICATE_KEY and UNIQUE_CONSTRAINT are properly mapped to ErrorCodes::DuplicateKey via TxErrorCodeToMongoStatus(). The method correctly returns the error status, ensuring duplicates are detected and reported to the caller.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (14)
src/mongo/db/ops/write_ops_exec.cpp (7)

371-371: Remove or adjust debug logging before merge.

This debug log uses level 0 (always logged) and has a developer prefix "yf:". This will cause excessive logging in production.


463-463: Variable name is unclear.

The name e for the caught exception is inconsistent with other catch blocks in this file that use ex. Consider using a more descriptive name for consistency.


477-477: Remove debug log before committing.

The debug log contains developer initials and a typo ("catach" instead of "catch"), indicating this is unfinished development code.


488-488: Commented-out code should be removed.

If abandonSnapshot() is not needed, the comment should be deleted to improve code cleanliness.


503-503: Debug logging statements should be removed before merging to production.

These MONGO_LOG statements were likely added during development for debugging purposes.


528-528: Debug logging statements should be removed before merging to production.

These MONGO_LOG statements were likely added during development for debugging purposes.


535-535: Commented-out code should be removed.

If recovery unit snapshot abandonment is not needed, the comments should be deleted to improve code cleanliness.

src/mongo/db/storage/sorted_data_interface.h (1)

129-157: Strengthen documentation for new duplicate-key hooks

The new hooks are well-placed and the default OK implementations preserve backward compatibility, but the comments leave some behavior ambiguous:

  • For batchCheckDuplicateKey, please document explicitly whether implementers should:
    • detect duplicates within bsonObjPtrs themselves, and
    • detect duplicates against existing index entries, and that this is typically relevant only for unique indexes.
  • For checkDuplicateKeysForUpdate, clarify that:
    • addedKeys are the new keys for a single document,
    • currentRecordId should be excluded from duplicate detection, and
    • the method is a preflight; uniqueness must still be enforced in the actual write path.

Also consider stating that the default implementations are no-ops and may return Status::OK() even when duplicates exist, so callers must treat these as best-effort checks.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

811-876: Reintroduce key-size validation in _insertRecords and improve duplicate-key reporting

Two related concerns here:

  1. Key-size validation appears to be skipped for some insert paths
  • batchCheckDuplicateKey() validates _id size via checkKeySize(idObj, "RecordStore"), but _insertRecords() (lines 1145–1153) now only rebuilds keys and record.id without any size check.
  • EloqRecordStore::insertRecord() calls _insertRecords() directly and never calls batchCheckDuplicateKey(), so single-document inserts will bypass the _id key-size validation that batchCheckDuplicateKey() performs.
  • To keep behavior consistent and robust, _insertRecords() itself should validate key size for every record before constructing the KeyString / RecordId, so that:
    • single-document inserts are covered, and
    • bulk callers are protected even if they forget to invoke batchCheckDuplicateKey() first.

Consider adding something like:

BSONObj obj{record.data.data()};
const BSONObj idObj = getIdBSONObjWithoutFieldName(obj);
Status s = checkKeySize(idObj, "RecordStore");
if (!s.isOK()) {
    return s;
}
BatchReadEntry& entry = batchEntries[i];
entry.resetToKey(idObj);
record.id = RecordId{entry.keyString.getBuffer(), entry.keyString.getSize()};

inside _insertRecords()’s loop, and optionally factoring the common “extract id + checkKeySize + resetToKey” into a small helper shared with batchCheckDuplicateKey().

  1. Duplicate-key diagnostics and logging are still very minimal / personalized
  • Log messages include a developer-specific prefix ("yf: ..."), e.g. in batchCheckDuplicateKey and _insertRecords (setKV logs). These should be cleaned up for production.
  • Both duplicate-key returns in batchCheckDuplicateKey() use a bare "DuplicateKey" message, without indicating the offending key or collection/table, which makes debugging harder.

Consider:

  • Dropping the "yf:" prefix and using a neutral message.
  • Including at least the offending key and _tableName in the returned Status message, e.g.:
return Status(ErrorCodes::DuplicateKey,
              str::stream() << "Duplicate primary key " << idObj
                            << " in table " << _tableName.StringView());

and similarly for the on-disk duplicate case using batchEntries[i].keyString.toString() plus _tableName.

This keeps wire-level behavior aligned with ErrorCodes::DuplicateKey while giving operators actionable context.

Also applies to: 1139-1176

src/mongo/db/catalog/collection_impl.cpp (1)

518-543: Verify that batchCheckDuplicateKey no longer sets RecordId prematurely.

A past review comment (from Copilot) flagged that batchCheckDuplicateKey sets record.id at line 846 in eloq_record_store.cpp, which would cause the assertion record.id.isNull() at line 1131 in _insertRecords to fail. While the current code flow appears correct (check duplicates → insert records → use RecordIds), please verify that batchCheckDuplicateKey implementations do not modify record.id fields.

Run the following script to verify that batchCheckDuplicateKey does not set record.id:

#!/bin/bash
# Search for record.id assignments within batchCheckDuplicateKey implementations
rg -A 10 -B 2 'batchCheckDuplicateKey.*\{' --type=cpp | rg 'record\.id\s*='
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)

610-610: Consider removing redundant getOwned() call.

The key parameter comes from the keys vector (line 603), which should already contain owned BSONObj instances. Calling getOwned() here may create an unnecessary copy. Consider inserting key directly into batchKeys if the keys are guaranteed to be owned.

🔎 Proposed optimization
-        batchKeys.insert(key.getOwned());
+        batchKeys.insert(key);

Only apply this change if you can verify that the keys parameter always contains owned BSONObj instances.

Based on learnings, past review from Copilot identified a similar concern.


817-842: Verify that removing inline duplicate check is safe.

The existing duplicate-key check via getKV (lines 817-827) has been commented out, presumably because duplicate checking now occurs earlier via batchCheckDuplicateKey. Please verify that:

  1. All code paths that call EloqUniqueIndex::insert are protected by prior duplicate-key checks
  2. There are no scenarios where insert could be called without prior validation

Additionally, the log message at line 840 contains the developer identifier "yf:" which should be removed (duplicate of past review comment).

#!/bin/bash
# Find all call sites of EloqUniqueIndex::insert to verify they perform prior duplicate checks
ast-grep --pattern $'$_.insert($$$)' | rg -B 5 'EloqUniqueIndex'

592-629: Critical: Memory safety issue due to vector reallocation.

The keyStringBuffers vector (line 594) is not pre-reserved. When emplace_back at line 616 triggers reallocation, all existing elements are moved to a new memory location, invalidating the pointers stored in mongoKeys (created at lines 619-620 using keyStringBuffers.back().data()). When these dangling pointers are later used in indexBatchTuples at line 627 and in batchGetKV at line 634, this causes undefined behavior.

🔎 Recommended fix

Use a stable container like std::deque for keyStringBuffers, or have MongoKey take ownership of the data:

     // Build batch for this unique index
-    // Use vectors to store KeyString buffers and MongoKey objects
-    std::vector<std::string> keyStringBuffers;  // Store KeyString buffer data
+    // Use deque to prevent pointer invalidation on reallocation
+    std::deque<std::string> keyStringBuffers;  // Store KeyString buffer data
     std::vector<std::unique_ptr<Eloq::MongoKey>> mongoKeys;  // Store MongoKey objects
     std::vector<Eloq::MongoRecord> mongoRecords;  // Store MongoRecord objects
     std::vector<txservice::ScanBatchTuple> indexBatchTuples;

This ensures that adding new elements doesn't invalidate pointers to existing elements.

Based on learnings, the past review from coderabbitai[bot] identified this same issue.


606-608: Remove developer identifier from logs and improve error messages.

Multiple issues flagged in past reviews:

  1. Log messages at lines 606, 649, and 660 contain the developer identifier prefix "yf:" which should be removed before merging.
  2. Error messages at lines 608, 651, and 664 return generic "DuplicateKey" without context about which key or index caused the duplicate.
🔎 Proposed fix
         if (batchKeys.find(key) != batchKeys.end()) {
-            MONGO_LOG(1) << "yf: _checkDuplicateKeysInternal duplicate key found within batch, index: "
+            MONGO_LOG(1) << "_checkDuplicateKeysInternal duplicate key found within batch, index: "
                          << _indexName.StringView() << ", key: " << key.toString();
-            return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+            return {ErrorCodes::DuplicateKey, 
+                    str::stream() << "Duplicate key in index " << _indexName.StringView() 
+                                  << ": " << key.toString()};
         }

Apply similar changes to lines 649-651 and 660-664.

Also applies to: 649-651, 660-664

🧹 Nitpick comments (2)
src/mongo/db/storage/record_store.h (1)

420-432: Clarify contract for batchCheckDuplicateKey in the interface doc

The API surface looks fine, but the comment is quite vague for storage engines implementing it. Consider clarifying a few points:

  • Whether implementations are expected to check both:
    • duplicates within records, and
    • duplicates between records and existing data in the store.
  • That this is a best-effort preflight and engines must still enforce uniqueness at insert time.
  • That the default implementation is a no-op and may return Status::OK() even if duplicates exist.

This will make it easier for other engines to implement consistent behavior.

src/mongo/db/catalog/index_catalog.h (1)

234-235: Index-level batch duplicate check is wired correctly; consider documenting uniqueness/scope more precisely

The new Impl::batchCheckDuplicateKey plus the inline wrapper on IndexCatalog are consistent with the rest of the design (delegation down to IndexAccessMethod/SortedDataInterface).

To make this easier to use correctly from higher layers, consider extending the comment around the public wrapper to note that:

  • Only ready, unique indexes are expected to participate (as enforced in IndexCatalogImpl::batchCheckDuplicateKey).
  • The method is a best-effort preflight; actual insert paths must still enforce uniqueness.
  • Callers should not assume per-document granularity from the Status alone (it represents “some duplicate in this batch”), which matters for how they map errors back to individual writes.

Functionally the addition looks good.

Also applies to: 533-543

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e28f3ea and 1a5f701.

📒 Files selected for processing (18)
  • src/mongo/db/catalog/collection_impl.cpp
  • src/mongo/db/catalog/index_catalog.h
  • src/mongo/db/catalog/index_catalog_impl.cpp
  • src/mongo/db/catalog/index_catalog_impl.h
  • src/mongo/db/commands/write_commands/write_commands.cpp
  • src/mongo/db/index/index_access_method.cpp
  • src/mongo/db/index/index_access_method.h
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.h
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
  • src/mongo/db/modules/eloq/src/eloq_record_store.h
  • src/mongo/db/operation_context.cpp
  • src/mongo/db/operation_context.h
  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/storage/record_store.h
  • src/mongo/db/storage/sorted_data_interface.h
  • tests/jstests/core/batch_write_command_insert.js
  • tests/jstests/core/write_result.js
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/jstests/core/write_result.js
  • src/mongo/db/commands/write_commands/write_commands.cpp
  • src/mongo/db/catalog/index_catalog_impl.h
  • src/mongo/db/operation_context.h
  • src/mongo/db/index/index_access_method.h
  • src/mongo/db/operation_context.cpp
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
📚 Learning: 2025-11-24T03:11:24.955Z
Learnt from: lzxddz
Repo: eloqdata/eloqdoc PR: 325
File: src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp:1362-1384
Timestamp: 2025-11-24T03:11:24.955Z
Learning: The EloqDoc codebase uses an older version of glog that supports the two-parameter signature `google::InitGoogleLogging(const char* argv0, PrefixCallback)` for custom log prefix formatting, as seen in src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp line 1383.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.h
🧬 Code graph analysis (10)
src/mongo/db/catalog/index_catalog_impl.cpp (5)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog.h (3)
  • batchCheckDuplicateKey (540-543)
  • accessMethod (117-119)
  • indexRecords (515-519)
src/mongo/db/commands/create_indexes.cpp (4)
  • opCtx (261-457)
  • opCtx (261-265)
  • opCtx (460-478)
  • opCtx (460-462)
src/mongo/db/modules/eloq/src/eloq_record_store.h (5)
src/mongo/db/storage/sorted_data_interface.h (4)
  • Status (136-140)
  • Status (152-157)
  • Status (163-165)
  • Status (205-208)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/storage/sorted_data_interface.h (3)
src/mongo/db/modules/eloq/src/eloq_index.cpp (6)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
  • checkDuplicateKeysForUpdate (711-721)
  • checkDuplicateKeysForUpdate (711-713)
src/mongo/db/index/index_access_method.cpp (6)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
  • checkDuplicateKeysForUpdate (632-642)
  • checkDuplicateKeysForUpdate (632-634)
  • checkDuplicateKeysForUpdate (644-648)
  • checkDuplicateKeysForUpdate (644-645)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)
  • TxErrorCodeToMongoStatus (11-75)
  • TxErrorCodeToMongoStatus (11-11)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (15)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
  • MONGO_LOG (98-126)
  • MONGO_LOG (450-483)
  • MONGO_LOG (519-526)
  • MONGO_LOG (528-533)
  • MONGO_LOG (535-540)
  • MONGO_LOG (542-547)
  • mongoKey (724-724)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)
  • TxErrorCodeToMongoStatus (11-75)
  • TxErrorCodeToMongoStatus (11-11)
src/mongo/db/index/index_access_method.cpp (6)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
  • checkDuplicateKeysForUpdate (632-642)
  • checkDuplicateKeysForUpdate (632-634)
  • checkDuplicateKeysForUpdate (644-648)
  • checkDuplicateKeysForUpdate (644-645)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/catalog/collection_impl.cpp (2)
src/mongo/db/commands/create_indexes.cpp (4)
  • opCtx (261-457)
  • opCtx (261-265)
  • opCtx (460-478)
  • opCtx (460-462)
src/mongo/db/catalog/index_catalog_entry.h (1)
  • begin (306-308)
src/mongo/db/modules/eloq/src/eloq_index.h (1)
src/mongo/db/modules/eloq/src/eloq_index.cpp (8)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
  • checkDuplicateKeysForUpdate (711-721)
  • checkDuplicateKeysForUpdate (711-713)
  • _checkDuplicateKeysInternal (579-674)
  • _checkDuplicateKeysInternal (579-581)
src/mongo/db/storage/record_store.h (6)
src/mongo/db/storage/sorted_data_interface.h (5)
  • Status (136-140)
  • Status (152-157)
  • Status (163-165)
  • Status (205-208)
  • std (420-422)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog.h (2)
  • batchCheckDuplicateKey (540-543)
  • vector (478-480)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.cpp (4)
src/mongo/db/modules/eloq/src/eloq_index.cpp (6)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
  • checkDuplicateKeysForUpdate (711-721)
  • checkDuplicateKeysForUpdate (711-713)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/index/index_access_method.h (1)
  • getSortedDataInterface (165-167)
src/mongo/db/catalog/index_catalog.h (5)
src/mongo/db/storage/sorted_data_interface.h (4)
  • Status (136-140)
  • Status (152-157)
  • Status (163-165)
  • Status (205-208)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (676-709)
  • batchCheckDuplicateKey (676-677)
  • opCtx (218-223)
  • opCtx (218-218)
src/mongo/db/index/index_access_method.cpp (2)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (8)
  • batchCheckDuplicateKey (811-876)
  • batchCheckDuplicateKey (811-812)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
🔇 Additional comments (14)
src/mongo/db/ops/write_ops_exec.cpp (5)

225-230: LGTM: Command-level transaction parameter addition.

The new allowCommandLevelTransaction parameter with a default value of false maintains backward compatibility while enabling special error handling for insert operations within command-level transactions, consistent with the PR objective.


246-250: LGTM: Differential error handling for insert vs update/delete.

The conditional logic correctly allows command-level transaction error handling for insert operations while preserving the original throw behavior for update/delete operations, aligning with MongoDB's bulk write semantics.


466-475: LGTM: Correct guard for multi-document transactions and non-DuplicateKey errors.

The logic appropriately rethrows exceptions for multi-document transactions and non-DuplicateKey errors, allowing only DuplicateKey errors to be handled in command-level transactions. This aligns with the PR objective of supporting partial inserts with duplicate-key violations.


487-490: LGTM: Correct recovery unit state reset.

The state reset to kActiveUnitOfWork is correct and allows continued processing within the outer command-level transaction after a nested WriteUnitOfWork failure, as documented in previous reviews.


656-658: LGTM: Consistent handleError call for insert validation errors.

The updated call correctly passes true for allowCommandLevelTransaction, consistent with the new command-level transaction support for insert operations.

tests/jstests/core/batch_write_command_insert.js (2)

105-121: Updated expectations correctly reflect bulk insert semantics

The revised assertions for:

  • illegal-key insert (lines 105–121),
  • single and multi-document unique-index duplicates (lines ~190–248), and
  • the ordered bulk failure count (lines ~398–408)

all now expect ok: 1 with writeErrors and an appropriate n, which matches bulk write behavior (partial success with per-write errors). The final collection counts in each case (1 doc for unique-index scenarios) also look correct.

Also applies to: 190-248, 398-408


428-473: New bulk write error-format tests provide good coverage

The new ordered and unordered tests at the end:

  • Assert ok: 1 and verify writeErrors[].code === ErrorCodes.DuplicateKey and proper index values.
  • Distinguish ordered behavior (n: 0, stop at first error) from unordered behavior (n: 2, continue after duplicate errors).
  • Confirm final collection counts (1 and 3 respectively) match expected partial inserts.

These checks nicely pin down the wire protocol shape for DuplicateKey in bulk writes.

src/mongo/db/modules/eloq/src/eloq_record_store.h (1)

240-242: Header override for batchCheckDuplicateKey matches implementation

The new override in EloqRecordStore cleanly exposes the batch duplicate-key hook from RecordStore and matches the implementation in the .cpp. No interface issues here.

src/mongo/db/catalog/index_catalog_impl.cpp (1)

1449-1478: Index-level batch duplicate check is implemented correctly over unique, ready indexes

IndexCatalogImpl::batchCheckDuplicateKey:

  • Iterates all entries, skips unfinished and non-unique indexes, and
  • Delegates to each index’s IndexAccessMethod::batchCheckDuplicateKey, returning the first non-OK status.

This matches the intended layering (catalog → access method → SDI/EloqIndex) and keeps non-participating engines effectively no-op via their defaults. The updated indexRecords loop is also consistent with how _indexRecords is applied per entry.

src/mongo/db/index/index_access_method.cpp (1)

621-648: Delegation pattern is clean; verify call sites don't rely on these as the sole uniqueness check

The new methods correctly delegate to SortedDataInterface:

  • batchCheckDuplicateKey(...) → forwards to getSortedDataInterface()->batchCheckDuplicateKey(...) or returns OK if SDI is unavailable.
  • checkDuplicateKeysForUpdate(...) (both overloads) → similarly forward to SDI or return OK.

Storage engines can override these (e.g., Eloq does). Call sites in collection_impl.cpp check return statuses and fail if non-OK, indicating these are relied upon. Since default SDI implementations are no-ops returning Status::OK(), verify that the existing unique-index logic is still enforced in the same code paths—these should be preflight checks in addition to, not instead of, the underlying storage engine's uniqueness guarantees.

src/mongo/db/catalog/collection_impl.cpp (1)

727-743: Good addition: pre-update duplicate-key validation.

The new duplicate-key check for update operations correctly validates unique index constraints before applying the update. This aligns with the pre-insert validation pattern and should catch duplicate-key violations early.

src/mongo/db/modules/eloq/src/eloq_index.h (1)

25-27: LGTM: Clean API additions with comprehensive documentation.

The new includes, public methods (batchCheckDuplicateKey, checkDuplicateKeysForUpdate), and private helper (_checkDuplicateKeysInternal) are well-documented and integrate cleanly with the existing interface. The documentation clearly explains the purpose, parameters, and return values.

Also applies to: 83-119

src/mongo/db/modules/eloq/src/eloq_index.cpp (2)

676-709: LGTM: Batch duplicate-key check implementation is correct.

The method correctly:

  1. Early-returns for non-unique indexes and empty batches
  2. Extracts keys from documents using the key schema (handling multikey indexes)
  3. Delegates to the internal helper with a null RecordId (indicating insert operation)

The getOwned() call at line 703 is necessary to ensure extracted keys outlive the loop.


711-721: LGTM: Update duplicate-key check delegates correctly.

The method appropriately delegates to _checkDuplicateKeysInternal with the currentRecordId parameter, enabling the internal method to distinguish between updates (where the existing record with the same RecordId is acceptable) and inserts (where any existing key is a duplicate).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/mongo/db/modules/eloq/src/eloq_index.cpp (3)

594-628: Vector reallocation can invalidate pointers stored in mongoKeys.

This issue was flagged in a previous review. The keyStringBuffers vector is not pre-reserved, so as elements are emplaced, reallocation can invalidate pointers to previously stored data. The MongoKey objects in mongoKeys hold pointers to keyStringBuffers[i].data(), which become dangling after reallocation.


606-608: Remove developer identifier prefix and improve error message.

The "yf:" prefix should be removed from log messages before merging, and the error message "DuplicateKey" should include context (key value, index name) to aid debugging. These issues were flagged in previous reviews.


800-827: Approve the refactored insert path; note the log prefix issue.

Commenting out the pre-existence getKV check aligns with the new batch duplicate-checking flow. The setKV call remains as the actual insertion mechanism.

The "yf:" prefix on line 824 should be removed as noted in previous reviews.

🧹 Nitpick comments (1)
src/mongo/db/catalog/collection_impl.cpp (1)

726-743: Consider consolidating duplicate index iterations.

The code correctly performs pre-update duplicate-key checks for ready, unique indexes before applying the update. However, this results in two separate iterations through the indexes (lines 708-725 for validateUpdate, then lines 728-743 for checkDuplicateKeysForUpdate).

🔎 Optional refactor to combine index iterations
     IndexCatalog::IndexIterator ii = _indexCatalog.getIndexIterator(opCtx, true);
     while (ii.more()) {
         IndexDescriptor* descriptor = ii.next();
         IndexCatalogEntry* entry = ii.catalogEntry(descriptor);
         IndexAccessMethod* iam = ii.accessMethod(descriptor);
 
         InsertDeleteOptions options;
         IndexCatalog::prepareInsertDeleteOptions(opCtx, descriptor, &options);
         UpdateTicket* updateTicket = new UpdateTicket();
         updateTickets.mutableMap()[descriptor] = updateTicket;
         uassertStatusOK(iam->validateUpdate(opCtx,
                                             oldDoc.value(),
                                             newDoc,
                                             oldLocation,
                                             options,
                                             updateTicket,
                                             entry->getFilterExpression()));
+        
+        // Check for duplicate keys immediately after validation
+        if (entry->isReady(opCtx) && descriptor->unique()) {
+            uassertStatusOK(iam->checkDuplicateKeysForUpdate(opCtx, *updateTicket));
+        }
     }
-
-    // Check for duplicate keys in the added keys of each index before updating the record
-    ii = _indexCatalog.getIndexIterator(opCtx, true);
-    while (ii.more()) {
-        IndexDescriptor* descriptor = ii.next();
-        IndexCatalogEntry* entry = ii.catalogEntry(descriptor);
-        IndexAccessMethod* iam = ii.accessMethod(descriptor);
-        if (!entry->isReady(opCtx)) {
-            continue;
-        }
-
-        if (!descriptor->unique()) {
-            continue;
-        }
-
-        UpdateTicket* updateTicket = updateTickets.mutableMap()[descriptor];
-        uassertStatusOK(iam->checkDuplicateKeysForUpdate(opCtx, *updateTicket));
-    }
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a5f701 and 486dfc5.

📒 Files selected for processing (3)
  • src/mongo/db/catalog/collection_impl.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.h
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-24T03:11:24.955Z
Learnt from: lzxddz
Repo: eloqdata/eloqdoc PR: 325
File: src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp:1362-1384
Timestamp: 2025-11-24T03:11:24.955Z
Learning: The EloqDoc codebase uses an older version of glog that supports the two-parameter signature `google::InitGoogleLogging(const char* argv0, PrefixCallback)` for custom log prefix formatting, as seen in src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp line 1383.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.h
🧬 Code graph analysis (1)
src/mongo/db/modules/eloq/src/eloq_index.h (5)
src/mongo/db/modules/eloq/src/eloq_index.cpp (8)
  • batchCheckDuplicateKey (660-693)
  • batchCheckDuplicateKey (660-661)
  • opCtx (218-223)
  • opCtx (218-218)
  • checkDuplicateKeysForUpdate (695-705)
  • checkDuplicateKeysForUpdate (695-697)
  • _checkDuplicateKeysInternal (579-658)
  • _checkDuplicateKeysInternal (579-581)
src/mongo/db/index/index_access_method.cpp (6)
  • batchCheckDuplicateKey (621-630)
  • batchCheckDuplicateKey (621-622)
  • checkDuplicateKeysForUpdate (632-642)
  • checkDuplicateKeysForUpdate (632-634)
  • checkDuplicateKeysForUpdate (644-648)
  • checkDuplicateKeysForUpdate (644-645)
src/mongo/db/catalog/index_catalog.h (4)
  • batchCheckDuplicateKey (540-543)
  • vector (478-480)
  • BSONObj (449-452)
  • BSONObj (454-457)
src/mongo/db/catalog/index_catalog_impl.cpp (2)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
src/mongo/db/storage/sorted_data_interface.h (4)
  • Status (136-140)
  • Status (152-157)
  • Status (163-165)
  • Status (205-208)
🔇 Additional comments (5)
src/mongo/db/catalog/collection_impl.cpp (3)

77-77: Duplicate include already flagged.

This duplicate include was already identified in a previous review.


545-555: LGTM: Correct use of assigned RecordIds.

The code correctly retrieves RecordIds from the records vector after insertRecords has assigned them. The assertion at line 551 ensures the RecordId is valid before constructing the BsonRecord.


518-543: No action required. The code correctly handles RecordIds.

The batchCheckDuplicateKey function (line 846 in eloq_record_store.cpp) has the RecordId assignment commented out (// record.id = RecordId{...}), so it does not modify record.id. The subsequent insertRecords call at line 540 proceeds with null RecordIds, satisfying the assertion at line 1131 (assert(record.id.isNull())).

Likely an incorrect or invalid review comment.

src/mongo/db/modules/eloq/src/eloq_index.h (1)

83-119: LGTM! Well-documented batch duplicate-key checking API.

The new public methods align with the SortedDataInterface base class declarations, and the private helper appropriately consolidates the duplicate-checking logic. The documentation clearly explains the purpose and parameters of each method.

src/mongo/db/modules/eloq/src/eloq_index.cpp (1)

660-705: Implementation structure is correct; depends on fixing the internal helper.

The batchCheckDuplicateKey method correctly extracts keys from documents and the checkDuplicateKeysForUpdate properly passes the currentRecordId. Both delegate to _checkDuplicateKeysInternal, so they will work correctly once the currentRecordId exclusion logic is implemented in that helper.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
tests/jstests/core/batch_write_command_insert.js (1)

426-426: Remove trailing whitespace.

This line contains trailing whitespace that should be cleaned up.

src/mongo/db/ops/write_ops_exec.cpp (3)

371-371: Remove debug logging before merge.

This debug log uses level 0 (always logged) and has a developer prefix "yf:". This will cause excessive logging in production. Similar debug logs appear at lines 477, 503, and 529.

🔎 Proposed fix
-    MONGO_LOG(0) <<  "yf: insertDocuments, commit";

Apply similar removal to lines 477, 503, and 529.


529-530: Remove debug logging.

This debug log with the "yf:" prefix and level 0 should be removed before merge.

🔎 Proposed fix
-            MONGO_LOG(0) <<  "yf: handleError, exception = " << ex.toString() << ", code = " << ex.code() << ", key = " << it->doc.firstElement().fieldNameStringData() << ", value = " << it->doc.firstElement().toString()    ;

503-503: Remove debug logging.

This debug log should be removed before merge.

🔎 Proposed fix
-                    MONGO_LOG(0) <<  "yf: insertDocuments, key = " << it->doc.firstElement().fieldNameStringData() << ", value = " << it->doc.firstElement().toString();
🧹 Nitpick comments (2)
src/mongo/db/ops/write_ops_exec.cpp (1)

481-481: Remove trailing whitespace.

-        
+
src/mongo/db/catalog/collection_impl.cpp (1)

520-522: Nitpick: Inconsistent brace placement.

The opening brace on line 520 is on a separate line, which differs from the predominant style in this codebase (e.g., lines 519, 533, 539). Consider aligning with the established pattern for consistency.

🔎 Proposed fix
-    if (!status.isOK())
-    {
+    if (!status.isOK()) {
         return status;
-    }
+    }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 486dfc5 and 8a14cfd.

📒 Files selected for processing (3)
  • src/mongo/db/catalog/collection_impl.cpp
  • src/mongo/db/ops/write_ops_exec.cpp
  • tests/jstests/core/batch_write_command_insert.js
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
📚 Learning: 2025-11-24T03:11:24.955Z
Learnt from: lzxddz
Repo: eloqdata/eloqdoc PR: 325
File: src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp:1362-1384
Timestamp: 2025-11-24T03:11:24.955Z
Learning: The EloqDoc codebase uses an older version of glog that supports the two-parameter signature `google::InitGoogleLogging(const char* argv0, PrefixCallback)` for custom log prefix formatting, as seen in src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp line 1383.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
🧬 Code graph analysis (2)
src/mongo/db/catalog/collection_impl.cpp (1)
src/mongo/db/commands/create_indexes.cpp (4)
  • opCtx (261-457)
  • opCtx (261-265)
  • opCtx (460-478)
  • opCtx (460-462)
src/mongo/db/ops/write_ops_exec.cpp (1)
src/mongo/db/commands/write_commands/write_commands.cpp (10)
  • opCtx (230-230)
  • opCtx (232-247)
  • opCtx (232-232)
  • opCtx (258-265)
  • opCtx (258-258)
  • opCtx (267-282)
  • opCtx (267-267)
  • opCtx (306-315)
  • opCtx (306-306)
  • opCtx (353-361)
🔇 Additional comments (14)
tests/jstests/core/batch_write_command_insert.js (5)

115-120: LGTM - Test now validates MongoDB-compatible bulk write error semantics.

The change from asserting !result.ok to result.ok with writeErrors array correctly aligns with MongoDB's BulkWriteException format where partial failures return ok: 1 with errors in the writeErrors array.


214-226: LGTM - Thorough validation of unordered batch error semantics.

The test correctly validates that in unordered mode:

  1. First document succeeds (n: 1)
  2. Subsequent duplicates generate writeErrors at correct indices (1 and 2)
  3. Error structure includes required fields (code, errmsg)

238-246: LGTM - Ordered batch error handling is correctly validated.

The test verifies that ordered inserts stop at the first error, resulting in 1 successful insert and 1 writeError at index 1.


405-406: Verify the expected count logic.

The test inserts {_id: 50} initially, then bulk inserts {_id: 1} through {_id: 99}. Since this is an ordered bulk operation:

  • Documents _id: 1 through _id: 49 insert successfully (49 docs)
  • _id: 50 fails with duplicate key, stopping the ordered batch
  • Total count: 1 (initial) + 49 (successful inserts) = 50

The assertion coll.count() == 50 appears correct for the new partial-success semantics. The commented-out assertion expecting 1 document reflects the old behavior where the entire batch would fail.


433-472: LGTM - Comprehensive new tests for bulk write error format.

These new tests thoroughly validate:

  1. Ordered batch (lines 438-450): Stops at first error, returns ok: 1, n: 0, single writeError at index 0 with DuplicateKey code
  2. Unordered batch (lines 456-472): Continues after errors, returns ok: 1, n: 2, writeErrors for all duplicates

The forEach validation ensures each error has the proper code (number) and errmsg (string) types.

src/mongo/db/ops/write_ops_exec.cpp (7)

229-230: LGTM - New parameter enables command-level transaction handling for inserts.

The allowCommandLevelTransaction parameter with a default of false maintains backward compatibility while allowing insert operations to handle errors within command-level transactions.


246-250: LGTM - Correctly differentiates insert vs update/delete error handling.

The logic allows inserts (with allowCommandLevelTransaction=true) to continue error handling within a WriteUnitOfWork, while updates and deletes maintain the original behavior of throwing.


463-475: Exception handling order is correct.

The flow correctly:

  1. Checks for multi-document transaction (rethrows if true)
  2. Checks for non-DuplicateKey errors (rethrows if true)
  3. Falls through to recovery logic for DuplicateKey errors

This allows only DuplicateKey errors to be handled gracefully during batch inserts.


487-490: State reset logic is correct.

The reset from kFailedUnitOfWork to kActiveUnitOfWork is necessary for the fallback single-insert path. The commented-out abandonSnapshot() is correctly left commented as it would violate invariants within the active WriteUnitOfWork.


521-526: Error handling order is now correct.

The code correctly checks ex.code() != ErrorCodes::DuplicateKey before calling handleError, ensuring that non-DuplicateKey errors are rethrown without being recorded in out->results. This addresses the previously flagged critical issue about error handling order.


535-538: State reset logic is consistent with the batch fallback path.

The same pattern used in lines 487-490 is correctly applied here for the single-insert error recovery path.


658-658: Consistent use of allowCommandLevelTransaction=true for insert error handling.

This ensures document validation errors during inserts also benefit from the new command-level transaction error handling.

src/mongo/db/catalog/collection_impl.cpp (2)

724-741: Good: Duplicate key validation added for updates.

The second pass through indexes to check for duplicate keys before applying updates is well-structured. It correctly:

  • Filters for ready indexes only
  • Processes only unique indexes
  • Uses the prepared UpdateTickets from the first pass

Ensure that transaction isolation or collection-level locking prevents concurrent inserts from creating duplicate keys between the check (line 740) and the actual update (line 746+).


516-541: No critical issue exists with RecordId handling.

The review comment's premise is incorrect. In batchCheckDuplicateKey (eloq_record_store.cpp), the line that would modify record.id at line 846 is commented out:

// record.id = RecordId{entry.keyString.getBuffer(), entry.keyString.getSize()};

As a result, batchCheckDuplicateKey does not modify record.id. Records retain their null RecordId values when passed to insertRecords, which correctly expects null ids (assert at line 1131). The implementation follows the correct sequence:

  1. Records created with null RecordId
  2. batchCheckDuplicateKey validates without modifying RecordId
  3. insertRecords receives null RecordIds and sets them during actual insertion
  4. Post-insertion, RecordIds are populated for index operations

No RecordId clearing is needed. The code is correct as written.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/jstests/core/opcounters_write_cmd.js (2)

48-53: Correct alignment with documented behavior for ordered inserts.

The updated expectation opCounters.insert + 2 correctly implements the documented behavior on lines 21-23: "counted as K + 1 ops, where K is number of docs successfully inserted, adding the failed attempt." In this scenario, the first document succeeds (K=1) and the second fails on duplicate key (+1), totaling 2 operations.

Consider removing the commented-out assertion on line 53 to reduce clutter, as the old behavior is superseded by this PR's changes.

🔎 Suggested cleanup
     assert.eq(opCounters.insert + 2, newdb.serverStatus().opcounters.insert);
-    // assert.eq(opCounters.insert, newdb.serverStatus().opcounters.insert);

55-61: Correct alignment with documented behavior for unordered inserts.

The updated expectation opCounters.insert + 3 correctly implements the documented behavior on lines 19-20: "counted as N ops, regardless of errors." All three insert attempts are counted in unordered mode, even though the second insert fails due to duplicate _id: 5.

Remove the commented-out assertion on line 61 since it represents the previous incorrect behavior.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a14cfd and b76e87a.

📒 Files selected for processing (1)
  • tests/jstests/core/opcounters_write_cmd.js

@lokax lokax removed the trigger-ci label Dec 31, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/mongo/db/modules/eloq/src/eloq_index.cpp (3)

671-683: Consider avoiding unnecessary getOwned() calls.

Calling getOwned() for every key in the batch may cause unnecessary memory allocations. Since the source BSON documents remain in scope for the duration of this batch, the keys could be used directly if lifetime is carefully managed.

This is an optional optimization—only pursue if profiling shows this is a bottleneck.


577-652: Critical: currentRecordId parameter is accepted but never used.

The method signature accepts currentRecordId and the documentation states it should be excluded from duplicate checks for update operations. However, the implementation never references currentRecordId. Lines 641-647 return DuplicateKey for any existing key without checking whether the existing record belongs to the document being updated.

This will cause update operations to incorrectly fail with duplicate key errors when the document's own indexed key is being checked.

🔎 Required fix
         // Check results for duplicates
         for (size_t batchIdx = 0; batchIdx < indexBatchTuples.size(); batchIdx++) {
             const txservice::ScanBatchTuple& tuple = indexBatchTuples[batchIdx];
             if (tuple.status_ == txservice::RecordStatus::Normal) {
-                // For insert operations, any existing key is a duplicate
-                return {ErrorCodes::DuplicateKey, "DuplicateKey"};
-                
+                // For update operations, check if the existing record belongs to the current document
+                if (currentRecordId.isValid()) {
+                    const Eloq::MongoRecord* existingRecord = 
+                        static_cast<const Eloq::MongoRecord*>(tuple.record_);
+                    if (existingRecord) {
+                        RecordId existingRecordId = existingRecord->ToRecordId(false);
+                        if (existingRecordId == currentRecordId) {
+                            // Same document, not a duplicate
+                            continue;
+                        }
+                    }
+                }
+                // Different document or insert operation - this is a duplicate
+                return {ErrorCodes::DuplicateKey, 
+                        str::stream() << "Duplicate key in index " << _indexName.StringView()};
             } else if (tuple.status_ == txservice::RecordStatus::Deleted) {
-                invariant(tuple.status_ == txservice::RecordStatus::Deleted);
+                // Expected case
+            } else {
+                return {ErrorCodes::InternalError, 
+                        str::stream() << "Unexpected record status: " << static_cast<int>(tuple.status_)};
             }
         }

592-627: Critical: Vector reallocation causes dangling pointers in MongoKey objects.

keyStringBuffers is not pre-reserved, so as elements are emplaced, reallocation can invalidate pointers to previously stored data. The MongoKey objects in mongoKeys hold pointers to keyStringBuffers[i].data(), which become dangling after reallocation.

Additionally, for multikey indexes where GetKeys() returns multiple keys per document, the current 1:1 index mapping assumption breaks.

🔎 Proposed fix
     // Build batch for this unique index
-    // Use vectors to store KeyString buffers and MongoKey objects
-    std::vector<std::string> keyStringBuffers;  // Store KeyString buffer data
+    // Use deque for stable addresses even after insertions
+    std::deque<std::string> keyStringBuffers;  // Store KeyString buffer data
     std::vector<std::unique_ptr<Eloq::MongoKey>> mongoKeys;  // Store MongoKey objects
     std::vector<Eloq::MongoRecord> mongoRecords;  // Store MongoRecord objects
     std::vector<txservice::ScanBatchTuple> indexBatchTuples;

Alternatively, store the key data inside MongoKey or use reserve() with careful capacity management.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

836-840: Improve error message with context.

The error message "DuplicateKey" is too generic and doesn't identify which key or table is affected. Include the duplicate key value and table name for better debugging.

🔎 Suggested improvement
         // Check if this key already exists in the batch
         if (batchKeys.find(idObj) != batchKeys.end()) {
-            return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+            return {ErrorCodes::DuplicateKey, 
+                    str::stream() << "Duplicate key in batch for table " 
+                                  << _tableName.StringView() << ": " << idObj};
         }
🧹 Nitpick comments (1)
src/mongo/db/modules/eloq/src/eloq_index.cpp (1)

795-805: Remove commented-out code before merging.

The old duplicate-check logic is commented out rather than deleted. Since the new approach delegates duplicate detection to setKV with checkUnique=true, the old code should be removed to avoid clutter.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b76e87a and 9425741.

📒 Files selected for processing (3)
  • src/mongo/db/commands/write_commands/write_commands.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.

Applied to files:

  • src/mongo/db/commands/write_commands/write_commands.cpp
📚 Learning: 2025-11-24T03:11:24.955Z
Learnt from: lzxddz
Repo: eloqdata/eloqdoc PR: 325
File: src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp:1362-1384
Timestamp: 2025-11-24T03:11:24.955Z
Learning: The EloqDoc codebase uses an older version of glog that supports the two-parameter signature `google::InitGoogleLogging(const char* argv0, PrefixCallback)` for custom log prefix formatting, as seen in src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp line 1383.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
🧬 Code graph analysis (2)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (3)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (654-687)
  • batchCheckDuplicateKey (654-655)
  • opCtx (218-223)
  • opCtx (218-218)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
  • get (277-279)
  • get (277-277)
src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)
  • TxErrorCodeToMongoStatus (11-75)
  • TxErrorCodeToMongoStatus (11-11)
src/mongo/db/modules/eloq/src/eloq_index.cpp (3)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (9)
  • opCtx (147-149)
  • opCtx (147-147)
  • opCtx (439-448)
  • opCtx (439-439)
  • opCtx (549-554)
  • opCtx (549-549)
  • mongoKey (724-724)
  • batchCheckDuplicateKey (811-872)
  • batchCheckDuplicateKey (811-812)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
  • get (277-279)
  • get (277-277)
src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)
  • TxErrorCodeToMongoStatus (11-75)
  • TxErrorCodeToMongoStatus (11-11)
🔇 Additional comments (3)
src/mongo/db/commands/write_commands/write_commands.cpp (1)

57-57: Remove unused logging include.

The mongo/util/log.h include at line 57 is unused—no logging calls (LOG, LOGV2, warning, error) are present in this file. Remove it to minimize unnecessary compilation dependencies.

Likely an incorrect or invalid review comment.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

60-70: LGTM! Clean batch entry structure.

The BatchReadEntry struct provides a well-organized container for batch read operations with proper initialization via resetToKey.


1113-1149: Well-designed batch insert flow.

The method correctly handles the ID extraction and assignment workflow:

  1. Expects null record.id on entry (line 1127)
  2. Extracts IDs from BSON documents (lines 1141-1148)
  3. Sets record.id before insertion

This properly addresses the previous concern about insertRecord failing due to null IDs.

Comment on lines +861 to +869
// Check results for duplicates
for (size_t i = 0; i < nRecords; i++) {
const txservice::ScanBatchTuple& tuple = batchTuples[i];
if (tuple.status_ == txservice::RecordStatus::Normal) {
return {ErrorCodes::DuplicateKey, "DuplicateKey"};
} else {
invariant(tuple.status_ == txservice::RecordStatus::Deleted);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Replace unsafe invariant with explicit status handling.

The invariant at line 867 assumes only Normal or Deleted statuses, but the getKV() function explicitly handles txservice::RecordStatus::Unknown with retry logic. If Unknown appears in batch results, the invariant will crash the server.

🔎 Proposed fix
     // Check results for duplicates
     for (size_t i = 0; i < nRecords; i++) {
         const txservice::ScanBatchTuple& tuple = batchTuples[i];
         if (tuple.status_ == txservice::RecordStatus::Normal) {
-            return {ErrorCodes::DuplicateKey, "DuplicateKey"};
-        } else {
-            invariant(tuple.status_ == txservice::RecordStatus::Deleted);
+            return {ErrorCodes::DuplicateKey, 
+                    str::stream() << "Duplicate key for table " << _tableName.StringView()};
+        } else if (tuple.status_ == txservice::RecordStatus::Deleted) {
+            // Expected case for insert - key doesn't exist
+        } else {
+            return {ErrorCodes::InternalError, 
+                    str::stream() << "Unexpected record status from batchGetKV: " 
+                                  << static_cast<int>(tuple.status_)};
         }
     }
🤖 Prompt for AI Agents
In src/mongo/db/modules/eloq/src/eloq_record_store.cpp around lines 861-869,
replace the unsafe invariant that assumes only Normal or Deleted statuses with
explicit handling for all RecordStatus values: for Normal return DuplicateKey,
for Deleted continue, and for Unknown (or any other unexpected status) return a
transient/ retryable error (e.g. WriteConflict or a clearly named error) so
callers that implement getKV() retry logic can handle it safely; add a short log
or comment explaining the Unknown case is transient.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
src/mongo/db/ops/write_ops_exec.cpp (1)

485-488: Consider removing commented-out code.

The commented-out abandonSnapshot() call adds noise. The surrounding comment already explains why we reset state without aborting. If you want to preserve the historical context, consider a brief inline comment instead of the dead code.

🔎 Proposed fix
         if (opCtx->getRecoveryUnitState() == WriteUnitOfWork::RecoveryUnitState::kFailedUnitOfWork) {
-            //opCtx->recoveryUnit()->abandonSnapshot(); // Clear any failed snapshot state
+            // Note: abandonSnapshot() is intentionally NOT called here - it would violate
+            // invariant(!_inUnitOfWork) in recovery unit implementations.
             opCtx->setRecoveryUnitState(WriteUnitOfWork::RecoveryUnitState::kActiveUnitOfWork);
         }
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)

671-683: Consider avoiding getOwned() if keys remain in scope.

Line 681 calls getOwned() for every key. If the source documents in bsonObjPtrs remain valid for the duration of the batch check, consider storing references or ensuring the source lifetime is sufficient to avoid the allocation overhead.


594-627: Memory safety issue with vector reallocation.

keyStringBuffers is not pre-reserved, so reallocation can invalidate pointers held by MongoKey objects (line 617-618), creating dangling pointers. Additionally, for multikey indexes where GetKeys() returns multiple keys per document, the indexing assumption between keyStringBuffers[batchIdx] breaks.

🔎 Proposed fix
     // Build batch for this unique index
-    std::vector<std::string> keyStringBuffers;
-    std::vector<std::unique_ptr<Eloq::MongoKey>> mongoKeys;
-    std::vector<Eloq::MongoRecord> mongoRecords;
-    std::vector<txservice::ScanBatchTuple> indexBatchTuples;
+    // Use deque for stable addresses
+    std::deque<std::string> keyStringBuffers;
+    std::vector<std::unique_ptr<Eloq::MongoKey>> mongoKeys;
+    std::vector<Eloq::MongoRecord> mongoRecords;
+    std::vector<txservice::ScanBatchTuple> indexBatchTuples;
+    
+    // Reserve space to reduce allocations
+    mongoKeys.reserve(keys.size());
+    mongoRecords.reserve(keys.size());
+    indexBatchTuples.reserve(keys.size());

579-643: Critical: currentRecordId parameter is accepted but never used.

The method accepts currentRecordId and documentation states it should exclude the current record from duplicate checks during updates. However, line 641-643 returns DuplicateKey for any existing key without verifying whether it belongs to the document being updated. This will cause update operations to incorrectly fail with duplicate key errors.

🔎 Proposed fix
         for (size_t batchIdx = 0; batchIdx < indexBatchTuples.size(); batchIdx++) {
             const txservice::ScanBatchTuple& tuple = indexBatchTuples[batchIdx];
             if (tuple.status_ == txservice::RecordStatus::Normal) {
-                // For insert operations, any existing key is a duplicate
-                return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+                // Check if this is the current document being updated
+                if (currentRecordId.isValid()) {
+                    const Eloq::MongoRecord* existingRecord = 
+                        static_cast<const Eloq::MongoRecord*>(tuple.record_);
+                    if (existingRecord) {
+                        RecordId existingRecordId = existingRecord->ToRecordId(false);
+                        if (existingRecordId == currentRecordId) {
+                            continue;  // Same document, not a duplicate
+                        }
+                    }
+                }
+                return {ErrorCodes::DuplicateKey, 
+                        str::stream() << "Duplicate key in index " << _indexName.StringView()};
             } else {
                 invariant(tuple.status_ == txservice::RecordStatus::Deleted);
             }
         }

645-647: Replace unsafe invariant with explicit status handling.

The invariant assumes only Normal or Deleted statuses, but getKV() in the codebase handles txservice::RecordStatus::Unknown with retry logic. If Unknown appears, the invariant will crash the server.

🔎 Proposed fix
             if (tuple.status_ == txservice::RecordStatus::Normal) {
                 // ... duplicate handling
-            } else {
-                invariant(tuple.status_ == txservice::RecordStatus::Deleted);
+            } else if (tuple.status_ == txservice::RecordStatus::Deleted) {
+                // Expected: key doesn't exist
+            } else {
+                return {ErrorCodes::InternalError, 
+                        str::stream() << "Unexpected record status: " 
+                                      << static_cast<int>(tuple.status_)};
             }
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

837-840: Generic error message lacks context.

The error message "DuplicateKey" doesn't indicate which key caused the duplicate or which collection was affected. Including the key value and table name would aid debugging.


861-869: Replace unsafe invariant with explicit status handling.

The invariant at line 867 assumes only Normal or Deleted statuses exist, but getKV() elsewhere handles txservice::RecordStatus::Unknown with retry logic. If Unknown appears in batch results, this invariant will crash the server.

🔎 Proposed fix
     // Check results for duplicates
     for (size_t i = 0; i < nRecords; i++) {
         const txservice::ScanBatchTuple& tuple = batchTuples[i];
         if (tuple.status_ == txservice::RecordStatus::Normal) {
             return {ErrorCodes::DuplicateKey, 
                     str::stream() << "Duplicate key in table " << _tableName.StringView()};
-        } else {
-            invariant(tuple.status_ == txservice::RecordStatus::Deleted);
+        } else if (tuple.status_ == txservice::RecordStatus::Deleted) {
+            // Expected: key doesn't exist
+        } else {
+            return {ErrorCodes::InternalError, 
+                    str::stream() << "Unexpected record status from batchGetKV: " 
+                                  << static_cast<int>(tuple.status_)};
         }
     }
🧹 Nitpick comments (2)
src/mongo/db/ops/write_ops_exec.cpp (1)

462-462: Fix formatting: missing space after &.

There's a missing space between & and e in the catch clause.

🔎 Proposed fix
-    } catch (const DBException&e) {
+    } catch (const DBException& e) {
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

1135-1149: Consider refactoring duplicate key extraction logic.

Lines 1141-1149 duplicate the key extraction logic found in batchCheckDuplicateKey (lines 828-846). Consider extracting this into a shared helper method to reduce code duplication and ensure consistency.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9425741 and c029e6f.

📒 Files selected for processing (4)
  • src/mongo/db/commands/write_commands/write_commands.cpp
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
  • src/mongo/db/ops/write_ops_exec.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/mongo/db/commands/write_commands/write_commands.cpp
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-24T03:11:24.955Z
Learnt from: lzxddz
Repo: eloqdata/eloqdoc PR: 325
File: src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp:1362-1384
Timestamp: 2025-11-24T03:11:24.955Z
Learning: The EloqDoc codebase uses an older version of glog that supports the two-parameter signature `google::InitGoogleLogging(const char* argv0, PrefixCallback)` for custom log prefix formatting, as seen in src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp line 1383.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
🧬 Code graph analysis (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (4)
src/mongo/db/modules/eloq/src/eloq_index.cpp (4)
  • batchCheckDuplicateKey (654-687)
  • batchCheckDuplicateKey (654-655)
  • opCtx (218-223)
  • opCtx (218-218)
src/mongo/db/catalog/index_catalog_impl.cpp (3)
  • batchCheckDuplicateKey (1449-1478)
  • batchCheckDuplicateKey (1449-1450)
  • i (1576-1576)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
  • get (277-279)
  • get (277-277)
src/mongo/db/modules/eloq/src/base/eloq_util.cpp (2)
  • TxErrorCodeToMongoStatus (11-75)
  • TxErrorCodeToMongoStatus (11-11)
🔇 Additional comments (11)
src/mongo/db/ops/write_ops_exec.cpp (4)

225-230: LGTM on the parameter addition.

The new allowCommandLevelTransaction parameter with a default of false maintains backward compatibility with existing call sites while enabling the new DuplicateKey handling behavior for insert operations.


245-250: Logic is correct.

The conditional properly gates error handling: inserts (with allowCommandLevelTransaction=true) can continue processing in command-level transactions, while updates/deletes maintain the original throw behavior.


517-536: Error handling logic is correct.

The order is now correct: non-DuplicateKey errors are rethrown immediately (lines 518-520), then handleError is called only for DuplicateKey errors. This addresses the previous review concern about recording errors before type checking.

The same commented-out abandonSnapshot() at line 531 should be cleaned up (same suggestion as above).


651-654: Verify: does the fixedDoc error path match the intended error handling scope?

The code comment at lines 245-247 states "For insert operations, allow error handling in command-level transactions," which appears to justify this behavior. However, the DuplicateKey paths (lines 472, 518) explicitly check if (ex.code() != ErrorCodes::DuplicateKey) { throw; } before using allowCommandLevelTransaction=true, whereas this fixedDoc validation error path calls handleError(..., true) without filtering the error code. If the intent is to allow only DuplicateKey errors to continue in command-level transactions, this path should similarly filter before passing true. If the intent is to allow all insert errors to continue, the other paths are unnecessarily selective.

src/mongo/db/modules/eloq/src/eloq_record_store.cpp (4)

60-70: LGTM: Clean batch entry structure.

The BatchReadEntry struct provides a clear container for batch read operations, managing KeyString, MongoKey, and MongoRecord together.


874-880: LGTM: Clean delegation to internal insert.

The updated signature accepts vector pointers and delegates appropriately to _insertRecords.


1127-1129: Verify assertion expectations are documented.

The assertion at line 1127 expects record.id.isNull() for new inserts. Ensure callers understand this precondition, as the method sets record.id during processing (line 1148).


1161-1167: Verify checkUnique flag behavior.

Line 1161 sets checkUnique = true for all inserts, which will cause setKV to perform duplicate checks at the storage layer. Since batchCheckDuplicateKey is expected to have been called before this point, confirm whether the redundant check is intentional (defense-in-depth) or if it introduces unnecessary overhead.

src/mongo/db/modules/eloq/src/eloq_index.cpp (3)

30-30: LGTM: Include addition.

The include for eloq_record.h is appropriate for the new batch checking functionality.


689-699: LGTM once the underlying bug is fixed.

This method correctly delegates to _checkDuplicateKeysInternal with the currentRecordId parameter. However, it depends on that method actually using the parameter (currently flagged as a critical bug).


795-818: Verify consistency with batch duplicate checking.

The pre-existence check via getKV (lines 796-804) is now commented out, and the code relies on setKV with checkUnique=true (line 816) to detect duplicates. Confirm this approach is consistent with the new batch checking flow where batchCheckDuplicateKey should be called before insert. If batch checking already validates uniqueness, the checkUnique=true flag may be redundant or serve as a defense-in-depth mechanism.

@lokax lokax merged commit c453f4d into eloqdata:main Dec 31, 2025
1 of 2 checks passed
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (2)

861-869: Replace unsafe invariant with explicit status handling.

The invariant at line 867 assumes only Normal or Deleted statuses, but the getKV() function explicitly handles txservice::RecordStatus::Unknown with retry logic. If Unknown appears in batch results, the invariant will crash the server.

🔎 Proposed fix
     // Check results for duplicates
     for (size_t i = 0; i < nRecords; i++) {
         const txservice::ScanBatchTuple& tuple = batchTuples[i];
         if (tuple.status_ == txservice::RecordStatus::Normal) {
-            return {ErrorCodes::DuplicateKey, "DuplicateKey"};
-        } else {
-            invariant(tuple.status_ == txservice::RecordStatus::Deleted);
+            return {ErrorCodes::DuplicateKey, 
+                    str::stream() << "Duplicate key for table " << _tableName.StringView()};
+        } else if (tuple.status_ == txservice::RecordStatus::Deleted) {
+            // Expected case for insert - key doesn't exist
+        } else {
+            return {ErrorCodes::InternalError, 
+                    str::stream() << "Unexpected record status from batchGetKV: " 
+                                  << static_cast<int>(tuple.status_)};
         }
     }

Based on learnings, the retry logic in getKV() suggests Unknown is a valid transient state that should be handled, not asserted away.


837-838: Improve error message context for debugging.

The generic "DuplicateKey" message doesn't indicate which key or table caused the duplicate, making debugging difficult in production. Similar issue exists at line 865.

🔎 Proposed fix
-        if (batchKeys.find(idObj) != batchKeys.end()) {
-            return {ErrorCodes::DuplicateKey, "DuplicateKey"};
-        }
+        if (batchKeys.find(idObj) != batchKeys.end()) {
+            return {ErrorCodes::DuplicateKey,
+                    str::stream() << "Duplicate key in batch for table " 
+                                  << _tableName.StringView() << ": " << idObj};
+        }

Apply similar fix at line 865:

-            return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+            return {ErrorCodes::DuplicateKey,
+                    str::stream() << "Duplicate key exists for table " 
+                                  << _tableName.StringView() << ": " 
+                                  << getIdBSONObjWithoutFieldName(BSONObj{records->at(i).data.data()})};
src/mongo/db/modules/eloq/src/eloq_index.cpp (3)

594-627: Memory safety issue: vector reallocation invalidates pointers.

keyStringBuffers is not pre-reserved. As elements are emplaced (line 614), reallocation can invalidate pointers to previously stored data. The MongoKey objects in mongoKeys (lines 617-619) hold pointers to keyStringBuffers[i].data(), which become dangling after reallocation.

🔎 Proposed fix
     // Build batch for this unique index
-    std::vector<std::string> keyStringBuffers;
+    std::vector<std::string> keyStringBuffers;
+    keyStringBuffers.reserve(keys.size());
     std::vector<std::unique_ptr<Eloq::MongoKey>> mongoKeys;
+    mongoKeys.reserve(keys.size());
     std::vector<Eloq::MongoRecord> mongoRecords;
+    mongoRecords.reserve(keys.size());
     std::vector<txservice::ScanBatchTuple> indexBatchTuples;
+    indexBatchTuples.reserve(keys.size());

Note: For multikey indexes, GetKeys() at line 677 can return multiple keys per document, so the actual count may exceed keys.size(). Consider using std::deque<std::string> for keyStringBuffers which never invalidates references on growth.


579-581: Critical bug: currentRecordId parameter is ignored.

The method signature accepts currentRecordId (for update operations to exclude the current document from duplicate checks), but the implementation never uses it. Line 642 returns DuplicateKey for any existing key without checking whether it belongs to the document being updated.

This will cause update operations to incorrectly fail with duplicate key errors when checking the document's own indexed keys.

🔎 Proposed fix
         // Check results for duplicates
         for (size_t batchIdx = 0; batchIdx < indexBatchTuples.size(); batchIdx++) {
             const txservice::ScanBatchTuple& tuple = indexBatchTuples[batchIdx];
             if (tuple.status_ == txservice::RecordStatus::Normal) {
-                // For insert operations, any existing key is a duplicate
-                return {ErrorCodes::DuplicateKey, "DuplicateKey"};
+                // For update operations, check if the existing record is the current document
+                if (currentRecordId.isValid()) {
+                    const Eloq::MongoRecord* existingRecord = 
+                        static_cast<const Eloq::MongoRecord*>(tuple.record_);
+                    if (existingRecord) {
+                        RecordId existingRecordId = existingRecord->ToRecordId(false);
+                        if (existingRecordId == currentRecordId) {
+                            // Same document, not a duplicate
+                            continue;
+                        }
+                    }
+                }
+                // Different document or insert operation - this is a duplicate
+                return {ErrorCodes::DuplicateKey,
+                        str::stream() << "Duplicate key for index " << _indexName.StringView()};
             } else if (tuple.status_ == txservice::RecordStatus::Deleted) {
                 // Expected case
             } else {
                 invariant(tuple.status_ == txservice::RecordStatus::Deleted);
             }
         }

Also applies to: 641-643


645-647: Replace unsafe invariant with explicit status handling.

The invariant assumes only Normal or Deleted statuses, but getKV() explicitly handles txservice::RecordStatus::Unknown with retry logic (per the related code in eloq_recovery_unit.cpp). If Unknown status appears in batch results, the invariant will crash the server.

🔎 Proposed fix
         for (size_t batchIdx = 0; batchIdx < indexBatchTuples.size(); batchIdx++) {
             const txservice::ScanBatchTuple& tuple = indexBatchTuples[batchIdx];
             if (tuple.status_ == txservice::RecordStatus::Normal) {
                 // ... duplicate key handling
+            } else if (tuple.status_ == txservice::RecordStatus::Deleted) {
+                // Expected case for insert
             } else {
-                invariant(tuple.status_ == txservice::RecordStatus::Deleted);
+                return {ErrorCodes::InternalError,
+                        str::stream() << "Unexpected record status from batchGetKV: "
+                                      << static_cast<int>(tuple.status_)};
             }
         }

Based on learnings, the retry logic in getKV() suggests Unknown is a valid transient state that should be handled.

🧹 Nitpick comments (2)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (1)

840-840: Consider performance impact of getOwned() in tight loop.

Calling getOwned() for every key in the batch allocates memory for each BSON object. For large batches, this overhead may be significant.

If the source documents remain in scope for the batch lifetime (which is typical), consider storing pointers or references instead, or profile to verify this overhead is acceptable for expected batch sizes.

src/mongo/db/modules/eloq/src/eloq_index.cpp (1)

680-682: Consider performance impact of getOwned() in tight loop.

Calling getOwned() for every key allocates memory. For large batches or multikey indexes (which can produce many keys per document), this overhead may be significant.

If the source documents remain in scope, consider using references or measuring whether this overhead is acceptable for expected workloads.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c029e6f and 383294a.

📒 Files selected for processing (5)
  • src/mongo/db/commands/write_commands/write_commands.cpp
  • src/mongo/db/modules/eloq/data_substrate
  • src/mongo/db/modules/eloq/src/eloq_index.cpp
  • src/mongo/db/modules/eloq/src/eloq_record_store.cpp
  • src/mongo/db/ops/write_ops_exec.cpp
✅ Files skipped from review due to trivial changes (1)
  • src/mongo/db/modules/eloq/data_substrate
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-24T03:11:24.955Z
Learnt from: lzxddz
Repo: eloqdata/eloqdoc PR: 325
File: src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp:1362-1384
Timestamp: 2025-11-24T03:11:24.955Z
Learning: The EloqDoc codebase uses an older version of glog that supports the two-parameter signature `google::InitGoogleLogging(const char* argv0, PrefixCallback)` for custom log prefix formatting, as seen in src/mongo/db/modules/eloq/src/eloq_kv_engine.cpp line 1383.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-28T10:55:21.020Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/modules/eloq/CMakeLists.txt:54-55
Timestamp: 2025-11-28T10:55:21.020Z
Learning: In src/mongo/db/modules/eloq/CMakeLists.txt, the CMake options EXT_TX_PROC_ENABLED, FORK_HM_PROCESS, and STATISTICS are used by the data_substrate submodule build system and should be retained as configuration interface options.

Applied to files:

  • src/mongo/db/modules/eloq/src/eloq_index.cpp
📚 Learning: 2025-11-14T03:10:22.027Z
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 279
File: src/mongo/db/dbmain.cpp:0-0
Timestamp: 2025-11-14T03:10:22.027Z
Learning: In this repository, avoid defining --config via gflags in src/mongo/db/dbmain.cpp; MongoDB’s options_parser owns --config. Prefer gflags flags with a distinct prefix (e.g., --data_substrate_config) parsed before options_parser, with options_parser configured to allow unregistered options.

Applied to files:

  • src/mongo/db/ops/write_ops_exec.cpp
  • src/mongo/db/commands/write_commands/write_commands.cpp
🧬 Code graph analysis (1)
src/mongo/db/modules/eloq/src/eloq_record_store.cpp (3)
src/mongo/db/modules/eloq/src/eloq_index.cpp (11)
  • mongoKey (237-237)
  • batchCheckDuplicateKey (654-687)
  • batchCheckDuplicateKey (654-655)
  • opCtx (218-223)
  • opCtx (218-218)
  • MONGO_LOG (188-190)
  • MONGO_LOG (192-196)
  • MONGO_LOG (198-209)
  • MONGO_LOG (211-216)
  • MONGO_LOG (332-364)
  • MONGO_LOG (390-414)
src/mongo/db/modules/eloq/src/eloq_recovery_unit.cpp (2)
  • get (277-279)
  • get (277-277)
src/mongo/db/modules/eloq/src/eloq_record_store.h (1)
  • insertRecords (87-92)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants