Ensure insertedIds contain ids from all batches #850

jyemin · 2022-01-06T13:28:50Z

The previous code was incorrect because it was comparing absolute write indexes
with indexes that are relative to the current batch. This patch avoids that
by using the insertedId map from SplittablePayload directly, which already
contains absolute write indexes.

JAVA-4436

The previous code was incorrect because it was comparing absolute write indexes with indexes that are relative to the current batch. This patch avoids that by using the insertedId map from SplittablePayload directly, which already contains absolute write indexes. JAVA-4436

jyemin · 2022-01-06T13:32:37Z

driver-core/src/main/com/mongodb/internal/operation/BulkWriteBatch.java

@@ -282,21 +281,11 @@ private BulkWriteResult getBulkWriteResult(final BsonDocument result) {
    }

    private List<BulkWriteInsert> getInsertedItems(final BsonDocument result) {
-        if (payload.getPayloadType() == SplittablePayload.Type.INSERT) {


I removed the conditional just to simplify the code. If the payload type is anything but INSERT, then payload.getInsertedIds will be empty anyway.

jyemin · 2022-01-06T13:33:13Z

driver-core/src/main/com/mongodb/internal/operation/BulkWriteBatch.java

-        if (payload.getPayloadType() == SplittablePayload.Type.INSERT) {
-
-            Stream<WriteRequestWithIndex> writeRequests = payload.getWriteRequestWithIndexes().stream();
-            List<Integer> writeErrors = getWriteErrors(result).stream().map(BulkWriteError::getIndex).collect(Collectors.toList());


Changed this to a Set to make the contains check more efficient.

jyemin · 2022-01-06T13:34:06Z

driver-core/src/main/com/mongodb/internal/operation/BulkWriteBatch.java

-
-            Stream<WriteRequestWithIndex> writeRequests = payload.getWriteRequestWithIndexes().stream();
-            List<Integer> writeErrors = getWriteErrors(result).stream().map(BulkWriteError::getIndex).collect(Collectors.toList());
-            if (!writeErrors.isEmpty()) {


Removed this conditional, as it seems an unnecessary optimization. The contains check will be fast enough on any empty set.

jyemin · 2022-01-06T13:35:53Z

driver-core/src/main/com/mongodb/internal/operation/BulkWriteBatch.java

-        }
-        return Collections.emptyList();
+        Set<Integer> writeErrors = getWriteErrors(result).stream().map(BulkWriteError::getIndex).collect(Collectors.toSet());
+        return payload.getInsertedIds().entrySet().stream()


It's safe to use SplittablePayload`s insertedIds map directly, since it will only contain the ids from the previously inserted batch.

So much neater and easier to read.

jyemin · 2022-01-06T13:37:07Z

driver-core/src/test/unit/com/mongodb/internal/operation/BulkWriteBatchSpecification.groovy

@@ -228,23 +229,71 @@ class BulkWriteBatchSpecification extends Specification {
        !bulkWriteBatch.hasAnotherBatch()
    }

-    def 'should only map inserts up to the payload position'() {
+    def 'should map all inserted ids'() {


I might have made this test more complicated that it needs to be, as I changed it first just to reproduce the bug, but before I settled on the simplified approach in BulkWriteBatch

jyemin · 2022-01-06T13:37:14Z

driver-core/src/test/unit/com/mongodb/internal/operation/BulkWriteBatchSpecification.groovy

+                                               new BulkWriteInsert(2, new BsonInt32(2))]
+    }
+
+    def 'should not map inserted id with a write error'() {


There wasn't a test for this so added one.

rozza

Much easier to read now.

I think you mentioned checking upsertedIds as well - does anything need happen with them?

rozza · 2022-01-06T14:33:34Z

driver-core/src/main/com/mongodb/internal/operation/BulkWriteBatch.java

-        }
-        return Collections.emptyList();
+        Set<Integer> writeErrors = getWriteErrors(result).stream().map(BulkWriteError::getIndex).collect(Collectors.toSet());
+        return payload.getInsertedIds().entrySet().stream()


So much neater and easier to read.

rozza · 2022-01-06T14:33:42Z

driver-core/src/main/com/mongodb/internal/operation/BulkWriteBatch.java

-        if (payload.getPayloadType() == SplittablePayload.Type.INSERT) {
-
-            Stream<WriteRequestWithIndex> writeRequests = payload.getWriteRequestWithIndexes().stream();
-            List<Integer> writeErrors = getWriteErrors(result).stream().map(BulkWriteError::getIndex).collect(Collectors.toList());


rozza · 2022-01-06T14:33:51Z

driver-core/src/main/com/mongodb/internal/operation/BulkWriteBatch.java

-
-            Stream<WriteRequestWithIndex> writeRequests = payload.getWriteRequestWithIndexes().stream();
-            List<Integer> writeErrors = getWriteErrors(result).stream().map(BulkWriteError::getIndex).collect(Collectors.toList());
-            if (!writeErrors.isEmpty()) {


rozza · 2022-01-06T14:36:44Z

driver-core/src/test/unit/com/mongodb/internal/operation/BulkWriteBatchSpecification.groovy

+                                               new BulkWriteInsert(2, new BsonInt32(2))]
+    }
+
+    def 'should not map inserted id with a write error'() {


jyemin · 2022-01-06T14:39:42Z

I think you mentioned checking upsertedIds as well - does anything need happen with them?

No, I was wrong about that, as upsertedIds come directly from the server's response and wouldn't suffer from the same mapping issue.

The previous code was incorrect because it was comparing absolute write indexes with indexes that are relative to the current batch. This patch avoids that by using the insertedId map from SplittablePayload directly, which already contains absolute write indexes. JAVA-4436

jyemin requested a review from rozza January 6, 2022 13:28

jyemin self-assigned this Jan 6, 2022

jyemin force-pushed the j4436 branch from 9f76675 to c9df41b Compare January 6, 2022 13:31

jyemin commented Jan 6, 2022

View reviewed changes

rozza approved these changes Jan 6, 2022

View reviewed changes

jyemin merged commit 3e5992e into mongodb:master Jan 6, 2022

jyemin deleted the j4436 branch January 6, 2022 14:39

Ensure insertedIds contain ids from all batches #850

Ensure insertedIds contain ids from all batches #850

Uh oh!

Conversation

jyemin commented Jan 6, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rozza left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jyemin commented Jan 6, 2022

Uh oh!

Uh oh!