Avoid contacting master on noop mapping update #102915

felixbarny · 2023-12-04T09:23:42Z

In an effort to make #96235 smaller, I've extracted the changes to the bulk execution logic.

This PR does two things:

Avoid sending a mapping update to the master node if the mapping update is a noop i.e. if the preflight merge yields the same mapping as the current mapping
Avoid potential risks of entering an infinite retry loop by failing the request in the following situation: After an index/update document request that yielded a noop mapping update, there's another noop mapping update on the second attempt, even if the mapping hasn't changed in the meantime. More details in the code comment.

Also reset noop mapping update count when doing a retry for another reason

elasticsearchmachine · 2023-12-04T09:24:06Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen

I think the main part of the PR looks good, but I think the retry loop detection could be flawed towards updates and seems like it should be the same whether the master detects the noop or the data node itself detects it.

henningandersen · 2023-12-08T15:08:12Z

server/src/test/java/org/elasticsearch/action/bulk/TransportShardBulkActionTests.java

@@ -1072,6 +1091,70 @@ public void testPerformOnPrimaryReportsBulkStats() throws Exception {
        latch.await();
    }

+    public void testNoopMappingUpdateInfiniteLoopPrevention() throws Exception {


I wonder if this test tests a non-existing scenario?

The noop prevention would only concern a case where the mapping used during parsing is different from the mapping used during preflight, containing the mapping changes added. And if so, we'd expect any retry to always succeed?

I wonder if this test tests a non-existing scenario?

I hope it does :)

This is meant as a safety net in case there's an unexpected bug that lead to a situation where we accept a field in DocumentParserContext even though it turns out that don't have enough capacity for it when merging the mappings.

henningandersen · 2023-12-08T15:20:35Z

server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java

+     */
+    public void resetForNoopMappingUpdateRetry(long mappingVersion) {
+        assert assertInvariants(ItemProcessingState.TRANSLATED);
+        if (noopMappingUpdateRetryForMappingVersion == mappingVersion) {


I am not sure I follow why this infinite loop detection is necessary. I'd expect that since all dynamic fields from previous parse is now in the mappings, the retry would always succeed in one go?

The only case where I could see this not being true is updates, where a new version of the doc could appear and thus a new dynamic mapping update could potentially be required. That would be at odds with the detection here I think?

To some extent having some detection in assertions would be good, but it would have to not trigger for updates. And then be the same for other mapping update retries. I could accept not adding that now though.

I'd expect that since all dynamic fields from previous parse is now in the mappings, the retry would always succeed in one go?

If all goes right, this is true. However, there's no strict guarantee about that. And the failure mode I have in mind is not related to concurrency. The risk comes from the fact that there are two distinct places that reason about a mapping's size. One is DocumentParserContext, and the other is Mapping::merge/MappingLookup. In DocumentParserContext, we're just estimating the size of the mapping when adding dynamic mappers. If there's a bug in that estimation (#102885 tries to make that less likely), there's a chance that we add more dynamic mappers than the mapping can hold. We normally find out if we have accepted too many dynamic mappers in DocumentParserContext if the preflight check throws an exception. However, with #96235, for indices that have enabled the setting ignore_dynamic_beyond_limit, the preflight check won't fail but silently ignore fields that are beyond the limit. After realizing that a noop mapping update has been requested, the indexing request gets retried. However, the mapping hasn't changed in the meantime. Therefore, when the document gets parsed, we're adding the same dynamic mapper in DocumentParserContext again, which leads to the same dynamic mapping update that the preflight check will ignore again and we've entered an infinite loop.

Maybe another solution to this could be that the preflight check merge reason, in contrast to the auto-update merge reason doesn't ignore fields that are beyond the limit but throws an exception instead. However, due to a race condition, we might have added a dynamic mapper under the assumption that enough space is still available. After accepting the mapper, but before the preflight check, the mapping might get updated with a version that already puts us at the limit. If we then throw a validation error during the preflight check, it will lead to data loss instead of ignoring the fields that are above the limit.

The only case where I could see this not being true is updates, where a new version of the doc could appear and thus a new dynamic mapping update could potentially be required.

You're referring to the situation when the retry_on_conflict parameter is set, right? I think this case is covered by the fact that resetForExecutionRetry() resets noopMappingUpdateRetryForMappingVersion. This means that the infinite loop prevention only kicks in if the BulkPrimaryExecutionContext has been reset via noopMappingUpdateRetryForMappingVersion twice in a row, and if the mapping also hasn't changed in the meantime.

A way to reproduce the infinite loop is to purposefully introduce a bug in how DocumentParserContext counts the size of a Mapper.Builder in #96235:

elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java

Line 325 in 94f25f5

int builderMapperSize = builder.mapperSize();

If you change that to int builderMapperSize = 1; and execute org.elasticsearch.index.mapper.DynamicMappingIT#testIgnoreDynamicBeyondLimitSingleMultiField, it will trigger this.

You're referring to the situation when the retry_on_conflict parameter is set, right?

No, I am referring to a case where an update results in a mapping update and thus a retry afterwards. That could potentially be a noop mapping update (due to concurrency). The retried update could then result in new dynamic fields.

I think it is safe towards such updates in the current form, since:

Any retry will at least use a mapping corresponding to the last mapping version, thus only adding fields missing there.

A noop result should thus not be possible.

I think #96235 does not change this, but the argument that we'll not run into it becomes more subtle.

A worry is that an update will fail with this error if it first adds a field f (noop detected) and then another field g (that is also a noop, towards same version). That seems impossible now, but hard to verify.

About the potential discrepancy between what the mapper results in and what the document parser holds, I suggest to add testing to verify this as well as some assertion to ensure they are aligned at relevant times. That belongs to the other PR though (perhaps we can have the size tracking as a separate PR?).

This is what #102885 is about. It significantly simplifies the calculation of how a mapper counts against the field limit and aligns the implementations as much as possible (Mapper.Builder#getTotalFieldsCount, Mapper#getTotalFieldsCount, MappingLookup#getTotalFieldsCount, and MappingLookup#checkFieldLimit).

I've also added an assertion here: befd7f6

henningandersen

LGTM.

Left a few items I'd like to see addressed.

henningandersen · 2023-12-29T17:17:52Z

server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java

+     */
+    public void resetForNoopMappingUpdateRetry(long mappingVersion) {
+        assert assertInvariants(ItemProcessingState.TRANSLATED);
+        if (noopMappingUpdateRetryForMappingVersion == mappingVersion) {


I think it is safe towards such updates in the current form, since:

Any retry will at least use a mapping corresponding to the last mapping version, thus only adding fields missing there.

A noop result should thus not be possible.

I think #96235 does not change this, but the argument that we'll not run into it becomes more subtle.

A worry is that an update will fail with this error if it first adds a field f (noop detected) and then another field g (that is also a noop, towards same version). That seems impossible now, but hard to verify.

server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java

henningandersen · 2023-12-29T17:20:37Z

server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java

+     */
+    public void resetForNoopMappingUpdateRetry(long mappingVersion) {
+        assert assertInvariants(ItemProcessingState.TRANSLATED);
+        if (noopMappingUpdateRetryForMappingVersion == mappingVersion) {


About the potential discrepancy between what the mapper results in and what the document parser holds, I suggest to add testing to verify this as well as some assertion to ensure they are aligned at relevant times. That belongs to the other PR though (perhaps we can have the size tracking as a separate PR?).

server/src/test/java/org/elasticsearch/action/bulk/TransportShardBulkActionTests.java

…er-on-noop-mapping-update

) Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit` index setting. When set to `true`, new fields are added to the mapping as long as the field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to `dynamic: false`. Ignored fields are added to the `_ignored` metadata field. Relates to #89911 To make this easier to review, this is split into the following PRs: - [x] #102915 - [x] #102936 - [x] #104769 Related but not a prerequisite: - [ ] #102885

felixbarny added 4 commits December 4, 2023 10:01

Guard against mapping update request timeouts and infinite loops

56856c2

Only throw if mapping version hasn't changed

9f23295

Also reset noop mapping update count when doing a retry for another reason

Add test case for infinite loop prevention

c5ea066

Test fixes

e3dd8dd

felixbarny added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Dec 4, 2023

felixbarny requested a review from henningandersen December 4, 2023 09:23

felixbarny self-assigned this Dec 4, 2023

elasticsearchmachine added Team:Distributed Meta label for distributed team v8.12.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Dec 4, 2023

felixbarny mentioned this pull request Dec 4, 2023

Add setting to ignore dynamic fields when field limit is reached #96235

Merged

4 tasks

felixbarny added the >non-issue label Dec 4, 2023

brianseeders added v8.13.0 and removed v8.12.0 labels Dec 6, 2023

henningandersen reviewed Dec 8, 2023

View reviewed changes

henningandersen self-requested a review December 29, 2023 15:15

henningandersen approved these changes Dec 29, 2023

View reviewed changes

felixbarny added 3 commits December 31, 2023 09:56

Add assertion

74845eb

Add test verifying a noop retry succeeds

e91f8e3

Merge remote-tracking branch 'origin/main' into avoid-contacting-mast…

3bfdc15

…er-on-noop-mapping-update

felixbarny merged commit 68093e6 into elastic:main Jan 2, 2024
15 checks passed

felixbarny deleted the avoid-contacting-master-on-noop-mapping-update branch January 2, 2024 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid contacting master on noop mapping update #102915

Avoid contacting master on noop mapping update #102915

felixbarny commented Dec 4, 2023

elasticsearchmachine commented Dec 4, 2023

henningandersen left a comment

henningandersen Dec 8, 2023

felixbarny Dec 8, 2023

henningandersen Dec 8, 2023

felixbarny Dec 8, 2023

felixbarny Dec 11, 2023

henningandersen Dec 29, 2023

henningandersen Dec 29, 2023

henningandersen Dec 29, 2023

felixbarny Dec 31, 2023 •

edited

Loading

felixbarny Jan 2, 2024 •

edited

Loading

henningandersen left a comment

henningandersen Dec 29, 2023

henningandersen Dec 29, 2023

Avoid contacting master on noop mapping update #102915

Avoid contacting master on noop mapping update #102915

Conversation

felixbarny commented Dec 4, 2023

elasticsearchmachine commented Dec 4, 2023

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixbarny Dec 31, 2023 • edited Loading

Choose a reason for hiding this comment

felixbarny Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

felixbarny Dec 31, 2023 •

edited

Loading

felixbarny Jan 2, 2024 •

edited

Loading