Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid contacting master on noop mapping update #102915

Conversation

felixbarny
Copy link
Member

In an effort to make #96235 smaller, I've extracted the changes to the bulk execution logic.

This PR does two things:

  1. Avoid sending a mapping update to the master node if the mapping update is a noop i.e. if the preflight merge yields the same mapping as the current mapping
  2. Avoid potential risks of entering an infinite retry loop by failing the request in the following situation: After an index/update document request that yielded a noop mapping update, there's another noop mapping update on the second attempt, even if the mapping hasn't changed in the meantime. More details in the code comment.

@felixbarny felixbarny added the :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. label Dec 4, 2023
@felixbarny felixbarny self-assigned this Dec 4, 2023
@elasticsearchmachine elasticsearchmachine added Team:Distributed Meta label for distributed team v8.12.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Dec 4, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main part of the PR looks good, but I think the retry loop detection could be flawed towards updates and seems like it should be the same whether the master detects the noop or the data node itself detects it.

@@ -1072,6 +1091,70 @@ public void testPerformOnPrimaryReportsBulkStats() throws Exception {
latch.await();
}

public void testNoopMappingUpdateInfiniteLoopPrevention() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this test tests a non-existing scenario?

The noop prevention would only concern a case where the mapping used during parsing is different from the mapping used during preflight, containing the mapping changes added. And if so, we'd expect any retry to always succeed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this test tests a non-existing scenario?

I hope it does :)

This is meant as a safety net in case there's an unexpected bug that lead to a situation where we accept a field in DocumentParserContext even though it turns out that don't have enough capacity for it when merging the mappings.

*/
public void resetForNoopMappingUpdateRetry(long mappingVersion) {
assert assertInvariants(ItemProcessingState.TRANSLATED);
if (noopMappingUpdateRetryForMappingVersion == mappingVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow why this infinite loop detection is necessary. I'd expect that since all dynamic fields from previous parse is now in the mappings, the retry would always succeed in one go?

The only case where I could see this not being true is updates, where a new version of the doc could appear and thus a new dynamic mapping update could potentially be required. That would be at odds with the detection here I think?

To some extent having some detection in assertions would be good, but it would have to not trigger for updates. And then be the same for other mapping update retries. I could accept not adding that now though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect that since all dynamic fields from previous parse is now in the mappings, the retry would always succeed in one go?

If all goes right, this is true. However, there's no strict guarantee about that. And the failure mode I have in mind is not related to concurrency. The risk comes from the fact that there are two distinct places that reason about a mapping's size. One is DocumentParserContext, and the other is Mapping::merge/MappingLookup. In DocumentParserContext, we're just estimating the size of the mapping when adding dynamic mappers. If there's a bug in that estimation (#102885 tries to make that less likely), there's a chance that we add more dynamic mappers than the mapping can hold. We normally find out if we have accepted too many dynamic mappers in DocumentParserContext if the preflight check throws an exception. However, with #96235, for indices that have enabled the setting ignore_dynamic_beyond_limit, the preflight check won't fail but silently ignore fields that are beyond the limit. After realizing that a noop mapping update has been requested, the indexing request gets retried. However, the mapping hasn't changed in the meantime. Therefore, when the document gets parsed, we're adding the same dynamic mapper in DocumentParserContext again, which leads to the same dynamic mapping update that the preflight check will ignore again and we've entered an infinite loop.

Maybe another solution to this could be that the preflight check merge reason, in contrast to the auto-update merge reason doesn't ignore fields that are beyond the limit but throws an exception instead. However, due to a race condition, we might have added a dynamic mapper under the assumption that enough space is still available. After accepting the mapper, but before the preflight check, the mapping might get updated with a version that already puts us at the limit. If we then throw a validation error during the preflight check, it will lead to data loss instead of ignoring the fields that are above the limit.

The only case where I could see this not being true is updates, where a new version of the doc could appear and thus a new dynamic mapping update could potentially be required.

You're referring to the situation when the retry_on_conflict parameter is set, right? I think this case is covered by the fact that resetForExecutionRetry() resets noopMappingUpdateRetryForMappingVersion. This means that the infinite loop prevention only kicks in if the BulkPrimaryExecutionContext has been reset via noopMappingUpdateRetryForMappingVersion twice in a row, and if the mapping also hasn't changed in the meantime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to reproduce the infinite loop is to purposefully introduce a bug in how DocumentParserContext counts the size of a Mapper.Builder in #96235:

If you change that to int builderMapperSize = 1; and execute org.elasticsearch.index.mapper.DynamicMappingIT#testIgnoreDynamicBeyondLimitSingleMultiField, it will trigger this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're referring to the situation when the retry_on_conflict parameter is set, right?

No, I am referring to a case where an update results in a mapping update and thus a retry afterwards. That could potentially be a noop mapping update (due to concurrency). The retried update could then result in new dynamic fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe towards such updates in the current form, since:

  1. Any retry will at least use a mapping corresponding to the last mapping version, thus only adding fields missing there.
  2. A noop result should thus not be possible.

I think #96235 does not change this, but the argument that we'll not run into it becomes more subtle.

A worry is that an update will fail with this error if it first adds a field f (noop detected) and then another field g (that is also a noop, towards same version). That seems impossible now, but hard to verify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the potential discrepancy between what the mapper results in and what the document parser holds, I suggest to add testing to verify this as well as some assertion to ensure they are aligned at relevant times. That belongs to the other PR though (perhaps we can have the size tracking as a separate PR?).

Copy link
Member Author

@felixbarny felixbarny Dec 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what #102885 is about. It significantly simplifies the calculation of how a mapper counts against the field limit and aligns the implementations as much as possible (Mapper.Builder#getTotalFieldsCount, Mapper#getTotalFieldsCount, MappingLookup#getTotalFieldsCount, and MappingLookup#checkFieldLimit).

Copy link
Member Author

@felixbarny felixbarny Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also added an assertion here: befd7f6

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Left a few items I'd like to see addressed.

*/
public void resetForNoopMappingUpdateRetry(long mappingVersion) {
assert assertInvariants(ItemProcessingState.TRANSLATED);
if (noopMappingUpdateRetryForMappingVersion == mappingVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe towards such updates in the current form, since:

  1. Any retry will at least use a mapping corresponding to the last mapping version, thus only adding fields missing there.
  2. A noop result should thus not be possible.

I think #96235 does not change this, but the argument that we'll not run into it becomes more subtle.

A worry is that an update will fail with this error if it first adds a field f (noop detected) and then another field g (that is also a noop, towards same version). That seems impossible now, but hard to verify.

*/
public void resetForNoopMappingUpdateRetry(long mappingVersion) {
assert assertInvariants(ItemProcessingState.TRANSLATED);
if (noopMappingUpdateRetryForMappingVersion == mappingVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the potential discrepancy between what the mapper results in and what the document parser holds, I suggest to add testing to verify this as well as some assertion to ensure they are aligned at relevant times. That belongs to the other PR though (perhaps we can have the size tracking as a separate PR?).

@felixbarny felixbarny merged commit 68093e6 into elastic:main Jan 2, 2024
15 checks passed
@felixbarny felixbarny deleted the avoid-contacting-master-on-noop-mapping-update branch January 2, 2024 07:40
elasticsearchmachine pushed a commit that referenced this pull request Feb 2, 2024
)

Adds a new `index.mapping.total_fields.ignore_dynamic_beyond_limit`
index setting.

When set to `true`, new fields are added to the mapping as long as the
field limit (`index.mapping.total_fields.limit`) is not exceeded. Fields
that would exceed the limit are not added to the mapping, similar to
`dynamic: false`.  Ignored fields are added to the `_ignored` metadata
field.

Relates to #89911

To make this easier to review, this is split into the following PRs: -
[x] #102915 - [x]
#102936 - [x]
#104769

Related but not a prerequisite: - [ ]
#102885
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue Team:Distributed Meta label for distributed team v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants