Add setting to ignore dynamic fields when field limit is reached #96235

felixbarny · 2023-05-20T13:00:44Z

Adds a new index.mapping.total_fields.ignore_dynamic_beyond_limit index setting.

When set to true, new fields are added to the mapping as long as the field limit (index.mapping.total_fields.limit) is not exceeded. Fields that would exceed the limit are not added to the mapping, similar to dynamic: false. Ignored fields are added to the _ignored metadata field.

Relates to #89911

To make this easier to review, this is split into the following PRs:

Related but not a prerequisite:

Make field limit more predictable #102885

github-actions · 2023-05-20T13:00:56Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2023-05-20T13:02:41Z

Hi @felixbarny, I've created a changelog YAML for you.

elasticsearchmachine · 2023-05-20T13:02:42Z

Pinging @elastic/es-search (Team:Search)

…g update failures once

server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

ruflin · 2023-05-22T09:13:03Z

I like the general approach here. It makes it clear, the limit is for dynamic mappings. If the overall limit is 1024 and the user has specified 1020 fields, 4 are open for dynamic fields. Every time an index rolls over, the game begins agains. If a user by accident created too many fields in an index under a data stream but then has fixed ingestion, a rollover resets the counter for the new index.

For the future

I'm starting to wonder, if index.mapping.total_fields.limit should actually only apply to dynamic fields. If a user puts predefined fields in, there is likely a reason for it and Elasticsearch should not block this users. Instead field limits are only for the cases of dynamic fields. Having this config option on by default and have a dynamic field limit only would also solve the problem. But likely we can't differentiate after creation, what a dynamic and what a static field is.

felixbarny · 2023-05-22T09:16:44Z

I'm starting to wonder, if index.mapping.total_fields.limit should actually only apply to dynamic fields.

Makes sense to me. This has also been discussed in #89911

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

javanna · 2023-05-22T10:38:14Z

When we discussed this last with the team we talked about introducing a new dynamic mode rather than a new index settings that affects how existing dynamic modes work. Also, we said we'd want to better understand how users are going to consume the additional info added to the _ignored field. It is not so straight-forward to understand the reason why a field name is added to the _ignored field, and also the field is not yet aggregatable. We'd like to figure out the overall plan before committing individual changes that may be incremental steps but not necessarily in the desired direction.

ruflin · 2023-05-22T10:50:43Z

We'd like to figure out the overall plan before committing individual changes that may be incremental steps but not necessarily in the desired direction.

We are one of the main requestors of this feature. For us, this incremental change solves the problem for now. I agree, eventually there should be a more holistic overhaul on how all the pieces work together but this should not block the addition of this config option, especially as it is rather small change.

we said we'd want to better understand how users are going to consume the additional info added to the _ignored field. I

The tradeoff we are making here is, dropping documents vs having the info available. We should not drop documents and optimise for this first. We can work in a second step to make it easier to retrieve to the user.

felixbarny · 2023-05-22T11:14:24Z

It is not so straight-forward to understand the reason why a field name is added to the _ignored field,

Agree but that's a problem that already exists today (for example, is _ignored set because if ignore_above or ignore_malformed?). I acknowledge that this change exacerbates the problem by adding more reasons for why the _ignored field exists but this is an orthogonal problem that we'll need to look at regardless.

I don't see why we should block progress on this until we have a way to store the _ignored reason.

and also the field is not yet aggregatable.

Again, this is orthogonal and should be handled via #59946

felixbarny · 2023-05-22T16:11:30Z

@javanna do you agree on the approach of just relying on _ignored for now and look at how to improve the experience around the _ignored field as a separate step that can come later? I think even with the limitations around _ignored this improvement already greatly simplifies the experience for users when dealing with dynamic fields. Progress over perfection 🙂

When we discussed this last with the team we talked about introducing a new dynamic mode rather than a new index settings that affects how existing dynamic modes work.

I've tried that, too in this PR: #96233
While that works, it would require two new modes for dynamic for adding runtime fields and concrete fields until the limit. Therefore, I found the setting a bit more elegant as it applies to both true and runtime. But I don't have strong feelings here. If you prefer adding more modes to the dynamic option, I can change it.

elasticsearchmachine · 2024-01-25T08:45:53Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Code has changed considerably since it was reviewed, another round needed.

This is in preparation of elastic#96235. At the moment, there's no difference between MAPPING_AUTO_UPDATE and MAPPING_AUTO_UPDATE_PREFLIGHT. After the other PR is merged, when the merge reason is auto-update and if ignore_dynamic_beyond_limit is set, the merge process will only add dynamically mapped fields until the field limit is reached and ignores additional ones.

This is in preparation of #96235. At the moment, there's no difference between MAPPING_AUTO_UPDATE and MAPPING_AUTO_UPDATE_PREFLIGHT. After the other PR is merged, when the merge reason is auto-update and if ignore_dynamic_beyond_limit is set, the merge process will only add dynamically mapped fields until the field limit is reached and ignores additional ones.

…-limit

javanna

Left a few comments, this is not far at all!

docs/reference/troubleshooting/common-issues/mapping-explosion.asciidoc

server/src/main/java/org/elasticsearch/index/IndexSettings.java

server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java

server/src/main/java/org/elasticsearch/index/mapper/MappingLookup.java

docs/reference/troubleshooting/common-issues/mapping-explosion.asciidoc

server/src/internalClusterTest/java/org/elasticsearch/index/mapper/DynamicMappingIT.java

javanna

did another round. The main bit that's left open is how we expose the new behaviour I will get back to you as soon as that's discussed and decided with the team.

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java

server/src/main/java/org/elasticsearch/index/mapper/DocumentParserContext.java

server/src/main/java/org/elasticsearch/index/mapper/MapperService.java

…-limit

javanna

LGTM!

felixbarny · 2024-02-02T05:50:26Z

@elasticmachine update branch

@javanna

Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit. This means that some field types, such as `search_as_you_type` or `percolator` count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping. This change makes it so that each field mapper only counts as one. We're still counting multi-fields. This makes it easier to understand for users why the field limit is hit. ~In addition to that, it also simplifies #96235 as it makes the implementation of `Mapper.Builder#getTotalFieldsCount` much easier and easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk of over- or under-estimating the field count of a `Mapper.Builder` in `DocumentParserContext#addDynamicMapper`, which in turn reduces the risk of data loss due to the issue described here: #96235 (comment) *Edit: due to #103865, we don't need an implementation of `getTotalFieldsCount` or `mapperSize` in `Mapper.Builder`. Still, this PR more closely aligns `Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`, which `DocumentParserContext#addDynamicMapper` uses to determine whether the field limit is hit* A potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss. We could also think about whether to apply the new counting logic only to new indices (depending on the `IndexVersion`). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in the `IndexVersion` to `MappingLookup` which previously didn't require the `IndexVersion`. This PR is meant as a conversation starter. It would also simplify #96235 but I don't think this blocks that PR in any way. I'm curious about the opinion of @javanna and @jpountz on this.

@javanna

Today, we're counting all mappers, including mappers for subfields that aren't explicitly added to the mapping towards the field limit. This means that some field types, such as `search_as_you_type` or `percolator` count as more than one field even though that's not apparent to users as they're just defining them as a single field in the mapping. This change makes it so that each field mapper only counts as one. We're still counting multi-fields. This makes it easier to understand for users why the field limit is hit. ~In addition to that, it also simplifies elastic#96235 as it makes the implementation of `Mapper.Builder#getTotalFieldsCount` much easier and easier to align with `Mapper#getTotalFieldsCount`. This reduces the risk of over- or under-estimating the field count of a `Mapper.Builder` in `DocumentParserContext#addDynamicMapper`, which in turn reduces the risk of data loss due to the issue described here: elastic#96235 (comment) *Edit: due to elastic#103865, we don't need an implementation of `getTotalFieldsCount` or `mapperSize` in `Mapper.Builder`. Still, this PR more closely aligns `Mapper#getTotalFieldsCount` with `MappingLookup#getTotalFieldsCount`, which `DocumentParserContext#addDynamicMapper` uses to determine whether the field limit is hit* A potential risk of this is that we're now effectively allowing more fields in the mapping. It may be surprising to users that more fields can be added to a mapping. Although, I'd not expect negative consequences from that. Generally, I'd expect users to be happy about any change that reduces the risk of data loss. We could also think about whether to apply the new counting logic only to new indices (depending on the `IndexVersion`). However, that would add more complexity and I'm not convinced about the value. We'd then need to maintain two different ways of counting fields and also require passing in the `IndexVersion` to `MappingLookup` which previously didn't require the `IndexVersion`. This PR is meant as a conversation starter. It would also simplify elastic#96235 but I don't think this blocks that PR in any way. I'm curious about the opinion of @javanna and @jpountz on this.

felixbarny added 2 commits May 19, 2023 19:20

Add new dynamic until_limit option

83b507c

Add setting

2993820

felixbarny requested review from ruflin, javanna and romseygeek May 20, 2023 13:00

felixbarny mentioned this pull request May 20, 2023

Add new dynamic until_limit option #96233

Closed

elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label v8.9.0 labels May 20, 2023

felixbarny changed the title ~~Ignore dynamic beyond limit~~ Add setting to ignore dynamic fields when field limit is reached May 20, 2023

felixbarny added >enhancement :Search/Mapping Index mappings, including merging and defining field types labels May 20, 2023

elasticsearchmachine added Team:Search Meta label for search team and removed needs:triage Requires assignment of a team area label labels May 20, 2023

Update docs/changelog/96235.yaml

6402346

felixbarny added 2 commits May 21, 2023 09:54

Fix test

0407747

Prevent infinite retry loops and simplify code by retrying all mappin…

677f749

…g update failures once

felixbarny commented May 21, 2023

View reviewed changes

server/src/main/java/org/elasticsearch/action/bulk/BulkPrimaryExecutionContext.java Outdated Show resolved Hide resolved

ruflin reviewed May 22, 2023

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java Outdated Show resolved Hide resolved

javanna reviewed May 22, 2023

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/DocumentParser.java Outdated Show resolved Hide resolved

Avoid catching exception

0634d26

AlexanderWert linked an issue Jan 25, 2024 that may be closed by this pull request

Add setting to ignore dynamic fields when field limit is reached #104733

Closed

siposea added the :StorageEngine/Logs You know, for Logs label Jan 25, 2024

elasticsearchmachine added the Team:StorageEngine label Jan 25, 2024

felixbarny mentioned this pull request Jan 25, 2024

Introduce MAPPING_AUTO_UPDATE merge reason #104769

Merged

Merge remote-tracking branch 'origin/main' into ignore-dynamic-beyond…

49191bf

…-limit

javanna reviewed Jan 29, 2024

View reviewed changes

felixbarny added 3 commits January 29, 2024 13:14

Add docs for index.mapping.total_fields.ignore_dynamic_beyond_limit

eade390

Align exceedsLimit with remainingFieldsUntilLimit

488befa

Make setters in IndexSettings private

58f900f

javanna reviewed Jan 31, 2024

View reviewed changes

felixbarny added 4 commits January 31, 2024 13:31

Replace AtomicInteger with a private DynamicMapperSize class

9efde90

Make MapperService#mergeMappings static again

5b462d4

Merge remote-tracking branch 'origin/main' into ignore-dynamic-beyond…

f27c88c

…-limit

Merge remote-tracking branch 'origin/main' into ignore-dynamic-beyond…

da16f8d

…-limit

javanna approved these changes Feb 1, 2024

View reviewed changes

felixbarny added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Feb 1, 2024

elasticmachine added 3 commits February 2, 2024 16:21

Merge branch 'main' into ignore-dynamic-beyond-limit

381a5d0

Merge branch 'main' into ignore-dynamic-beyond-limit

43d7412

Merge branch 'main' into ignore-dynamic-beyond-limit

059132d

elasticsearchmachine merged commit f642b8a into elastic:main Feb 2, 2024
16 checks passed

felixbarny deleted the ignore-dynamic-beyond-limit branch February 2, 2024 10:54

felixbarny mentioned this pull request Feb 6, 2024

Make total fields limit less of a nuisance to users #89911

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add setting to ignore dynamic fields when field limit is reached #96235

Add setting to ignore dynamic fields when field limit is reached #96235

felixbarny commented May 20, 2023 •

edited

github-actions bot commented May 20, 2023

elasticsearchmachine commented May 20, 2023

elasticsearchmachine commented May 20, 2023

ruflin commented May 22, 2023

felixbarny commented May 22, 2023

javanna commented May 22, 2023

ruflin commented May 22, 2023

felixbarny commented May 22, 2023

felixbarny commented May 22, 2023

elasticsearchmachine commented Jan 25, 2024

javanna left a comment

javanna left a comment

javanna left a comment

felixbarny commented Feb 2, 2024 •

edited

Add setting to ignore dynamic fields when field limit is reached #96235

Add setting to ignore dynamic fields when field limit is reached #96235

Conversation

felixbarny commented May 20, 2023 • edited

github-actions bot commented May 20, 2023

elasticsearchmachine commented May 20, 2023

elasticsearchmachine commented May 20, 2023

ruflin commented May 22, 2023

For the future

felixbarny commented May 22, 2023

javanna commented May 22, 2023

ruflin commented May 22, 2023

felixbarny commented May 22, 2023

felixbarny commented May 22, 2023

elasticsearchmachine commented Jan 25, 2024

javanna left a comment

Choose a reason for hiding this comment

javanna left a comment

Choose a reason for hiding this comment

javanna left a comment

Choose a reason for hiding this comment

felixbarny commented Feb 2, 2024 • edited

felixbarny commented May 20, 2023 •

edited

felixbarny commented Feb 2, 2024 •

edited