New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add limit to total number of fields in mapping #17357

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@yanjunh
Contributor

yanjunh commented Mar 27, 2016

This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable

Add limit to total number of fields in mapping
This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable
@yanjunh

This comment has been minimized.

Show comment
Hide comment
@yanjunh

yanjunh Mar 27, 2016

Contributor

related to #17203
#17357

Contributor

yanjunh commented Mar 27, 2016

related to #17203
#17357

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Mar 29, 2016

Contributor

I wonder if we really need to make 0 a special value, why can't folks just use Long.MAX_VALUE or any high number? I also wonder if this PR can reject documents that have been accepted previously on the primary and trigger a mapping update on the replica due to some cluster state delays. This might also reject mapping merges coming from the master due to concurrent indexing? @jpountz should know more here, in any case I guess we need to make this check higher up the stack I suspect.

Contributor

s1monw commented Mar 29, 2016

I wonder if we really need to make 0 a special value, why can't folks just use Long.MAX_VALUE or any high number? I also wonder if this PR can reject documents that have been accepted previously on the primary and trigger a mapping update on the replica due to some cluster state delays. This might also reject mapping merges coming from the master due to concurrent indexing? @jpountz should know more here, in any case I guess we need to make this check higher up the stack I suspect.

@@ -403,6 +406,13 @@ private void checkNestedFieldsLimit(Map<String, ObjectMapper> fullPathObjectMapp
}
}
private void checkTotalFieldsLimit(long totalMappers) {
long allowedTotalFields = indexSettings.getValue(INDEX_MAPPING_TOTAL_FIELDS_LIMIT_SETTING);
if (allowedTotalFields > 0 && allowedTotalFields < totalMappers) {

This comment has been minimized.

@jpountz

jpountz Mar 29, 2016

Contributor

Like Simon suggested, I would just do: if (allowedTotalFields < totalMappers) {. If someone needs to have many fields anyway and understands the issue, (s)he can still set index.mapping.total_fields.limit=1000000 for instance.

@jpountz

jpountz Mar 29, 2016

Contributor

Like Simon suggested, I would just do: if (allowedTotalFields < totalMappers) {. If someone needs to have many fields anyway and understands the issue, (s)he can still set index.mapping.total_fields.limit=1000000 for instance.

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Mar 29, 2016

Contributor

@simonw The code has the check under the following if statement: if (reason == MergeReason.MAPPING_UPDATE). This means that the mapping is being merged due to a call to the update mapping API. So only the master node performs these checks, and only when there is a call to the update mapping API. In all other cases, like on data nodes or on the master node when it restores mappings from disk, the MergeReason would be MergeReason.MAPPING_RECOVERY so the check will be skipped. So I think we are fine?

Contributor

jpountz commented Mar 29, 2016

@simonw The code has the check under the following if statement: if (reason == MergeReason.MAPPING_UPDATE). This means that the mapping is being merged due to a call to the update mapping API. So only the master node performs these checks, and only when there is a call to the update mapping API. In all other cases, like on data nodes or on the master node when it restores mappings from disk, the MergeReason would be MergeReason.MAPPING_RECOVERY so the check will be skipped. So I think we are fine?

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Mar 29, 2016

Contributor

@jpountz good can we add a comment there?

Contributor

s1monw commented Mar 29, 2016

@jpountz good can we add a comment there?

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Mar 29, 2016

Contributor

I will do it in a separate commit.

Contributor

jpountz commented Mar 29, 2016

I will do it in a separate commit.

@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Mar 29, 2016

Contributor

@simonw Done in c7bdfb1.

Contributor

jpountz commented Mar 29, 2016

@simonw Done in c7bdfb1.

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Mar 29, 2016

Contributor

thx @jpountz

Contributor

s1monw commented Mar 29, 2016

thx @jpountz

jpountz added a commit that referenced this pull request Mar 29, 2016

Add limit to total number of fields in mapping. #17357
This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable

Closes #11443
@jpountz

This comment has been minimized.

Show comment
Hide comment
@jpountz

jpountz Mar 29, 2016

Contributor

@yanjunh It was so close that I applied the changes I suggested and then merged. I hope you don't mind. 361adcf

Contributor

jpountz commented Mar 29, 2016

@yanjunh It was so close that I applied the changes I suggested and then merged. I hope you don't mind. 361adcf

@yanjunh

This comment has been minimized.

Show comment
Hide comment
@yanjunh

yanjunh Mar 29, 2016

Contributor

@jpountz Not at all. Thanks for merging this feature.

Contributor

yanjunh commented Mar 29, 2016

@jpountz Not at all. Thanks for merging this feature.

imotov added a commit that referenced this pull request Mar 29, 2016

Increase the total number of allowed fields in UpdateMappingIntegrati…
…onIT#testDynamicUpdates

With restriction for the total number of fields introduced in #17357 this test can fail if a large number of records is randomly selected for indexing.

@clintongormley clintongormley referenced this pull request Mar 30, 2016

Closed

Add safeguards to prevent simple user errors #11511

13 of 25 tasks complete
@segalziv

This comment has been minimized.

Show comment
Hide comment
@segalziv

segalziv Apr 19, 2016

@jpountz is there a chance that this PR can get back ported to 2.3.x ? I've discussed this with @yanjunh who pointed me to a fork 2.3.1 with this fix here: https://github.com/yanjunh/elasticsearch/tree/v2.3.1-maplimit
Thanks!

segalziv commented Apr 19, 2016

@jpountz is there a chance that this PR can get back ported to 2.3.x ? I've discussed this with @yanjunh who pointed me to a fork 2.3.1 with this fix here: https://github.com/yanjunh/elasticsearch/tree/v2.3.1-maplimit
Thanks!

@clintongormley

This comment has been minimized.

Show comment
Hide comment
@clintongormley

clintongormley Apr 19, 2016

Member

@segalziv no, this is a breaking change so we'll keep it in 5.0 only

Member

clintongormley commented Apr 19, 2016

@segalziv no, this is a breaking change so we'll keep it in 5.0 only

@segalziv

This comment has been minimized.

Show comment
Hide comment
@segalziv

segalziv commented Apr 19, 2016

ok @clintongormley , thanks anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment