Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ignore_above option for ICUCollationKeywordFieldMapper #40413

Closed
clement-tourriere opened this issue Mar 25, 2019 · 7 comments · Fixed by #40414
Closed

Add ignore_above option for ICUCollationKeywordFieldMapper #40413

clement-tourriere opened this issue Mar 25, 2019 · 7 comments · Fixed by #40414
Labels
>enhancement :Search/Analysis How text is split into tokens

Comments

@clement-tourriere
Copy link

ICUCollationKeywordFieldMapper is not derived from KeywordFieldMapper, but from FieldMapper, so it doesn't have the ignore_above configuration.

The use case is to be able to add a sort field for specific language, ignoring long values.
For the moment the only solution is to perform this check in the client and separate the sort field from the keyword field.

I can provide a PR for this feature.

@cbuescher cbuescher added >enhancement :Search/Analysis How text is split into tokens labels Mar 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jtibshirani
Copy link
Contributor

@clement-tourriere to check my understanding of your use case: would you expect that ignore_above be applied to the input value, before collation is run?

@clement-tourriere
Copy link
Author

In fact my use case is to be able to configure a max length on text field indexed with a sort "sub field" https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html. The use case is to ignore long field (considering that they are not sortable) and in the worst case to protect against lucene byte-ref limit https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html.

It might be more logical to apply the ignore_above check after collation instead of the input value.

@jtibshirani
Copy link
Contributor

jtibshirani commented Apr 8, 2019

Thanks for the additional details. I could also see an argument for applying ignore_above before collation -- this matches the behavior for keyword fields, where we check ignore_above against the input before any normalizer is applied. I've asked other team members for their thoughts on this issue as well.

@jpountz
Copy link
Contributor

jpountz commented Apr 16, 2019

+1 to support ignore_above like keyword and make it behave similarly, ie. check the length of the original string rather than the collation. @clement-tourriere you suggested the opposite but I suspect you would be fine with this behavior, wouldn't you?

@clement-tourriere
Copy link
Author

Yes, it would be absolutely fine for our use case to check the length before collation ;)

@jtibshirani
Copy link
Contributor

Great, now that we are clear on that I will take a look at your PR.

jtibshirani pushed a commit that referenced this issue Apr 19, 2019
Add the possibility to use ignore_above parameter in ICUCollationKeywordFieldMapper.

Close #40413
jtibshirani pushed a commit that referenced this issue Apr 19, 2019
Add the possibility to use ignore_above parameter in ICUCollationKeywordFieldMapper.

Close #40413
clement-tourriere pushed a commit to opendatasoft/elasticsearch that referenced this issue May 10, 2019
clement-tourriere pushed a commit to opendatasoft/elasticsearch that referenced this issue May 21, 2019
clement-tourriere pushed a commit to opendatasoft/elasticsearch that referenced this issue May 21, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
Add the possibility to use ignore_above parameter in ICUCollationKeywordFieldMapper.

Close elastic#40413
clement-tourriere pushed a commit to opendatasoft/elasticsearch that referenced this issue Jul 19, 2019
clement-tourriere pushed a commit to opendatasoft/elasticsearch that referenced this issue Aug 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Analysis How text is split into tokens
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants