Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix position increment gap on phrase/prefix analyzers #70096

Merged
merged 2 commits into from
Mar 9, 2021

Conversation

romseygeek
Copy link
Contributor

Custom position increments are handled by wrapping analyzers
with a NamedAnalyzer and passing the custom increment through
to its constructor. However, phrase and prefix analyzers use
delegating analyzer wrappers to add extra filtering to their parent
analyzers, and we can't wrap analyzers multiple times because this
wrecks reuse strategies, so we unwrap the parent before passing
it to phrase and prefix builders. This unwrapping means that we
lose the custom position increments; in particular, it means that
we can end up with a position increment gap of -1, which is the
sentinel value for the unset parameter - and that means exceptions
at index time for backwards-moving positions on fields with multiple
values.

This commit removes the sentinel value and uses standard parameter
defaults and the isConfigured() method instead, plus it adds some
more comprehensive testing for position increments when combined
with phrase/prefix index options on text fields.

Fixes #70049

@romseygeek romseygeek self-assigned this Mar 8, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Mar 8, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/rest-compat

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@romseygeek romseygeek merged commit 49897be into elastic:master Mar 9, 2021
@romseygeek romseygeek deleted the bug/index-phrases branch March 9, 2021 12:10
romseygeek added a commit that referenced this pull request Mar 9, 2021
Custom position increments are handled by wrapping analyzers
with a NamedAnalyzer and passing the custom increment through
to its constructor. However, phrase and prefix analyzers use
delegating analyzer wrappers to add extra filtering to their parent
analyzers, and we can't wrap analyzers multiple times because this
wrecks reuse strategies, so we unwrap the parent before passing
it to phrase and prefix builders. This unwrapping means that we
lose the custom position increments; in particular, it means that
we can end up with a position increment gap of -1, which is the
sentinel value for the unset parameter - and that means exceptions
at index time for backwards-moving positions on fields with multiple
values.

This commit removes the sentinel value and uses standard parameter
defaults and the isConfigured() method instead, plus it adds some
more comprehensive testing for position increments when combined
with phrase/prefix index options on text fields.

Fixes #70049
romseygeek added a commit that referenced this pull request Mar 9, 2021
Custom position increments are handled by wrapping analyzers
with a NamedAnalyzer and passing the custom increment through
to its constructor. However, phrase and prefix analyzers use
delegating analyzer wrappers to add extra filtering to their parent
analyzers, and we can't wrap analyzers multiple times because this
wrecks reuse strategies, so we unwrap the parent before passing
it to phrase and prefix builders. This unwrapping means that we
lose the custom position increments; in particular, it means that
we can end up with a position increment gap of -1, which is the
sentinel value for the unset parameter - and that means exceptions
at index time for backwards-moving positions on fields with multiple
values.

This commit removes the sentinel value and uses standard parameter
defaults and the isConfigured() method instead, plus it adds some
more comprehensive testing for position increments when combined
with phrase/prefix index options on text fields.

Fixes #70049
easyice pushed a commit to easyice/elasticsearch that referenced this pull request Mar 25, 2021
Custom position increments are handled by wrapping analyzers
with a NamedAnalyzer and passing the custom increment through
to its constructor. However, phrase and prefix analyzers use
delegating analyzer wrappers to add extra filtering to their parent
analyzers, and we can't wrap analyzers multiple times because this
wrecks reuse strategies, so we unwrap the parent before passing
it to phrase and prefix builders. This unwrapping means that we
lose the custom position increments; in particular, it means that
we can end up with a position increment gap of -1, which is the
sentinel value for the unset parameter - and that means exceptions
at index time for backwards-moving positions on fields with multiple
values.

This commit removes the sentinel value and uses standard parameter
defaults and the isConfigured() method instead, plus it adds some
more comprehensive testing for position increments when combined
with phrase/prefix index options on text fields.

Fixes elastic#70049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Analysis How text is split into tokens Team:Search Meta label for search team v7.12.1 v7.13.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indexing doc to index with "index_phrase" mapping causes illegal argument exception
4 participants