Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use index_phrases on graph queries #44340

Merged
merged 1 commit into from
Jul 17, 2019

Conversation

romseygeek
Copy link
Contributor

Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you
try to use a synonym filter with the index_phrases option on a text field,
you can end up with null values in a Phrase query, leading to weird
exceptions further down the querying chain. As a workaround, this commit
disables the index_phrases optimization for queries that produce token
graphs.

Fixes #43976

@romseygeek romseygeek added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.3.0 v7.4.0 labels Jul 15, 2019
@romseygeek romseygeek requested a review from jimczi July 15, 2019 09:58
@romseygeek romseygeek self-assigned this Jul 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

@jpountz jpountz added v7.3.1 and removed v7.3.0 labels Jul 15, 2019
Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @romseygeek. Makes sense

// we can't use the index_phrases shortcut with slop, if there are gaps in the stream,
// or if the incoming token stream is the output of a token graph due to
// https://issues.apache.org/jira/browse/LUCENE-8916
if (indexPhrases && slop == 0 && hasGaps(stream) == false && stream.hasAttribute(BytesTermAttribute.class) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to add stream.hasAttribute(BytesTermAttribute.class) condition for multiPhraseQuery below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about graph token filters, but I wonder if another way to check that index_phrases is not used with synonyms is to modify hasGaps to check also that posIncAtt.getPositionIncrement() != 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for an implementation detail, namely that we represent the paths through a token graph using BytesTermAttribute instead of CharTermAttribute - once LUCENE-8916 has been fixed then this workaround should stop working, so I think we probably should keep checking for the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this for multiPhraseQuery, because the results of calling getFiniteStrings() on a token graph are always simple token streams with no stacked tokens. The bug appears because in MatchQuery.MatchQueryBuilder.analyzeGraphBoolean() we iterate through all possible paths in the graph and send them back through createFieldQuery(), which because the paths are all simple will end up delegating to either FieldType#termQuery() or FieldType#phraseQuery - multiPhraseQuery will never be encountered in this second loop, so there's no need to check for the attribute.

@romseygeek romseygeek merged commit c8ae530 into elastic:master Jul 17, 2019
@romseygeek romseygeek deleted the index-phrases-synonyms branch July 17, 2019 15:08
romseygeek added a commit that referenced this pull request Jul 17, 2019
Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you
try to use a synonym filter with the index_phrases option on a text field,
you can end up with null values in a Phrase query, leading to weird
exceptions further down the querying chain. As a workaround, this commit
disables the index_phrases optimization for queries that produce token
graphs.

Fixes #43976
romseygeek added a commit that referenced this pull request Jul 17, 2019
Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you
try to use a synonym filter with the index_phrases option on a text field,
you can end up with null values in a Phrase query, leading to weird
exceptions further down the querying chain. As a workaround, this commit
disables the index_phrases optimization for queries that produce token
graphs.

Fixes #43976
@jpountz jpountz added v7.3.0 and removed v7.3.1 labels Jul 26, 2019
romseygeek added a commit that referenced this pull request Nov 21, 2019
Lucene 8.3 included a root fix for #43976, which was temporarily fixed in elasticsearch
by #44340. Since we have upgraded to 8.3 we no longer need this workaround. This
commit fixes the test that was added to check the workaround, and instead checks that
fields with index_phrases enabled correctly build queries when used with multi-term
synonyms.

Closes #47777
romseygeek added a commit that referenced this pull request Nov 21, 2019
Lucene 8.3 included a root fix for #43976, which was temporarily fixed in elasticsearch
by #44340. Since we have upgraded to 8.3 we no longer need this workaround. This
commit fixes the test that was added to check the workaround, and instead checks that
fields with index_phrases enabled correctly build queries when used with multi-term
synonyms.

Closes #47777
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v7.3.0 v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

index_phrases together with synonym_graph filter causes NullPointerException
5 participants