Don't use index_phrases on graph queries #44340

romseygeek · 2019-07-15T09:58:37Z

Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you
try to use a synonym filter with the index_phrases option on a text field,
you can end up with null values in a Phrase query, leading to weird
exceptions further down the querying chain. As a workaround, this commit
disables the index_phrases optimization for queries that produce token
graphs.

Fixes #43976

elasticmachine · 2019-07-15T09:58:39Z

Pinging @elastic/es-search

mayya-sharipova

Thanks @romseygeek. Makes sense

mayya-sharipova · 2019-07-17T12:55:29Z

server/src/main/java/org/elasticsearch/index/mapper/TextFieldMapper.java

+            // we can't use the index_phrases shortcut with slop, if there are gaps in the stream,
+            // or if the incoming token stream is the output of a token graph due to
+            // https://issues.apache.org/jira/browse/LUCENE-8916
+            if (indexPhrases && slop == 0 && hasGaps(stream) == false && stream.hasAttribute(BytesTermAttribute.class) == false) {


Do we also want to add stream.hasAttribute(BytesTermAttribute.class) condition for multiPhraseQuery below?

I don't know much about graph token filters, but I wonder if another way to check that index_phrases is not used with synonyms is to modify hasGaps to check also that posIncAtt.getPositionIncrement() != 0

This is a workaround for an implementation detail, namely that we represent the paths through a token graph using BytesTermAttribute instead of CharTermAttribute - once LUCENE-8916 has been fixed then this workaround should stop working, so I think we probably should keep checking for the implementation.

We don't need this for multiPhraseQuery, because the results of calling getFiniteStrings() on a token graph are always simple token streams with no stacked tokens. The bug appears because in MatchQuery.MatchQueryBuilder.analyzeGraphBoolean() we iterate through all possible paths in the graph and send them back through createFieldQuery(), which because the paths are all simple will end up delegating to either FieldType#termQuery() or FieldType#phraseQuery - multiPhraseQuery will never be encountered in this second loop, so there's no need to check for the attribute.

Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you try to use a synonym filter with the index_phrases option on a text field, you can end up with null values in a Phrase query, leading to weird exceptions further down the querying chain. As a workaround, this commit disables the index_phrases optimization for queries that produce token graphs. Fixes #43976

Lucene 8.3 included a root fix for #43976, which was temporarily fixed in elasticsearch by #44340. Since we have upgraded to 8.3 we no longer need this workaround. This commit fixes the test that was added to check the workaround, and instead checks that fields with index_phrases enabled correctly build queries when used with multi-term synonyms. Closes #47777

Don't use index_phrases on graph queries

1025786

romseygeek added >bug :Search/Search Search-related issues that do not fall into other categories v8.0.0 v7.3.0 v7.4.0 labels Jul 15, 2019

romseygeek requested a review from jimczi July 15, 2019 09:58

romseygeek self-assigned this Jul 15, 2019

jpountz added v7.3.1 and removed v7.3.0 labels Jul 15, 2019

mayya-sharipova approved these changes Jul 17, 2019

View reviewed changes

romseygeek merged commit c8ae530 into elastic:master Jul 17, 2019

romseygeek deleted the index-phrases-synonyms branch July 17, 2019 15:08

jpountz added v7.3.0 and removed v7.3.1 labels Jul 26, 2019

romseygeek mentioned this pull request Nov 20, 2019

Fix test for index phrases shortcut with multi-term synonyms #49366

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use index_phrases on graph queries #44340

Don't use index_phrases on graph queries #44340

romseygeek commented Jul 15, 2019

elasticmachine commented Jul 15, 2019

mayya-sharipova left a comment

mayya-sharipova Jul 17, 2019

mayya-sharipova Jul 17, 2019

romseygeek Jul 17, 2019

romseygeek Jul 17, 2019

Don't use index_phrases on graph queries #44340

Don't use index_phrases on graph queries #44340

Conversation

romseygeek commented Jul 15, 2019

elasticmachine commented Jul 15, 2019

mayya-sharipova left a comment

Choose a reason for hiding this comment

mayya-sharipova Jul 17, 2019

Choose a reason for hiding this comment

mayya-sharipova Jul 17, 2019

Choose a reason for hiding this comment

romseygeek Jul 17, 2019

Choose a reason for hiding this comment

romseygeek Jul 17, 2019

Choose a reason for hiding this comment