Multi_match should not enable coordination in bool query with BM25 #18944

clintongormley · 2016-06-17T14:44:40Z

In 5.0 we use BM25, which means that query coordination should always be disabled. This works correctly with the bool query but the multi_match query enables coordination incorrectly:

PUT t/t/1
{
  "foo": "one",
  "bar": "two"
}

GET t/_search
{
  "query": {
    "multi_match": {
      "query": "one two",
      "fields": ["foo", "bar"]
    }
  },
  "explain": true
}

Returns:

            {
              "value": 0.5,
              "description": "coord(1/2)",
              "details": []
            }

The text was updated successfully, but these errors were encountered:

jimczi · 2016-06-17T17:32:36Z

In 5.0 we use BM25, which means that query coordination should always be disabled.

The default similarity is still TFIDF which is referred as classic. I opened #18948 to change the default similarity to BM25.

This works correctly with the bool query but the multi_match query enables coordination incorrectly

This is how the match_query works. It's the same on 2.x, I didn't test 1.7 but it should do the same.
Coords are disabled only when multiple terms are at the same position in the query otherwise the coords are always enabled and we rely on this functionality for the relevancy (documents matching a lot of terms are scored first). Regarding BM25, things will change since the coords are not taken into account in this similarity but this should not be considered as a bug ? To be honest I don't know what's the impact on the relevancy for queries produced by a match_query or a multi_match_query. @jpountz @rmuir WDYT ?

rmuir · 2016-06-17T19:37:53Z

This is how the match_query works.

This is why a SynonymQuery was added when defaulting to BM25 that handles this case in a more generic way for any scoring system (including classic TF/IDF):

One issue was the generation of synonym queries (posinc=0) by QueryBuilder (used by parsers). This is kind of a corner case (query-time synonyms), but we should make it nicer. The current code in trunk disables coord, which makes no sense for anything but the vector space impl. Instead, this patch adds a SynonymQuery which treats occurrences of any term as a single pseudoterm. With english wordnet as a query-time synonym dict, this query gives 12% improvement in MAP for title queries on BM25, and 2% with Classic (not significant). So its a better generic approach for synonyms that works with all scoring models.

I wanted to use BlendedTermQuery, but it seems to have problems at a glance, it tries to "take on the world", it has problems like not working with distributed scoring (doesn't consult indexsearcher for stats). Anyway this one is a different, simpler approach, which only works for a single field, and which calls tf(sum) a single time.

https://issues.apache.org/jira/browse/LUCENE-6789

Please use it :)

rpedela · 2016-06-20T03:05:43Z

I am currently using 2.3.3 and planning to start experimenting with BM25 since Lucene 6.0 makes that the default. I assumed it was ready to be used in ES as well. Is that not the case? Should I wait until 5.0 especially since I use multi_match heavily?

jimczi · 2016-06-20T07:01:25Z

Thanks for the clarification @rmuir.
@rpedela please use it as well ;) Concerning the multi_match you may want to experiment different boosts as the scoring for BM25 is different and the range of possible values differ.

jimczi · 2016-06-21T11:38:56Z

@clintongormley I think we can close this issue (please reopen if you disagree). The coords are a TF/IDF thing that was added as a countermeasure for terms with very high term frequency where the score constantly increases and never reaches a saturation point like in BM25.

clintongormley added >bug :Search/Search Search-related issues that do not fall into other categories v5.0.0-alpha4 labels Jun 17, 2016

clintongormley assigned jimczi Jun 17, 2016

jimczi mentioned this issue Jun 17, 2016

Change default similarity to BM25 #18948

Merged

jimczi closed this as completed Jun 21, 2016

clintongormley removed the v5.0.0-alpha4 label Jun 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi_match should not enable coordination in bool query with BM25 #18944

Multi_match should not enable coordination in bool query with BM25 #18944

clintongormley commented Jun 17, 2016

jimczi commented Jun 17, 2016

rmuir commented Jun 17, 2016

rpedela commented Jun 20, 2016

jimczi commented Jun 20, 2016

jimczi commented Jun 21, 2016

Multi_match should not enable coordination in bool query with BM25 #18944

Multi_match should not enable coordination in bool query with BM25 #18944

Comments

clintongormley commented Jun 17, 2016

jimczi commented Jun 17, 2016

rmuir commented Jun 17, 2016

rpedela commented Jun 20, 2016

jimczi commented Jun 20, 2016

jimczi commented Jun 21, 2016