Improving how search relevance score is calculated for multi_match queries on optional fields #81466
Labels
>enhancement
:Search/Ranking
Scoring, rescoring, rank evaluation.
Team:Search
Meta label for search team
Hi,
I might be missing something trivial here, so please bear with me; but the search relevance score for multi_match queries on optional fields (properties in a document that do not necessarily always have a value assigned to them) in specific situations seem to be very different from what end-user might expect, mainly due to the way that IDF is calculated.
Please consider the example below:
Test data:
Search Query:
If I run a search query similar to one below:
Expectation
The end-user would expect Document 9 and 10 to score higher than others, because they contain the exact two words of the search query in their optional_field
Reality
Document 1 would score better than 10, even though it only contains one of the the two words of the search query; which is the opposite of what end-users most likely expect.
A closer look at _explain
Here is the _explain results of running the same search query for Document 1:
And here is the _explain results of running the same search query for Document 10:
As you can see, Document 10 scores worse, mainly due to the lower IDF value (0.18232156). Looking closely, it's because IDF uses N, total number of documents with field: 2 instead of simply considering the total number of the documents in the index: 10.
Feature Request
I was wondering if there is any way that I could force multi_match query to consider all the documents (instead of only those that contain the field) when computing the IDF value for an optional field, hence resulting in a relevance score which is closer to the expectations of the end-users?
Disclaimer:
I originally posted this as a question on the official elasticsearch forum, and stackoverflow but so far I haven't received any suggestions that would indicate there's an easy way of doing what I've described using multi_match. However I admit there might be a better way to write a search query to achieve the intended result, one that I am not aware of and ideally does not rely on using something like combined_fields due to various limitations that would impose.
Any feedback would be greatly appreciated. Thanks.
The text was updated successfully, but these errors were encountered: