Exactness ranking rule takes an excessive amount of time #3116
Labels
milli
Related to the milli workspace
performance
Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
v1.0.0
PRs/issues solved in v1.0.0 released on 2023-02-06
Milestone
For context, the
exactness
ranking rule sorts documents as follows:It is also, by default, the last ranking rule. It is therefore very performance sensitive since it can be called many times for the same search query.
For example, I noticed that with the songs dataset, a search query consisting of ten common words would take 5 seconds to be executed. This is because of the implementation of the 3rd sorting rule, which tries to find the documents containing any combinations of 10/9/8/7/etc. of the ten words in the query. That is a total of 1022 combinations
Each of those combination is computed by doing an intersection of 5 bitmaps on average. Additionally, this very expensive operation is repeated on each call to the ranking rule, which happens a lot since
exactness
is the last rule, by default.It's possible to greatly speed up the current implementation. Nevertheless, I think we might need to rethink the design of the third sorting rule.
One potential solution is to simply remove the 3rd sorting rule from the implementation of
exactness
. This would somewhat impact the relevancy of the search results because ngrams and split words will not be disadvantaged anymore.Another potential solution is to sort documents by the number of ngrams and split words that they contain. This will be a lot easier to implement and more efficient after we redesign the query tree.
TODO
main
The text was updated successfully, but these errors were encountered: