Exactness ranking rule takes an excessive amount of time #3116

loiclec · 2022-11-23T12:21:52Z

For context, the exactness ranking rule sorts documents as follows:

a document's attribute is exactly equal to the search query
a document's attribute starts exactly by the search query
a document contains many of the search query's exact words (i.e. as opposed to derived terms such as ngrams or split words)

It is also, by default, the last ranking rule. It is therefore very performance sensitive since it can be called many times for the same search query.

For example, I noticed that with the songs dataset, a search query consisting of ten common words would take 5 seconds to be executed. This is because of the implementation of the 3rd sorting rule, which tries to find the documents containing any combinations of 10/9/8/7/etc. of the ten words in the query. That is a total of 1022 combinations

Each of those combination is computed by doing an intersection of 5 bitmaps on average. Additionally, this very expensive operation is repeated on each call to the ranking rule, which happens a lot since exactness is the last rule, by default.

It's possible to greatly speed up the current implementation. Nevertheless, I think we might need to rethink the design of the third sorting rule.

One potential solution is to simply remove the 3rd sorting rule from the implementation of exactness. This would somewhat impact the relevancy of the search results because ngrams and split words will not be disadvantaged anymore.
Another potential solution is to sort documents by the number of ngrams and split words that they contain. This will be a lot easier to implement and more efficient after we redesign the query tree.

TODO

Implement changes in Milli
Release a Milli version containing these changes
Bump this new Milli version in Meilisearch and merge it into main

The text was updated successfully, but these errors were encountered:

709: Optimise the `ExactWords` sub-criterion within `Exactness` r=loiclec a=loiclec # Pull Request ## Related issue Fixes (partially) meilisearch/meilisearch#3116 ## What does this PR do? 1. Reduces the algorithmic complexity of finding the documents containing N exact words from something that is exponential to something that is polynomial. 2. Cache intermediary results between different calls to the `exactness` criterion. ## Performance Results On the `smol_songs.csv` dataset, a request containing 10 common words now takes about 60ms instead of 5 seconds to execute. For example, this is the case with this (admittedly nonsensical) request: `Rock You Hip Hop Folk World Country Electronic Love The`. Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>

curquiza · 2023-01-04T17:43:26Z

Close by #3269, which integrates milli v0.38.0, which integrates the changes for this issue.

loiclec added the performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption label Nov 23, 2022

loiclec self-assigned this Nov 23, 2022

loiclec mentioned this issue Nov 23, 2022

Search performance issues #3111

Closed

8 tasks

curquiza transferred this issue from meilisearch/milli Nov 23, 2022

curquiza added the milli Related to the milli workspace label Nov 23, 2022

loiclec mentioned this issue Nov 24, 2022

Optimise the ExactWords sub-criterion within Exactness meilisearch/milli#709

Merged

curquiza added this to the v1.0.0 milestone Jan 4, 2023

curquiza closed this as completed Jan 4, 2023

meili-bot added the v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06 label Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exactness ranking rule takes an excessive amount of time #3116

Exactness ranking rule takes an excessive amount of time #3116

loiclec commented Nov 23, 2022 •

edited

curquiza commented Jan 4, 2023

Exactness ranking rule takes an excessive amount of time #3116

Exactness ranking rule takes an excessive amount of time #3116

Comments

loiclec commented Nov 23, 2022 • edited

TODO

curquiza commented Jan 4, 2023

loiclec commented Nov 23, 2022 •

edited