Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exactness ranking rule takes an excessive amount of time #3116

Closed
3 tasks
Tracked by #3111
loiclec opened this issue Nov 23, 2022 · 1 comment
Closed
3 tasks
Tracked by #3111

Exactness ranking rule takes an excessive amount of time #3116

loiclec opened this issue Nov 23, 2022 · 1 comment
Assignees
Labels
milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06
Milestone

Comments

@loiclec
Copy link
Contributor

loiclec commented Nov 23, 2022

For context, the exactness ranking rule sorts documents as follows:

  1. a document's attribute is exactly equal to the search query
  2. a document's attribute starts exactly by the search query
  3. a document contains many of the search query's exact words (i.e. as opposed to derived terms such as ngrams or split words)

It is also, by default, the last ranking rule. It is therefore very performance sensitive since it can be called many times for the same search query.

For example, I noticed that with the songs dataset, a search query consisting of ten common words would take 5 seconds to be executed. This is because of the implementation of the 3rd sorting rule, which tries to find the documents containing any combinations of 10/9/8/7/etc. of the ten words in the query. That is a total of 1022 combinations

Each of those combination is computed by doing an intersection of 5 bitmaps on average. Additionally, this very expensive operation is repeated on each call to the ranking rule, which happens a lot since exactness is the last rule, by default.

It's possible to greatly speed up the current implementation. Nevertheless, I think we might need to rethink the design of the third sorting rule.

  1. One potential solution is to simply remove the 3rd sorting rule from the implementation of exactness. This would somewhat impact the relevancy of the search results because ngrams and split words will not be disadvantaged anymore.

  2. Another potential solution is to sort documents by the number of ngrams and split words that they contain. This will be a lot easier to implement and more efficient after we redesign the query tree.


TODO

  • Implement changes in Milli
  • Release a Milli version containing these changes
  • Bump this new Milli version in Meilisearch and merge it into main
@loiclec loiclec added the performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption label Nov 23, 2022
@loiclec loiclec self-assigned this Nov 23, 2022
@loiclec loiclec mentioned this issue Nov 23, 2022
8 tasks
@curquiza curquiza transferred this issue from meilisearch/milli Nov 23, 2022
@curquiza curquiza added the milli Related to the milli workspace label Nov 23, 2022
bors bot added a commit to meilisearch/milli that referenced this issue Jan 2, 2023
709: Optimise the `ExactWords` sub-criterion within `Exactness` r=loiclec a=loiclec

# Pull Request

## Related issue
Fixes (partially) meilisearch/meilisearch#3116

## What does this PR do?
1. Reduces the algorithmic complexity of finding the documents containing N exact words from something that is exponential to something that is polynomial.
2. Cache intermediary results between different calls to the `exactness` criterion.

## Performance Results
On the `smol_songs.csv` dataset, a request containing 10 common words now takes about 60ms instead of 5 seconds to execute. For example, this is the case with this (admittedly nonsensical) request: `Rock You Hip Hop Folk World Country Electronic Love The`.


Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
@curquiza curquiza added this to the v1.0.0 milestone Jan 4, 2023
@curquiza
Copy link
Member

curquiza commented Jan 4, 2023

Close by #3269, which integrates milli v0.38.0, which integrates the changes for this issue.

@curquiza curquiza closed this as completed Jan 4, 2023
@meili-bot meili-bot added the v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06 label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06
Projects
None yet
Development

No branches or pull requests

3 participants