Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid exhaustively sorting buckets that will be discarded due to the "offset" search parameter #3123

Closed
Tracked by #3111
loiclec opened this issue Nov 23, 2022 · 1 comment
Labels
milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.2.0 PRs/issues solved in v1.2.0 released on 2023-06-05
Milestone

Comments

@loiclec
Copy link
Contributor

loiclec commented Nov 23, 2022

Meilisearch works by performing a bucket sort of the documents that match a search query (as explained shortly in the Meilisearch documentation).

For example, with the following ranking rules:

words
typo
proximity

Then, conceptually, the following things happen:

  1. words prepares the first bucket of documents. These are the documents which contain all the words from the search query. It gives this bucket to the next ranking rule, typo.
  2. typo prepares a sub-bucket by finding all the documents which contain these words from the search query with 0 typo. It gives this sub-bucket to proximity.
  3. proximity prepares a sub-sub-bucket by finding all documents where consecutive words in the query are also consecutive in the document.
  4. Since there are no more ranking rules afterward, this sub-sub-bucket is returned. If we don't have enough results yet, we ask proximity for its next sub-bucket. If there aren't any, we ask typo, and finally words again. We go up and down the ranking rules in this way until we have exhaustively sorted enough documents.

However, if a user asks for results starting from offset 500, for example, we should apply a slightly different logic. When a ranking rule computes its bucket, it should check whether any document in this bucket could possibly be returned in the results. If not, it should discard this bucket and compute the next one.

Currently, this logic that skips sorting a bucket if it doesn't intersect with the possible range of results is not implemented. In practice, it means that searches given a large offset parameter will be much slower than necessary.

@loiclec loiclec added milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption labels Nov 23, 2022
@loiclec
Copy link
Contributor Author

loiclec commented May 15, 2023

Fixed by #3542

@loiclec loiclec closed this as completed May 15, 2023
@curquiza curquiza added this to the v1.2.0 milestone May 15, 2023
@meili-bot meili-bot added the v1.2.0 PRs/issues solved in v1.2.0 released on 2023-06-05 label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.2.0 PRs/issues solved in v1.2.0 released on 2023-06-05
Projects
None yet
Development

No branches or pull requests

3 participants