Avoid exhaustively sorting buckets that will be discarded due to the "offset" search parameter #3123
Labels
milli
Related to the milli workspace
performance
Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption
v1.2.0
PRs/issues solved in v1.2.0 released on 2023-06-05
Milestone
Meilisearch works by performing a bucket sort of the documents that match a search query (as explained shortly in the Meilisearch documentation).
For example, with the following ranking rules:
Then, conceptually, the following things happen:
words
prepares the first bucket of documents. These are the documents which contain all the words from the search query. It gives this bucket to the next ranking rule,typo
.typo
prepares a sub-bucket by finding all the documents which contain these words from the search query with 0 typo. It gives this sub-bucket toproximity
.proximity
prepares a sub-sub-bucket by finding all documents where consecutive words in the query are also consecutive in the document.proximity
for its next sub-bucket. If there aren't any, we asktypo
, and finallywords
again. We go up and down the ranking rules in this way until we have exhaustively sorted enough documents.However, if a user asks for results starting from offset
500
, for example, we should apply a slightly different logic. When a ranking rule computes its bucket, it should check whether any document in this bucket could possibly be returned in the results. If not, it should discard this bucket and compute the next one.Currently, this logic that skips sorting a bucket if it doesn't intersect with the possible range of results is not implemented. In practice, it means that searches given a large
offset
parameter will be much slower than necessary.The text was updated successfully, but these errors were encountered: