Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Attributes Criterion #3378

Closed
ManyTheFish opened this issue Apr 13, 2021 · 1 comment
Closed

Optimize Attributes Criterion #3378

ManyTheFish opened this issue Apr 13, 2021 · 1 comment
Labels
enhancement New feature or improvement milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.2.0 PRs/issues solved in v1.2.0 released on 2023-06-05
Milestone

Comments

@ManyTheFish
Copy link
Member

ManyTheFish commented Apr 13, 2021

The algorithm

The algorithm based on Set iterate over intervals of position giving sets of docids (word_level_position_docids database), and choose between the different Query derivations the best interval giving docids.

Explanations

For each query, we have several Query derivations mainly because of ngrams or word-split,
to be able to fetch the best docids we have to build meta-intervals for each Query derivation and keep the best.

What is an Interval?

An interval is defined by a word a left, a right and contains a set of docids:

  • word: the word we want to find.
  • left: the left-most word-position where the searched word can match.
  • right: the right-most word-position where the searched word can match.
  • set of docids: the list of docids that match the word word at a word-position between left and right.

The interval ("hello", 16, 32) would contain a set of docids that match the word "hello" at a word-position between 16 and 32.

Merge of intervals of words in the same Query derivations to build the meta-intervals

Because each word in the same Query derivations have intervals of position containing sets of docids, we have to merge these intervals creating meta-intervals:
To create meta-intervals we have to cross merge word-intervals to keep the best relevancy when we select docids.
Attributes criterion (set)
In This schemas, we can see that farther is the intervals more cross-merge is needed to create it.

The issue

This algorithm is reset at every call of the criterion, meaning that it needs to restart the iteration over all intervals to fetch the next Set of docids.
We could keep the last state of the algorithm transforming it from a simple function to a real iterator and store it into the criterion instance.

Warning

There is some intervals-level stuff that is not explained in this issue which makes the implementation of this optimization more difficult.

self note (@LegendreM)

  • we should keep higher level iterators in the BinaryHeap

@ManyTheFish ManyTheFish self-assigned this Apr 13, 2021
@curquiza curquiza added the enhancement New feature or improvement label Aug 19, 2021
@curquiza curquiza added the performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption label Jul 5, 2022
@curquiza curquiza added the milli Related to the milli workspace label Jan 16, 2023
@curquiza curquiza transferred this issue from meilisearch/milli Jan 18, 2023
@loiclec
Copy link
Contributor

loiclec commented May 15, 2023

Fixed by #3542

@loiclec loiclec closed this as completed May 15, 2023
@curquiza curquiza added this to the v1.2.0 milestone May 15, 2023
@meili-bot meili-bot added the v1.2.0 PRs/issues solved in v1.2.0 released on 2023-06-05 label Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.2.0 PRs/issues solved in v1.2.0 released on 2023-06-05
Projects
None yet
Development

No branches or pull requests

4 participants