Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MatchingWords structure uses too much memory #3115

Closed
3 tasks
Tracked by #3111
loiclec opened this issue Nov 23, 2022 · 1 comment
Closed
3 tasks
Tracked by #3111

MatchingWords structure uses too much memory #3115

loiclec opened this issue Nov 23, 2022 · 1 comment
Assignees
Labels
milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06
Milestone

Comments

@loiclec
Copy link
Contributor

loiclec commented Nov 23, 2022

Certain search queries create very large DFAs (deterministic finite automata) in MatchingWords. They are the single biggest consumer of RAM within the execution a search query. This is because the DFAs:

  1. are duplicated.
  2. match words which exceed Meilisearch’s word length limit (200 letters iirc)

Simply deduplicating the DFAs and avoiding creating them for long words would already go a long way towards mitigating the problem. I wrote a proof of concept that deduplicates the DFAs on the branch cache_matching_words of milli. It does indeed dramatically reduce memory usage and also speed up some particular search requests significantly.


TODO

  • Implement changes in Milli
  • Release a Milli version containing these changes
  • Bump this new Milli version in Meilisearch and merge it into main
@loiclec loiclec added the performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption label Nov 23, 2022
@loiclec loiclec self-assigned this Nov 23, 2022
@loiclec loiclec mentioned this issue Nov 23, 2022
8 tasks
@curquiza curquiza transferred this issue from meilisearch/milli Nov 23, 2022
@curquiza curquiza added the milli Related to the milli workspace label Nov 23, 2022
bors bot added a commit to meilisearch/milli that referenced this issue Nov 30, 2022
708: Reduce memory usage of the MatchingWords structure r=ManyTheFish a=loiclec

# Pull Request

## Related issue
Fixes (partially) meilisearch/meilisearch#3115 

## What does this PR do?
1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. 
   This is done by deduplicating the `MatchingWord` values, which are heavy because of their inner DFA. The deduplication works by wrapping each `MatchingWord` in a reference-counted box and using a hash map to determine whether a  `MatchingWord` DFA already exists for a certain signature, or whether a new one needs to be built.
 
2. Avoid the worst-case scenario of creating a `MatchingWord` for extremely long words that cannot be indexed by milli.

Co-authored-by: Loïc Lecrenier <loic.lecrenier@me.com>
@curquiza curquiza added this to the v1.0.0 milestone Jan 4, 2023
@curquiza
Copy link
Member

curquiza commented Jan 4, 2023

Close by #3269, which integrates milli v0.38.0, which integrates the changes for this issue.

@curquiza curquiza closed this as completed Jan 4, 2023
@meili-bot meili-bot added the v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06 label Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
milli Related to the milli workspace performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.0.0 PRs/issues solved in v1.0.0 released on 2023-02-06
Projects
None yet
Development

No branches or pull requests

3 participants