-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve indexation time when inserting documents #2203
Comments
I have meilisearch deployed on GKE and can only see it using a max of one core also. |
Hello @aariacarterweir You would be more interested in the open discussion dedicated to indexing performance. We will continuously be working on the indexing time while it's not satisfying for our users. Thanks for your interest in Meilisearch! |
Here are the benchmarks before/after my improvements on the time spend in the prefix databases. These databases and data structures are used by the engine to reduce the time spent searching for all the words (or pairs of words) that start with a given prefix. Computing those can take time and we are now using a difference between the previous and newly created prefix word FST instead. In this experiment, we indexed 80 million of songs and then sent 10x batches of 10 documents that are not known by the engine. Settings{
"searchableAttributes":
[
"title",
"album",
"artist"
],
"displayedAttributes":
[
"id",
"title",
"album",
"artist",
"genre",
"country",
"released",
"duration"
],
"criteria":
[
"words",
"typo",
"proximity",
"attribute",
"released-timestamp:desc"
],
"filterableAttributes":
[
"released-timestamp",
"duration-float",
"genre",
"country",
"artist"
]
} Before meilisearch/milli@45f5262Here is the time taken by the updates in seconds from the most recent to the least recent. We first sent the whole 80 million documents (2932s) and then sent the documents 10 by 10.
It takes 3 hours and 10 minutes to index with the previous version of the engine. After meilisearch/milli@25d7ed8Here is the time taken by the updates in seconds from the most recent to the least recent. We first sent the whole 80 million documents (3090s) and then sent the documents 10 by 10.
It takes 1 hour and 8 minutes to index with the newly patched version of the engine. |
@meilisearch/devrel-team so that you can follow the issue, it might interest you for your communication (v0.27.0, not now) |
adding to this issue, meilisearch/milli#467 should about halve the times announced by @Kerollmops ! 🏎️ |
Milli was bumped in #2244, with milli v0.24.0 containing the current improvements regarding the indexation speed. |
@Kerollmops you need to provide new metrics using milli v0.26.3 🚀 |
Hey @curquiza, I have made the same benchmarks as in #2203 (comment) and the engine is much faster. Note that I have done my benchmarks on v0.27.0 (meilisearch/milli@2aae19d, the latest commit).
It takes 51 minutes to index 80 million and 100 new documents now! Good job @meilisearch/core-team 🎉 |
🔥 🔥 🔥 I don't know why it's faster but 🔥 |
🤘🤘🤘 |
Closing this then! |
Thanks to the @Kerollmops work on Milli's side, we succeed to improve the indexation time when inserting documents in an already existing database.
See the base PR:
And the fixes to improve it
Steps:
Edit by @curquiza 16/03/2022
Another PR improved the indexation speed, meilisearch/milli#467 done by @MarinPostma
The text was updated successfully, but these errors were encountered: