Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch document additions and deletions together #3440

Closed
loiclec opened this issue Jan 31, 2023 · 6 comments · Fixed by #3470 or #3670
Closed

Batch document additions and deletions together #3440

loiclec opened this issue Jan 31, 2023 · 6 comments · Fixed by #3470 or #3670
Assignees
Labels
enhancement New feature or improvement impacts docs This issue involves changes in the Meilisearch's documentation indexing performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.3.0 PRs/issues solved in v1.3.0 released on 2023-07-31
Milestone

Comments

@loiclec
Copy link
Contributor

loiclec commented Jan 31, 2023

When someone sends many document addition tasks interspersed with document deletion tasks, meilisearch is currently forced to process these tasks serially. For example, if the following tasks are sent in this order:

1. document addition
2. document addition
3. document addition
4. document deletion
5. document deletion
6. document addition

Then they will be batched as follows:

a. (1–3) document additions
b. (4–5) document deletions
c. (6) document addition

This can cause significant indexing performance problems, as we rely on incremental indexing speed to keep up with the updates. If the updates are sent quicker than meilisearch can process then, then the task queue will keep growing bigger and bigger.

Ideally, we want to batch all of these tasks into:

a. (1–6) document additions and deletions

Proposed solution

We could add a new function in milli which can accumulate document additions and deletions, respecting the order of the operations. The output of this function should be two things:

  1. A set (roaring bitmap) of documents to delete
  2. A Transform or TransformOutput containing the documents to add

Then we can use the existing indexing functions to process (1) and (2) serially.


impacted team

Can impact @meilisearch/docs-team in some part of the docs talking about auto batching

@loiclec loiclec added enhancement New feature or improvement performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption indexing labels Jan 31, 2023
@gmourier
Copy link
Member

gmourier commented Jan 31, 2023

Thank you, @loiclec!

In the future, (not saying it will be needed), could we leverage that enhancement to provide an endpoint permitting sending hybrid documents ops akka containing additions (updates/replace) and deletions in the same HTTP request?

I see it as a friction reducer in the case you receive a batch of ops (from a message queue or a file system) being mixed and you are forced to separate them by type before sending them to Meilisearch.

@loiclec
Copy link
Contributor Author

loiclec commented Jan 31, 2023

@gmourier Sure, it's technically possible :) The main challenge would be in making a good API for it.

@gmourier
Copy link
Member

gmourier commented Feb 1, 2023

Thank you @loiclec

It could be related to meilisearch/product#554

@curquiza curquiza added this to the v1.1.0 milestone Feb 6, 2023
@curquiza
Copy link
Member

curquiza commented Feb 6, 2023

We will try to be investigated and maybe done during v1, no guarantee however

@curquiza curquiza added the impacts docs This issue involves changes in the Meilisearch's documentation label Feb 6, 2023
@irevoire irevoire self-assigned this Feb 9, 2023
@bors bors bot closed this as completed in b08a49a Feb 20, 2023
@meili-bot meili-bot added the v1.1.0 PRs/issues solved in v1.1.0 released on 2023-04-03 label Apr 6, 2023
@irevoire
Copy link
Member

After an unexpected and hard-to-find/debug bug on the implementation, we decided to cancel this feature: #3667

We'll work on it again later.

@irevoire irevoire reopened this Apr 24, 2023
@irevoire irevoire modified the milestones: v1.1.0, v1.2.0 Apr 24, 2023
@curquiza curquiza removed the v1.1.0 PRs/issues solved in v1.1.0 released on 2023-04-03 label May 4, 2023
@curquiza curquiza removed this from the v1.2.0 milestone May 4, 2023
@meili-bors meili-bors bot closed this as completed in 45636d3 Jun 19, 2023
@gillian-meilisearch gillian-meilisearch added this to the v1.3.0 milestone Jun 20, 2023
@gillian-meilisearch
Copy link
Contributor

Hello everyone 👋

We have just released the first RC (release candidate) of Meilisearch containing this fix!
You can test it by using

docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:v1.3.0-rc.0

If you encounter any bugs, please report them here.
Thanks in advance for your help and your involvement in Meilisearch ❤️

🎉 The official and stable release containing this change will be available on July 31st, 2023

⚠️ RC (release candidates) are not recommended for production

@meili-bot meili-bot added the v1.3.0 PRs/issues solved in v1.3.0 released on 2023-07-31 label Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement impacts docs This issue involves changes in the Meilisearch's documentation indexing performance Related to the performance in term of search/indexation speed or RAM/CPU/Disk consumption v1.3.0 PRs/issues solved in v1.3.0 released on 2023-07-31
Projects
No open projects
Status: Done
6 participants