-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak problem #677
Comments
Hey, since you say you are less experienced with rust, it would be good to understand a bit more about your problem. How long is "a long time"? Also, I don't recommend you commit in a loop, that will slow down your indexing pipeline and lead to many small segments. |
Your polling loop looks like it batches but it does not. In that case you create a lot of tiny segment. Tantivy is a bit dumb and does not know how to wait for merging threads, so you end up with an evergrowing number of merging thread and this is the source of your OOM. Can you batch your commit and see if your problem is solved? |
@petr-tik @fulmicoton After the program running several days, every indices has about 250GB data(totala about 5TB), it holds 180G memory in RES(SHR 300M). Yes, it would been killed by oom-killer when the system's memory exhausted. @fulmicoton Actually, i use a counter to control batch(if it does). |
@vsop-479 Sweet. It seems like you know how to monitor your program so we should not have too much trouble solving your issue. It sounds like your index is large. The larger your commits the better your indexing throughput. Also, if you can share what kind of data you are trying to index, please let us know! This is always awesome to here about users. |
@fulmicoton Consider there are 20 indices, maybe the amount fo merging threads(270) is normal? The data i am tring to index is some tcp flow data, which likes: |
So you have 20 indices in the same process? each with their own index writer? |
yes. |
Ok... How many threads do you have on your CPU? Setting it to 1 per index writer for inatance, will force ensure that the segment you create are larger for the same amount of memory. If this does not solve your problem... Right now, no mechanism bounds the number of merging threads. If indexing outpaces merging, which happens as the amount of data you have gets larger, you end up with more merging threads running at the same time. As your segments become larger and larger, there is less and less good reasons to actually merge them. One thing that would be reasonable to do with your use case is to write your own This is assuming that you do not have any deletes in your index. |
It's fairly fun and easy to write your own Merge policy. Let us know if you have troubles doing it. |
@fulmicoton |
@vsop-479 Can you rely on the number of documents? |
@fulmicoton |
I'm also face this issue, didn't solved yet. |
@dearsxx0918 are you sure you have the same issue? On #666, the problem you described was very different. If you share your code, we can maybe help. |
@vsop-479 I am afraid there is no way to get the size of the segments in MB. |
I can't give you the source code, but I can give you a valgrind log(massif log). |
I am not interested in the valgrind log.
|
@fulmicoton Maybe tantivy should control the size of the segments choose to be merged, and the number of threads on which to execute merge task. |
My current workaround is reload Index every commit, and there is no memory issue now. |
Hi, i'm a rust rookie. my program suffer a memory leak problem, it could been killed by kernel
after running along time.
Getting data from redis,and adding to tantivy with rust:
The text was updated successfully, but these errors were encountered: