-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: [benchmark] Continuous concurrent upsert and count(*) testing milvus OOM #31705
Comments
I think this might be as expected. you need enough memory buffer before compaction done. |
/assign @bigsheeper |
Another thing is how we can improve compaction throughput to catch up intense write/delete workload. |
I think a 8c16g standalone pod shall be able to hold 2m_128d entities with 5-concurrents upsert requests without OOM.
|
based on how fast compaction can work |
I believe this is because the memory growth is too rapid, and the quota center takes several seconds to provide feedback to the proxy interceptor. |
We may want to add insertion throughput limit earlier? could that be an option? for example 0%-60% - not limit |
Yes, we support such functionality, provided that the |
@elstic Maybe you can try to set |
milvus still crashes, here are the test details
I changed the quota parameter as you provided.
milvus resources:
client parameters:
It means to insert 2 million 128d vectors and str columns of length 100. Continuous serial count(*) and upsert (concurrency reduced to 1), upserting 2000 pieces of data at a time. |
@elstic |
I think it's irrelevant, the count operation is just added to verify if upsert causes the amount of data to increase or decrease. |
crash from upsert only.
|
@elstic Has the |
No modification, default value. |
Nope, I'll shard you a user document about quota and limits. |
So this is the back pressure of memory works? |
This case works fine in the 2.4 branch, but crashes in the master branch. server:
deploy config:
client pod: fouramf-6tth5-60-4232-milvus-standalone-5f9fd547d-m5mgs |
@bigsheeper please pay attention to this issue. |
/assign @elstic |
Verified and fixed |
The issue has resurfaced.
server:
|
@bigsheeper |
issue: #31705 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
If the request is limited by rate limiter, limiter should not "Cancel". This is because, if limited, tokens are not deducted; instead, "Cancel" operation would increase the token count. issue: #31705 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
/assign @elstic |
Verified and fixed |
Is there an existing issue for this?
Environment
Current Behavior
After inserting 2 million vectors and a string scalar, the string column is the partition key. Concurrency is 5, with continuous concurrent upserts and count(*). Each time you upsert 2000 pieces of data, milvus oom and restart.
argo task : upsert-count-5d5dk
client pod: upsert-count-5d5dk-99120586 (qa ns)
test env: 4am cluster, qa-milvus ns
resource: 8c16g
![image](https://private-user-images.githubusercontent.com/48523564/317937614-1f1320fe-fb94-46ae-afb7-ece331472f1e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjEwMDYyODMsIm5iZiI6MTcyMTAwNTk4MywicGF0aCI6Ii80ODUyMzU2NC8zMTc5Mzc2MTQtMWYxMzIwZmUtZmI5NC00NmFlLWFmYjctZWNlMzMxNDcyZjFlLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE1VDAxMTMwM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWI5Mzg1ZmYxZDAzNzBkYzE2ZTU0OTcxODY5NjczNWY5OWUwYjI3Njc4YzQyODY1MzdmZjZmMWU3YjgwNWYwZDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.LxKxQ-dzjs0Kxt7zwmJ9aNm3oZekMBMRufaR3EAwnuQ)
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: