Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: v2.4.0 query node auto_balance no available #32714

Open
1 task done
yesyue opened this issue Apr 29, 2024 · 9 comments
Open
1 task done

[Bug]: v2.4.0 query node auto_balance no available #32714

yesyue opened this issue Apr 29, 2024 · 9 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Milestone

Comments

@yesyue
Copy link

yesyue commented Apr 29, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.0
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):  kafka   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.7
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU:  0
- Others:

Current Behavior

My Collection loading scale of 40 million entites to the mem index. I has enabled 40 query nodes, but the data is only loaded into 2 query nodes. Here is my configuration, how to adjust it to enable automatic memory balancing allocation:
d034f7bfdac8a24c666948e9965755e

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@yesyue yesyue added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 29, 2024
@xiaofan-luan
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.0
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):  kafka   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.7
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU:  0
- Others:

Current Behavior

My Collection loading scale of 40 million entites to the mem index. I has enabled 40 query nodes, but the data is only loaded into 2 query nodes. Here is my configuration, how to adjust it to enable automatic memory balancing allocation: d034f7bfdac8a24c666948e9965755e

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

so you are saying you got 40m data but they got only 2 nodes with data?

  1. could you offer querycoord and querynode logs?
  2. could you share your segment distribution?

@xiaofan-luan
Copy link
Contributor

/assign @sunby
please help to follow

@yesyue
Copy link
Author

yesyue commented Apr 29, 2024

the attu-client show the error msg follow:

show collection failed: load segment failed, OOM if load, maxSegmentSize = 205.4959774017334 MB, memUsage = 121813.59168624878 MB, predictMemUsage = 122019.08766365051 MB, totalMem = 122880 MB thresholdFactor = 0.900000

@yesyue
Copy link
Author

yesyue commented Apr 29, 2024

segment:
assignmentExpiration: 2000
compactableProportion: 0.85
diskSegmentMaxSize: 2048
enableLevelZero: true
expansionRate: 1.25
maxBinlogFileNumber: 32
maxIdleTime: 600
maxLife: 86400
maxSize: 1024
minSizeFromIdleToSealed: 16
sealProportion: 0.12
smallProportion: 0.5

@yanliang567
Copy link
Contributor

@yesyue Could you please refer this doc to export the whole Milvus logs for investigation?
Also Could you please attach the etcd backup for investigation? Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher
/assign @yesyue
/unassign

@sre-ci-robot sre-ci-robot assigned yesyue and unassigned yanliang567 Apr 30, 2024
@yanliang567 yanliang567 added triage/needs-information Indicates an issue needs more information in order to work on it. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 30, 2024
@yanliang567 yanliang567 added this to the 2.4.1 milestone Apr 30, 2024
@xiaofan-luan
Copy link
Contributor

the attu-client show the error msg follow:

show collection failed: load segment failed, OOM if load, maxSegmentSize = 205.4959774017334 MB, memUsage = 121813.59168624878 MB, predictMemUsage = 122019.08766365051 MB, totalMem = 122880 MB thresholdFactor = 0.900000

it seems that you can not load, might be for the unbalance reason.

you can use birdwatcher to check if all the segment is sealed or indexed.
We can not help with detailed logs and info from birdwatcher.

Using birdwatcher with show segment command can help you to figure out why

@yesyue
Copy link
Author

yesyue commented May 4, 2024

only one query node mem high, and increasing

querynode (3).log
image

@yanliang567 yanliang567 modified the milestones: 2.4.1, 2.4.2 May 7, 2024
@sunby
Copy link
Contributor

sunby commented May 10, 2024

only one query node mem high, and increasing

querynode (3).log image

Can you provide querynode Segment Loaded Num and Queryable Entity Num metrics?

@sunby
Copy link
Contributor

sunby commented May 10, 2024

you can use birdwatcher and run download global-distribution command and paste generated distribution file here.

@yanliang567 yanliang567 modified the milestones: 2.4.6, 2.4.7 Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

4 participants