Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: standalone panic with error panic: failed to seek, error please subcribe the channel, channel name =by-dev-rootcoord-dml during test #34221

Closed
1 task done
zhuwenxing opened this issue Jun 27, 2024 · 2 comments
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4-20240626-6423b6c7-amd64
- Deployment mode(standalone or cluster):standalone 
- MQ type(rocksmq, pulsar or kafka): kafka   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

[2024/06/26 06:53:27.793 +00:00] [DEBUG] [querynodev2/services.go:895] ["received query request"] [traceID=3bb9249ca168a84cfd5dce8788066fa1] [collectionID=450726161379583808] [shards="[by-dev-rootcoord-dml_7_450726161379583808v0]"] [outputFields="[100,1]"] [segmentIDs="[]"] [guaranteeTimestamp=450726411033903108] [mvccTimestamp=0] [isCount=false]
[2024/06/26 06:53:27.793 +00:00] [DEBUG] [querynodev2/handlers.go:201] ["start do query with channel"] [traceID=3bb9249ca168a84cfd5dce8788066fa1] [msgID=450726411033903108] [collectionID=450726161379583808] [channel=by-dev-rootcoord-dml_7_450726161379583808v0] [scope=All] [segmentIDs="[]"]
[2024/06/26 06:53:27.793 +00:00] [DEBUG] [querynodev2/services.go:895] ["received query request"] [traceID=3bb9249ca168a84cfd5dce8788066fa1] [collectionID=450726161379583808] [shards="[by-dev-rootcoord-dml_8_450726161379583808v1]"] [outputFields="[100,1]"] [segmentIDs="[]"] [guaranteeTimestamp=450726411033903108] [mvccTimestamp=0] [isCount=false]
[2024/06/26 06:53:27.793 +00:00] [DEBUG] [querynodev2/handlers.go:201] ["start do query with channel"] [traceID=3bb9249ca168a84cfd5dce8788066fa1] [msgID=450726411033903108] [collectionID=450726161379583808] [channel=by-dev-rootcoord-dml_8_450726161379583808v1] [scope=All] [segmentIDs="[]"]
[2024/06/26 06:53:27.793 +00:00] [DEBUG] [datacoord/index_service.go:892] ["GetIndexInfos successfully"] [traceID=ad70d3ebe56c1723500abac334d09c83] [collectionID=450726056263956632] [indexName=]
[2024/06/26 06:53:27.793 +00:00] [INFO] [indexnode/indexnode_service.go:244] ["Get Index Job Stats"] [traceID=0ded30ec0436bfbe6a051a0cee90e032] [unissued=0] [active=1] [slot=0]
[2024/06/26 06:53:27.794 +00:00] [WARN] [checkers/index_checker.go:127] ["failed to get indexInfo for segment"] [collectionID=450726056263956632] [segmentID=450726161378190070] [error="index not found[segmentID=450726161378190070]"]
[2024/06/26 06:53:27.793 +00:00] [ERROR] [msgdispatcher/manager.go:241] ["split failed"] [role=datanode] [nodeID=2] [vchannel=by-dev-rootcoord-dml_0_450726161379785451v1] [error="failed to seek, error please subcribe the channel, channel name =by-dev-rootcoord-dml"] [stack="github.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).split\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:241\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).Run.func1\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:171\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).Run.(*ConcurrentMap[...]).Range.func2\n\t/go/src/github.com/milvus-io/milvus/pkg/util/typeutil/map.go:54\nsync.(*Map).Range\n\t/usr/local/go/src/sync/map.go:476\ngithub.com/milvus-io/milvus/pkg/util/typeutil.(*ConcurrentMap[...]).Range\n\t/go/src/github.com/milvus-io/milvus/pkg/util/typeutil/map.go:51\ngithub.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).Run\n\t/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:170"]
[2024/06/26 06:53:27.794 +00:00] [INFO] [datacoord/index_builder.go:202] ["there is no idle indexing node, wait a minute..."]
panic: failed to seek, error please subcribe the channel, channel name =by-dev-rootcoord-dml

goroutine 16248 [running]:
panic({0x51b4040?, 0xc057eff290?})
	/usr/local/go/src/runtime/panic.go:1017 +0x3ac fp=0xc006293b10 sp=0xc006293a60 pc=0x1debb2c
github.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).split(0xc00208c360, 0xc030b79940)
	/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:242 +0x5aa fp=0xc006293cf8 sp=0xc006293b10 pc=0x45f1c4a
github.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).Run.func1({0xc00dd20000, 0x2b}, 0xc006e68270?)
	/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:171 +0x2c fp=0xc006293d30 sp=0xc006293cf8 pc=0x45f0ccc
github.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).Run.(*ConcurrentMap[...]).Range.func2({0x52aa640?, 0xc030b79940?})
	/go/src/github.com/milvus-io/milvus/pkg/util/typeutil/map.go:54 +0x4b fp=0xc006293d58 sp=0xc006293d30 pc=0x45f0c2b
sync.(*Map).Range(0xc017e81020, 0xc006293e90)
	/usr/local/go/src/sync/map.go:476 +0x228 fp=0xc006293df0 sp=0xc006293d58 pc=0x1e45728
github.com/milvus-io/milvus/pkg/util/typeutil.(*ConcurrentMap[...]).Range(...)
	/go/src/github.com/milvus-io/milvus/pkg/util/typeutil/map.go:51
github.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*dispatcherManager).Run(0xc00208c360)
	/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/manager.go:170 +0x489 fp=0xc006293fc8 sp=0xc006293df0 pc=0x45f0b49
github.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*client).Register.func2()
	/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/client.go:74 +0x25 fp=0xc006293fe0 sp=0xc006293fc8 pc=0x45ebe85
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc006293fe8 sp=0xc006293fe0 pc=0x1e25381
created by github.com/milvus-io/milvus/pkg/mq/msgdispatcher.(*client).Register in goroutine 2013
	/go/src/github.com/milvus-io/milvus/pkg/mq/msgdispatcher/client.go:74 +0x498

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/2558/pipeline
log:
artifacts-kafka-standalone-reinstall-2558-server-second-deployment-logs.tar.gz

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 27, 2024
@zhuwenxing zhuwenxing added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. labels Jun 27, 2024
@zhuwenxing zhuwenxing added this to the 2.4.6 milestone Jun 27, 2024
@bigsheeper
Copy link
Contributor

/assign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 29, 2024
@yanliang567 yanliang567 removed their assignment Jun 29, 2024
sre-ci-robot pushed a commit that referenced this issue Jul 1, 2024
Converting the same msgposition's vchannel to a pchannel multiple times
would result in an invalid pchannel, leading to seek failure and panic.
This PR:
1. Make a copy of msgposition in msgdispatcher.
2. Check if channel is already a pchannel, no further channel conversion
is performed.

issue: #34221

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Jul 1, 2024
Converting the same msgposition's vchannel to a pchannel multiple times
would result in an invalid pchannel, leading to seek failure and panic.
This PR:
1. Make a copy of msgposition in msgdispatcher.
2. Check if channel is already a pchannel, no further channel conversion
is performed.

issue: #34221

pr: #34229

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Jul 2, 2024
Converting the same msgposition's vchannel to a pchannel multiple times
would result in an invalid pchannel, leading to seek failure and panic.
This PR:
1. Make a copy of msgposition in msgdispatcher.
2. Check if channel is already a pchannel, no further channel conversion
is performed.

issue: milvus-io#34221

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
@zhuwenxing
Copy link
Contributor Author

verified and fix in 2.4-20240705-f5a0353f-amd64

sre-ci-robot pushed a commit that referenced this issue Jul 13, 2024
This PR cherry-picks the following commits that fix bugs:

- #34563
- #34230
- #34071
- #34302
- #34566

issue: #34255,
#34221,
#34068,
#34247,

pr: #34563,
#34230,
#34071,
#34302,
#34566

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: SimFG <bang.fu@zilliz.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants