Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Standalone pod panic after a failed bulk insert task #24478

Closed
1 task done
zhuwenxing opened this issue May 29, 2023 · 4 comments
Closed
1 task done

[Bug]: Standalone pod panic after a failed bulk insert task #24478

zhuwenxing opened this issue May 29, 2023 · 4 comments
Assignees
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@zhuwenxing
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.2.0-20230529-c314d546
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

bulk insert state

[2023-05-29 13:51:07 - INFO - ci_test]: bulk insert state:False in 3.4916129112243652 with states: {441801171575775918: <Bulk insert state:
    - taskID          : 441801171575775918,
    - state           : Failed,
    - row_count       : 0,
    - infos           : {'files': 'uid.npy,vectors.npy,$meta.npy', 'collection': 'dynamic_schema_YQkuPHPr', 'partition': '_default', 'failed_reason': "illegal data type Int64 of numpy file for float vector field 'vectors'", 'progress_percent': '0'},
    - id_ranges       : [],
    - create_ts       : 2023-05-29 13:51:04
>} (test_bulk_insert.py:980)

error log

[2023/05/29 05:52:46.387 +00:00] [INFO] [datanode/data_node.go:1096] ["import time range"] [start_ts=0] [end_ts=18446744073709551615]
[2023/05/29 05:52:46.387 +00:00] [INFO] [importutil/import_wrapper.go:255] ["import wrapper: begin import"] [filePaths="[uid.npy,vectors.npy,$meta.npy]"] [options="{\"OnlyValidate\":false,\"TsStartPoint\":0,\"TsEndPoint\":18446744073709551615,\"IsBackup\":false}"]
[2023/05/29 05:52:46.394 +00:00] [INFO] [importutil/numpy_adapter.go:107] ["Numpy adapter: numpy header info"] [shape="[100]"] [dType=<i8] [majorVer=1] [minorVer=0] [ByteOrder=LittleEndian]
[2023/05/29 05:52:46.398 +00:00] [INFO] [importutil/numpy_adapter.go:107] ["Numpy adapter: numpy header info"] [shape="[100,128]"] [dType=<f4] [majorVer=1] [minorVer=0] [ByteOrder=LittleEndian]
[2023/05/29 05:52:46.400 +00:00] [INFO] [importutil/numpy_adapter.go:107] ["Numpy adapter: numpy header info"] [shape="[]"] [dType=<U10332] [majorVer=1] [minorVer=0] [ByteOrder=LittleEndian]
[2023/05/29 05:52:46.400 +00:00] [INFO] [datanode/data_node.go:973] ["DataNode finish import request"] ["task ID"=441801171575776448]
panic: runtime error: index out of range [0] with length 0

goroutine 99331 [running]:
github.com/milvus-io/milvus/internal/util/importutil.(*NumpyParser).validateHeader(0x448d6e0?, 0xc008ff5380?)
        /go/src/github.com/milvus-io/milvus/internal/util/importutil/numpy_parser.go:273 +0x1b2f
github.com/milvus-io/milvus/internal/util/importutil.(*NumpyParser).createReaders(0xc0009c9570, {0xc0095b2d80, 0x3, 0xc0095d6cf8?})
        /go/src/github.com/milvus-io/milvus/internal/util/importutil/numpy_parser.go:238 +0x2a5
github.com/milvus-io/milvus/internal/util/importutil.(*NumpyParser).Parse(0xc0095bd1c0?, {0xc0095b2d80, 0x3, 0x4})
        /go/src/github.com/milvus-io/milvus/internal/util/importutil/numpy_parser.go:127 +0x7c
github.com/milvus-io/milvus/internal/util/importutil.(*ImportWrapper).Import(0xc009590ab0, {0xc0095b2d80?, 0x3, 0x4}, {0x1?, 0x20000000?, 0xc000e84c00?, 0xd8?})
        /go/src/github.com/milvus-io/milvus/internal/util/importutil/import_wrapper.go:304 +0x499
github.com/milvus-io/milvus/internal/datanode.(*DataNode).Import(0xc001ec8360, {0x44a8a38?, 0xc0095af3e0}, 0xc0095ab4a0)
        /go/src/github.com/milvus-io/milvus/internal/datanode/data_node.go:1097 +0x1936
github.com/milvus-io/milvus/internal/distributed/datanode.(*Server).Import(0xf?, {0x44a8a38?, 0xc0095af3e0?}, 0x10?)
        /go/src/github.com/milvus-io/milvus/internal/distributed/datanode/service.go:379 +0x2b
github.com/milvus-io/milvus/internal/proto/datapb._DataNode_Import_Handler.func1({0x44a8a38, 0xc0095af3e0}, {0x3e76700?, 0xc0095ab4a0})
        /go/src/github.com/milvus-io/milvus/internal/proto/datapb/data_coord.pb.go:7403 +0x78
github.com/milvus-io/milvus/internal/util/logutil.UnaryTraceLoggerInterceptor({0x44a8a38?, 0xc0095af350?}, {0x3e76700, 0xc0095ab4a0}, 0x4491700?, 0xc0095994b8)
        /go/src/github.com/milvus-io/milvus/internal/util/logutil/grpc_interceptor.go:22 +0x49
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x44a8a38?, 0xc0095af350?}, {0x3e76700?, 0xc0095ab4a0?})
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.UnaryServerInterceptor.func1({0x44a8a38, 0xc0095af2c0}, {0x3e76700, 0xc0095ab4a0}, 0xc0095a8ac0?, 0xc0095a8ae0)
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tracing/opentracing/server_interceptors.go:38 +0x16a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1({0x44a8a38?, 0xc0095af2c0?}, {0x3e76700?, 0xc0095ab4a0?})
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25 +0x3a
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x44a8a38, 0xc0095af2c0}, {0x3e76700, 0xc0095ab4a0}, 0xc001172af0?, 0x3c4be00?)
        /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34 +0xbf
github.com/milvus-io/milvus/internal/proto/datapb._DataNode_Import_Handler({0x3f131e0?, 0xc0009d9720}, {0x44a8a38, 0xc0095af2c0}, 0xc0095be120, 0xc004e7b5f0)
        /go/src/github.com/milvus-io/milvus/internal/proto/datapb/data_coord.pb.go:7405 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc001620e00, {0x44b98b8, 0xc006d20000}, 0xc0095b8ea0, 0xc004e7b770, 0x5a65618, 0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:1283 +0xcfd
google.golang.org/grpc.(*Server).handleStream(0xc001620e00, {0x44b98b8, 0xc006d20000}, 0xc0095b8ea0, 0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:1620 +0xa1b
google.golang.org/grpc.(*Server).serveStreams.func1.2()
        /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /go/pkg/mod/google.golang.org/grpc@v1.46.0/server.go:920 +0x28a

Expected Behavior

the pod should not panic when the bulk insert failed

Steps To Reproduce

No response

Milvus Log

standalone.log

Anything else?

No response

@zhuwenxing zhuwenxing added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 29, 2023
@zhuwenxing zhuwenxing added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. labels May 29, 2023
@zhuwenxing
Copy link
Contributor Author

/assign @yhmo

@zhuwenxing
Copy link
Contributor Author

When I do a bulk insert with the right data type, the pod will panic again.

standalone.log

@zhuwenxing
Copy link
Contributor Author

zhuwenxing commented May 29, 2023

bulk insert file

files.zip

@yanliang567
Copy link
Contributor

/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants