Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION]: Import has failed when using BulkWriter() API. #1927

Closed
1 task done
steviego opened this issue Feb 7, 2024 · 1 comment
Closed
1 task done

[QUESTION]: Import has failed when using BulkWriter() API. #1927

steviego opened this issue Feb 7, 2024 · 1 comment

Comments

@steviego
Copy link

steviego commented Feb 7, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What is your question?

Failure when loading data using the BulkWrite API

I am currently deploying a milvus cluster using k8s on my server, and I have set up port forwarding for 0.0.0.0 to allow external access to the following k8s services.

my-release-milvus ClusterIP 10.96.163.96 <none> 19530/TCP,9091/TCP 4h7m
my-release-minio ClusterIP 10.108.47.114 <none> 9000/TCP 4h11m

It works fine to connect to the milvus cluster and access minio. However, when I import the parquet file uploaded to minio into the collection by referring to the bulk_writer example in pymilvus/examples, I get the following error.

The task 447549569471477180 failed, reason : failed to get file size of 'bulk_data/4abc0d48-a8aa-46a1-bdf9-141737e71893/1.parquet', error:NoSuchKey(key=bulk_data/4abc0d48-a8aa-46a1-bdf9-141737e71893/1.parquet): importing data failed

You can see that the 1.parquet file is successfully uploaded to minio.

mc ls --recursive --versions myminio/a-bucket
[2024-02-07 16:19:54 KST]  19MiB STANDARD null v1 PUT bulk_data/4abc0d48-a8aa-46a1-bdf9-141737e71893/1.parquet

I am currently using milvus v2.3.7, pymilvus v2.3.6.

What am I doing wrong?

Anything else?

No response

@steviego
Copy link
Author

steviego commented Feb 9, 2024

Oh, this is totally my fault.
The bulkAPI example in pymilvus says to use 'a-bucket', so it wasn't a problem with milvus standalone deployed with docker, but when I tried to deploy it with a k8s cluster, it was using my-release as the default bucket, so it couldn't find the object.

I found the log message shown as below from datanode

[WARN] [storage/remote_chunk_manager.go:117] ["failed to stat object"] [bucket=my-release] [path=bulk_data/3af5cab6-522c-4f4b-a357-0fade7611540/1.parquet] [error="NoSuchKey(key=bulk_data/3af5cab6-522c-4f4b-a357-0fade7611540/1.parquet)"]

@steviego steviego closed this as completed Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant