Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Spark Milvus Connector DataType.ARRAY #2078

Closed
1 task done
juanandreas opened this issue May 9, 2024 · 1 comment
Closed
1 task done

[Bug]: Spark Milvus Connector DataType.ARRAY #2078

juanandreas opened this issue May 9, 2024 · 1 comment
Labels
kind/bug Something isn't working

Comments

@juanandreas
Copy link

juanandreas commented May 9, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

When I try to write data into a collection with a predefined schema, my spark job write aborts with:

scala.MatchError: Array (of class io.milvus.grpc.DataType)

When I try to write with spark-milvus connector without predefining a schema:

java.lang.Exception: Unsupported data type array

Expected Behavior

Pymilvus should be able to recognize array data types? Predefining schema should also work? Is this a bug in spark to milvus connector?

Steps/Code To Reproduce behavior

fields = [
  FieldSchema(name="id", is_primary=True, dtype=DataType.VARCHAR, max_length=100),

  FieldSchema(name="countries", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=100, max_capacity=100),

  # FieldSchema(name="vector_field", dtype=DataType.FLOAT_VECTOR, dim=705),
  FieldSchema(name="vector_field", dtype=DataType.SPARSE_FLOAT_VECTOR),

]
schema = CollectionSchema(
  fields,
  description="collection",
  enable_dynamic_field=True
)

collection = Collection(COLLECTION_NAME, schema)

df.write \
  .mode("append") \
  .option("milvus.host", MILVUS_HOST) \
  .option("milvus.port", MILVUS_PORT) \
  .option("milvus.collection.name", COLLECTION_NAME) \
  .option("milvus.collection.vectorField", "vector_field") \
  .option("milvus.collection.vectorDim", "705") \
  .option("milvus.collection.primaryKeyField", "id") \
  .option("milvus.database.name", "default") \
  .format("milvus") \
  .save()

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):
- Method of installation (Docker, or from source):
- Milvus version (v0.3.1, or v0.4.0): 2.4.1
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

@juanandreas juanandreas added the kind/bug Something isn't working label May 9, 2024
@juanandreas
Copy link
Author

Closing, moving bug ticket to spark-milvus repo:
zilliztech/spark-milvus#15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant