-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:zilliz-vectordb-integrated #771
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for adding support @LuciAkirami.
Can you please resolve my comments and I think we are mostly good to go then?
Hey @deshraj , I have made all the changes that were discussed. Can you please review it? |
embedchain/vectordb/zilliz.py
Outdated
else: | ||
fields = [ | ||
FieldSchema(name="id", dtype=DataType.VARCHAR, is_primary=True, max_length=512), | ||
FieldSchema(name="doc", dtype=DataType.VARCHAR, max_length=2048), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to define max_length here? Also, can we change the name of the field to text
for consistency?
Hey @deshraj , I have made all the changes that were discussed. Can you please review it? |
Thanks @LuciAkirami. Looks mostly good now. Although I seem to be getting the following error when trying to run the app: **In [11]: from embedchain import App
In [12]: from embedchain.vectordb.zilliz import ZillizVectorDB
In [13]: app = App(db=ZillizVectorDB())
2023-10-10 22:24:55,532 [pymilvus.milvus_client.milvus_client] [DEBUG] Created new connection using: 63043b8cf3314f708636d3f62489af89
In [14]: app.query("What is the net worth of Elon Musk?")
2023-10-10 22:25:07,914 [pymilvus.decorators] [ERROR] RPC error: [search], <MilvusException: (code=1, message=checkIfLoaded failed when search, collection:embedchain_store, partitions:[], err = GetCollectionInfo failed, collection = embedchain_store, err = collection 444143658175637249 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-10-10 22:25:07.862157', 'RPC error': '2023-10-10 22:25:07.914419'}>
2023-10-10 22:25:07,915 [pymilvus.milvus_client.milvus_client] [ERROR] Failed to search collection: embedchain_store
---------------------------------------------------------------------------
MilvusException Traceback (most recent call last)
[... skipping hidden 1 frame]
Cell In[14], line 1
----> 1 app.query("What is the net worth of Elon Musk?")
File ~/Projects/embedchain/embedchain/embedchain.py:484, in EmbedChain.query(self, input_query, config, dry_run, where)
466 """
467 Queries the vector database based on the given input query.
468 Gets relevant doc based on the query and then passes it to an
(...)
482 :rtype: str
483 """
--> 484 contexts = self.retrieve_from_database(input_query=input_query, config=config, where=where)
485 answer = self.llm.query(input_query=input_query, contexts=contexts, config=config, dry_run=dry_run)
File ~/Projects/embedchain/embedchain/embedchain.py:456, in EmbedChain.retrieve_from_database(self, input_query, config, where)
454 db_query = ClipProcessor.get_text_features(query=input_query)
--> 456 contents = self.db.query(
457 input_query=db_query,
458 n_results=query_config.number_documents,
459 where=where,
460 skip_embedding=(hasattr(config, "query_type") and config.query_type == "Images"),
461 )
463 return contents
File ~/Projects/embedchain/embedchain/vectordb/zilliz.py:160, in ZillizVectorDB.query(self, input_query, n_results, where, skip_embedding)
158 query_vector = input_query_vector[0]
--> 160 query_result = self.client.search(
161 collection_name=self.config.collection_name,
162 data=[query_vector],
163 limit=n_results,
164 output_fields=["text"],
165 )
167 doc_list = []
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py:259, in MilvusClient.search(self, collection_name, data, filter, limit, output_fields, search_params, timeout, **kwargs)
258 logger.error("Failed to search collection: %s", collection_name)
--> 259 raise ex from ex
261 ret = []
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py:246, in MilvusClient.search(self, collection_name, data, filter, limit, output_fields, search_params, timeout, **kwargs)
245 try:
--> 246 res = conn.search(
247 collection_name,
248 data,
249 "",
250 search_params or {},
251 expression=filter,
252 limit=limit,
253 output_fields=output_fields,
254 timeout=timeout,
255 **kwargs,
256 )
257 except Exception as ex:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:127, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
126 LOGGER.error(f"RPC error: [{inner_name}], {e}, <Time:{record_dict}>")
--> 127 raise e from e
128 except grpc.FutureTimeoutError as e:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:123, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
122 record_dict["RPC start"] = str(datetime.datetime.now())
--> 123 return func(*args, **kwargs)
124 except MilvusException as e:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:162, in tracing_request.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
161 self.set_onetime_request_id(req_id)
--> 162 return func(self, *args, **kwargs)
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:102, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(*args, **kwargs)
101 else:
--> 102 raise e from e
103 except Exception as e:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:68, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(*args, **kwargs)
67 try:
---> 68 return func(*args, **kwargs)
69 except grpc.RpcError as e:
70 # Reference: https://grpc.github.io/grpc/python/grpc.html#grpc-status-code
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:774, in GrpcHandler.search(self, collection_name, data, anns_field, param, limit, expression, partition_names, output_fields, round_decimal, timeout, **kwargs)
762 requests = Prepare.search_requests_with_expr(
763 collection_name,
764 data,
(...)
772 **kwargs,
773 )
--> 774 return self._execute_search_requests(
775 requests, timeout, round_decimal=round_decimal, **kwargs
776 )
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:735, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
734 return SearchFuture(None, None, pre_err)
--> 735 raise pre_err from pre_err
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:726, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
725 if response.status.error_code != 0:
--> 726 raise MilvusException(response.status.error_code, response.status.reason)
728 raws.append(response)
MilvusException: <MilvusException: (code=1, message=checkIfLoaded failed when search, collection:embedchain_store, partitions:[], err = GetCollectionInfo failed, collection = embedchain_store, err = collection 444143658175637249 has not been loaded to memory or load failed)>
The above exception was the direct cause of the following exception:
MilvusException Traceback (most recent call last)
[... skipping hidden 1 frame]
Cell In[14], line 1
----> 1 app.query("What is the net worth of Elon Musk?")
File ~/Projects/embedchain/embedchain/embedchain.py:484, in EmbedChain.query(self, input_query, config, dry_run, where)
466 """
467 Queries the vector database based on the given input query.
468 Gets relevant doc based on the query and then passes it to an
(...)
482 :rtype: str
483 """
--> 484 contexts = self.retrieve_from_database(input_query=input_query, config=config, where=where)
485 answer = self.llm.query(input_query=input_query, contexts=contexts, config=config, dry_run=dry_run)
File ~/Projects/embedchain/embedchain/embedchain.py:456, in EmbedChain.retrieve_from_database(self, input_query, config, where)
454 db_query = ClipProcessor.get_text_features(query=input_query)
--> 456 contents = self.db.query(
457 input_query=db_query,
458 n_results=query_config.number_documents,
459 where=where,
460 skip_embedding=(hasattr(config, "query_type") and config.query_type == "Images"),
461 )
463 return contents
File ~/Projects/embedchain/embedchain/vectordb/zilliz.py:160, in ZillizVectorDB.query(self, input_query, n_results, where, skip_embedding)
158 query_vector = input_query_vector[0]
--> 160 query_result = self.client.search(
161 collection_name=self.config.collection_name,
162 data=[query_vector],
163 limit=n_results,
164 output_fields=["text"],
165 )
167 doc_list = []
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py:259, in MilvusClient.search(self, collection_name, data, filter, limit, output_fields, search_params, timeout, **kwargs)
258 logger.error("Failed to search collection: %s", collection_name)
--> 259 raise ex from ex
261 ret = []
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py:246, in MilvusClient.search(self, collection_name, data, filter, limit, output_fields, search_params, timeout, **kwargs)
245 try:
--> 246 res = conn.search(
247 collection_name,
248 data,
249 "",
250 search_params or {},
251 expression=filter,
252 limit=limit,
253 output_fields=output_fields,
254 timeout=timeout,
255 **kwargs,
256 )
257 except Exception as ex:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:127, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
126 LOGGER.error(f"RPC error: [{inner_name}], {e}, <Time:{record_dict}>")
--> 127 raise e from e
128 except grpc.FutureTimeoutError as e:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:123, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
122 record_dict["RPC start"] = str(datetime.datetime.now())
--> 123 return func(*args, **kwargs)
124 except MilvusException as e:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:162, in tracing_request.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
161 self.set_onetime_request_id(req_id)
--> 162 return func(self, *args, **kwargs)
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:102, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(*args, **kwargs)
101 else:
--> 102 raise e from e
103 except Exception as e:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:68, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(*args, **kwargs)
67 try:
---> 68 return func(*args, **kwargs)
69 except grpc.RpcError as e:
70 # Reference: https://grpc.github.io/grpc/python/grpc.html#grpc-status-code
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:774, in GrpcHandler.search(self, collection_name, data, anns_field, param, limit, expression, partition_names, output_fields, round_decimal, timeout, **kwargs)
762 requests = Prepare.search_requests_with_expr(
763 collection_name,
764 data,
(...)
772 **kwargs,
773 )
--> 774 return self._execute_search_requests(
775 requests, timeout, round_decimal=round_decimal, **kwargs
776 )
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:735, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
734 return SearchFuture(None, None, pre_err)
--> 735 raise pre_err from pre_err
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:726, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
725 if response.status.error_code != 0:
--> 726 raise MilvusException(response.status.error_code, response.status.reason)
728 raws.append(response)
MilvusException: <MilvusException: (code=1, message=checkIfLoaded failed when search, collection:embedchain_store, partitions:[], err = GetCollectionInfo failed, collection = embedchain_store, err = collection 444143658175637249 has not been loaded to memory or load failed)>
The above exception was the direct cause of the following exception:
MilvusException Traceback (most recent call last)
Cell In[14], line 1
----> 1 app.query("What is the net worth of Elon Musk?")
File ~/Projects/embedchain/embedchain/embedchain.py:484, in EmbedChain.query(self, input_query, config, dry_run, where)
465 def query(self, input_query: str, config: BaseLlmConfig = None, dry_run=False, where: Optional[Dict] = None) -> str:
466 """
467 Queries the vector database based on the given input query.
468 Gets relevant doc based on the query and then passes it to an
(...)
482 :rtype: str
483 """
--> 484 contexts = self.retrieve_from_database(input_query=input_query, config=config, where=where)
485 answer = self.llm.query(input_query=input_query, contexts=contexts, config=config, dry_run=dry_run)
487 # Send anonymous telemetry
File ~/Projects/embedchain/embedchain/embedchain.py:456, in EmbedChain.retrieve_from_database(self, input_query, config, where)
452 from embedchain.models.clip_processor import ClipProcessor
454 db_query = ClipProcessor.get_text_features(query=input_query)
--> 456 contents = self.db.query(
457 input_query=db_query,
458 n_results=query_config.number_documents,
459 where=where,
460 skip_embedding=(hasattr(config, "query_type") and config.query_type == "Images"),
461 )
463 return contents
File ~/Projects/embedchain/embedchain/vectordb/zilliz.py:160, in ZillizVectorDB.query(self, input_query, n_results, where, skip_embedding)
157 input_query_vector = self.embedder.embedding_fn([input_query])
158 query_vector = input_query_vector[0]
--> 160 query_result = self.client.search(
161 collection_name=self.config.collection_name,
162 data=[query_vector],
163 limit=n_results,
164 output_fields=["text"],
165 )
167 doc_list = []
168 for query in query_result:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py:259, in MilvusClient.search(self, collection_name, data, filter, limit, output_fields, search_params, timeout, **kwargs)
257 except Exception as ex:
258 logger.error("Failed to search collection: %s", collection_name)
--> 259 raise ex from ex
261 ret = []
262 for hits in res:
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/milvus_client/milvus_client.py:246, in MilvusClient.search(self, collection_name, data, filter, limit, output_fields, search_params, timeout, **kwargs)
244 conn = self._get_connection()
245 try:
--> 246 res = conn.search(
247 collection_name,
248 data,
249 "",
250 search_params or {},
251 expression=filter,
252 limit=limit,
253 output_fields=output_fields,
254 timeout=timeout,
255 **kwargs,
256 )
257 except Exception as ex:
258 logger.error("Failed to search collection: %s", collection_name)
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:127, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
125 record_dict["RPC error"] = str(datetime.datetime.now())
126 LOGGER.error(f"RPC error: [{inner_name}], {e}, <Time:{record_dict}>")
--> 127 raise e from e
128 except grpc.FutureTimeoutError as e:
129 record_dict["gRPC timeout"] = str(datetime.datetime.now())
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:123, in error_handler.<locals>.wrapper.<locals>.handler(*args, **kwargs)
121 try:
122 record_dict["RPC start"] = str(datetime.datetime.now())
--> 123 return func(*args, **kwargs)
124 except MilvusException as e:
125 record_dict["RPC error"] = str(datetime.datetime.now())
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:162, in tracing_request.<locals>.wrapper.<locals>.handler(self, *args, **kwargs)
160 if req_id:
161 self.set_onetime_request_id(req_id)
--> 162 return func(self, *args, **kwargs)
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:102, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(*args, **kwargs)
100 back_off = min(back_off * back_off_multiplier, max_back_off)
101 else:
--> 102 raise e from e
103 except Exception as e:
104 raise e from e
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/decorators.py:68, in retry_on_rpc_failure.<locals>.wrapper.<locals>.handler(*args, **kwargs)
66 while True:
67 try:
---> 68 return func(*args, **kwargs)
69 except grpc.RpcError as e:
70 # Reference: https://grpc.github.io/grpc/python/grpc.html#grpc-status-code
71 if e.code() in (
72 grpc.StatusCode.DEADLINE_EXCEEDED,
73 grpc.StatusCode.PERMISSION_DENIED,
(...)
77 grpc.StatusCode.RESOURCE_EXHAUSTED,
78 ):
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:774, in GrpcHandler.search(self, collection_name, data, anns_field, param, limit, expression, partition_names, output_fields, round_decimal, timeout, **kwargs)
752 check_pass_param(
753 limit=limit,
754 round_decimal=round_decimal,
(...)
759 guarantee_timestamp=kwargs.get("guarantee_timestamp", None),
760 )
762 requests = Prepare.search_requests_with_expr(
763 collection_name,
764 data,
(...)
772 **kwargs,
773 )
--> 774 return self._execute_search_requests(
775 requests, timeout, round_decimal=round_decimal, **kwargs
776 )
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:735, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
733 if kwargs.get("_async", False):
734 return SearchFuture(None, None, pre_err)
--> 735 raise pre_err from pre_err
File ~/Projects/embedchain/.venv/lib/python3.11/site-packages/pymilvus/client/grpc_handler.py:726, in GrpcHandler._execute_search_requests(self, requests, timeout, **kwargs)
723 response = self._stub.Search(request, timeout=timeout)
725 if response.status.error_code != 0:
--> 726 raise MilvusException(response.status.error_code, response.status.reason)
728 raws.append(response)
729 round_decimal = kwargs.get("round_decimal", -1)
MilvusException: <MilvusException: (code=1, message=checkIfLoaded failed when search, collection:embedchain_store, partitions:[], err = GetCollectionInfo failed, collection = embedchain_store, err = collection 444143658175637249 has not been loaded to memory or load failed)>** Can you please resolve this issue so that we can merge it? |
Hey @deshraj . The error is because no data is being loaded into the Zilliz DB. I mean here you are querying the app without adding anything to the database right? So it's throwing out an error. For example, here I'm adding a youtube video and then querying it
This gives out the output: |
Thanks for the explanation. I think we dont want to error out in this case though. If there is no data in the db, we will want the |
Also, I can confirm that the example that you shared above works well on my system as well. |
Fixed the issue @deshraj ,
Output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome. Thanks for adding this! ❤️
@LuciAkirami looks like you have unused imports due to which the build is failing? can you please run |
@deshraj , calling the |
Its ok. You can skip doing that. Can you please incorporate this comment: #771 (comment) ? |
Codecov ReportAttention:
📢 Thoughts on this report? Let us know!. |
Thanks @LuciAkirami. Released your feature in https://github.com/embedchain/embedchain/releases/tag/v0.0.69 |
That's cool @deshraj . It's my first contribution to an open source and looking forward for more😄 |
Description
Added Zilliz as a vector database
Fixes #692
Type of change
Added a new feature
How Has This Been Tested?
Created a unit test for each possibility. But as the Vector Database must require a URI and Token, these must be provided in an .env file to run the tests
Please delete options that are not relevant.
Checklist:
Maintainer Checklist