New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: IngestionPipeline 0.9.36 pipline.arun to Redis Not implemented Error #10248
Comments
🤖 Hello @JoeSF49! I'm Dosu, an AI here to assist you with your inquiries and issues relating to the LlamaIndex repository. I'm here to help you navigate through bugs, answer your questions, and guide you in contributing to the project while we're waiting for a human maintainer. I've taken notice of the issue you've described and I'm currently looking into it. I'll get back to you with a thorough answer shortly. |
Ah yea, need to implement the async methods here
|
🤖 Thank you for reporting this issue. It seems like there's a problem with the implementation of the async methods in the Redis key-value store in the LlamaIndex version 0.9.36. To better understand and address this issue, could you please provide the following additional information:
This information will help us in diagnosing the problem more accurately. Thank you for your cooperation. This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
i believe it has to do with Docstore strategy in the pipeling. i also traced it to start failing at 0.9.33. it's works up to llama-index 0.9.32 ERROR: ---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[18], line 2
1 ## Async
----> 2 nodes = await pipeline.arun(documents=documents,show_progress=True)
3 # nodes = pipeline.run(documents=documents,show_progress=True)
4 print(f"Ingested {len(nodes)} Nodes")
File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/ingestion/pipeline.py:570, in IngestionPipeline.arun(self, show_progress, documents, nodes, cache_collection, in_place, store_doc_text, num_workers, **kwargs)
565 if self.docstore is not None and self.vector_store is not None:
566 if self.docstore_strategy in (
567 DocstoreStrategy.UPSERTS,
568 DocstoreStrategy.UPSERTS_AND_DELETE,
569 ):
--> 570 nodes_to_run = await self._ahandle_upserts(
571 input_nodes, store_doc_text=store_doc_text
572 )
573 elif self.docstore_strategy == DocstoreStrategy.DUPLICATES_ONLY:
574 nodes_to_run = await self._ahandle_duplicates(
575 input_nodes, store_doc_text=store_doc_text
576 )
File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/ingestion/pipeline.py:513, in IngestionPipeline._ahandle_upserts(self, nodes, store_doc_text)
509 """Handle docstore upserts by checking hashes and ids."""
510 assert self.docstore is not None
512 existing_doc_ids_before = set(
--> 513 (await self.docstore.aget_all_document_hashes()).values()
514 )
515 doc_ids_from_nodes = set()
516 deduped_nodes_to_run = {}
File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/storage/docstore/keyval_docstore.py:524, in KVDocumentStore.aget_all_document_hashes(self)
522 """Get the stored hash for all documents."""
523 hashes = {}
--> 524 for doc_id in await self._kvstore.aget_all(
525 collection=self._metadata_collection
526 ):
527 hash = await self.aget_document_hash(doc_id)
528 if hash is not None:
File /opt/miniforge/envs/pytorch/lib/python3.10/site-packages/llama_index/storage/kvstore/redis_kvstore.py:140, in RedisKVStore.aget_all(self, collection)
138 async def aget_all(self, collection: str = DEFAULT_COLLECTION) -> Dict[str, dict]:
139 """Get all values from the store."""
--> 140 raise NotImplementedError
NotImplementedError: |
As far as OSes: i use docker workloads on an ML hardware platform |
Yea as mentioned above, async redis docstore operations just need to be implemented |
@dosu-bot Hi and sorry I rarely make bug so i'm new to this process. Do I have to provided any more information or will this fix be implemented in upcoming releases? I cannot run the pipeline without the async as it fills the GPU memory during reg ".run". Thanks |
🤖 Thank you for the detailed report, JoeSF49. It seems like the issue is indeed related to the async methods not being implemented for the Redis docstore operations in the versions after 0.9.32. To help us further investigate and address this issue, could you please provide the following additional information:
This information will help us understand the context better and expedite the process of finding a solution. Thank you for your cooperation. This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Bug Description
using IngestionPipeline with docstore=RedisDocumentStore, then running pipline.arun to Redis Not implemented Error get Not implemented Error using llama-index 0.9.36. Downgrade to 0.9.30 allows me to run pipline.arun. just using simpleDirectoryRead load_data documents and passing it to run or arun.
Version
0.9.36 llama-index
Steps to Reproduce
install llama-index 0.9.36
create IngestionPipeline:
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(),
embed_model,
],
docstore=RedisDocumentStore.from_host_and_port(
"localhost", 6379, namespace="document_store"
),
vector_store=RedisVectorStore(
index_name="redis_vector_store",
index_prefix="vectore_store",
redis_url="redis://localhost:6379",
),
cache=IngestionCache(
cache=RedisCache.from_host_and_port("localhost", 6379),
collection="redis_cache",
),
docstore_strategy=DocstoreStrategy.UPSERTS,
)
attempt pipline.arun to a redis vectorstore
Error "Not implemented"
Workaround:
downgrade to llama-index 0.9.30 and both pipline.arun and pipline.run work
Relevant Logs/Tracbacks
No response
The text was updated successfully, but these errors were encountered: