-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing of long text documents are tricky #127
Comments
We are working on a feature that will allow the user to have multiple |
I don't know yet, how it will look like. |
in this case you would need to have your own version of AnnLiteIndexer indexing different parts in different DocArrays, but yes current implementation does not work |
could you please explain how would sub_indices work. Thanks |
@tommykoctur The subindex has been released. https://docarray.jina.ai/fundamentals/documentarray/subindex/ |
Thank you, but I don't think that this would help me. I would probably add another LMDB to store root doc information to save some space. |
Hello,
my use case is the search in long text documents.
Documents are split to chunks (lets say sentences) and each chunk has its embedding. Root document has no embedding.
I am not able to index documents with annlite indexer because of missing embedding of root document, only chunks may be indexed.
If I store documents directly to lmdb via
self._index.doc_store(0).insert(root_docs)
then when loading query flow it throws error.ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10,) + inhomogeneous part.
10 means (5 root docs, and 5 chunks together - dummy data)
Can you please help me
Thanks
The text was updated successfully, but these errors were encountered: