-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ParentDocumentRetriever: parent_splitter and ids are incompatible #11982
Comments
🤖 Hello, Thank you for reaching out and providing a detailed description of the issue you're facing. The The error you're encountering is due to the mismatch between the number of documents and the number of IDs you're providing. The Here's the relevant code snippet: if ids is None:
doc_ids = [str(uuid.uuid4()) for _ in documents]
if not add_to_docstore:
raise ValueError(
"If ids are not passed in, `add_to_docstore` MUST be True"
)
else:
if len(documents) != len(ids):
raise ValueError(
"Got uneven list of documents and ids. "
"If `ids` is provided, should be same length as `documents`."
)
doc_ids = ids If you don't provide IDs, the method generates random UUIDs for each document. If you do provide IDs, you need to ensure that the number of IDs matches the number of documents. If you don't want to add documents to the docstore, you must provide IDs. If you don't provide IDs and set To resolve the ValueError, you need to ensure that you provide an ID for each document if you are providing IDs, or set I hope this helps! If you have any other questions or need further clarification, please don't hesitate to ask. Best regards. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@pprados That part of the interface doesn't make sense. Not sure of a good solution right now, but I don't think that a parent splitter should be allowed as an argument |
@eyurtsev I started proposing preliminary pull-requests, to enrich lanchain before eventually accepting my proposal.
I can't valide one pull-request, because of a lint bug in code that is not mine. |
@eyurtsev |
System Info
description
With parent_splitter, it's not possible to know the number of IDs before the split.
So, it's not possible to know the ID of each fragment.
Then, it's not possible to manage the life cycle of the fragment because it's impossible to know the list of IDs associated with the original big document.
Who can help?
@eyurtsev
Information
Related Components
Reproduction
Produce:
Expected behavior
No error.
The text was updated successfully, but these errors were encountered: