-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
langchain[patch]: Add Possibility to use Contextual chunk headers in Parent Document Retriever #4651
langchain[patch]: Add Possibility to use Contextual chunk headers in Parent Document Retriever #4651
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
d001d19
to
4601cb9
Compare
Nice one, thank you! |
Looks like you might need to run |
You are right, sorry. Fixed. |
@@ -159,7 +162,11 @@ export class ParentDocumentRetriever extends MultiVectorRetriever { | |||
for (let i = 0; i < parentDocs.length; i += 1) { | |||
const parentDoc = parentDocs[i]; | |||
const parentDocId = parentDocIds[i]; | |||
const subDocs = await this.childSplitter.splitDocuments([parentDoc]); | |||
const chunkHeaderOptions = chunkHeader && parentDoc?.metadata?.chunkHeader ? { | |||
chunkHeader: parentDoc.metadata.chunkHeader, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on this to be in the metadata
field of the document is a bit weird - could we make this a property on the class instead? Like this.childSplitterOptions
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore the above, I see the problem. Let me think about it a little longer - would be nice to not have a dependency on an untyped metadata key like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just have to the try the same way of passing chunkHeader as this is currently implemented in LangChainJs. It will be better solution, I will see if this can be done the same way. Thanks for spotting.
See comment, also please run |
e93e1ce
to
447dd93
Compare
1ed471a
to
97a739d
Compare
97a739d
to
c28f039
Compare
Thank you again for the suggestions. I've pushed the improved version that looks cleaner and doesn't use |
Thank you! |
Add possibility to use Contextual chunk headers in Parent Document Retriever.
This is particularly important if you have several fine-grained child chunks that need to be correctly retrieved from the vector store.