You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think, even if not yet in scope for lanchianrb, this should be discussed as people will inevitably come across this problem. Especially when embedding documents with langchainrb, what is a good strategy to prevent the same document / strings being re-added repeatedly?
For a whole document i think checksums could work (although for big docs computing a checksum will increase) - but what about individual pages of a document or text chunks? Would love some guidance and maybe later down the road langchain can help with this.
The text was updated successfully, but these errors were encountered:
Thanks that's really useful. Would be great to have something like this in langchainrb. At least a basic version to start with as it is a real PITA to do this manually
I think, even if not yet in scope for lanchianrb, this should be discussed as people will inevitably come across this problem. Especially when embedding documents with langchainrb, what is a good strategy to prevent the same document / strings being re-added repeatedly?
For a whole document i think checksums could work (although for big docs computing a checksum will increase) - but what about individual pages of a document or text chunks? Would love some guidance and maybe later down the road langchain can help with this.
The text was updated successfully, but these errors were encountered: