Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Community] : Added SentenceWindowRetriever #20981

Closed
wants to merge 7 commits into from

Conversation

rsk2327
Copy link
Contributor

@rsk2327 rsk2327 commented Apr 28, 2024

Thank you for contributing to LangChain!

  • PR title: "package: description"

Description : Updated TextSplitter to include a new add_chunk_id argument to add a chunk_id variable into document metadata

This is a prerequisite for implementing the Sentence Window Retriever broadly across all databases as it helps easily identify neighboring chunks of text from the same text source

  • Add tests and docs
    This is a simple argument addition thats False by default and has basic logic. Specific unit tests might not be necessary

  • Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified.

If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

Updated TextSplitter to include a new add_chunk_id to add a chunk_id variable into document metadata
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Apr 28, 2024
Copy link

vercel bot commented Apr 28, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview May 3, 2024 4:33pm

@dosubot dosubot bot added Ɑ: text splitters Related to text splitters package 🤖:improvement Medium size change to existing code to handle new use-cases labels Apr 28, 2024
Updated chunk_id logic to persist chunk_id across different pages of the same source text
@rsk2327
Copy link
Contributor Author

rsk2327 commented Apr 29, 2024

@eyurtsev @baskaryan This metadata variable is a prerequisite for setting up the new implementation of the Sentence Window Retrieval method.

@rsk2327
Copy link
Contributor Author

rsk2327 commented May 3, 2024

@hwchase17 @efriis Can i get a review on this?

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels May 3, 2024
@rsk2327 rsk2327 changed the title Text Splitters : Add new metadata variable - chunk_id [Community] : Added SentenceWindowRetriever May 3, 2024
@rsk2327 rsk2327 closed this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: text splitters Related to text splitters package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant