-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support for chunking strategies #1081
Comments
HI @orpiske this sounds great! We are looking forward to improve this in the LC4J, so any contributions are welcome! What chunking strategies do you have in mind? |
It's great news that you have plans for Markdown (and/or AsciiDoc) and semantic splitter! Those would have been very useful for our project. In general, specialized chunkers/splitters (and/or an interface for implementing those) could be particularly helpful (i.e.: so we could deal with YAML, XML, etc) for a subset of our data. I also think that it API/services/LLM-based chucking, where we defer the chunking to an external service, could be useful. |
The interface you are looking for is |
Noted, thanks! |
Is your feature request related to a problem? Please describe.
One of the lessons we learned from a project we worked recently was that there doesn't seem to be great/widespread support for Chunking in Java. We were particularly looking for support for different chunking strategies. That could have helped us maximize our ability to store, retrieve and match data in our VectorDB.
Describe the solution you'd like
We would like to discuss with the Langchain4j community whether they having support for chunking feasible within this project and aligned with the project goals and feature set.
Describe alternatives you've considered
Among other things, we have considered creating a chunking library as a separate project, but we believe that adding a chunking library as part of Langchain4j would result in a better developer experience and would also allow the project to, more easily, implement chunking strategies that would involve LLM-based chunking.
Additional context
If the community believes that this is in line with the project, we are motivated to contribute and help maintain this feature.
The text was updated successfully, but these errors were encountered: