Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]Using Ragflow for Document Preprocessing with custom chunking strategies #568

Open
JahnKhan opened this issue Apr 26, 2024 · 1 comment
Labels
Feature question Further information is requested

Comments

@JahnKhan
Copy link

JahnKhan commented Apr 26, 2024

Describe your problem

Hi,
i am currently working on a project where the way documents are segmented into chunks is crucial and varies depending on the specific task at hand. For example, in a dictionary, it is useful to segment the txt into word-and-explanation pairs.
I am interested in using ragflow for the preprocessing phase of my project. Specifically, i would like to know:

  1. can ragflow be configured to perform custom chunking of documents? For instance, can it segment documents based on specific delimiters or structural patterns unique for the content being processed?
  2. is it possible to use raflow solely for the purpose of preprocessing data, where i can specify how the documents should be chunked ?

i would like to have a tool, that can preprocess my documents and show me visualy how the chunks are created. Mark it on the documents itself so i can see visually how the document is segmented and if necessary, change it by only marking a smaller or bigger text area.

thank you very much

@JahnKhan JahnKhan added the question Further information is requested label Apr 26, 2024
@yingfeng
Copy link
Member

Hi,

  1. Currrently, ragflow can not adopt a customized chunking approach. But it's not a difficult requirement just according to some pattern. Perhaps we could provide that later.
  2. We are going to provide API for that purpose, to provide the chunked results through API.

@KevinHuSh KevinHuSh changed the title Using Ragflow for Document Preprocessing with custom chunking strategies [Feature Request]Using Ragflow for Document Preprocessing with custom chunking strategies Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants