Replies: 2 comments 1 reply
-
|
Hi @yang2233! I'm Dosu and I’m helping the dify team. This is a confirmed behavior caused by how Dify's extraction pipeline handles What's happening: When you upload a In contrast, The file routing is determined purely by file extension in Workaround: Rename your This is admittedly not ideal since it forces you to change file formats. The underlying issue is that markdown header-based pre-segmentation runs unconditionally before custom chunking — there's currently no option to disable it when using custom separators. To reply, just mention @dosu. Docs are dead. Just use Dosu. |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for your detailed explanation. I'll go with the file format workaround for now. @dosu |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
Content
Issue Description
I'm experiencing inconsistent behavior with document chunking in Dify. The same document content, using the same custom separator
---\n, produces different chunking results depending on the file format.TXT format: Works correctly — splits into 3 chunks as expected (headers and two vulnerability fix sections)
MD format: Does NOT respect the separator — splits into 10+ small chunks
Document Structure Example
Expected Behavior
Both MD and TXT files should chunk identically when using the same custom separator
---\n.Actual Behavior
TXT file: correctly chunks into 3 sections based on
---\nMD file: ignores
---\nand creates many smaller chunks (10+)Question
Is there a way to make Markdown files respect the custom separator
---\nthe same way TXT files do?Why This Matters
I want to avoid maintaining two separate copies of the same document locally (one .md and one .txt), which leads to synchronization issues and duplicated effort.
Environment
Dify version: 1.13.2
Deployment method: Docker
Additional Context
The separator
---is also standard Markdown horizontal rule syntax. It seems Dify might be interpreting it differently in Markdown parsing vs. plain text parsing.Beta Was this translation helpful? Give feedback.
All reactions