HelpSelf-hosted Dify: No retrieval results from Knowledge Base A (200 docs) while Knowledge Base B (1 same doc) works with parent-child chunking & hybrid retrieval #33762
Replies: 2 comments 4 replies
-
|
Hi @Misakamikoto527! I'm Dosu and I’m helping the dify team. This looks like a known issue with Dify's retrieval when you have 100+ documents. Based on my research, here are the most likely causes and solutions: Most Likely Cause: 100+ Document Metadata Filter Limitation When using metadata filtering with more than 100 documents, Dify constructs filters by chaining OR conditions for document IDs. With 101+ documents, this creates an overly large filter that causes protobuf serialization errors and returns empty results. This is a confirmed bug, particularly affecting Weaviate. A maintainer acknowledged this issue and committed to submitting a fix (as of December 30, 2025). Other Contributing Factors:
Recommended Troubleshooting Steps:
Additional Info Needed:
If you're using Weaviate, the 100+ document filter limitation is very likely your root cause. Upgrading to the latest Dify version may help, as many critical retrieval fixes were merged in December 2025 - January 2026. To reply, just mention @dosu. Docs are dead. Just use Dosu. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
I was trying to retrieve content from my knowledge bases using parent-child chunking and hybrid retrieval in my self-hosted Dify instance. I created two knowledge bases with identical chunking and model configurations:
Knowledge Base A: 200 documents uploaded, using parent-child chunking (parent: 600 tokens with separator \n\n, child: 200 tokens with separator 。 (Chinese full stop)), configured with bge-m3 embedding model and bge-rerank reranking model.
Knowledge Base B: 1 document uploaded (the exact same document was also uploaded to Knowledge Base A), with the same chunking and model settings as Knowledge Base A.
When I run the same query (targeting content from the document that exists in both knowledge bases):
In Knowledge Base B, the document is successfully retrieved using hybrid retrieval (topK = 5, threshold = 0.3).
In Knowledge Base A, no results are returned at all for the same query. I have tested many different queries related to all 200 documents in Knowledge Base A, and none of them return any results. I have tried full-text search, vector matching, and hybrid retrieval modes for Knowledge Base A, all without success.
This is frustrating because Knowledge Base B works correctly with the same document and configuration, but Knowledge Base A (with more documents) fails to retrieve any content.
2. Additional context or comments
Deployment type: Self-hosted local deployment
Chunking parameters:
Parent chunk: 600 tokens, separator \n\n
Child chunk: 200 tokens, separator 。 (Chinese full stop)
Models configured: bge-m3 (embedding), bge-rerank (reranking)
Retrieval settings tested: Hybrid retrieval (topK = 5, threshold = 0.3), full-text search, vector matching
Key observation: The problematic document exists in both knowledge bases, so the issue appears specific to Knowledge Base A (with 200 documents) rather than the document or configuration itself.
Beta Was this translation helpful? Give feedback.
All reactions