HelpSelf-hosted Dify: No retrieval results from Knowledge Base A (200 docs) while Knowledge Base B (1 same doc) works with parent-child chunking & hybrid retrieval #33762

Misakamikoto527 · 2026-03-19T13:03:54Z

Misakamikoto527
Mar 19, 2026

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I was trying to retrieve content from my knowledge bases using parent-child chunking and hybrid retrieval in my self-hosted Dify instance. I created two knowledge bases with identical chunking and model configurations:
Knowledge Base A: 200 documents uploaded, using parent-child chunking (parent: 600 tokens with separator \n\n, child: 200 tokens with separator 。 (Chinese full stop)), configured with bge-m3 embedding model and bge-rerank reranking model.
Knowledge Base B: 1 document uploaded (the exact same document was also uploaded to Knowledge Base A), with the same chunking and model settings as Knowledge Base A.
When I run the same query (targeting content from the document that exists in both knowledge bases):
In Knowledge Base B, the document is successfully retrieved using hybrid retrieval (topK = 5, threshold = 0.3).
In Knowledge Base A, no results are returned at all for the same query. I have tested many different queries related to all 200 documents in Knowledge Base A, and none of them return any results. I have tried full-text search, vector matching, and hybrid retrieval modes for Knowledge Base A, all without success.
This is frustrating because Knowledge Base B works correctly with the same document and configuration, but Knowledge Base A (with more documents) fails to retrieve any content.

2. Additional context or comments

Deployment type: Self-hosted local deployment
Chunking parameters:
Parent chunk: 600 tokens, separator \n\n
Child chunk: 200 tokens, separator 。 (Chinese full stop)
Models configured: bge-m3 (embedding), bge-rerank (reranking)
Retrieval settings tested: Hybrid retrieval (topK = 5, threshold = 0.3), full-text search, vector matching
Key observation: The problematic document exists in both knowledge bases, so the issue appears specific to Knowledge Base A (with 200 documents) rather than the document or configuration itself.

@Misakamikoto527 · 2026-03-19T13:17:58Z

dosubot[bot]
Bot Mar 19, 2026

Hi @Misakamikoto527! I'm Dosu and I’m helping the dify team.

This looks like a known issue with Dify's retrieval when you have 100+ documents. Based on my research, here are the most likely causes and solutions:

Most Likely Cause: 100+ Document Metadata Filter Limitation

When using metadata filtering with more than 100 documents, Dify constructs filters by chaining OR conditions for document IDs. With 101+ documents, this creates an overly large filter that causes protobuf serialization errors and returns empty results. This is a confirmed bug, particularly affecting Weaviate. A maintainer acknowledged this issue and committed to submitting a fix (as of December 30, 2025).

Other Contributing Factors:

Stale index issues - Hybrid retrieval with bge-m3 and bge-rerank can suffer from stale index problems after incremental updates, returning fewer results than expected
Connection pool exhaustion - Parent-child chunking with 100+ documents can cause QueuePool exhaustion, leaving documents incompletely indexed
PostgreSQL timeouts - Large embedding tasks can trigger idle-in-transaction timeouts causing indexing failures

Recommended Troubleshooting Steps:

Check your backend logs for QueuePool limit, protobuf.message.DecodeError, or PostgreSQL timeout errors
Verify all 200 documents show "completed" indexing status - some may be stuck in "queuing" or "indexing"
Test retrieval without reranking - temporarily disable bge-rerank and set score_threshold to 0 to isolate the issue
Perform a full reindex - export your data, delete Knowledge Base A, and reimport in smaller batches (50 documents at a time)
Try vector search only (not hybrid) to determine if the issue is specific to hybrid retrieval

Additional Info Needed:

Which vector database are you using (Weaviate, Milvus, Elasticsearch, pgvector, etc.)?
What Dify version are you running?
Do any error messages appear in your backend logs during retrieval?

If you're using Weaviate, the 100+ document filter limitation is very likely your root cause. Upgrading to the latest Dify version may help, as many critical retrieval fixes were merged in December 2025 - January 2026.

To reply, just mention @dosu.

Docs are dead. Just use Dosu.

4 replies

Misakamikoto527 Mar 19, 2026
Author

@dosu,
Here is the additional information you requested:

Vector database: Weaviate
Dify version: 0.15.3
Indexing status: All 200 documents in Knowledge Base A have been successfully indexed and show a "completed" status (no documents are stuck in "queuing" or "indexing" state).

I will follow your recommended troubleshooting steps next, especially performing a full reindex by reimporting the documents in smaller batches. I'll update you if I get any results or encounter new issues!

dosubot[bot] Bot Mar 19, 2026

Thanks for the additional details! Since you're using Weaviate with 200 documents, this strongly points to the 100+ document metadata filter limitation as the root cause. This is a confirmed bug where Dify chains OR conditions for document ID filtering, and with 101+ documents, Weaviate's protobuf serialization fails, returning empty results.

Dify 0.15.3 is quite outdated (released in early 2024). Many critical fixes for retrieval, parent-child chunking, and this specific Weaviate issue were merged in late 2025. I'd strongly recommend upgrading to the latest version before doing the full reindex, as:

The 100+ document filter fix was committed after December 2025
Multiple performance optimizations (100-1000x improvements for large datasets) were added
Several parent-child chunking and hybrid retrieval bugs were resolved

If upgrading isn't feasible right now, the smaller batch reindex (50 docs per knowledge base) should work as a workaround since each batch stays under the 100-document limit.

Good luck with the troubleshooting, and please do share your results!

To reply, just mention @dosu.

Docs are dead. Just use Dosu.

Misakamikoto527 Mar 19, 2026
Author

Hi @dosu, thanks a lot for the detailed explanation and suggestions

If I upgrade to Dify version 1.13.1 while still using Weaviate as my vector database, will I be able to perform normal retrieval after uploading 200 documents into a single knowledge base? I'd like to confirm if this version already includes the fix for the 100+ document metadata filter limitation issue.

Looking forward to your confirmation!

dosubot[bot] Bot Mar 19, 2026

Great news! Yes, Dify 1.13.1 includes the fix for the 100+ document metadata filter limitation with Weaviate.

PR #30559 was merged on January 5, 2026 and is included in version 1.13.1. The fix replaced the problematic OR-chaining approach with Weaviate's contains_any method:

# Before (caused protobuf errors with 101+ docs):
where = where | Filter.by_property("document_id").equal(doc_id)

# After (fixed):
where = Filter.by_property("document_id").contains_any(doc_ids)

After upgrading to 1.13.1, you should be able to upload 200 documents into a single knowledge base and perform normal retrieval without the empty results issue. However, I'd still recommend doing a full reindex of your knowledge base after the upgrade to ensure the new indexing logic is applied cleanly to all documents.

Let me know how it goes!

To reply, just mention @dosu.

Docs are dead. Just use Dosu.

SHL185416 · 2026-03-27T06:46:06Z

SHL185416
Mar 27, 2026

Dify 1.13.0: Only one child chunk returned when multiple documents in the same knowledge base use parent-child chunking and share identical child chunks

I’ve run into the same issue:

In Dify 1.13.0 and 1.13.2, if several knowledge bases use parent-child chunking and contain identical child chunks, only one of those child chunks is returned during retrieval.
However, knowledge bases migrated from Dify 1.9.0 to 1.13.0 do not exhibit this problem.

Dify 1.13.0/1.13.2 + newly-created knowledge base: only one child chunk returned

Problem Screenshot

Both a.txt and b.txt contain 客座率, but only b.txt is returned.

Dify 1.13.0 + Migrated Knowledge Base: Multiple child chunks returned correctly

0 replies

HelpSelf-hosted Dify: No retrieval results from Knowledge Base A (200 docs) while Knowledge Base B (1 same doc) works with parent-child chunking & hybrid retrieval #33762

Uh oh!

Misakamikoto527 Mar 19, 2026

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

2. Additional context or comments

Replies: 2 comments · 4 replies

Uh oh!

Uh oh!

dosubot[bot] Bot Mar 19, 2026

Uh oh!

Misakamikoto527 Mar 19, 2026 Author

Uh oh!

Uh oh!

dosubot[bot] Bot Mar 19, 2026

Uh oh!

Misakamikoto527 Mar 19, 2026 Author

Uh oh!

Uh oh!

dosubot[bot] Bot Mar 19, 2026

Uh oh!

Uh oh!

SHL185416 Mar 27, 2026

Dify 1.13.0: Only one child chunk returned when multiple documents in the same knowledge base use parent-child chunking and share identical child chunks

Dify 1.13.0/1.13.2 + newly-created knowledge base: only one child chunk returned

Misakamikoto527
Mar 19, 2026

Replies: 2 comments 4 replies

dosubot[bot]
Bot Mar 19, 2026

Misakamikoto527 Mar 19, 2026
Author

Misakamikoto527 Mar 19, 2026
Author

SHL185416
Mar 27, 2026