Do you need to file an issue?
Describe the bug
When using GraphRAG chunking with prepend_metadata: true, metadata (e.g., title) is correctly prepended to chunks only for the first document, but gets dropped for subsequent documents if multiple small documents are grouped into a single chunk.
This happens when individual document contents are smaller than the configured chunk size, causing GraphRAG to merge multiple documents into one chunk.
Current behavior:
- Multiple documents with small content are combined into a single chunk.
- Metadata (such as title) is prepended only once, corresponding to the first document.
- Content from subsequent documents appears in the same chunk without their associated metadata.
- As a result, chunk-to-document attribution becomes ambiguous or incorrect.
Expected behavior:
One of the following (or an equivalent deterministic behavior):
- Each document’s metadata should be prepended before its respective content, even if multiple documents share the same chunk.
OR
- Documents should not be merged into a single chunk when prepend_metadata: true, ensuring metadata consistency.
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
requests_per_minute: 200 # auto = 1 rpm. set to null to disable rate limiting
### Input settings ###
input:
storage:
type: file # or blob
base_dir: PLACEHOLDER
file_pattern: PLACEHOLDER
metadata:
- title
chunks:
size: PLACEHOLDER
overlap: PLACEHOLDER
encoding_model: cl100k_base
group_by_columns: [id]
prepend_metadata: true
Logs and screenshots
No response
Additional Information
- GraphRAG Version:
- Operating System:
- Python Version:
- Related Issues:
@natoverse
Do you need to file an issue?
Describe the bug
When using GraphRAG chunking with prepend_metadata: true, metadata (e.g., title) is correctly prepended to chunks only for the first document, but gets dropped for subsequent documents if multiple small documents are grouped into a single chunk.
This happens when individual document contents are smaller than the configured chunk size, causing GraphRAG to merge multiple documents into one chunk.
Current behavior:
Expected behavior:
One of the following (or an equivalent deterministic behavior):
OR
Steps to reproduce
No response
Expected Behavior
No response
GraphRAG Config Used
Logs and screenshots
No response
Additional Information
@natoverse