-
Notifications
You must be signed in to change notification settings - Fork 146
Description
When I run the Pipeline() on a loop with multiple documents, a Chunk node with an id property of ":1" and index of 1 is created for each run. This causes problems, since the ids are no longer unique.
For example, when the lexical graph gets created, a Chunk node with an id of ":1" has a NEXT_NODE relation to every Chunk node that has an id of ":2".
After running the pipeline with 4 documents, it looks like this:
The same issue is occuring with FROM_CHUNK, where an entity that's supposed to have a relation like (n:Entity)-[:FROM_CHUNK]->(c:Chunk {id: ":1", index: "1"}) actually has that relation to all documents' chunks with an index of 1.
Is there any workaround for this?
I'm guessing this issue would be solved if I could somehow pass document-specific id_prefix so each chunk gets a unique id?
neo4j-graphrag-python/src/neo4j_graphrag/experimental/components/lexical_graph.py
Lines 78 to 79 in bc6dd9c
| def chunk_id(self, chunk_index: int) -> str: | |
| return f"{self.config.id_prefix}:{chunk_index}" |
Additional info:
I use v1.2.0.
I have a standard pipeline setup that has these components.
pipe = Pipeline()
# skipping the config code
pipe.add_component(text_splitter, "splitter")
pipe.add_component(embedder, "chunk_embedder")
pipe.add_component(schema_builder, "schema")
pipe.add_component(extractor, "extractor")
pipe.add_component(writer, "writer")
pipe.add_component(resolver, "resolver")