Pipeline creates Chunks with duplicate ids when executed multiple times

When I run the `Pipeline()` on a loop with multiple documents, a Chunk node with an id property of `":1"` and index of `1` is created for each run. This causes problems, since the ids are no longer unique.

For example, when the lexical graph gets created, a Chunk node with an id of `":1"` has a NEXT_NODE relation to every Chunk node that has an id of ":2".

After running the pipeline with 4 documents, it looks like this:

<img width="895" alt="Neo4j_Aura" src="https://github.com/user-attachments/assets/0d767ee6-6333-4dfa-9322-ed5b5588f8bf">

The same issue is occuring with FROM_CHUNK, where an entity that's supposed to have a relation like `(n:Entity)-[:FROM_CHUNK]->(c:Chunk {id: ":1", index: "1"})` actually has that relation to all documents' chunks with an index of 1.

Is there any workaround for this?
I'm guessing this issue would be solved if I could somehow pass document-specific id_prefix so each chunk gets a unique id?

https://github.com/neo4j/neo4j-graphrag-python/blob/bc6dd9c7b3f8fcfffb9ed360648ea80c6cbb17dc/src/neo4j_graphrag/experimental/components/lexical_graph.py#L78-L79 

Additional info:
I use v1.2.0.
I have a standard pipeline setup that has these components.
```python
    pipe = Pipeline()
    # skipping the config code
    pipe.add_component(text_splitter, "splitter")
    pipe.add_component(embedder, "chunk_embedder")
    pipe.add_component(schema_builder, "schema")
    pipe.add_component(extractor, "extractor")
    pipe.add_component(writer, "writer")
    pipe.add_component(resolver, "resolver")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pipeline creates Chunks with duplicate ids when executed multiple times #221

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	def chunk_id(self, chunk_index: int) -> str:
	return f"{self.config.id_prefix}:{chunk_index}"

Pipeline creates Chunks with duplicate ids when executed multiple times #221

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions