Parallel Label Propagation suffering from stale reads

LP needs to find the partition key for each connected node in order to calculate the new partition key. The partition is loaded first from the current set of calculated partitions and falls back to the partition property at load-time and finally to the node id. The current set of calculated partitions is thread-local and not shared. If separate threads calculate the partition keys for two nodes that are connected, at least one thread reads outdated data.

```cypher
CREATE (a:L {partition: 1})
CREATE (b:L {partition: 2})
CREATE (a)-[:X]->(b)
CREATE (b)-[:X]->(a)
```

While Thread1 calculates `a`, it will assign partition 2 from `b` to `a`.
While Thread2 calculates `b`, it will assign partition 1 from `a` to `b`.
When the individual results are merged, the result is  `(a {partition: 2})` and `(b {partition: 1})` which is wrong.

If those are run sequentially instead, the result is either `(a {partition: 1})` and `(b {partition: 1})` or `(a {partition: 2})` and `(b {partition: 2})`, depending on which node you calculate first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel Label Propagation suffering from stale reads #270

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallel Label Propagation suffering from stale reads #270

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions