Skip to content
This repository was archived by the owner on Apr 22, 2020. It is now read-only.
This repository was archived by the owner on Apr 22, 2020. It is now read-only.

Parallel Label Propagation suffering from stale reads #270

@knutwalker

Description

@knutwalker

LP needs to find the partition key for each connected node in order to calculate the new partition key. The partition is loaded first from the current set of calculated partitions and falls back to the partition property at load-time and finally to the node id. The current set of calculated partitions is thread-local and not shared. If separate threads calculate the partition keys for two nodes that are connected, at least one thread reads outdated data.

CREATE (a:L {partition: 1})
CREATE (b:L {partition: 2})
CREATE (a)-[:X]->(b)
CREATE (b)-[:X]->(a)

While Thread1 calculates a, it will assign partition 2 from b to a.
While Thread2 calculates b, it will assign partition 1 from a to b.
When the individual results are merged, the result is (a {partition: 2}) and (b {partition: 1}) which is wrong.

If those are run sequentially instead, the result is either (a {partition: 1}) and (b {partition: 1}) or (a {partition: 2}) and (b {partition: 2}), depending on which node you calculate first.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions