[EdwwgWgR] Stream schema results in batches of batchSize #435

gem-neo4j · 2023-06-09T12:32:44Z

The nodes and relationships weren't streaming in batches which caused an out of memory error as APOC tried to hold all the nodes/rels in memory.

Differences:

As we take nodes sequentially, there may be more groups of unwinds as previously all nodes of that label was found and in one unwind vs now they may end up in different statements due to batching (similar for rels)
There was a check for rels that assumed we knew all rels and could optimize on whether or not to filter on id, however, a batch only knows about that batch and not all rels, so that assumption no longer holds
Schema will stream before moving on to nodes, this was intended by the looks of it already, just missed the call to accept the batch
There was a bug around the counting of needed unique labels as the for loop didn't stop when it hit true (found unique), this has been corrected
I didn't add an out of memory test as I felt like this would just slow our testing down. But I did test this manually on a graph which was roughly 2.5 million nodes and 2 million rels (2x bigger than db which was reported for this to be failing on). The version without the fix doesn't work, the version with the fix did :)

Lojjs

Overall, it looks fine. Just had two small concerns and some further clean-up that could be done if you like

core/src/test/java/apoc/export/cypher/ExportCypherTest.java

common/src/main/java/apoc/export/cypher/formatter/AbstractCypherFormatter.java

Lojjs · 2023-06-12T12:47:16Z

core/src/main/java/apoc/export/cypher/MultiStatementCypherSubGraphExporter.java

@@ -388,11 +389,10 @@ private long countArtificialUniques(Iterable<Node> n) {
    private long getArtificialUniques(Node node, long artificialUniques) {
        Iterator<Label> labels = node.getLabels().iterator();
        boolean uniqueFound = false;
-        while (labels.hasNext()) {
+        while (labels.hasNext() && !uniqueFound) {


Was this bug fix only a speed-up or can we add a test for it?

It was both I guess, I added it for the speed, although the bug wouldn't have affected the correctness of the outputted queries. I can add a test for it though :)

gem-neo4j added team-cypher-surface dev labels Jun 9, 2023

Lojjs self-assigned this Jun 12, 2023

Lojjs requested changes Jun 12, 2023

View reviewed changes

Lojjs approved these changes Jun 12, 2023

View reviewed changes

gem-neo4j added 2 commits June 13, 2023 08:36

[EdwwgWgR] Stream schema results in batches of batchSize

b0c78bd

[EdwwgWgR] PR review updates

7fade28

gem-neo4j force-pushed the dev_stream_in_batches_fix branch from f86d73e to 7fade28 Compare June 13, 2023 06:36

gem-neo4j merged commit a415ae9 into dev Jun 13, 2023

gem-neo4j deleted the dev_stream_in_batches_fix branch June 13, 2023 13:49

gem-neo4j mentioned this pull request Jun 14, 2023

[EdwwgWgR] Stream schema results in batches of batchSize neo4j-contrib/neo4j-apoc-procedures#3620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EdwwgWgR] Stream schema results in batches of batchSize #435

[EdwwgWgR] Stream schema results in batches of batchSize #435

gem-neo4j commented Jun 9, 2023

Lojjs left a comment

Lojjs Jun 12, 2023

gem-neo4j Jun 12, 2023

[EdwwgWgR] Stream schema results in batches of batchSize #435

[EdwwgWgR] Stream schema results in batches of batchSize #435

Conversation

gem-neo4j commented Jun 9, 2023

Lojjs left a comment

Choose a reason for hiding this comment

Lojjs Jun 12, 2023

Choose a reason for hiding this comment

gem-neo4j Jun 12, 2023

Choose a reason for hiding this comment