-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EdwwgWgR] Stream schema results in batches of batchSize #435
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks fine. Just had two small concerns and some further clean-up that could be done if you like
common/src/main/java/apoc/export/cypher/formatter/AbstractCypherFormatter.java
Outdated
Show resolved
Hide resolved
common/src/main/java/apoc/export/cypher/formatter/AbstractCypherFormatter.java
Outdated
Show resolved
Hide resolved
common/src/main/java/apoc/export/cypher/formatter/AbstractCypherFormatter.java
Show resolved
Hide resolved
@@ -388,11 +389,10 @@ private long countArtificialUniques(Iterable<Node> n) { | |||
private long getArtificialUniques(Node node, long artificialUniques) { | |||
Iterator<Label> labels = node.getLabels().iterator(); | |||
boolean uniqueFound = false; | |||
while (labels.hasNext()) { | |||
while (labels.hasNext() && !uniqueFound) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this bug fix only a speed-up or can we add a test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was both I guess, I added it for the speed, although the bug wouldn't have affected the correctness of the outputted queries. I can add a test for it though :)
f86d73e
to
7fade28
Compare
The nodes and relationships weren't streaming in batches which caused an out of memory error as APOC tried to hold all the nodes/rels in memory.
Differences:
As we take nodes sequentially, there may be more groups of unwinds as previously all nodes of that label was found and in one unwind vs now they may end up in different statements due to batching (similar for rels)
There was a check for rels that assumed we knew all rels and could optimize on whether or not to filter on id, however, a batch only knows about that batch and not all rels, so that assumption no longer holds
Schema will stream before moving on to nodes, this was intended by the looks of it already, just missed the call to accept the batch
There was a bug around the counting of needed unique labels as the for loop didn't stop when it hit true (found unique), this has been corrected
I didn't add an out of memory test as I felt like this would just slow our testing down. But I did test this manually on a graph which was roughly 2.5 million nodes and 2 million rels (2x bigger than db which was reported for this to be failing on). The version without the fix doesn't work, the version with the fix did :)