Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EdwwgWgR] Stream schema results in batches of batchSize #435

Merged
merged 2 commits into from
Jun 13, 2023

Conversation

gem-neo4j
Copy link
Contributor

The nodes and relationships weren't streaming in batches which caused an out of memory error as APOC tried to hold all the nodes/rels in memory.

Differences:

As we take nodes sequentially, there may be more groups of unwinds as previously all nodes of that label was found and in one unwind vs now they may end up in different statements due to batching (similar for rels)
There was a check for rels that assumed we knew all rels and could optimize on whether or not to filter on id, however, a batch only knows about that batch and not all rels, so that assumption no longer holds
Schema will stream before moving on to nodes, this was intended by the looks of it already, just missed the call to accept the batch
There was a bug around the counting of needed unique labels as the for loop didn't stop when it hit true (found unique), this has been corrected
I didn't add an out of memory test as I felt like this would just slow our testing down. But I did test this manually on a graph which was roughly 2.5 million nodes and 2 million rels (2x bigger than db which was reported for this to be failing on). The version without the fix doesn't work, the version with the fix did :)

Copy link
Contributor

@Lojjs Lojjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks fine. Just had two small concerns and some further clean-up that could be done if you like

@@ -388,11 +389,10 @@ private long countArtificialUniques(Iterable<Node> n) {
private long getArtificialUniques(Node node, long artificialUniques) {
Iterator<Label> labels = node.getLabels().iterator();
boolean uniqueFound = false;
while (labels.hasNext()) {
while (labels.hasNext() && !uniqueFound) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this bug fix only a speed-up or can we add a test for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was both I guess, I added it for the speed, although the bug wouldn't have affected the correctness of the outputted queries. I can add a test for it though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants