Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[master < T521-FL] Fix nested FOREACH variable shadowing bug #725

Merged
merged 16 commits into from
Jan 25, 2023

Conversation

brunos252
Copy link
Contributor

@brunos252 brunos252 commented Dec 21, 2022

[master < Task] PR

  • Check, and update documentation if necessary
  • Update changelog
  • Provide the full content or a guide for the final git message

@brunos252 brunos252 self-assigned this Dec 21, 2022
@brunos252 brunos252 linked an issue Dec 22, 2022 that may be closed by this pull request
@brunos252
Copy link
Contributor Author

As mentioned in Slack, this fix causes the following test to fail:
FOREACH (i IN [1] | FOREACH (j IN [3,4] | CREATE (i {prop: j})));
Expected output is to make 2 nodes, actual output is that nothing happens.
Neo4j has this functionality which also makes 2 nodes.

while this fix solves the issue linked, which is to be able to run:
foreach (i in range(0,100)| create (n:NODE {i: i})); match (n) with n order by n.id with collect(n) as n_ FOREACH (i in range(0, size(n_) - 2) | FOREACH (node1 in [n_[i]] | FOREACH (node2 in [n_[i+1]] | FOREACH (j in [1] | MERGE (node1)-[:foo]->(node2)))));

Personally I think the first foreach query doesn't make sense and should throw RedeclareVariableError, similar to the query MATCH (n) CREATE (n);
But aside from that, this fix allows us to use foreach with variables outside its scope, so for the second query we see node1. But to make the first query work, we would need to limit FOREACH's scope at the same time.

Copy link
Contributor

@antaljanosbenjamin antaljanosbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are on the good path, however I think there is one thing that can be confusing: the difference between the following two queries:

  • FOREACH (i IN [1] | FOREACH (j IN [3,4] | CREATE (i {prop: j})));
  • FOREACH (i IN [1] | FOREACH (j IN [3,4] | CREATE (o {prop: j})));

The first one won't create any nodes, but the second one will create 2 nodes. How could we phrase that properly for the users? In my opinion we cannot explain this behavior in a way that users can understand. As I see there are two ways to make this easier to understand for the users:

  1. Make shadowing a actual error. This is basically removing the "multi-scope" functionality we introduced with FOREACH.
  2. We should clearly check whether we are introducing a new identifier (e.g.: CREATE (i {prop:1}) introduces i) or want to reference and existing one (e.g.: CREATE (n {prop:j}) references j, not introducing it). Every time we want to create an identifier, we only operate on the local scope, but when we want to reference a variable, then we should consider the parent scopes also. After Wednesday I can have a brief look and help you if you need help.

Even though I am against of releasing this in the current state (it will create a lot of confusion and angry users in my opinion), @gitbuda what do you think?

@brunos252
Copy link
Contributor Author

In the current state, shadowing does throw an error, that we are trying to redeclare a variable. It is intuitive to me that you wouldn't use a variable that you use for iteration inside the foreach, same as using the iterator in a for loop.

Copy link
Contributor

@antaljanosbenjamin antaljanosbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was checking this PR on my local machine and somehow I messed up the pull and I was checking the wrong state of the PR.

src/query/frontend/semantic/symbol_generator.cpp Outdated Show resolved Hide resolved
@gitbuda
Copy link
Member

gitbuda commented Jan 9, 2023

Since like we are back on track here, I think we have time here 😄

@antaljanosbenjamin
Copy link
Contributor

Two more comment that popped into my mind: the functions HasSymbolLocalScope and GetOrCreateSymbolLocalScope and the whole "multiple scope" approach. My arguments:

  1. As we want to prevent shadowing, we shouldn't care whether the identifier is declared in our local scope or not.
  2. Furthermore we shouldn't care about scopes anymore. We introduced multi scopes just to support shadowing, but now we want to prevent users to shadow identifiers => Remove them, the code will be easier to understand/reason about.
  3. As we have a bug here, we might have other bugs that we just cannot came up with yet, therefore eliminating the possibility of bugs reduces the chance of keeping a bug alive.

These are only my thoughts, not a decision from team/somebody, so please feel free to comment/argue them.

@brunos252
Copy link
Contributor Author

I would agree that we don't need shadowing then, but we would need to see about edge cases, what are the arguments for having it. Do you maybe have some examples where shadowing is used?

@antaljanosbenjamin
Copy link
Contributor

The only reason we did allow shadowing is being as close as Neo4j as possible. I would be surprised if it would change anything on the expressiveness of (the our subset of) openCypher, so I think we can be aggressive on this: remove and see if anybody complains about this.

The very only reason I can think of is when a user is debugging a query and tries to nest foreaches, then it might be easier to copy-paste the inner part, but that doesn't worth keeping this (bugphrone) architecture alive in my opinion.

@brunos252
Copy link
Contributor Author

brunos252 commented Jan 16, 2023

I removed a total of 3 tests that were failing after removing local scope, coincidentally all related to FOREACH.

  1. From unit test query_semantic:
FOREACH (i in [1] | CREATE (n)) RETURN n;
FOREACH (i in [1] | CREATE (n)) RETURN i;

Note: I assumed [1] because I didn't find what i iterates over, but it is irrelevant.
Since we deleted local scope, we can now return symbols from within FOREACH.

  1. From unit test query_plan:
FOREACH(i in [1, 2] | CREATE (n)) FOREACH (j in [1, 2] | CREATE (n))

@antaljanosbenjamin

@brunos252
Copy link
Contributor Author

One inconvenience is, for:

FOREACH (i in [1, 2, 3] | CREATE (n)) RETURN n;

query returns only one node, the last one created, while all three are created.

@brunos252 brunos252 added v2.5.2 and removed v2.5.1 labels Jan 19, 2023
@gitbuda gitbuda added the bug bug label Jan 19, 2023
@gitbuda gitbuda changed the title [master < T521-FL] nested FOREACH variable shadowing bug [master < T521-FL] Fix nested FOREACH variable shadowing bug Jan 19, 2023
@antoniofilipovic
Copy link
Contributor

[master < Task] PR

  • Check, and update documentation if necessary
  • Update changelog
  • Provide the full content or a guide for the final git message

@brunos252 don't forget to link to provide a link to docs PR if there is need for one and changelog PR so I can merge this.

Copy link
Contributor

@antoniofilipovic antoniofilipovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check what is with removed test, otherwise seems like changes Benjo requested are here and implemented.

@antoniofilipovic antoniofilipovic merged commit 34dd47e into master Jan 25, 2023
@antoniofilipovic antoniofilipovic deleted the T521-FL-nested-foreach-bug branch January 25, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants