Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble with DML/overlays and ensure_source_rvar #3030

Closed
msullivan opened this issue Oct 12, 2021 · 0 comments · Fixed by #6223
Closed

Trouble with DML/overlays and ensure_source_rvar #3030

msullivan opened this issue Oct 12, 2021 · 0 comments · Fixed by #6223
Assignees
Labels
bug compiler under discussion Not ready for implementation
Projects

Comments

@msullivan
Copy link
Member

msullivan commented Oct 12, 2021

In #3023 I added an xfailed insert test. A simplified version of it is

WITH                                                                      
     F := (INSERT Subordinate {name := "!"}),                             
     Z := enumerate((F,)),                                                
SELECT Z.1.0; 

and it returns an empty set. The issue turns out to be:

  1. no source rvar for Z.1.0 gets advertised by enumerate, so the compiler generates a new_primitive_rvar and joins it in
  2. we don't detect the insert as a dml_source of Z.1.0, and so the overlay isn't applied. This is because get_nearest_dml just looks down the main path.

I think this particular case might be easy to fix by making sure that enumerate can preserve source rvars for its elements, but there are plenty of operations that won't be able to preserve source rvars, like:

WITH                                                                      
     F := (INSERT Subordinate {name := "!"}),                             
SELECT [F][0];

Part 2 could be addressed by taking a wider view of finding dml sources; perhaps by grabbing any dml source in the ir, at all. But this made me realize that there is a more general problem here.
Consider

WITH                                                                      
     A := (SELECT UpdateTest FILTER .name = 'update-test2'),              
     B := (UPDATE A SET { comment := "foo" })                             
SELECT ((SELECT {A, B} {name, comment}));

We will output

[
  {"comment": "second", "name": "update-test2"},
  {"comment": "foo", "name": "update-test2"}
]

One object is from the select, and reflects the old data, and one from the update, and reflects the new.

If we try something where we lose track of the source rvars, though:

WITH                                                                      
     A := (SELECT UpdateTest FILTER .name = 'update-test2'),              
     B := (UPDATE UpdateTest FILTER .name = 'update-test2' SET { comment := "foo" })                                                               
SELECT assert_exists((SELECT {A, B} {name, comment}));

... then we trip an assertion. But if we remove that assertion, then we get

[
  {"comment": "second", "name": "update-test2"},
  {"comment": "second", "name": "update-test2"}
]

with both objects reflecting the old data, since we don't use the overlay.
But if we did arrange to use the overlay, then both objects would reflect the new data, which is also wrong!

Getting this right in cases like this seems pretty tricky. Possible approaches:

  1. Some sort of dynamic provenance tracking
  2. Reusing materialization to track any data that might be output

I'm not sure if option 2 scales to dealing with links that are also changed, and the like, though.

This issue seems very tricky to get right.

msullivan added a commit that referenced this issue Oct 12, 2021
msullivan added a commit that referenced this issue Oct 13, 2021
msullivan added a commit that referenced this issue Oct 13, 2021
@1st1 1st1 added this to To do in 1.0 via automation Oct 22, 2021
msullivan added a commit that referenced this issue May 24, 2022
Many of these were found by putting a trivial access policy on
std::Object and running the test suite.

 * Disable overlays when compiling type rewrites: they aren't
   really in scope.
 * Work around issue #3030 by looking through assert_exists
   when looking for DML.
 * Fix a bad interaction with materialization that can generate
   invalid queries
 * Fix an issue with UNLESS CONFLICT by compiling constraints
   without query rewrites enabled to avoid introducing extra
   dependencies that broke our constraint key detection
 * Fix ordering in some tests that broke when the
   std::Object policy perturbed their ordreing
msullivan added a commit that referenced this issue May 24, 2022
Many of these were found by putting a trivial access policy on
std::Object and running the test suite.

 * Disable overlays when compiling type rewrites: they aren't
   really in scope.
 * Work around issue #3030 by looking through assert_exists
   when looking for DML.
 * Fix a bad interaction with materialization that can generate
   invalid queries
 * Fix an issue with UNLESS CONFLICT by compiling constraints
   without query rewrites enabled to avoid introducing extra
   dependencies that broke our constraint key detection
 * Fix ordering in some tests that broke when the
   std::Object policy perturbed their ordreing
@msullivan msullivan added bug under discussion Not ready for implementation compiler labels Sep 20, 2022
msullivan added a commit that referenced this issue Sep 28, 2023
Because of how casts are inserted in the func code, this required some
tweaking to casts:
 - The empty array cast needs to be able to look through a Set in
   order to be efficient
 - The empty set to object cast needs to be able to look through a
   Set in order to not generate an IR cast that messes things up
   because it doesn't provide a source and thus causes the #3030
   overlay issue to pop up.
msullivan added a commit that referenced this issue Sep 28, 2023
As we've discussed (and, tragically, recommended to users), we can
rewrite the conditionals into for loops:
    for b in COND union (
      {
	(for _ in (select () filter b) union (IF_BRANCH)),
	(for _ in (select () filter not b) union (ELSE_BRANCH)),
      }
    )

The main fiddly part is preserving the correct
cardinality/multiplicity inference. I did this by adding a
card_inference_override field to Set that specifies another set that
should determine the cardinality.

I don't love this, but I haven't thought of a cleaner approach that
doesn't give up the benefits of the desugaring approach.

We need more testing but I wanted to get something up for people to
look at / we can catch up on testing after the feature freeze if
needed.

Fixes #4437.

* Support writing a bare {} in the else branch

Because of how casts are inserted in the func code, this required some
tweaking to casts:
 - The empty array cast needs to be able to look through a Set in
   order to be efficient
 - The empty set to object cast needs to be able to look through a
   Set in order to not generate an IR cast that messes things up
   because it doesn't provide a source and thus causes the #3030
   overlay issue to pop up.
msullivan added a commit that referenced this issue Oct 3, 2023
Queries like:
```
with noobs := {
  ((insert Person { name := "foo" }), "bar"),
  ((insert Person { name := "spam" }), "eggs"),
},
select noobs;
```
would return an empty set, since (for annoying reasons that could
possibly be avoided in this simple case but not in the general case),
we need to do an `ensure_source_rvar` on `noobs`, but
`get_nearest_dml_stmt` is highly limited and can only find one DML
statement. This means we can't produce the proper overlay that
includes both Persons.

Upgrade `get_nearest_dml_stmt` to be able to collect all of the
relevant DML sources.

Fixes #6057. That bug was just a special case of #3030; this fix thus
represents a combination of forward progress on #3030 as well as some
lateral movement: as discussed in that issue, some queries are now
*differently* wrong (see test_edgeql_update_union_overlay_02, which
previously would have returned the old value twice instead of the
new value twice). I am going to close #3030 in favor of an
updated #6222 once this is merged.
1.0 automation moved this from To do to Done Oct 4, 2023
msullivan added a commit that referenced this issue Oct 4, 2023
)

Queries like:
```
with noobs := {
  ((insert Person { name := "foo" }), "bar"),
  ((insert Person { name := "spam" }), "eggs"),
},
select noobs;
```
would return an empty set, since (for annoying reasons that could
possibly be avoided in this simple case but not in the general case),
we need to do an `ensure_source_rvar` on `noobs`, but
`get_nearest_dml_stmt` is highly limited and can only find one DML
statement. This means we can't produce the proper overlay that
includes both Persons.

Upgrade `get_nearest_dml_stmt` to be able to collect all of the
relevant DML sources.

Fixes #6057. That bug was just a special case of #3030; this fix thus
represents a combination of forward progress on #3030 as well as some
lateral movement: as discussed in that issue, some queries are now
*differently* wrong (see test_edgeql_update_union_overlay_02, which
previously would have returned the old value twice instead of the
new value twice). I am going to close #3030 in favor of an
updated #6222 once this is merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug compiler under discussion Not ready for implementation
Projects
1.0
  
Done
Development

Successfully merging a pull request may close this issue.

1 participant