Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some bugs involving union and coalescing of optional values #6590

Merged
merged 5 commits into from Dec 13, 2023

Conversation

msullivan
Copy link
Member

There are a number of places that improperly handle a set that has
multiple NULL values in it:

  • INSERT on single values
  • UPDATE on single values
  • assert_single

The assert_single case is a long standing bug that can be triggered
without much trouble. The INSERT/UPDATE bugs are new, and stem from
the new ?? on DML, since I think it that was the first thing in the
compiler backend where extra NULLS might be produced for something
with single cardinality.

Fix this by adding null checks at the consumer sites. One of the
consumer sites is set_as_subquery, and I modified update to always use
that.
I also added a helper function to add the null checks, and I'll be
going back and changing lots of other code to use the helper.

We could also flip this around a bit and try saying that we shouldn't
ever produce extra NULLs in a single set, and try to stamp it out on
the producer side, though that seems a little less consistent to me?

Fixes #6438.

There are a number of places that improperly handle a set that has
multiple NULL values in it:
 * INSERT on single values
 * UPDATE on single values
 * assert_single

The assert_single case is a long standing bug that can be triggered
without much trouble. The INSERT/UPDATE bugs are new, and stem from
the new ?? on DML, since I think it that was the first thing in the
compiler backend where extra NULLS might be produced for something
with single cardinality.

Fix this by adding null checks at the consumer sites. One of the
consumer sites is set_as_subquery, and I modified update to always use
that.
I also added a helper function to add the null checks, and I'll be
going back and changing lots of other code to use the helper.

We could also flip this around a bit and try saying that we shouldn't
ever produce extra NULLs in a single set, and try to stamp it out on
the producer side, though that seems a little less consistent to me?

Fixes #6438.
@elprans
Copy link
Member

elprans commented Dec 12, 2023

We could also flip this around a bit and try saying that we shouldn't
ever produce extra NULLs in a single set, and try to stamp it out on
the producer side, though that seems a little less consistent to me?

Why? I have the opposite feeling that stamping out all callers could be a more error-prone whack-a-mole?

@msullivan
Copy link
Member Author

We could also flip this around a bit and try saying that we shouldn't
ever produce extra NULLs in a single set, and try to stamp it out on
the producer side, though that seems a little less consistent to me?

Why? I have the opposite feeling that stamping out all callers could be a more error-prone whack-a-mole?

There is sort of a general question of "when is having NULLs output by a producer permitted" that needs to be answered.

The answer can't be "never", because at the very least we have our optional wrappers, but more generally we want some operations to always produce NULL and for us to be able to elide the optional wrapper in that case. (Object property references, parameters, json casts are current operations where this happens).

  1. In this PR, I took the approach of "whenever the set could be empty", since it seemed an easy and consistent rule. But you're right that it might be whack-a-mole.
  2. Another possibility is "also whenever the set could be empty, but single sets must actually output at most one row whether it is NULL or not". I think I had it in my head that this is a less consistent rule than option 1, but honestly it probably makes more sense. I think this was obeyed by everything until the conditional DML work added constructs that were implemented with unions but might have their cardinality overridden to be single.
  3. Or "no extra NULLs ever, even for multi sets". This is plausible, but would require inserting NULL checks in multi operations in cases where we could not want to insert NULL checks for single operations: We don't want a NULL check for <str><json>$0 but this rule would require a NULL check in <str>{<json>$0, <json>$1}

Alright I have convinced myself that option 2 is the best.
I'll revise the PR.

@msullivan msullivan merged commit 54ca8f2 into master Dec 13, 2023
22 checks passed
@msullivan msullivan deleted the nullables branch December 13, 2023 19:58
aljazerzen pushed a commit that referenced this pull request Dec 15, 2023
There are a number of places that improperly handle a set that has
multiple NULL values in it:
 * INSERT on single values produced by ??
 * UPDATE on single values produced by ??
 * assert_single
 * limit/offset

The assert_single and limit case is a long standing bug that can be triggered
without much trouble. The INSERT/UPDATE bugs are new, and stem from
the new ?? on DML, since I think it that was the first thing in the
compiler backend where extra NULLS might be produced for something
with single cardinality.

Fix the long-standing cases by adding null checks, but fix the
coalesce cases by adding a null check at the producer side.
The rule then, is that there can be NULLs in any produced set
that can be empty, but a single set has to return at most one row
(so it can't have extra NULLs).


Fixes #6438.
msullivan added a commit that referenced this pull request Dec 19, 2023
…#6590)

The original 4.x/5.x version had a lot to do with DML coalescing.

There are a number of places that improperly handle a set that has
multiple NULL values in it:
 * assert_single
 * limit/offset

The assert_single and limit case is a long standing bug that can be triggered
without much trouble.

Fix the long-standing cases by adding null checks, but fix the
coalesce cases by adding a null check at the producer side.
The rule then, is that there can be NULLs in any produced set
that can be empty, but a single set has to return at most one row
(so it can't have extra NULLs).
msullivan added a commit that referenced this pull request Dec 20, 2023
…#6590)

The original 4.x/5.x version had a lot to do with DML coalescing.

There are a number of places that improperly handle a set that has
multiple NULL values in it:
 * assert_single
 * limit/offset

The assert_single and limit case is a long standing bug that can be triggered
without much trouble.

Fix the long-standing cases by adding null checks, but fix the
coalesce cases by adding a null check at the producer side.
The rule then, is that there can be NULLs in any produced set
that can be empty, but a single set has to return at most one row
(so it can't have extra NULLs).
aljazerzen pushed a commit that referenced this pull request Dec 21, 2023
There are a number of places that improperly handle a set that has
multiple NULL values in it:
 * INSERT on single values produced by ??
 * UPDATE on single values produced by ??
 * assert_single
 * limit/offset

The assert_single and limit case is a long standing bug that can be triggered
without much trouble. The INSERT/UPDATE bugs are new, and stem from
the new ?? on DML, since I think it that was the first thing in the
compiler backend where extra NULLS might be produced for something
with single cardinality.

Fix the long-standing cases by adding null checks, but fix the
coalesce cases by adding a null check at the producer side.
The rule then, is that there can be NULLs in any produced set
that can be empty, but a single set has to return at most one row
(so it can't have extra NULLs).


Fixes #6438.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coalesce DML express return more than one Element
3 participants