Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DML in IF/THEN/ELSE expressions #6181

Merged
merged 5 commits into from
Sep 28, 2023
Merged

Support DML in IF/THEN/ELSE expressions #6181

merged 5 commits into from
Sep 28, 2023

Conversation

msullivan
Copy link
Member

@msullivan msullivan commented Sep 27, 2023

As we've discussed (and, tragically, recommended to users), we can
rewrite the conditionals into for loops:

    for b in COND union (
      {
	(for _ in (select () filter b) union (IF_BRANCH)),
	(for _ in (select () filter not b) union (ELSE_BRANCH)),
      }
    )

The main fiddly part is preserving the correct
cardinality/multiplicity inference. I did this by adding a
card_inference_override field to Set that specifies another set that
should determine the cardinality.

I don't love this, but I haven't thought of a cleaner approach that
doesn't give up the benefits of the desugaring approach.

We need more testing but I wanted to get something up for people to
look at / we can catch up on testing after the feature freeze if
needed.

Fixes #4437.

As we've discussed (and, tragically, recommended to users), we can
rewrite the conditionals into for loops:
    for b in COND union (
      {
	(for _ in (select () filter b) union (IF_BRANCH)),
	(for _ in (select () filter not b) union (ELSE_BRANCH)),
      }
    )

The main fiddly part is preserving the correct
cardinality/multiplicity inference. I did this by adding a
card_inference_override field to Set that specifies another set that
should determine the cardinality.

I don't love this, but I haven't thought of a cleaner approach that
doesn't give up the benefits of the desugaring approach.

We need more testing but I wanted to get something up for people to
look at / we can catch up on testing after the feature freeze if
needed.

Fixes #4437.
Because of how casts are inserted in the func code, this required some
tweaking to casts:
 - The empty array cast needs to be able to look through a Set in
   order to be efficient
 - The empty set to object cast needs to be able to look through a
   Set in order to not generate an IR cast that messes things up
   because it doesn't provide a source and thus causes the #3030
   overlay issue to pop up.
@msullivan
Copy link
Member Author

msullivan commented Sep 28, 2023

Follow-up tasks:

  • Support DML in coalescing ??
  • Don't generate conflict CTEs for potential conflicts between arms if the IF?
  • Optimize out an empty set from the union?

Comment on lines 310 to 322
def _compile_dml_ifelse(
ir: irast.Set, *, ctx: context.ContextLevel) -> irast.Set:
"""Transform an IF/ELSE that contains DML into FOR loops

The basic approach is to extract the pieces from the if/then/else and
rewrite them into:
for b in COND union (
{
(for _ in (select () filter b) union (IF_BRANCH)),
(for _ in (select () filter not b) union (ELSE_BRANCH)),
}
)
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach. At first it seemed really hacky, but when thinking about it, this will probably compile to very similar SQL as it would if implemented iterator expressions in pg compiler for IfElse too.

Can you post some generated SQL?

Copy link
Member Author

@msullivan msullivan Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is select if <bool>$0 then (insert X { foo := "!" }) else {};:

    WITH 
    "iter~1" AS 
        ((
            FROM LATERAL (
                SELECT/*<pg.SelectStmt at 0x7fe9b4544e10>*/
                    ($1)::bool AS "expr~12_value~1",
                    edgedb.uuid_generate_v4() AS "expr~12_identity~1"
            ) AS "bool~2"
            SELECT/*<pg.SelectStmt at 0x7fe9b447e7d0>*/
                "bool~2"."expr~12_identity~1" AS "bool_identity~1",
                "bool~2"."expr~12_value~1" AS "bool_value~1"
        )),
    "iter~4" AS 
        ((
            FROM "iter~1" AS "iter~3"
            CROSS JOIN LATERAL (
                FROM LATERAL (
                    SELECT/*<pg.SelectStmt at 0x7fe9b44a93d0>*/
                        "iter~3"."bool_identity~1" AS "bool_identity~2"
                    WHERE
                        ("iter~3"."bool_identity~1" IS NOT NULL)
                ) AS "expr-15~2"
                SELECT/*<pg.SelectStmt at 0x7fe9b44ab750>*/
                    edgedb.uuid_generate_v4() AS "tuple_identity~1"
                WHERE
                    ("iter~3"."bool_value~1" AND 
                     ("expr-15~2"."bool_identity~2" IS NOT NULL))
            ) AS "tuple<>~2"
            SELECT/*<pg.SelectStmt at 0x7fe9b447f310>*/
                "iter~3"."bool_identity~1" AS "bool_identity~3",
                "tuple<>~2"."tuple_identity~1" AS "tuple_identity~2"
        )),
    "ins_contents~1" AS 
        ((
            FROM "iter~4" AS "iter~6"
            CROSS JOIN LATERAL (
                SELECT/*<pg.SelectStmt at 0x7fe9b44cb590>*/
                    ($2)::text AS "expr~1_value~1"
            ) AS "str_foo~2"
            CROSS JOIN LATERAL (
                FROM LATERAL (
                    FROM LATERAL (
                        FROM LATERAL (
                            SELECT/*<pg.SelectStmt at 0x7fe9b4532fd0>*/
                                edgedb.uuid_generate_v1mc() AS "expr~3_value~1"
                            WHERE
                                ("iter~6"."tuple_identity~2" IS NOT NULL)
                        ) AS "q~1"
                        SELECT/*<pg.SelectStmt at 0x7fe9b450aa10>*/
                            "q~1"."expr~3_value~1" AS "expr~3_value~2"
                    ) AS "expr-3~2"
                    SELECT/*<pg.SelectStmt at 0x7fe9b44e6ad0>*/
                        "expr-3~2"."expr~3_value~2" AS "expr~3_value~3"
                ) AS "expr-4~2"
                SELECT/*<pg.SelectStmt at 0x7fe9b44e5250>*/
                    "expr-4~2"."expr~3_value~3" AS "expr~4_value~1"
            ) AS "uuid_id~2"
            SELECT/*<pg.SelectStmt at 0x7fe9b44c8cd0>*/
                "str_foo~2"."expr~1_value~1" AS "foo",
                "uuid_id~2"."expr~4_value~1" AS id,
                "iter~6"."tuple_identity~2" AS "tuple_identity~3"
        )),
    "ins~1" AS 
        (INSERT INTO edgedbpub."default::X" AS "X~1"
            ("foo", id)
            ((
                FROM "ins_contents~1" AS "ins_contents~2"
                SELECT/*<pg.SelectStmt at 0x7fe9b44e5210>*/
                    "foo",
                    id
            ))
            RETURNING
                "X~1".id AS "X_identity~1"),
    "m~1" AS 
        ((
            FROM "iter~4" AS "iter~7"
            CROSS JOIN "ins~1" AS "ins~2"
            JOIN "ins_contents~1" AS "ins_contents~2"
                ON (("ins~2"."X_identity~1" = "ins_contents~2".id) AND 
                 ("iter~7"."tuple_identity~2" = "ins_contents~2"."tuple_identity~3"))
            SELECT/*<pg.SelectStmt at 0x7fe9b450a310>*/
                "iter~7"."tuple_identity~2" AS "tuple_identity~4",
                "ins~2"."X_identity~1" AS "X_value~1"
        ))
    FROM (
        FROM LATERAL (
            FROM "iter~1" AS "iter~2"
            CROSS JOIN LATERAL (
                FROM "iter~4" AS "iter~5"
                CROSS JOIN LATERAL (
                    FROM LATERAL (
                        FROM LATERAL (
                            FROM "m~1" AS "m~2"
                            SELECT/*<pg.SelectStmt at 0x7fe9b454ec50>*/
                                "m~2"."X_value~1" AS "X_value~2",
                                "iter~5"."tuple_identity~2" AS "tuple_identity~5"
                            WHERE
                                ("iter~5"."tuple_identity~2" = "m~2"."tuple_identity~4")
                        ) AS "expr-10~2"
                        SELECT/*<pg.SelectStmt at 0x7fe9b44b3590>*/
                            "expr-10~2"."X_value~2" AS "expr~10_value~1",
                            json_build_object('id', "expr-10~2"."X_value~2") AS "expr~10_serialized~1"
                        WHERE
                            ("iter~5"."tuple_identity~2" = "expr-10~2"."tuple_identity~5")
                    ) AS "expr-11~2"
                    SELECT/*<pg.SelectStmt at 0x7fe9b44b0e10>*/
                        "expr-11~2"."expr~10_value~1" AS "expr~11_value~1",
                        "expr-11~2"."expr~10_serialized~1" AS "expr~11_serialized~1"
                ) AS "expr-16~2"
                SELECT/*<pg.SelectStmt at 0x7fe9b4544990>*/
                    "expr-16~2"."expr~11_value~1" AS "expr~16_value~1",
                    "expr-16~2"."expr~11_serialized~1" AS "expr~16_serialized~1"
                WHERE
                    ("iter~2"."bool_identity~1" = "iter~5"."bool_identity~3")
            ) AS "expr-17~2"
            SELECT/*<pg.SelectStmt at 0x7fe9b447d810>*/
                "expr-17~2"."expr~16_value~1" AS "expr~17_value~1",
                "expr-17~2"."expr~16_serialized~1" AS "expr~17_serialized~1"
        ) AS "expr-18~2"
        SELECT/*<pg.SelectStmt at 0x7fe9b447cf90>*/
            to_json("expr-18~2"."expr~17_serialized~1") AS "expr~18_serialized~1"
    ) AS "aggw~1"
    SELECT/*<pg.SelectStmt at 0x7fe9b447c590>*/
        COALESCE(json_agg("expr~18_serialized~1"), '[]')

iter~1 is the outer iter that computes the condition and iter~4 uses it to generate an iterator that fires once if the condition is true and zero times otherwise.

The big suboptimality in this generated code is the iterators get joined into the main result query, when really it shouldn't need to be.

This is because only DML and the IN clauses of DML-containing FORs get compiled into CTEs, and a FOR might contain both a DML and some non-DML that references the iterator, like:

for x in expr union (x, (insert Foo { x := x}))

It should be possible to do an optimization to elide joining in the iterator CTEs in the main body when they aren't needed, though. We could probably do something hacky where these desugaring passes set a flag in the iterator, but it should also be possible to do an analysis to see if the iterator is used outside of DML in the body. (But probably that optimization isn't happening for 4.x)

@msullivan msullivan merged commit ea947dd into master Sep 28, 2023
22 checks passed
@msullivan msullivan deleted the if-else-cond branch September 28, 2023 20:13
msullivan added a commit that referenced this pull request Sep 29, 2023
The basic approach is to extract the pieces from the ?? and
rewrite them into:
    for optional x in (LHS,) union (
      {
	x.0,
	(for _ in (select () filter not exists x) union (RHS)),
      }
    )

Optional for is needed because the LHS needs to be bound in a for
in order to get put in a CTE and only executed once, but the RHS
needs to be dependent on the LHS being empty.

We hackily wrap the LHS in a 1-ary tuple and then project it back
out because the OPTIONAL FOR implementation doesn't properly
handle object-type iterators. OPTIONAL FOR relies on having a
non-NULL identity ref but objects use their actual id, which
will be NULL.

Follow-up to #6181. Like #6181, needs more testing.

One of the tests I wrote failed because of #6057, which I'm working on
fixing also.
msullivan added a commit that referenced this pull request Sep 30, 2023
The basic approach is to extract the pieces from the ?? and
rewrite them into:
    for optional x in (LHS,) union (
      {
	x.0,
	(for _ in (select () filter not exists x) union (RHS)),
      }
    )

Optional for is needed because the LHS needs to be bound in a for
in order to get put in a CTE and only executed once, but the RHS
needs to be dependent on the LHS being empty.

We hackily wrap the LHS in a 1-ary tuple and then project it back
out because the OPTIONAL FOR implementation doesn't properly
handle object-type iterators. OPTIONAL FOR relies on having a
non-NULL identity ref but objects use their actual id, which
will be NULL.

Follow-up to #6181. Like #6181, needs more testing.

One of the tests I wrote failed because of #6057, which I'm working on
fixing also.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider allowing DML in conditionals
3 participants