Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use @> for std::contains with array parameters #4776

Merged
merged 4 commits into from Mar 31, 2023

Conversation

stealth-toucan
Copy link
Contributor

At the moment, EdgeDB uses array_position in std::contains when the haystack is an array. This change modifies std::contains to use the @> operator; this improves performance in the presence of indexes that support the array_ops operator class.

@edgedb-cla
Copy link

edgedb-cla bot commented Dec 5, 2022

All commit authors signed the Contributor License Agreement.
CLA signed

@msullivan
Copy link
Member

This looks reasonable.

Did you test if it seems to help with indexes?

@elprans
Copy link
Member

elprans commented Dec 7, 2022

Yeah @> is part of array_ops operator class which is indexable by GIN. So, this wouldn't immediately help with the default b-tree index type we currently use, but will help once @vpetrovykh's work on exposing additional index types lands.

@stealth-toucan
Copy link
Contributor Author

Support for GIN indexes was requested accordingly, see my comment in #4691. The initial edb/lib/std/80-indexes.edgeql did not account for this use case. You may also want to consider BRIN indexes.

@msullivan
Copy link
Member

Another question here is: will postgres use a GIN index in response to a @> inside a function call, or will we need to special case it in our compile to take advantage of this. I guess postgres can probably inline the function? Would you be able to test that? Otherwise we'll need to take a different approach here

@stealth-toucan
Copy link
Contributor Author

stealth-toucan commented Dec 8, 2022

When needle is an array, postgres will inline the function and take advantage of the index. When needle is an array element, postgres will fall back to a sequential scan; this has to do with postgres' criteria for inlining strict functions.

Unfortunately, I don't see a way to finagle inlining using SQL alone, but inlining can be achieved by making the function non-strict (SET impl_is_strict := false;). Technically we can write a strict function for this, but postgres will refuse to recognize it as such for the purposes of inlining because of decisions made many years ago.

All of the above have been experimentally verified by manipulating the postgres backend of a live EdgeDB instance.

How would you like to proceed?

@stealth-toucan
Copy link
Contributor Author

Perhaps this can be dealt with in two phases. Would it be possible to initially merge the variant of std::contains where the needle is an array, until a better solution can be found for the array element case? This variant will be inlined by postgres without modification and GIN indexes (if any) will be used.

@msullivan
Copy link
Member

Sorry for not following up sooner. Thank you very much for your analysis and PR, it made this very easy on my side.

What I am going to do is:

  • Put up another PR that fixes the one downside that SET impl_is_strict := false; has (that it means the function can't be used in constraints/indexes)
  • Add SET impl_is_strict := false; to the element-wise std::contains and merge this PR

@msullivan
Copy link
Member

Oh, wait, actually, there is one snag: the behavior @> on arrays in postgres is to check each element for membership individually, not to check whether it is a subsequence, which is inconsistent with the behavior of std::contains on str and bytes.

@stealth-toucan is the array-containing version important to your use cases if the element version exists?

It seems like a pretty weird function, so my inclination is that it's not as important to support directly.
We could also include it under the name contains_all or some such, though.

msullivan added a commit that referenced this pull request Mar 30, 2023
Wrap them using a CASE statement.

Motivation here is to make it so that #4776 can set `impl_is_strict :=
false` on `std::contains` for arrays without breaking anything.
msullivan added a commit that referenced this pull request Mar 30, 2023
Wrap them using a CASE statement.

Motivation here is to make it so that #4776 can set `impl_is_strict :=
false` on `std::contains` for arrays without breaking anything.
@msullivan msullivan self-assigned this Mar 30, 2023
@msullivan
Copy link
Member

I dropped the std::contains(haystack: array<anytype>, needle: array<anytype>) version because it is inconsistent with the existing behavior of contains. It could probably be introduced with a name like contains_all, but I haven't done that because it seems a little marginal (and because I don't want to spend time writing tests for a new function right now :P).

@stealth-toucan if having that version of the function is important to you, could you please file another issue (or even open a new PR, with tests)?

@msullivan
Copy link
Member

msullivan commented Mar 31, 2023

Oh, annoyingly making contains not strict actually does still cause some more problems, because scalar constraints will be apparently still be evaluated on NULL values. I think there we actually want to always filter NULLs out, since it would be pretty nonsensical to be able to write a scalar constraint that coalesces or uses ?= or whatnot.

It's also a problem with pointer constraints but there I think we do want it to understand

@msullivan
Copy link
Member

Alright, I ended up actually needing to do some stuff to get it to work

@msullivan msullivan requested a review from elprans March 31, 2023 19:50
edb/edgeql/compiler/stmtctx.py Outdated Show resolved Hide resolved
Co-authored-by: Elvis Pranskevichus <elvis@magic.io>
@msullivan msullivan merged commit be06d06 into edgedb:master Mar 31, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants