Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array comparisons behavior changed #3476

Closed
beikov opened this issue Apr 5, 2022 · 10 comments
Closed

Array comparisons behavior changed #3476

beikov opened this issue Apr 5, 2022 · 10 comments

Comments

@beikov
Copy link

beikov commented Apr 5, 2022

I'm testing with H2 2.1.210

    create table table_with_boolean_arrays (
       id bigint not null,
        the_array boolean array,
        primary key (id)
    )
    INSERT 
    INTO
        table_with_boolean_arrays
        (id, the_array) 
    VALUES
        ( 1 , ARRAY[false,null,true] )

Querying like this

SELECT * FROM table_with_boolean_arrays t WHERE the_array = ARRAY[false,null,true]

gives no results. Note that the same works on PostgreSQL just fine and in H2 1.4 also when changing from boolean array to the untyped array.

I noticed that switching to the distinct operator makes this work

SELECT * FROM table_with_boolean_arrays t WHERE the_array IS NOT DISTINCT FROM ARRAY[false,null,true]

So it seems null semantics are taken into account although they shouldn't.

@katzyn
Copy link
Contributor

katzyn commented Apr 6, 2022

From the SQL Standard, part 2, section 4.10.4 Collection comparison and assignment:

In the case of comparison of two arrays C and D, the elements are compared pairwise in element order. C = D is True if and only if C and D have the same cardinality and every pair of elements are equal.

NULL is neither equal nor not equal to NULL, so you can't get TRUE from such comparison in standard-compliant database system, but you can use a distinct predicate (see General rules of <distinct predicate> in the Standard).

H2 doesn't emulate incompatibilities with the Standard in that area in its compatibility modes. Collection data types from PostgreSQL definitely have some and yours at least consistent, with row value data type result of comparison in PostgreSQL depends on source of arguments, they are processed differently in literal = literal and literal = column.

@katzyn katzyn closed this as completed Apr 6, 2022
@beikov
Copy link
Author

beikov commented Apr 6, 2022

For reference, it doesn't seem to matter to PostgreSQL if the array is a literal or comes from a column. The following is true as well SELECT array[1, NULL, 1] = array[1, NULL, 1]

Not sure what you mean by "consistent with row value". The following results in a NULL result SELECT row(1, NULL, 1) = row(1, NULL, 1), so it seems NULL = NULL is only true in PostgreSQL when it is part of an array.

@beikov
Copy link
Author

beikov commented Apr 6, 2022

I just had a chat with some folks from the PostgreSQL community and they seem to be sure that PostgreSQL is behaving SQL standard compliant. Let me cite from the discussion

I have the following:
Let V1 be the FIRST VALUE and let V2 be the SECOND VALUE in an application of the General Rules of this Subclause. NOTE 378 — This Subclause is invoked implicitly wherever the word identical is used of two values.
Case:
If V1 and V2 are both the null value, then V1 is identical to V2.
If V1 is the null value and V2 is not the null value, or if V1 is not the null value and V2 is the null value, then V1 is not identical to V2.

Identical definition is on section 9.10 from SQL Standard 2016 part 2

and the quote about identical arrays is on section 4.10.2

So overall, it seems to me this is in fact a bug in H2

@katzyn
Copy link
Contributor

katzyn commented Apr 6, 2022

NULL is really identical to NULL:

If V1 and V2 are both the null value, then V1 is identical to V2.

But NULL is neither equal nor not equal to NULL, equality comparison between NULL values returns an UNKNOWN (NULL) value:

If at least one of XV and YV is the null value, then
X <comp op> Y
is Unknown.

To check array equality we need to check equality of its elements:

and every pair of elements are equal.

So, behavior of H2 seems so be correct, H2 returns UNKNOWN / NULL as expected.
@lukaseder
Could you confirm that expectation?


Situation with row values in PostgreSQL is even more complicated:

CREATE TYPE D AS(A INT, B INT);
CREATE TABLE T(V D);
INSERT INTO T VALUES (ROW(1, NULL));
SELECT V, (1, NULL::INT), V = (1, NULL::INT), (1, NULL::INT) = (1, NULL::INT) FROM T;
  v        | row       | ?column? | ?column? 
-----------+-----------+----------+----------
 (1, null) | (1, null) | t        | null

Of course, someone may say that this data type is not a real row value data type (PostgreSQL supports row value literals, but doesn't support standard-compliant row data type definitions) and this vendor-specific syntax may produce a data type with different comparison rules, but it looks weird anyway.

H2 handles them in consistent way:

CREATE TABLE T(V ROW(A INT, B INT));
INSERT INTO T VALUES ROW(1, NULL);
SELECT V, (1, NULL), V = (1, NULL), (1, NULL) = (1, NULL) FROM T;
> V             | ROW (1, NULL) | V = ROW (1, NULL) | UNKNOWN
> --------------+---------------+-------------------+--------
> ROW (1, null) | ROW (1, null) | null              | null

@lukaseder
Copy link
Contributor

Thanks for the ping, @katzyn. I can't access the linked slack channel by @beikov. Weird, why use a closed (to domains other than the ones listed) slack channel rather than the usual mailing lists?

Anyway. ISO/IEC 9075-2:2016(E) 8.2 <comparison predicate> GR 1) b) ii) is quite clear. H2 is right, PostgreSQL and HSQLDB are both wrong. I've posted a message to the pgsql-bugs list:
https://www.postgresql.org/message-id/CAB4ELO7afJgQfZoQfqfMBA7Zk1AdWRkZ9mUN5jpTZupurQTRsA%40mail.gmail.com

Whether that's worth fixing in PostgreSQL is one thing, but I think H2 is doing it right, and there's no need to take action here.

@lukaseder
Copy link
Contributor

Weird, why use a closed (to domains other than the ones listed) slack channel rather than the usual mailing lists?

For the record, that isn't criticism of you, @beikov, just my surprise that such closed channels even exist.

@beikov
Copy link
Author

beikov commented Apr 6, 2022

slack channel rather than the usual mailing lists?

It's not closed, but Slack requires that people are invited. You can join here: https://postgres-slack.herokuapp.com/

@beikov
Copy link
Author

beikov commented Apr 6, 2022

By the way, the null behavior regarding composite types is actually documented: https://www.postgresql.org/docs/current/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON

@lukaseder
Copy link
Contributor

Yeah, I guess the array comparison behaviour should be documented there as well

@katzyn
Copy link
Contributor

katzyn commented Apr 6, 2022

Thanks.

Their explanation of that behavior is the same as for row values. I don't think that ordered index with ordered nulls inside composite values really cannot be used for standard-compliant comparison operator (in H2 it can, both distinct and equality with their very different rules about nulls work with indexes too), but I understand why they don't want to touch that logic.

How implementations of JPA or other persistence technologies should deal with these deviations between database systems is an interesting question. But semantics of NULL is very different between SQL and the most of regular programming languages by itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants