Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PostgreSQL array created using array internal type name #659

Merged

Conversation

3 participants
@guyco33
Copy link
Contributor

commented Apr 23, 2019

Fixing failure when selecting Postgres Array column that was created with native type name (eg: _int4)

Postgres: create table t (a _int4);
Presto: select a from t;

io.prestosql.spi.PrestoException: No array dimensions for ARRAY type: JdbcTypeHandle{jdbcType=2003, jdbcTypeName=_int4, columnSize=10, decimalDigits=0}
	at io.prestosql.plugin.postgresql.PostgreSqlClient.lambda$null$0(PostgreSqlClient.java:248)
	at java.util.Optional.orElseThrow(Optional.java:290)
	at io.prestosql.plugin.postgresql.PostgreSqlClient.lambda$toPrestoType$1(PostgreSqlClient.java:248)

Explanation:
pg_attribute.attndims is 0 when creating columns with the native type names

create table t (a _int4); --> pg_attribute.attndims = 0 for column a
create table t (a int[]); --> pg_attribute.attndims = 1 for column a

Fix potential issue for #317

@cla-bot cla-bot bot added the cla-signed label Apr 23, 2019

@guyco33 guyco33 force-pushed the guyco33:fix_support_for_postgersql_arrays branch from 22731d2 to 9d40b66 Apr 25, 2019

@findepi

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

@guyco33 i checked in Postgres 11.2 and confirmed your findings:

test=# create table t(a int4[], b _int4);
CREATE TABLE
test=# \d t
                  Table "public.t"
 Column |   Type    | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
 a      | integer[] |           |          |
 b      | integer[] |           |          |
test=# SELECT attname,  attndims FROM pg_attribute att JOIN pg_class tbl ON tbl.oid = att.attrelid WHERE tbl.relname = 't';
 attname  | attndims
----------+----------
...
 a        |        1
 b        |        0

However

@findepi

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

@guyco33 for now, let's get #668 in.

let's discuss whether we need this change or not.
i found an old thread (https://www.postgresql.org/message-id/8C5B026B51B6854CBE88121DBF097A8651DB95%40ehost010-33.exch010.intermedia.net)
and i am under impression that we don't and this is on Postgres side.

But maybe there is something newer that I should refer to?
Especially -- is there a bug/issue in Postgres tracking undesirable difference in behavior between eg int4[] and _int4?

@guyco33

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2019

  • if i understand correctly and int4[], _int4 should be treated equivalent in Postgres, they should also have the same attndims.
    Is this a bug in Postgres? Is it reported?

@findepi Maybe attndims has other internal purpose. For example to be able to reproduce the original DDL statement.

@guyco33

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2019

let's discuss whether we need this change or not.
i found an old thread (https://www.postgresql.org/message-id/8C5B026B51B6854CBE88121DBF097A8651DB95%40ehost010-33.exch010.intermedia.net)
and i am under impression that we don't and this is on Postgres side.

I think that our main motivation should be to map all Postgres array columns although they might be created with native names - I guess that there are plenty of automated tools that construct Postgres tables from their native names and we might loose them without this change (I encountered lots of Postgres sources with such tables)

But maybe there is something newer that I should refer to?
Especially -- is there a bug/issue in Postgres tracking undesirable difference in behavior between eg int4[] and _int4?

From https://www.postgresql.org/docs/11/catalog-pg-attribute.html we can learn that attndims is not enforced:
Presently, the number of dimensions of an array is not enforced, so any nonzero value effectively means “it's an array”.

@findepi

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

@martint

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

@findepi

This comment has been minimized.

Copy link
Member

commented Apr 26, 2019

And a follow up: https://www.postgresql.org/message-id/32678.1556228173%40sss.pgh.pa.us

In particular:

the Postgres type system doesn't distinguish
arrays of different numbers of dimensions, ie, int4[][] is not
really different from int4[]. So you can't attach any very strong
meaning to attndims = 2 vs attndims = 1.

I therefore propose that we go with #682

@findepi

This comment has been minimized.

Copy link
Member

commented Apr 26, 2019

@guyco33 let's discuss under #682

@findepi
Copy link
Member

left a comment

Now that we agreed to the way forward in #682 (comment), I feel this is a good change -- let's treat arrays with undeclared dimensions as single-dimension arrays.

I have some review comments.
Also, please rebase to accommodate for my #687.

Please change the commit message to something like

Support PostgreSQL array created using array internal type name
" JOIN pg_class tbl ON tbl.oid = att.attrelid " +
" JOIN pg_namespace ns ON tbl.relnamespace = ns.oid " +
"WHERE ns.nspname = ? " +
"AND tbl.relname = ? " +
"AND att.attndims > 0 ";
"AND CASE WHEN att.attndims = 0 AND attyp.typcategory = 'A' THEN 1 ELSE att.attndims END > 0 ";

This comment has been minimized.

Copy link
@findepi

findepi May 7, 2019

Member

Would this be following be enough

Suggested change
"AND CASE WHEN att.attndims = 0 AND attyp.typcategory = 'A' THEN 1 ELSE att.attndims END > 0 ";
"AND attyp.typcategory = 'A' ";

(if not sufficient, then this should be AND (attyp.typcategory = 'A' OR att.attndims > 0), but I would strongly not have such an OR until I understand why one would be needed)

?

This comment has been minimized.

Copy link
@guyco33

guyco33 May 21, 2019

Author Contributor

Seems that your suggestion is sufficient unless there are cases that attndims is NULL.
Anyway, selecting COALESCE(att.attndims,1) will cover this as well.

@@ -198,13 +198,14 @@ protected ResultSet getTables(Connection connection, Optional<String> schemaName
throws SQLException
{
String sql = "" +
"SELECT att.attname, att.attndims " +
"SELECT att.attname, CASE WHEN att.attndims = 0 AND attyp.typcategory = 'A' THEN 1 ELSE att.attndims END AS attndims " +

This comment has been minimized.

Copy link
@findepi

findepi May 7, 2019

Member

Would that work?

Suggested change
"SELECT att.attname, CASE WHEN att.attndims = 0 AND attyp.typcategory = 'A' THEN 1 ELSE att.attndims END AS attndims " +
"SELECT att.attname, max(att.attndims, 1) " +

This comment has been minimized.

Copy link
@guyco33

guyco33 May 21, 2019

Author Contributor

Should be greatest(att.attndims, 1)

@@ -279,6 +279,19 @@ public void testArray()
.execute(getQueryRunner(), prestoCreateAsSelect("test_array_parameterized_varchar_unicode"));

This comment has been minimized.

Copy link
@findepi

findepi May 7, 2019

Member

Please rebase. You will need to remove testInternalArray.

@@ -279,6 +279,19 @@ public void testArray()
.execute(getQueryRunner(), prestoCreateAsSelect("test_array_parameterized_varchar_unicode"));
arrayVarcharUnicodeDataTypeTest(TestPostgreSqlTypeMapping::postgresArrayDataType)
.execute(getQueryRunner(), postgresCreateAndInsert("tpch.test_array_parameterized_varchar_unicode"));

JdbcSqlExecutor jdbcSqlExecutor = new JdbcSqlExecutor(postgreSqlServer.getJdbcUrl());
jdbcSqlExecutor.execute("CREATE TABLE tpch.test_arr(text_arr _text, int_arr _int4)");

This comment has been minimized.

Copy link
@findepi

findepi May 7, 2019

Member

Use DataTypeTest with arrayDataType(integerDataType(), "_int4")

This comment has been minimized.

Copy link
@guyco33

guyco33 May 22, 2019

Author Contributor

Is there away to use DataTypeTest with a predefined table?
_int4 can't be created by CTAS from presto.

This comment has been minimized.

Copy link
@findepi

findepi May 23, 2019

Member

Please see how TestPostgreSqlTypeMapping#postgresCreateAndInsert is used. When using this method, the table is created and populated directly in PostgreSQL.

This comment has been minimized.

Copy link
@guyco33

guyco33 May 24, 2019

Author Contributor

Got it. Thanks!

@guyco33 guyco33 changed the title Fix support for PostgreSQL arrays Support PostgreSQL array created using array internal type name May 22, 2019

@guyco33 guyco33 force-pushed the guyco33:fix_support_for_postgersql_arrays branch from 9d40b66 to bd6c8c0 May 22, 2019

@guyco33 guyco33 force-pushed the guyco33:fix_support_for_postgersql_arrays branch from bd6c8c0 to c9e057a May 24, 2019

@findepi findepi merged commit d8f0650 into prestosql:master Jun 4, 2019

2 checks passed

Travis CI - Pull Request Build Passed
Details
verification/cla-signed
Details
@findepi

This comment has been minimized.

Copy link
Member

commented Jun 4, 2019

Merged, thanks!

@findepi findepi referenced this pull request Jun 4, 2019

Closed

Release notes for 314 #879

2 of 6 tasks complete

@findepi findepi added this to the 314 milestone Jun 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.