Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map PostgreSQL JSON, JSONB to Presto JSON #81

Merged
merged 1 commit into from Feb 23, 2019

Conversation

@guyco33
Copy link
Member

guyco33 commented Jan 27, 2019

No description provided.

@findepi findepi requested a review from electrum Jan 27, 2019
@guyco33 guyco33 force-pushed the guyco33:add_support_for_postgres_json branch from d80ab2c to 2e0f193 Feb 1, 2019
@ebyhr

This comment has been minimized.

Copy link
Contributor

ebyhr commented Feb 4, 2019

Whereas I could SELECT json from postgresql using this commit, CTAS failed with below message.
→Resolved by appending ?stringtype=unspecified to connection-url

2019-02-04T20:42:16.812+0545	ERROR	remote-task-callback-19	io.prestosql.execution.StageStateMachine	Stage 20190204_145716_00013_4tx9w.1 failed
io.prestosql.spi.PrestoException: Batch entry 0 INSERT INTO "test"."public"."tmp_presto_c00e7b22b456488693c2b5c1a50793d2" VALUES ('{"customer":"John Doe","items":{"product":"Beer","qty":6}}') was aborted: ERROR: column "c1" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 83  Call getNextException to see other errors in the batch.
	at io.prestosql.plugin.jdbc.JdbcPageSink.finish(JdbcPageSink.java:186)
	at io.prestosql.operator.TableWriterOperator.finish(TableWriterOperator.java:193)
	at io.prestosql.operator.Driver.processInternal(Driver.java:397)
	at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
	at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
	at io.prestosql.operator.Driver.processFor(Driver.java:276)
	at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
	at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
	at io.prestosql.$gen.Presto_null__testversion____20190204_145200_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.BatchUpdateException: Batch entry 0 INSERT INTO "test"."public"."tmp_presto_c00e7b22b456488693c2b5c1a50793d2" VALUES ('{"customer":"John Doe","items":{"product":"Beer","qty":6}}') was aborted: ERROR: column "c1" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 83  Call getNextException to see other errors in the batch.
	at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148)
	at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:50)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2184)
	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:481)
	at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:840)
	at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1538)
	at io.prestosql.plugin.jdbc.JdbcPageSink.finish(JdbcPageSink.java:178)
	... 12 more
Caused by: org.postgresql.util.PSQLException: ERROR: column "c1" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.
  Position: 83
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
	... 16 more
@findepi

This comment has been minimized.

Copy link
Member

findepi commented Feb 4, 2019

@cla-bot cla-bot bot added the cla-signed label Feb 4, 2019
@prestosql prestosql deleted a comment from cla-bot bot Feb 4, 2019
@prestosql prestosql deleted a comment from cla-bot bot Feb 4, 2019
@prestosql prestosql deleted a comment from cla-bot bot Feb 4, 2019
@guyco33

This comment has been minimized.

Copy link
Member Author

guyco33 commented Feb 5, 2019

@ebyhr I already encountered this issue when I first used it to CTAS in postgres catalog and solved it by appending ?stringtype=unspecified to all postgres jdbc urls in the catalog config.
When CTAS into hive catalog I got different error: Unsupported Hive type: json but it makes sense since there is no Json type in hive.

@ebyhr

This comment has been minimized.

Copy link
Contributor

ebyhr commented Feb 5, 2019

Sorry, my update's notication might have not been sent you. I tried same thing as @guyco33 commented, it works in my environment too.

@guyco33 guyco33 force-pushed the guyco33:add_support_for_postgres_json branch 3 times, most recently from 5085008 to a9bad83 Feb 11, 2019
@guyco33

This comment has been minimized.

Copy link
Member Author

guyco33 commented Feb 13, 2019

@findepi Now after merging #109 the pushdown predicate in QueryBuilder (https://github.com/prestosql/presto/pull/109/files#diff-88e8a0a25b51f372e2a50ab027085b33) picks also JSON types (previously it was prevented by QueryBuilder#isAcceptedType) and it fails since JSON is not orderable. I think to make JSON orderable or to skip QueryBuilder#toPredicate for types that are not orderable

Copy link
Member

findepi left a comment

Some initial comments.
We will need @electrum review for in/out conversions around JsonType, but please apply my feedback first. Please do all new changes as fixups, so i can re-review (i'll let you know when you can squash them).

/**
* The stack representation for JSON objects must have the keys in natural sorted order.
*/
public class JsonType

This comment has been minimized.

Copy link
@findepi

findepi Feb 13, 2019

Member

I recall @electrum's comment saying the JsonType should not be moved to SPI

prestodb/presto#11913 (comment)

(i didn't review this class, i think there were some changes that would require review)

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 14, 2019

Author Member

@findepi Should I create a new JsonType in SPI and use the signature to correlate them ?


private static ColumnMapping jsonColumnMapping()
{
return ColumnMapping.sliceMapping(JSON, (resultSet, columnIndex) -> jsonParse(utf8Slice(resultSet.getString(columnIndex))), varcharWriteFunction());

This comment has been minimized.

Copy link
@findepi

findepi Feb 13, 2019

Member

I assume we don't want to push down JSON predicates. Otherwise this requires extensive testing that this does not change query semantics, while this is likely not very needed feature.

To disable pushdown, use ColumnMapping.sliceMapping overload which takes UnaryOperator<Domain> pushdownConverter and pass domain -> Domain.all(domain.getType()) as the converter. Or better, rebase on #225 and use DISABLE_PUSHDOWN constant.

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 14, 2019

Author Member

I disabled pushdown and SELECT * FROM test_json where json_column = json'{"x":123}' still fails on Domain type must be orderable

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 14, 2019

Author Member

@findepi It works while skipping builder.add(toPredicate(column.getColumnName(), domain, column, accumulator)) in io.prestosql.plugin.jdbc.QueryBuilder when domain type is not orderable:

private List<String> toConjuncts(JdbcClient client, ConnectorSession session, List<JdbcColumnHandle> columns, TupleDomain<ColumnHandle> tupleDomain, List<TypeAndValue> accumulator)
{
    ImmutableList.Builder<String> builder = ImmutableList.builder();
    for (JdbcColumnHandle column : columns) {
        Domain domain = tupleDomain.getDomains().get().get(column);
        if (domain != null) {
            domain = pushDownDomain(client, session, column, domain);
            if (domain.getType().isOrderable()) {
                builder.add(toPredicate(column.getColumnName(), domain, column, accumulator));
            }
        }
    }
    return builder.build();
}

This comment has been minimized.

Copy link
@findepi

findepi Feb 14, 2019

Member

I filed a ticket -- #238

return DataType.dataType(
"json",
JSON,
value -> format("JSON'%s'", value),

This comment has been minimized.

Copy link
@findepi

findepi Feb 13, 2019

Member
Suggested change
value -> format("JSON'%s'", value),
value -> format("JSON '%s'", value),

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 13, 2019

Author Member

done

This comment has been minimized.

Copy link
@findepi

findepi Feb 14, 2019

Member

This actually applies to in a few places in your test queries. Also, by a convention, we always uppercase type name in type constructors like JSON '....' rather than json '...' (or json'...'.)

Copy link
Member

findepi left a comment

PIng me when you address all my comments (or when you feel blocked).


private static UnaryOperator<Domain> jsonPushdown()
{
return domain -> Domain.all(VARCHAR);

This comment has been minimized.

Copy link
@findepi

findepi Feb 14, 2019

Member

replace jsonPushdown() method with io.prestosql.plugin.jdbc.ColumnMapping#DISABLE_PUSHDOWN (static-import it)

here you're changing domain's type. Why so?

This comment has been minimized.

Copy link
@findepi

findepi Feb 14, 2019

Member

I see the reason now. Can you check if #238 is sufficient? I didn't test that change with your code.

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 15, 2019

Author Member

Your fix works for me. Thanks!

@@ -136,7 +138,12 @@ public WriteMapping toWriteMapping(Type type)

private static ColumnMapping jsonColumnMapping()
{
return ColumnMapping.sliceMapping(JSON, (resultSet, columnIndex) -> jsonParse(utf8Slice(resultSet.getString(columnIndex))), jsonWriteFunction(), domain -> Domain.all(domain.getType()));
return ColumnMapping.sliceMapping(JSON, (resultSet, columnIndex) -> jsonParse(utf8Slice(resultSet.getString(columnIndex))), jsonWriteFunction(), jsonPushdown());

This comment has been minimized.

Copy link
@findepi

findepi Feb 14, 2019

Member

nit: format with each arg on separate line, for readability

ColumnMapping.sliceMapping(
    JSON,
     (resultSet, columnIndex) -> jsonParse(utf8Slice(resultSet.getString(columnIndex))), 
     jsonWriteFunction(),
    jsonPushdown());

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 15, 2019

Author Member

done

@findepi

This comment has been minimized.

Copy link
Member

findepi commented Feb 14, 2019

Thanks for the fixups, that was helpful. You can squash what you have so far.
Then we should move JsonType back to where it was. I will defer to @electrum on that one.

Copy link
Member

findepi left a comment

Please squash the fixups you have so far.

JSON,
(resultSet, columnIndex) -> jsonParse(utf8Slice(resultSet.getString(columnIndex))),
jsonWriteFunction(),
domain -> Domain.all(domain.getType()));

This comment has been minimized.

Copy link
@findepi

findepi Feb 15, 2019

Member
Suggested change
domain -> Domain.all(domain.getType()));
DISABLE_PUSHDOWN);

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 15, 2019

Author Member

done

@guyco33 guyco33 force-pushed the guyco33:add_support_for_postgres_json branch from 00c2319 to 6748f0b Feb 15, 2019
@electrum

This comment has been minimized.

Copy link
Member

electrum commented Feb 15, 2019

Sorry for the confusion and delay on this PR. We'd like to get this in, but there are two main things to resolve:

  1. We shouldn't need to add the JSON type to the SPI. The intended way for connectors to access types is by looking them up via TypeManager. The name is available by StandardTypes.JSON. Instance checks can be converted to type.getTypeSignature().getBase().equals(StandardTypes.JSON).

  2. For constructing the object value for the JSON type, we should use the JSON type constructor. We need to expose this to the SPI. I will update #245 to do that after the massive function change #196 is landed, since it will conflict.

Please update the commit title to "Map PostgreSQL JSON to Presto JSON"

@guyco33 guyco33 changed the title map postgres json to presto json Map PostgreSQL JSON to Presto JSON Feb 15, 2019
@electrum

This comment has been minimized.

Copy link
Member

electrum commented Feb 15, 2019

For the JSON type constructor, it will likely be several weeks before the function and constructor PRs will land, so we can move forward on this PR using the duplicated code (and clean it up later).

We do still need to revert the part about moving JsonType into the SPI.

To get access to TypeManager:

  • Add it as a constructor parameter to PostgreSqlClient.
  • Change JdbcConnectorFactory to bind it in Guice:
Bootstrap app = new Bootstrap(
        binder -> binder.bind(TypeManager.class).toInstance(context.getTypeManager()),
        new JdbcModule(catalogName),
        module);

Then in the PostgreSqlClient constructor:

this.jsonType = typeManager.getType(new TypeSignature(StandardTypes.JSON));
@electrum

This comment has been minimized.

Copy link
Member

electrum commented Feb 15, 2019

@guyco33 Thanks for your work on this so far. Please let me know when you've updated the PR so that we can do a final review and merge. Feel free to ping me on Slack if you have any questions.

@@ -72,6 +72,7 @@
import static io.prestosql.spi.type.RealType.REAL;
import static io.prestosql.spi.type.SmallintType.SMALLINT;
import static io.prestosql.spi.type.StandardTypes.ARRAY;
import static io.prestosql.spi.type.StandardTypes.JSON;

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

The comparison below needs to be against JsonType.JSON

Copy link
Member

electrum left a comment

A few minor comments, otherwise this looks good. Please address the comments and squash into a single commit.


public class PostgreSqlClient
extends BaseJdbcClient
{
protected static Type jsonType;

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Make this protected final (no static) since it is initialized in the constructor

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 18, 2019

Author Member

I need it to be static since it used in private static ColumnMapping jsonColumnMapping()

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Make that method non-static. Initializing a static field from a constructor is problematic.

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 18, 2019

Author Member

Right, got it, thanks!

.addRoundTrip(jsonDataType(), "[]");

dataTypeTest.execute(getQueryRunner(), prestoCreateAsSelect("presto_test_json"));
dataTypeTest.execute(getQueryRunner(), postgresCreateAndInsert("tpch.postgresql_test_json2"));

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Why the 2 at the end? I don't see any other usages of this name

This comment has been minimized.

Copy link
@guyco33

guyco33 Feb 18, 2019

Author Member

tpch.postgresql_test_json used to be there before :) Fixed

try {
assertQuery(
"SELECT COLUMN_NAME,DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'tpch' AND TABLE_NAME = 'test_json'",
"VALUES ('key','varchar(5)'),('json_column','json'),('jsonb_column','json')"); //

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Remove //

jdbcSqlExecutor.execute("CREATE TABLE tpch.test_json(key varchar(5), json_column json, jsonb_column jsonb)");
try {
assertQuery(
"SELECT COLUMN_NAME,DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'tpch' AND TABLE_NAME = 'test_json'",

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Nit: add space after , and lowercase column and table names (but keep SQL keywords in uppercase)

@@ -291,6 +329,15 @@ private void testUnsupportedDataType(String databaseDataType)
Function.identity());
}

public static DataType<String> jsonDataType()
{
return DataType.dataType(

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Nit: static import dataType and identity

return DataType.dataType(
"json",
JSON,
value -> format("JSON '%s'", value),

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

This doesn't work if the value has single quotes, but that's probably fine for this test

This comment has been minimized.

Copy link
@findepi

findepi Feb 18, 2019

Member
value -> {
    checkArgument(!value.contains("'"));
    return format("JSON '%s'", value);
}
@@ -291,6 +329,15 @@ private void testUnsupportedDataType(String databaseDataType)
Function.identity());
}

public static DataType<String> jsonDataType()

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Make this private since it's not used elsewhere

@@ -260,6 +263,15 @@ else if (DOUBLE.equals(type)) {
row.add(doubleValue);
}
}
else if (type.getTypeSignature().getBase().equals(StandardTypes.JSON)) {

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

You can use JsonType.JSON here since it is available to test code

row.add(null);
}
else {
row.add(JsonFunctions.jsonParse(utf8Slice(stringValue)).toStringUtf8());

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Nit: static import jsonParse

@@ -235,6 +236,9 @@ else if (type instanceof DecimalType) {
else if (type.getTypeSignature().getBase().equals("ObjectId")) {
return value;
}
else if (type.getTypeSignature().getBase().equals(StandardTypes.JSON)) {

This comment has been minimized.

Copy link
@electrum

electrum Feb 18, 2019

Member

Same, this can use JsonType.JSON

@guyco33 guyco33 force-pushed the guyco33:add_support_for_postgres_json branch from a58aed2 to 8b27ae4 Feb 18, 2019
Copy link
Member

electrum left a comment

This is ready to merge. GitHub is showing a merge conflict, please rebase and then I'll merge it.

@@ -90,6 +116,13 @@ protected ResultSet getTables(Connection connection, String schemaName, String t
@Override
public Optional<ColumnMapping> toPrestoType(ConnectorSession session, JdbcTypeHandle typeHandle)
{
if (typeHandle.getJdbcType() == Types.OTHER) {

This comment has been minimized.

Copy link
@electrum

electrum Feb 23, 2019

Member

I don't think we need to check the JDBC type here. The switch on the type name should be sufficient.

@guyco33 guyco33 force-pushed the guyco33:add_support_for_postgres_json branch from 8b27ae4 to 653ee44 Feb 23, 2019
@electrum electrum merged commit 1f70a1b into prestosql:master Feb 23, 2019
2 checks passed
2 checks passed
Travis CI - Pull Request Build Passed
Details
verification/cla-signed
Details
@findepi

This comment has been minimized.

Copy link
Member

findepi commented Feb 23, 2019

@guyco33 congrats!

@electrum electrum mentioned this pull request Feb 24, 2019
4 of 6 tasks complete
@findepi findepi changed the title Map PostgreSQL JSON to Presto JSON Map PostgreSQL JSON, JSONB to Presto JSON Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants
You can’t perform that action at this time.