Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suppport for creating table with MergeTree engine. #7135

Closed
wants to merge 1 commit into from

Conversation

wgzhao
Copy link
Member

@wgzhao wgzhao commented Mar 3, 2021

@cla-bot cla-bot bot added the cla-signed label Mar 3, 2021
@wgzhao wgzhao mentioned this pull request Mar 3, 2021
10 tasks
docs/src/main/sphinx/connector/clickhouse.rst Show resolved Hide resolved
docs/src/main/sphinx/connector/clickhouse.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/connector/clickhouse.rst Show resolved Hide resolved
ClickHouseTableProperties.getPartitionBy(tableProperties).ifPresent(value -> tableOptions.add("PARTITION BY " + value));
ClickHouseTableProperties.getSampleBy(tableProperties).ifPresent(value -> tableOptions.add("SAMPLE BY " + value));
}
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should validate the order by, primary key, partition by, sample by are not present if engine is not "mergretree".
Otherwise we silently ignore table properties explicitly requested by the user.

Other option would be to pass everything to ClickHouse and let CH do the validation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I means I will add required property (order by) and optional properties (if present) for MergeTree engine.
But for Log engine, all above properties are not required, so i just ignore them.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

A question about testing.
If ClickHouse passes the testAddColumn and testDropColumn defined in superclass then instead of over-riding them add new methods with more descriptive names (e.g. testDropColumn -> testDropMergeTreeColumn, testAddColumn -> testAddMergeTreeColumn.

In case the superclass tests fail I'd suggest to add a SkipException with the reason. This way if the base test class is ever improved in future all connectors will get the improved test coverage for free.

@wgzhao wgzhao force-pushed the ck_support_mergetree branch 2 times, most recently from 7e9cee7 to 60c7c38 Compare March 8, 2021 01:42
false),
new PropertyMetadata<>(
ORDER_BY_PROPERTY,
"columns to be the sorting key. Required",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Required for tables using MergeTree engine property."

But I think it's OK to drop "Required" and "Optional" here and just describe this in docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These description of table property comes from office documentation.
Ok, I change them.

ClickHouseTableProperties.getPrimaryKey(tableProperties).ifPresent(value -> tableOptions.add("PRIMARY KEY " + value));
ClickHouseTableProperties.getPartitionBy(tableProperties).ifPresent(value -> tableOptions.add("PARTITION BY " + value));
ClickHouseTableProperties.getSampleBy(tableProperties).ifPresent(value -> tableOptions.add("SAMPLE BY " + value));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relating to #7135 (comment),

what happens if i issue

CREATE TABLE clickhouse.default.t(a int) 
WITH (partition_by = ARRAY['x'])

?

I would expect some kind of failure (either from the connector, or from ClickHouse), because there is no column x.

Then, what happens if i issue

CREATE TABLE clickhouse.default.t(a int, x int) 
WITH (engine = 'LOG', partition_by = ARRAY['x'])

?

I would expect some kind of failure (either from the connector, or from ClickHouse), because i asked for the table to be partitioned, but it cannot be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE TABLE clickhouse.default.t(a int) 
WITH (partition_by = ARRAY['x'])
CREATE TABLE clickhouse.default.t(a int, x int) 
WITH (engine = 'LOG', partition_by = ARRAY['x'])

Yes, the above sqls will both create successfully, because the default table engine Log will ignore all other properties.

we can pass all table property regardless table engine, it will be kind of failure from ClickHouse side。
So we take this approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execute the following sql in ClickHouse directly

create table default.t (a int)  engine = Log order by a;

It print the following errors:

ClickHouse exception, code: 36, host: 10.60.242.112, port: 18123; Code: 36, e.displayText() = DB::Exception: 
Engine Log doesn't support PARTITION_BY, PRIMARY_KEY, ORDER_BY or SAMPLE_BY clauses. 
Currently only the following engines have support for the feature: [MergeTree, ReplicatedVersionedCollapsingMergeTree, 
ReplacingMergeTree, ReplicatedSummingMergeTree, ReplicatedAggregatingMergeTree, ReplicatedCollapsingMergeTree, ReplicatedGraphiteMergeTree, ReplicatedMergeTree, 
ReplicatedReplacingMergeTree, VersionedCollapsingMergeTree, SummingMergeTree,
GraphiteMergeTree, CollapsingMergeTree, AggregatingMergeTree] (version 20.3.5.21 (official build))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified codes:

    @Override
    protected String createTableSql(RemoteTableName remoteTableName, List<String> columns, ConnectorTableMetadata tableMetadata)
    {
        ImmutableList.Builder<String> tableOptions = ImmutableList.builder();
        Map<String, Object> tableProperties = tableMetadata.getProperties();
        ClickHouseEngineType engine = ClickHouseTableProperties.getEngine(tableProperties);
        tableOptions.add("ENGINE = " + engine.getEngineType());
        if (engine == ClickHouseEngineType.MERGETREE && formatProperty(ClickHouseTableProperties.getOrderBy(tableProperties)).isEmpty()) {
            // order_by property is required
            throw new TrinoException(INVALID_TABLE_PROPERTY, format("The property of %s is required for table engine %s", ClickHouseTableProperties.ORDER_BY_PROPERTY, engine.getEngineType()));
        }
        formatProperty(ClickHouseTableProperties.getOrderBy(tableProperties)).ifPresent(value -> tableOptions.add("ORDER BY " + value));
        formatProperty(ClickHouseTableProperties.getPrimaryKey(tableProperties)).ifPresent(value -> tableOptions.add("PRIMARY KEY " + value));
        formatProperty(ClickHouseTableProperties.getPartitionBy(tableProperties)).ifPresent(value -> tableOptions.add("PARTITION BY " + value));
        ClickHouseTableProperties.getSampleBy(tableProperties).ifPresent(value -> tableOptions.add("SAMPLE BY " + value));

        return format("CREATE TABLE %s (%s) %s", quoted(remoteTableName), join(", ", columns), join(" ", tableOptions.build()));
    }

@findepi
Copy link
Member

findepi commented Mar 10, 2021

The build status seems related

2021-03-09T03:13:40.7966599Z [ERROR] testDifferentEngine(io.trino.plugin.clickhouse.TestClickHouseConnectorTest)  Time elapsed: 0.124 s  <<< FAILURE!
2021-03-09T03:13:40.7969084Z java.lang.AssertionError: 
2021-03-09T03:13:40.7969522Z 
2021-03-09T03:13:40.7969910Z Expecting message:
2021-03-09T03:13:40.7970589Z   <"The property of order_by is required for table engine MergeTree()">
2021-03-09T03:13:40.7971210Z to match regex:
2021-03-09T03:13:40.7971716Z   <"order_by is required for MergeTree\(\)">
2021-03-09T03:13:40.7972212Z but did not.
2021-03-09T03:13:40.7972475Z 
2021-03-09T03:13:40.7979705Z Throwable that failed the check:

@wgzhao wgzhao closed this Mar 10, 2021
@wgzhao wgzhao reopened this Mar 10, 2021
@findepi
Copy link
Member

findepi commented Mar 12, 2021

@electrum @kokosing @martint @losipiuk cc @trinodb/maintainers

I have a question how should we model table properties that are arbitrary expessions.

ORDER BY expr
[PARTITION BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]

Those expressions can be column names, in which case user could expect the connector to apply proper quoting.
But of course, we don't know whether expression is a column name, since it's a CH's expression.
Actually a n-m could refer be "column n minus m", or could be "column "n-m"".

We could reduce risk for confusion by naming the table properties informatively, eg partition_by_expressions instead of partition_by, although this feels a bit too verbose.

I am therefore inclined to continue with the approach presented here, using shorter table property names, eg partition_by.

Thoughts?

@wgzhao
Copy link
Member Author

wgzhao commented Mar 12, 2021

@findepi
My understanding is that when a user uses clickhouse connector, we can assume that the user has some understanding of clickhouse, like how to create a table on the clickhouse side, so in trino, if we keep the syntax or keywords similar to clickhouse, isn't it easier for the user to understand?

2021-03-06
1. Changes engine property to enum class
2. Validates required property

2021-03-08
1. Add test cases for table properties

2021-03-09
1. move ClickHouseTableProperties#formatPropety to ClickHouse
2. Add all passed table properties regardless of table engine
3. more tests

2021-03-10
1. Fixed excepted message for test
@findepi
Copy link
Member

findepi commented Mar 23, 2021

Merged, thanks!

@findepi findepi closed this Mar 23, 2021
@wgzhao wgzhao deleted the ck_support_mergetree branch March 23, 2021 13:00
@findepi findepi mentioned this pull request Apr 7, 2021
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants