Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TableStatistics in QueryInputMetadata for Join Query #12808

Merged
merged 1 commit into from
Aug 1, 2019

Conversation

viczhang861
Copy link
Contributor

@viczhang861 viczhang861 commented May 15, 2019

Table stats availablity is important for cost-based optimization
of join query. Add it to QueryInputMetadata class.

Test is added to show example of TableStatisitcs in json format.

@viczhang861 viczhang861 changed the title [WIP]Add TableStatistics in QueryInputMetadata for Join Query Add TableStatistics in QueryInputMetadata for Join Query May 15, 2019
Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits

@@ -35,26 +36,23 @@
private final String table;
private final List<Column> columns;
private final Optional<Object> connectorInfo;
private final Optional<TableStatistics> tableStats;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: simply statistics

@@ -58,15 +63,15 @@ private static Column createColumn(ColumnMetadata columnMetadata)
return new Column(columnMetadata.getName(), columnMetadata.getType().toString());
}

private Input createInput(TableMetadata table, TableHandle tableHandle, Set<Column> columns)
private Input createInput(TableMetadata table, TableHandle tableHandle, Set<Column> columns, Optional<TableStatistics> tableStats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: statistics

@@ -85,13 +114,20 @@ public Void visitTableScan(TableScanNode node, Void context)
columns.add(createColumn(metadata.getColumnMetadata(session, tableHandle, columnHandle)));
}

inputs.add(createInput(metadata.getTableMetadata(session, tableHandle), tableHandle, columns));
Optional<TableStatistics> tableStats = Optional.empty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: statistics

Optional<TableStatistics> tableStats = Optional.empty();

if (context.extractStats()) {
Constraint<ColumnHandle> constraint = new Constraint<>(node.getCurrentConstraint());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline this


private static class Context
{
private boolean extractStats;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extractStatistics

@@ -78,6 +80,7 @@ public int hashCode()
return Objects.hash(min, max);
}

@JsonValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's annotate getMin and getMax with the @JsonProperty to keep this class Json parsable

@arhimondr arhimondr assigned rschlussel and viczhang861 and unassigned arhimondr May 17, 2019

if (context.extractStats()) {
Constraint<ColumnHandle> constraint = new Constraint<>(node.getCurrentConstraint());
tableStats = Optional.of(metadata.getTableStatistics(session, tableHandle, constraint));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's log the stats only for the columns references in the table scan

@viczhang861 viczhang861 force-pushed the stats branch 2 times, most recently from d41d3a1 to ae9ede6 Compare May 21, 2019 00:29
@@ -589,6 +589,23 @@ public TableStatistics getTableStatistics(ConnectorSession session, ConnectorTab
return hiveStatisticsProvider.getTableStatistics(session, ((HiveTableHandle) tableHandle).getSchemaTableName(), columns, columnTypes, partitions);
}

@Override
public TableStatistics getTableStatistics(ConnectorSession session, ConnectorTableHandle tableHandle, Constraint<ColumnHandle> constraint, Set<ColumnHandle> desiredColumns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's don't change the SPI. You can simply request the stats using the old SPI, but before storing them in the Input, filter out the stats that are not referenced in the TableScanNode recreating the TableStatistics object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting stats for only the columns needed requires more reworking in hive connector. By asking of filtering the stats for not used columns i just wanted to optimize the storage for the Input objects.

@viczhang861
Copy link
Contributor Author

Rebase

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comment

@arhimondr arhimondr removed their assignment Jul 11, 2019
Record table statistics in QueryCompletedEvent for join query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants