Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide table layouts from engine #363

Merged
merged 15 commits into from Mar 9, 2019

Conversation

5 participants
@martint
Copy link
Member

martint commented Mar 2, 2019

In preparation for https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations, this change removes support for multiple layouts. It's a feature that's not used by any known connector, complicates the work of the optimizer and just gets in the way. In the future, we may add this functionality again but in a different form.

@cla-bot cla-bot bot added the cla-signed label Mar 2, 2019

@martint martint referenced this pull request Mar 2, 2019

Open

Allow connectors to participate in query optimization #18

4 of 15 tasks complete

@martint martint force-pushed the martint:tl-wip3 branch from 402630b to 9b3c617 Mar 2, 2019

@martint martint changed the title [WIP] Hide table layouts from engine Hide table layouts from engine Mar 2, 2019

@dain

dain approved these changes Mar 2, 2019

Copy link
Member

dain left a comment

One comment, but otherwise looks good.

// Used during predicate refinement over multiple passes of predicate pushdown
// TODO: think about how to get rid of this in new planner
private final TupleDomain<ColumnHandle> currentConstraint;

private final TupleDomain<ColumnHandle> enforcedConstraint;

public static TableScanNode newInstance(

This comment has been minimized.

@dain

dain Mar 2, 2019

Member

This is a strange design, which I think predates this change. Maybe add a comment on why this factory is different from the constructor with the same signature.

This comment has been minimized.

@martint

martint Mar 2, 2019

Author Member

Ah, thanks for pointing it out. I need to see if I can get rid of it. Without it, two of the constructors had the same shape (but ended up doing different things). I may be able to consolidate them now that we don’t have table layouts.

This comment has been minimized.

@martint

martint Mar 3, 2019

Author Member

It turns out I can't. I tried making the static factory method the one that's used to deserialized from json, but it needs to set some fields to null, which is disallowed by the full constructor. Oh well.. I added a comment to clarify.

This comment has been minimized.

@kokosing

kokosing Mar 5, 2019

Member

But we used to have such constructor and it used to work. What was wrong with it?

@martint martint force-pushed the martint:tl-wip3 branch from 9b3c617 to aab3854 Mar 2, 2019

@kokosing
Copy link
Member

kokosing left a comment

Can you please refresh my memory and explain why TableLayout* were introduced in the first place and why it no longer holds?


import static java.util.Objects.requireNonNull;

public final class TableHandle
{
private final ConnectorId connectorId;
private final ConnectorTableHandle connectorHandle;
private final ConnectorTransactionHandle transaction;
private final Optional<ConnectorTableLayoutHandle> layout;

This comment has been minimized.

@kokosing

kokosing Mar 4, 2019

Member

To me it looks like to much of information to be stored in TableHandle. To me TableHandle represents an handle to a table, same way like a column handle. It is expected to be lightweight. What you have here is designed for very specific use case, hence maybe a new entity should be created for that.

Notice that you are planning to put even more information here (aggregations, joins, projections), the more you add here the less "table"-like it becomes, instead it becomes a more generic "relation" (no longer a table). Also notice that TableHandle is used for insert (perfect fit usage) and now with your changes it becomes questionable.

This comment has been minimized.

@martint

martint Mar 4, 2019

Author Member

To me it looks like to much of information to be stored in TableHandle. To me TableHandle represents an handle to a table, same way like a column handle. It is expected to be lightweight. What you have here is designed for very specific use case, hence maybe a new entity should be created for that.

Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.

But at the end of the day, yes, the plan is for TableHandle to potentially represent more than raw tables. The object doesn't need to be heavyweight -- it's up to the connector to decide what it holds on to internally (it could be just an id to an stored definition, etc). As to whether it becomes more "relation"-like, that's not much of a concern. In fact, the SQL spec treats everything as some form of a table:

A table is a collection of zero or more rows where each row is a sequence of one or more column values. 

[...] 

A table is either a base table, a derived table, or a transient table.

[...]

A derived table is a table derived directly or indirectly from one or more other tables by the evaluation of an expression, 
such as a <joined table>, <data change delta table>, <query expression>, or <table expression>. A <query expression> 
can contain an optional <order by clause>

Also notice that TableHandle is used for insert (perfect fit usage) and now with your changes it becomes questionable.

That's ok, because whether a table can be inserted into is decided at analysis time. So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into. BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables? For example, there's an entire section describing "updatable views".

This comment has been minimized.

@kokosing

kokosing Mar 5, 2019

Member

I see your point. Anyway, I still think that TableHandle was a simple wrapper around ConnectorTableHandle and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.

Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.

But there will be something else that will replace TableLayout. So TableHandle won't become any simpler.

So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into.

Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).

BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables?

Mind blowing. Anyway, I don't think that TableHandle that is suited for read will match that.

@martint

This comment has been minimized.

Copy link
Member Author

martint commented Mar 4, 2019

Can you please refresh my memory and explain why TableLayout* were introduced in the first place and why it no longer holds?

It's a feature we added a few years ago for a very specific use case at FB to allow connectors to provide multiple physical organizations of a table for the engine to use during optimization. No one else is using it. Since it makes it hard to introduce new features like complex operation pushdown, we're removing it for now. We may reintroduce it later in some other shape (materialized query tables/materialized views, etc).

@kokosing

This comment has been minimized.

Copy link
Member

kokosing commented Mar 4, 2019

Would you mind to sort commits, that way it will be easier to review.

@martint martint force-pushed the martint:tl-wip3 branch from aab3854 to e6e6e7a Mar 4, 2019

@martint

This comment has been minimized.

Copy link
Member Author

martint commented Mar 4, 2019

Would you mind to sort commits, that way it will be easier to review.

done

@kokosing
Copy link
Member

kokosing left a comment

LGTM for all commits but Hide TableLayouts from engine


import static java.util.Objects.requireNonNull;

public final class TableHandle
{
private final ConnectorId connectorId;
private final ConnectorTableHandle connectorHandle;
private final ConnectorTransactionHandle transaction;
private final Optional<ConnectorTableLayoutHandle> layout;

This comment has been minimized.

@kokosing

kokosing Mar 5, 2019

Member

I see your point. Anyway, I still think that TableHandle was a simple wrapper around ConnectorTableHandle and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.

Storing the TableLayout is only temporary. I plan to eventually get rid of them entirely, but didn't do so yet because it requires backward incompatible changes to the SPI.

But there will be something else that will replace TableLayout. So TableHandle won't become any simpler.

So you'd never end up with a TableHandle to insert into that represents something that can't be inserted into.

Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).

BTW, did you know the SQL spec allows for inserting into things that are more complex that simple tables?

Mind blowing. Anyway, I don't think that TableHandle that is suited for read will match that.

@@ -29,15 +29,22 @@

public class TableLayoutResult
{
private final TableHandle newTableHandle;

This comment has been minimized.

@kokosing

kokosing Mar 5, 2019

Member

why newTableHanlde and not just tableHandle?

This comment has been minimized.

@martint

martint Mar 5, 2019

Author Member

It's just to indicate that this is the new table handle to be used to refer to the result of pushing the predicates into the table scan. It's named that way for clarity, but it doesn't matter in the medium term since I'm planning to remove this class altogether once the replacement from https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations is in place and we give connectors some time to migrate.

{
this.newTableHandle = newTable;

This comment has been minimized.

@kokosing

kokosing Mar 5, 2019

Member

rnn

@@ -90,20 +90,14 @@ public PickTableLayout(Metadata metadata, SqlParser parser)
{
return ImmutableSet.of(
checkRulesAreFiredBeforeAddExchangesRule(),

This comment has been minimized.

@kokosing

kokosing Mar 5, 2019

Member

is this still required? I would replace with comment in PlanOptimizers. There is no reason to prevent this class to be invoked later.

This comment has been minimized.

@martint

martint Mar 5, 2019

Author Member

Probably not. I'll remove it.

Show resolved Hide resolved .../io/prestosql/sql/planner/iterative/rule/PushPredicateIntoTableScan.java
// Used during predicate refinement over multiple passes of predicate pushdown
// TODO: think about how to get rid of this in new planner
private final TupleDomain<ColumnHandle> currentConstraint;

private final TupleDomain<ColumnHandle> enforcedConstraint;

public static TableScanNode newInstance(

This comment has been minimized.

@kokosing

kokosing Mar 5, 2019

Member

But we used to have such constructor and it used to work. What was wrong with it?

@martint

This comment has been minimized.

Copy link
Member Author

martint commented Mar 5, 2019

But we used to have such constructor and it used to work. What was wrong with it?

Before, we had two constructors:

@JsonCreator
public TableScanNode(
        @JsonProperty("id") PlanNodeId id,
        @JsonProperty("table") TableHandle table,
        @JsonProperty("outputSymbols") List<Symbol> outputs,
        @JsonProperty("assignments") Map<Symbol, ColumnHandle> assignments,
        @JsonProperty("layout") Optional<TableLayoutHandle> tableLayout)

and

public TableScanNode(
        PlanNodeId id,
        TableHandle table,
        List<Symbol> outputs,
        Map<Symbol, ColumnHandle> assignments)

After removing table layouts, both constructors have the same signature, but do something different.

@martint

This comment has been minimized.

Copy link
Member Author

martint commented Mar 5, 2019

I see your point. Anyway, I still think that TableHandle was a simple wrapper around ConnectorTableHandle and there are still places where it is used as such. Now, they will get a complex type, so they would need to at least check if the layout is empty there.

It will still be a simple wrapper around ConnectorTableHandle once we finish removing table layouts from the SPI. But we can'd do that quite yet -- we need to 1) add the new APIs and mechanism for predicate pushdown 2) have a grace period for connectors to adapt (or do it automatically, if possible)

But there will be something else that will replace TableLayout. So TableHandle won't become any simpler.

No, there won't. It will be just TableHandles wrapping ConnectorTableHandles. It's up to connectors to decide what they need to track in a ConnectorTableHandle based on what can be pushed down into the connector.

Having separate a type would make expressive in the code that such case is impossible (without knowing the actual flow).

That's really hard to do in general. It forces entire parallel hierarchies and APIs to represent similar objects with different constraints (e.g., we'd need separate APIs to resolve a table during analysis for each possible use so we can get a table handle for insert, a table handle for delete, etc.). Also, where do we stop? Would we have one of each expression kind for each SQL type to prevent misuse? It's a fine balance, and I think the current concepts provide good separation between action (expression/plan node type), object (columns, tables, arguments) and attributes (physical or logical properties) without requiring a NxMxR explosion of valid combinations.

@martint martint force-pushed the martint:tl-wip3 branch 2 times, most recently from 858edcb to 6773bb3 Mar 5, 2019

@martint martint referenced this pull request Mar 7, 2019

Open

[WIP] Complex pushdown #402

@sopel39
Copy link
Member

sopel39 left a comment

well done!

Show resolved Hide resolved .../java/io/prestosql/sql/planner/optimizations/MetadataQueryOptimizer.java Outdated
Show resolved Hide resolved presto-main/src/main/java/io/prestosql/metadata/Metadata.java
Show resolved Hide resolved presto-main/src/main/java/io/prestosql/metadata/MetadataManager.java
Show resolved Hide resolved presto-main/src/main/java/io/prestosql/metadata/MetadataManager.java
Show resolved Hide resolved presto-main/src/main/java/io/prestosql/metadata/TableHandle.java
@@ -25,15 +25,22 @@

public class TableLayoutResult
{
private final TableHandle newTableHandle;

This comment has been minimized.

@sopel39

sopel39 Mar 7, 2019

Member

Why do we need this field since we still have layout?

This comment has been minimized.

@martint

martint Mar 7, 2019

Author Member

Layout doesn't have a reference to the ConnectorTableHandle that's needed to construct the new TableHandle. I could add that, but it seemed conceptually cleaner to do it this way.

Show resolved Hide resolved presto-main/src/main/java/io/prestosql/split/SplitManager.java
Show resolved Hide resolved ...java/io/prestosql/sql/planner/optimizations/HashGenerationOptimizer.java
Show resolved Hide resolved ...st/java/io/prestosql/operator/BenchmarkScanFilterAndProjectOperator.java

martint added some commits Mar 2, 2019

Rename listTableLayouts method
More accurately, constructs a set of alternate plans with the
filter pushed into the table scan based on available table layouts.
@dain

dain approved these changes Mar 7, 2019

@@ -144,12 +145,14 @@ public static RowExpression translate(
FunctionRegistry functionRegistry,
TypeManager typeManager,
Session session,
boolean optimize)
boolean optimize,
Map<Symbol, Integer> layout)

This comment has been minimized.

@dain

dain Mar 7, 2019

Member

I'd put this below type

}
});

Block block = Utils.nativeValueToBlock(expectedType, result);

This comment has been minimized.

@dain

dain Mar 7, 2019

Member

add comment // convert result from stack type to Type ObjectValue

martint added some commits Feb 16, 2019

Simplify unconditional PickLayout
It's just selecting a layout for the raw table scan, so
no need to go through the logic for pushing a predicate,
etc.
Move test to distributed query tests
This test relies on a layout being picked during planning. For
LocalQueryRunner this only happens because of a synthetic rule
(PickLayout w/o predicate) that attaches a layout to the table scan.
So, in a sense, it's just a coincidence that it works.

In distributed execution, the job is done by AddExchanges, so
we want to make sure we're testing that behavior.

@martint martint force-pushed the martint:tl-wip3 branch from 6773bb3 to c705d2d Mar 7, 2019

@martint martint force-pushed the martint:tl-wip3 branch 2 times, most recently from 2eebec6 to f10d5f2 Mar 8, 2019

@martint martint force-pushed the martint:tl-wip3 branch from f10d5f2 to c9d4d04 Mar 9, 2019

martint added some commits Feb 27, 2019

Hide TableLayouts from engine
This change hides table layouts from the engine as a first-class
concept. We keep the SPI as is for backward compatibility for now.

When predicates are pushed into a table scan by PickLayout (now
PushPredicateIntoTableScan) or AddExchanges, we now replace the
table handle associated with the table scan with a new one that
contains the reference to the ConnectorTableLayoutHandle under
the covers.
Make PushPredicateIntoTableScan top level rule
It now contains a single rule, so no point in
having it return a rule set.
Remove unnecessary expression translations
In order to translate expression to row expressions, the
code was first replacing all symbol references with field
references for the corresponding ordinal inputs.

This is unnecessary, as the translation can be done on demand
as the expression is translated to a row expression.
Remove interpreted page processors
They were only being used in tests. The engine no longer
relies on them for query execution.
Fix expression type
The inferred type of the former expression is INTEGER, which
doesn't match the signature of combineHash function call.
Rename analyzeExpressionsWithSymbols method
There's no longer a conflict with analyzeExpressionsWithInputs so
simplify the name
Inline analyzeExpressions method
There's only one caller, so no need for an extra indirection.

@martint martint force-pushed the martint:tl-wip3 branch from c9d4d04 to feb70d7 Mar 9, 2019

@martint martint merged commit 2a23e10 into prestosql:master Mar 9, 2019

1 check passed

verification/cla-signed
Details
@@ -252,10 +251,10 @@ public Object evaluate()
return visitor.process(expression, new NoPagePositionContext());
}

public Object evaluate(int position, Page page)
public Object evaluate(SymbolResolver inputs)

This comment has been minimized.

@rongrong

rongrong Mar 22, 2019

Contributor

What's the difference between this evaluate and the optimize now? If you provide SymbolResolver, using optimize would be just fine, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.