New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding presto-pinot with pushdown working #13504
Conversation
9ebfaa7
to
ae9bb18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be unblocked by #13521
ab074c2
to
563b1a5
Compare
I think this PR is ready enough to start to be reviewed. It depends on #13521. It also subsumes #13413 and has implemented all of the changes suggested in there. (cc: @haibow) I know it is kind of big .. it has almost no changes to the presto-main except:
The build will fail until https://github.com/agrawaldevesh/presto-pinot-driver can be pushed to mavencentral and the presto-pinot-driver version upgraded to 0.1.0. I am working with @wenleix on that. I am also happy to meet in person and give a code overview. But a good place to start is from the PinotConnectorPlanOptimizer.java file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level comments only. The dependency looks much cleaner!
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorId.java
Outdated
Show resolved
Hide resolved
...ot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
...ot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-plugin/src/main/java/com/facebook/presto/pinot/PinotPlugin.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnector.java
Outdated
Show resolved
Hide resolved
...-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotPageSourceProvider.java
Outdated
Show resolved
Hide resolved
...-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotPageSourceProvider.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotSplit.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConfig.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnection.java
Outdated
Show resolved
Hide resolved
presto-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnection.java
Outdated
Show resolved
Hide resolved
...to-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorFactory.java
Outdated
Show resolved
Hide resolved
...to-pinot/presto-pinot-lib/src/main/java/com/facebook/presto/pinot/PinotConnectorFactory.java
Outdated
Show resolved
Hide resolved
563b1a5
to
de579bf
Compare
It is ok to have many top-level modules |
de579bf
to
0bca794
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @agrawaldevesh! A couple of high level comments to get started:
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Show resolved
Hide resolved
0bca794
to
284b4de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished reviewing classes PinotExpression
, PinotFilterExpressionConverter
, PinotProjectExpressionConverter
, and PinotQueryGenerator
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotExpression.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGenerator.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotSegmentPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotConfig.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/test/java/com/facebook/presto/pinot/MetadataUtil.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
284b4de
to
34e8204
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawaldevesh, there are still some amount of comments not addressed? Could you go through those in https://github.com/prestodb/presto/pull/13504/files ? Make sure to open each file and exam the comment.
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotExpression.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many nits and small coding style to be fixed. I only skimmed through the files rather than digging into each individual. @agrawaldevesh, could do a thorough exam on all the files to make sure the coding style is right based on the existing comments? Thanks!
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
...-pinot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotQueryGeneratorContext.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotBrokerPageSource.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotMetricsStat.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotSessionProperties.java
Outdated
Show resolved
Hide resolved
import java.util.List; | ||
import java.util.Optional; | ||
|
||
public class PushdownUtils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this PinotPushdownUtils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ... something to discuss about: I want to use these for presto-aresdb connector too. We originally had them in presto-spi as a part of our fork. Is there a designated common module that connectors can share code in ? Or I can always have the presto-aresdb-toolkit depend on presto-pinot-toolkit but that would look weird.
It would be funny to create another module named presto-pinot-aresdb-utils to house these :)
But I can rename them to something pinot specific for now but this is generic.
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PushdownUtils.java
Outdated
Show resolved
Hide resolved
9661d49
to
1830602
Compare
@highker ... I just did a pass over the entire patch and I think you will find that a lot of your common nits have been addressed :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
public Set<ConnectorPlanOptimizer> getConnectorPlanOptimizers() | ||
public Set<ConnectorPlanOptimizer> getLogicalPlanOptimizers() | ||
{ | ||
return ImmutableSet.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to add a getStage()
method in ConnectorPlanOptimizer
but current design is fine given we won't (probably shouldn't) have more than two stages.
...t-toolkit/src/test/java/com/facebook/presto/pinot/query/TestPinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
* A simplified table scan matcher for multiple-connector support. | ||
* The goal is to test plan structural matching rather than table scan details | ||
* <p> | ||
* Copied from com.facebook.presto.sql.planner.optimizations.TestConnectorOptimization.SimpleTableScanMatcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment might not be necessary?
...t-toolkit/src/test/java/com/facebook/presto/pinot/query/TestPinotConnectorPlanOptimizer.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Test | ||
public void testFilterSplitting() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testPartialPredicatePushdown
?
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Outdated
Show resolved
Hide resolved
presto-pinot-toolkit/src/main/java/com/facebook/presto/pinot/PinotErrorCode.java
Show resolved
Hide resolved
return handleIn(specialForm, true, context); | ||
case AND: | ||
case OR: | ||
return derived(format("(%s %s %s)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to put parenthesis around sub clauses:
format("( (%s) %s (%s))", ...)
...ot-toolkit/src/main/java/com/facebook/presto/pinot/query/PinotFilterExpressionConverter.java
Show resolved
Hide resolved
int aggregations = 0; | ||
boolean groupByExists = false; | ||
|
||
for (AggregationColumnNode expression : computeAggregationNodes(node).orElseThrow(() -> new PinotException(PINOT_UNSUPPORTED_EXPRESSION, Optional.empty(), "Unsupported aggregation node " + node))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of lengthy computeAggregationNodes orElseThrow
, can we just throw inside computeAggregationNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure is related. Could you fix it really quick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agrawaldevesh, could you update your commit message body with the following text? (Basically I just wrap your words into 70 chars per line and removed some out-of-date statement)
Adding presto-pinot with pushdown
This presto-pinot connector works as follows:
* It builds upon the new plan connector optimizer framework to push the
maximal query to the pinot system.
* By default, aggregation and limit queries are send to the pinot broker,
while just filter queries are send to the servers parallelly. This is
useful when joining against another (say hive) table wherein the join
needs to be parallelized and segments from pinot (fetched directly from
the pinot servers) need to be joined with hive splits. If you wish to
not send any queries to the broker, please set the catalog session
property prefer_broker_queries to false.
* You can disable parallel queries (useful for low latency deployment)
we do at Uber, by setting the catalog property
forbid_segmnt_queries=false.
* The connector needs a way to talk to the controller (to fetch the
brokers and the table schema). You can either provide the controller
URI or the URI of the Muttley proxy (an RPC proxy mechanism we use at
Uber). It is also possible to use the pinot-rest-proxy instead of
going to the pinot-broker (again something we use for our test pinot
clusters at Uber).
In addition the pinot connector would split the filter expressions such
that expressions that can be pushed down to pinot are done so, and those
that aren't are handled in presto.
It handles the full range of pinot PQL and respects pinot's weirdnesses
to guarantee that all results are still in the expected SQL semantics.
Usage:
Place the catalog file like so in the etc/catalog directory. The schema
name is immaterial and it can be anything. (since pinot does not have a
notion of schema). Example catalog files can be found below:
$ cat local_run/etc/catalog/pinotnotparallel.properties
connector.name=pinot
pinot.controller-url=controller_host:9000
pinot.scan-parallelism-enabled=false
$ cat local_run/etc/catalog/pinotparallel.properties
connector.name=pinot
pinot.controller-url=controller_host:9000
pinot.scan-parallelism-enabled=true
1830602
to
e8b3174
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
e8b3174
to
af6bb6a
Compare
This presto-pinot connector works as follows: * It builds upon the new plan connector optimizer framework to push the maximal query to the pinot system. * By default, aggregation and limit queries are send to the pinot broker, while just filter queries are send to the servers parallelly. This is useful when joining against another (say hive) table wherein the join needs to be parallelized and segments from pinot (fetched directly from the pinot servers) need to be joined with hive splits. If you wish to not send any queries to the broker, please set the catalog session property prefer_broker_queries to false. * You can disable parallel queries (useful for low latency deployment) we do at Uber, by setting the catalog property forbid_segment_queries=false. * The connector needs a way to talk to the controller (to fetch the brokers and the table schema). You can either provide the controller URI or the URI of the Muttley proxy (an RPC proxy mechanism we use at Uber). It is also possible to use the pinot-rest-proxy instead of going to the pinot-broker (again something we use for our test pinot clusters at Uber). In addition the pinot connector would split the filter expressions such that expressions that can be pushed down to pinot are done so, and those that aren't are handled in presto. It handles the full range of pinot PQL and respects pinot's weirdnesses to guarantee that all results are still in the expected SQL semantics. Usage: See the newly added connector/pinot.rst documentation file.
af6bb6a
to
95b935b
Compare
Tested and fixed. Good to be merged in. |
…o's date_trunc This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
…o's date_trunc (#4740) This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
…o's date_trunc (apache#4740) This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
…o's date_trunc (apache#4740) This will only be called by Presto connector (prestodb/presto#13504) and it has identically semantics to presto's SQL date_trunc, albeit with a timezone specialization. Please see details on presto's date_trunc function here: https://prestodb.github.io/docs/current/functions/datetime.html This is needed so that the presto's date_trunc invocations can be faithfully translated as is to this new function. Its a new function so that it is trivial to roll out to harmlessly without a lot of regression testing. Without this function, we cannot handle timezones nor week truncations in the existing pinot's dateTimeConvert function. It basically copies the PrestoDB code: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/operator/scalar/DateTimeFunctions.java, (specializations of date_trunc for TIMESTAMP and TIMESTAMP_WITH_TIME_ZONE I am even checking in the zone-index.properties used by presto to ensure that even the time zones are 1:1 b/w presto and this function. (sync'd to the latest prestodb/presto repo) Understanding this UDF requires knowledge of the joda-time API. I am not documenting this heavily since it is a copy of the Presto UDF.
Adding presto-pinot with pushdown
This presto-pinot connector works as follows:
maximal query to the pinot system.
while just filter queries are send to the servers parallelly. This is
useful when joining against another (say hive) table wherein the join
needs to be parallelized and segments from pinot (fetched directly from
the pinot servers) need to be joined with hive splits. If you wish to
not send any queries to the broker, please set the catalog session
property prefer_broker_queries to false.
we do at Uber, by setting the catalog property
forbid_segment_queries=false.
brokers and the table schema). You can either provide the controller
URI or the URI of the Muttley proxy (an RPC proxy mechanism we use at
Uber). It is also possible to use the pinot-rest-proxy instead of
going to the pinot-broker (again something we use for our test pinot
clusters at Uber).
In addition the pinot connector would split the filter expressions such
that expressions that can be pushed down to pinot are done so, and those
that aren't are handled in presto.
It handles the full range of pinot PQL and respects pinot's weirdnesses
to guarantee that all results are still in the expected SQL semantics.
Usage: See the newly added connector/pinot.rst documentation file.