Indexed columns predicate push down to plugin level query #1097

Closed
alexliu68 opened this Issue Mar 4, 2014 · 6 comments

Projects

None yet

5 participants

Contributor

Current predicate on indexed column is not pushed down to connector level. We should provide some API to push it down to connector. Hive/Pig has indexed column predicate push down feature which push down the query to us connector level indexes. e.g. Cassandra's secondary index on columns

Contributor

Did you need to access the whole spectrum of possible predicates on indexed columns? Or only range predicates? Currently we pass in range predicate information to the connectors when generating partitions, and thereby splits from partitions.

Contributor

For a Cassandra table example.

CREATE TABLE test(key_id int primary key, b int );
CREATE INDEX index_b on test(b);

Presto query select * from test where b =100;

Current presto retrieves all partitions based on primary key key_id. The following query is the final query pushed down to Cassandra connector

select * from test where token(key_id) > > [start_token] and token(key_id) < [end_token]

It then filters out result by b =100.

If we can push down the indexed column predicate query to Cassandra, we only need select the partitions by the following query

select * from test where b = 1000 and token(key_id) > [start_token] and token(key_id) < [end_token]

Can you elaborate how range predicate push down work?

Contributor

I believe the range predicate push down already does this for you. But it may have to be supported properly in the Cassandra connector. If you go to the ConnectorSplitManager class, you will see that when you getPartitions, a TupleDomain is provided that defines all of the range predicates that Presto was able to extract/infer from the query syntax. At this point the Cassandra connector should be able to generate partitions that are aware of these ranges and then only generate splits that respect these predicates (so as to not to produce unnecessary data).

smorin commented Mar 4, 2014

Is anyone working on a mysql connector or know of anyone working on one? Is there any information on building connectors?

Contributor

ahh, I will check out TupleDomain object to update Cassandra connector.

looks like this will fix #1033

@cberner cberner closed this Apr 20, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment