Predicate Pushdown for JDBC connector is too limited #4874

mtrbean · 2016-03-25T09:29:11Z

From reading com.facebook.presto.plugin.jdbc.QueryBuilder it appears that predicate pushdown only works for BIGINT, DOUBLE and BOOLEAN column types. Is there any reason why other types e.g. varchar (string) are not supported?

The text was updated successfully, but these errors were encountered:

dain · 2016-03-25T15:19:04Z

IIRC, MySQL doesn't support proper character encodings so you get different results depending on if Presto or MySQL executes the filter. I don't know if the same is true for Postgres, so it is might be possible to enable it for just that. Also, there may be some work around for the MySQL issues.

mtrbean · 2016-03-25T17:53:43Z

Not sure exactly what it means, but MySQL does support unicode character encoding for one thing so it will be a very big win for people who have set their database up with sane / correct encoding. We have several big tables in MySQL and it's very painful that presto has to to table scan in most queries.

How about filtering on date / timestamp column?

electrum · 2016-03-25T20:56:26Z

We're adding support for those columns in #4842.

electrum · 2016-03-25T21:07:27Z

We will need to test the character encoding stuff to see how it works. For example, if MySQL is using a non-Unicode charset like latin1, and you compare against a Unicode character, what does the MySQL JDBC driver when calling setString() with that Unicode value? As long as the predicate returns false, that's correct behavior, since it would be impossible for the MySQL column to hold that value.

MySQL also does case insensitive comparisons by default, but that just means they will return extra rows that the Presto engine will filter out.

The only issue is if there cases where we push down a filter that incorrectly filters out rows which should be returned. If you have ideas on how this could happen, please share them.

One potential issue is trailing whitespace, such as with char columns, since those are space padded to the declared length. However, if I remember correctly, trailing space is ignored for comparisons, so it should work.

mtrbean · 2016-03-25T22:37:34Z

Thanks! Good to know that this is coming soon. You are right that trailing space is ignored when doing comparisons with char columns.

Does #4842 support other column types as well?

g-r-u · 2017-07-02T03:13:14Z

I'm not sure where the fix for this issue is on the roadmap, but I can say that this is the one deal-breaker feature missing from Presto for my company. I think it might be useful to allow the user to designate full query pushdown to a JDBC connector like the following:

SELECT a.customer_id,
       a.customer_name,
       b.sales_count,
       b.sales_total
FROM hive.web.customers a
   INNER JOIN (
      SELECT customer_id,
             COUNT(sale_id) AS sales_count,
             SUM(sale_amount) AS sales_total
      FROM postgresql.web.sales
      WHERE sale_status = 'C'
      GROUP BY customer_id) AS b
    ON a.customer_id = b.customer_id
WHERE a.customer_id >= 256
ORDER BY b.sales_total DESC;

In this case, the entire sub-query would be pushed down as-is to the connector, and only the requested results would be returned for the Join. You could even require some kind of explicit annotation or query hint to the Join clause such as "INNER JOIN PUSHDOWN (sub-query)". This might also help with resolving #4839 in that the aggregation happens entirely on the target database before being shipped to Presto for joining.

Correct me if I'm wrong, but I think this is currently not the case, and that Presto usually just sends the connector a query like the following in Join scenario like the one above: "SELECT * FROM postgresql.web.sales". Apache Drill commits this same cardinal sin, and this is not feasible when dealing with large data sets that can't easily be moved over the wire.

Edit: Cleaned up the sample query.

electrum · 2017-07-02T06:48:23Z

Have you tried creating a view and selecting from it? That should allow a complex query to be executed in the remote database (assuming a view satisfies your requirements).

antonioromero-pm · 2017-10-07T07:55:25Z

interesting -- Drill has this same weakness? I thought they had done a better job.

I can confirm that the view as workaround is successful. But I too expected something a little more robust from pushdown for Presto. Don't want to set a team to go build that myself :/ but may have to.

guangyy · 2017-11-29T07:29:44Z

Hi @electrum @aromero-pm ,
Could you elaborate how creating views can help? My understanding of view is that it creates a shortcut of subquery, and by the time of execution, it replace the view with its real subquery. If that is the case, how does the push down works? Thanks!

electrum · 2017-11-29T08:07:04Z

The view needs to be created in the remote database, allowing that database to execute the query (such as an aggregation) rather than executing it in Presto. A Presto view wouldn’t help as those are identical to writing a subquery.

guangyy · 2017-11-29T23:32:26Z

Thanks @electrum , that makes sense.

g-r-u · 2017-12-22T18:01:24Z

I think creating a view on the target database would be a good workaround for some use cases, but what I'm really after as a long-term enhancement would be a sort of "pass-through" SQL feature where you can use the target DB's native SQL syntax and have it sent as-is to the target. Presto would then retrieve the result set and pass it on to the next step in the execution plan. This would be a very powerful feature for large enterprises where you sometimes only have a SELECT-only user on some Oracle database that has the data you need, but you can't just have a view created on it. It also helps Presto leverage the target DB's optimizer to speed up its own execution as opposed to just "SELECT *" queries hitting a 100GB Oracle table.

antonioromero-pm · 2017-12-22T19:17:52Z

This kind of pass-through feature would be terrific, actually. I used to work on a data federation/virtualization product (Composite Information Server, which was acquired by Cisco a few years ago, then just recently sold off to Tibco) and we had a feature like this that was a terrific get-out-of-jail-free card for exactly the circumstance you describe. We had great pushdown optimization that could figure out pretty good SQL for foreign databases, but there were always those edge cases you needed a way to handle, and you didn't necessarily have the option of creating a view on the foreign system. We called them "packaged queries." You described the schema you expected back, wrote the foreign SQL expression, and you were off to the races. If Presto had that, it would provide some relief.

…

-Antonio

On Fri, Dec 22, 2017 at 10:01 AM, g-r-u ***@***.***> wrote: I think creating a view on the target database would be a good workaround for some use cases, but what I'm really after as a long-term enhancement would be a sort of "pass-through" SQL feature where you can use the target DB's native SQL syntax and have it sent as-is to the target. Presto would then retrieve the result set and pass it on to the next step in the execution plan. This would be a very powerful feature for large enterprises where you sometimes only have a SELECT-only user on some Oracle database that has the data you need, but you can't just have a view created on it. It also helps Presto leverage the target DB's optimizer to speed up its own execution as opposed to just "SELECT *" queries hitting a 100GB Oracle table. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4874 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AdSK_sXSFD6ownPE1q96VnecFYoMujCcks5tC-6KgaJpZM4H4pel> .

findepi · 2018-03-29T20:43:42Z

From reading com.facebook.presto.plugin.jdbc.QueryBuilder it appears that predicate pushdown only works for BIGINT, DOUBLE and BOOLEAN column types. Is there any reason why other types e.g. varchar (string) are not supported?

Can I assume that this issue is about pushdown for other types that those mentioned as already supported? If so, then it can be closed (quite some time ago).
Please create specific issues for any other requirements/expectations towards pushdown (or find, they probably already exist)

findepi closed this as completed Mar 29, 2018

tooptoop4 mentioned this issue Apr 1, 2020

Allow connectors to participate in query optimization trinodb/trino#18

Open

26 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicate Pushdown for JDBC connector is too limited #4874

Predicate Pushdown for JDBC connector is too limited #4874

mtrbean commented Mar 25, 2016

dain commented Mar 25, 2016

mtrbean commented Mar 25, 2016

electrum commented Mar 25, 2016

electrum commented Mar 25, 2016

mtrbean commented Mar 25, 2016

g-r-u commented Jul 2, 2017 •

edited

Loading

electrum commented Jul 2, 2017 via email

antonioromero-pm commented Oct 7, 2017

guangyy commented Nov 29, 2017

electrum commented Nov 29, 2017 via email

guangyy commented Nov 29, 2017

g-r-u commented Dec 22, 2017

antonioromero-pm commented Dec 22, 2017 via email

findepi commented Mar 29, 2018

Predicate Pushdown for JDBC connector is too limited #4874

Predicate Pushdown for JDBC connector is too limited #4874

Comments

mtrbean commented Mar 25, 2016

dain commented Mar 25, 2016

mtrbean commented Mar 25, 2016

electrum commented Mar 25, 2016

electrum commented Mar 25, 2016

mtrbean commented Mar 25, 2016

g-r-u commented Jul 2, 2017 • edited Loading

electrum commented Jul 2, 2017 via email

antonioromero-pm commented Oct 7, 2017

guangyy commented Nov 29, 2017

electrum commented Nov 29, 2017 via email

guangyy commented Nov 29, 2017

g-r-u commented Dec 22, 2017

antonioromero-pm commented Dec 22, 2017 via email

findepi commented Mar 29, 2018

g-r-u commented Jul 2, 2017 •

edited

Loading