Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicate Pushdown for JDBC connector is too limited #4874

Closed
mtrbean opened this issue Mar 25, 2016 · 14 comments
Closed

Predicate Pushdown for JDBC connector is too limited #4874

mtrbean opened this issue Mar 25, 2016 · 14 comments

Comments

@mtrbean
Copy link

mtrbean commented Mar 25, 2016

From reading com.facebook.presto.plugin.jdbc.QueryBuilder it appears that predicate pushdown only works for BIGINT, DOUBLE and BOOLEAN column types. Is there any reason why other types e.g. varchar (string) are not supported?

@dain
Copy link
Contributor

dain commented Mar 25, 2016

IIRC, MySQL doesn't support proper character encodings so you get different results depending on if Presto or MySQL executes the filter. I don't know if the same is true for Postgres, so it is might be possible to enable it for just that. Also, there may be some work around for the MySQL issues.

@mtrbean
Copy link
Author

mtrbean commented Mar 25, 2016

Not sure exactly what it means, but MySQL does support unicode character encoding for one thing so it will be a very big win for people who have set their database up with sane / correct encoding. We have several big tables in MySQL and it's very painful that presto has to to table scan in most queries.

How about filtering on date / timestamp column?

@electrum
Copy link
Contributor

We're adding support for those columns in #4842.

@electrum
Copy link
Contributor

We will need to test the character encoding stuff to see how it works. For example, if MySQL is using a non-Unicode charset like latin1, and you compare against a Unicode character, what does the MySQL JDBC driver when calling setString() with that Unicode value? As long as the predicate returns false, that's correct behavior, since it would be impossible for the MySQL column to hold that value.

MySQL also does case insensitive comparisons by default, but that just means they will return extra rows that the Presto engine will filter out.

The only issue is if there cases where we push down a filter that incorrectly filters out rows which should be returned. If you have ideas on how this could happen, please share them.

One potential issue is trailing whitespace, such as with char columns, since those are space padded to the declared length. However, if I remember correctly, trailing space is ignored for comparisons, so it should work.

@mtrbean
Copy link
Author

mtrbean commented Mar 25, 2016

Thanks! Good to know that this is coming soon. You are right that trailing space is ignored when doing comparisons with char columns.

Does #4842 support other column types as well?

@g-r-u
Copy link

g-r-u commented Jul 2, 2017

I'm not sure where the fix for this issue is on the roadmap, but I can say that this is the one deal-breaker feature missing from Presto for my company. I think it might be useful to allow the user to designate full query pushdown to a JDBC connector like the following:

SELECT a.customer_id,
       a.customer_name,
       b.sales_count,
       b.sales_total
FROM hive.web.customers a
   INNER JOIN (
      SELECT customer_id,
             COUNT(sale_id) AS sales_count,
             SUM(sale_amount) AS sales_total
      FROM postgresql.web.sales
      WHERE sale_status = 'C'
      GROUP BY customer_id) AS b
    ON a.customer_id = b.customer_id
WHERE a.customer_id >= 256
ORDER BY b.sales_total DESC;

In this case, the entire sub-query would be pushed down as-is to the connector, and only the requested results would be returned for the Join. You could even require some kind of explicit annotation or query hint to the Join clause such as "INNER JOIN PUSHDOWN (sub-query)". This might also help with resolving #4839 in that the aggregation happens entirely on the target database before being shipped to Presto for joining.

Correct me if I'm wrong, but I think this is currently not the case, and that Presto usually just sends the connector a query like the following in Join scenario like the one above: "SELECT * FROM postgresql.web.sales". Apache Drill commits this same cardinal sin, and this is not feasible when dealing with large data sets that can't easily be moved over the wire.

Edit: Cleaned up the sample query.

@electrum
Copy link
Contributor

electrum commented Jul 2, 2017 via email

@antonioromero-pm
Copy link

interesting -- Drill has this same weakness? I thought they had done a better job.

I can confirm that the view as workaround is successful. But I too expected something a little more robust from pushdown for Presto. Don't want to set a team to go build that myself :/ but may have to.

@guangyy
Copy link

guangyy commented Nov 29, 2017

Hi @electrum @aromero-pm ,
Could you elaborate how creating views can help? My understanding of view is that it creates a shortcut of subquery, and by the time of execution, it replace the view with its real subquery. If that is the case, how does the push down works? Thanks!

@electrum
Copy link
Contributor

electrum commented Nov 29, 2017 via email

@guangyy
Copy link

guangyy commented Nov 29, 2017

Thanks @electrum , that makes sense.

@g-r-u
Copy link

g-r-u commented Dec 22, 2017

I think creating a view on the target database would be a good workaround for some use cases, but what I'm really after as a long-term enhancement would be a sort of "pass-through" SQL feature where you can use the target DB's native SQL syntax and have it sent as-is to the target. Presto would then retrieve the result set and pass it on to the next step in the execution plan. This would be a very powerful feature for large enterprises where you sometimes only have a SELECT-only user on some Oracle database that has the data you need, but you can't just have a view created on it. It also helps Presto leverage the target DB's optimizer to speed up its own execution as opposed to just "SELECT *" queries hitting a 100GB Oracle table.

@antonioromero-pm
Copy link

antonioromero-pm commented Dec 22, 2017 via email

@findepi
Copy link
Contributor

findepi commented Mar 29, 2018

From reading com.facebook.presto.plugin.jdbc.QueryBuilder it appears that predicate pushdown only works for BIGINT, DOUBLE and BOOLEAN column types. Is there any reason why other types e.g. varchar (string) are not supported?

Can I assume that this issue is about pushdown for other types that those mentioned as already supported? If so, then it can be closed (quite some time ago).
Please create specific issues for any other requirements/expectations towards pushdown (or find, they probably already exist)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants