-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predicate Pushdown for JDBC connector is too limited #4874
Comments
IIRC, MySQL doesn't support proper character encodings so you get different results depending on if Presto or MySQL executes the filter. I don't know if the same is true for Postgres, so it is might be possible to enable it for just that. Also, there may be some work around for the MySQL issues. |
Not sure exactly what it means, but MySQL does support unicode character encoding for one thing so it will be a very big win for people who have set their database up with sane / correct encoding. We have several big tables in MySQL and it's very painful that presto has to to table scan in most queries. How about filtering on date / timestamp column? |
We're adding support for those columns in #4842. |
We will need to test the character encoding stuff to see how it works. For example, if MySQL is using a non-Unicode charset like latin1, and you compare against a Unicode character, what does the MySQL JDBC driver when calling MySQL also does case insensitive comparisons by default, but that just means they will return extra rows that the Presto engine will filter out. The only issue is if there cases where we push down a filter that incorrectly filters out rows which should be returned. If you have ideas on how this could happen, please share them. One potential issue is trailing whitespace, such as with |
Thanks! Good to know that this is coming soon. You are right that trailing space is ignored when doing comparisons with Does #4842 support other column types as well? |
I'm not sure where the fix for this issue is on the roadmap, but I can say that this is the one deal-breaker feature missing from Presto for my company. I think it might be useful to allow the user to designate full query pushdown to a JDBC connector like the following:
In this case, the entire sub-query would be pushed down as-is to the connector, and only the requested results would be returned for the Join. You could even require some kind of explicit annotation or query hint to the Join clause such as "INNER JOIN PUSHDOWN (sub-query)". This might also help with resolving #4839 in that the aggregation happens entirely on the target database before being shipped to Presto for joining. Correct me if I'm wrong, but I think this is currently not the case, and that Presto usually just sends the connector a query like the following in Join scenario like the one above: "SELECT * FROM postgresql.web.sales". Apache Drill commits this same cardinal sin, and this is not feasible when dealing with large data sets that can't easily be moved over the wire. Edit: Cleaned up the sample query. |
Have you tried creating a view and selecting from it? That should allow a complex query to be executed in the remote database (assuming a view satisfies your requirements).
|
interesting -- Drill has this same weakness? I thought they had done a better job. I can confirm that the view as workaround is successful. But I too expected something a little more robust from pushdown for Presto. Don't want to set a team to go build that myself :/ but may have to. |
Hi @electrum @aromero-pm , |
The view needs to be created in the remote database, allowing that database to execute the query (such as an aggregation) rather than executing it in Presto.
A Presto view wouldn’t help as those are identical to writing a subquery.
|
Thanks @electrum , that makes sense. |
I think creating a view on the target database would be a good workaround for some use cases, but what I'm really after as a long-term enhancement would be a sort of "pass-through" SQL feature where you can use the target DB's native SQL syntax and have it sent as-is to the target. Presto would then retrieve the result set and pass it on to the next step in the execution plan. This would be a very powerful feature for large enterprises where you sometimes only have a SELECT-only user on some Oracle database that has the data you need, but you can't just have a view created on it. It also helps Presto leverage the target DB's optimizer to speed up its own execution as opposed to just "SELECT *" queries hitting a 100GB Oracle table. |
This kind of pass-through feature would be terrific, actually. I used to
work on a data federation/virtualization product (Composite Information
Server, which was acquired by Cisco a few years ago, then just recently
sold off to Tibco) and we had a feature like this that was a terrific
get-out-of-jail-free card for exactly the circumstance you describe. We had
great pushdown optimization that could figure out pretty good SQL for
foreign databases, but there were always those edge cases you needed a way
to handle, and you didn't necessarily have the option of creating a view on
the foreign system. We called them "packaged queries." You described the
schema you expected back, wrote the foreign SQL expression, and you were
off to the races. If Presto had that, it would provide some relief.
…-Antonio
On Fri, Dec 22, 2017 at 10:01 AM, g-r-u ***@***.***> wrote:
I think creating a view on the target database would be a good workaround
for some use cases, but what I'm really after as a long-term enhancement
would be a sort of "pass-through" SQL feature where you can use the target
DB's native SQL syntax and have it sent as-is to the target. Presto would
then retrieve the result set and pass it on to the next step in the
execution plan. This would be a very powerful feature for large enterprises
where you sometimes only have a SELECT-only user on some Oracle database
that has the data you need, but you can't just have a view created on it.
It also helps Presto leverage the target DB's optimizer to speed up its own
execution as opposed to just "SELECT *" queries hitting a 100GB Oracle
table.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4874 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AdSK_sXSFD6ownPE1q96VnecFYoMujCcks5tC-6KgaJpZM4H4pel>
.
|
Can I assume that this issue is about pushdown for other types that those mentioned as already supported? If so, then it can be closed (quite some time ago). |
From reading
com.facebook.presto.plugin.jdbc.QueryBuilder
it appears that predicate pushdown only works for BIGINT, DOUBLE and BOOLEAN column types. Is there any reason why other types e.g. varchar (string) are not supported?The text was updated successfully, but these errors were encountered: