New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non equality predicates in outer join (v3) #5088
Non equality predicates in outer join (v3) #5088
Conversation
2240993
to
8fa487a
Compare
Have you benchmarked this? Without compiling the join expressions, it may be a big performance hit for queries that push the filter into the join. |
No - I did not. I expect that performance will not be great. Though the penalty will only apply to queries which would not execute at all before this change. |
@martint. Friendly ping. Please take a look at interpreted versions so we can proceed with making changes to class generators. |
@losipiuk, the general approach is fine. I'm a bit concerned about potential performance projects. For instance, in InMemoryJoinHash.getNextJoinPositionFrom (https://github.com/prestodb/presto/pull/5088/files#diff-cc58d9f8668f9f278dae5330c9e4d7a6R137), if that code path is hit from executions that have a filter function and ones that don't, the branch prediction may not work well and affect performance for regular joins. |
@martint It would be nice to benchmark that. This seems a bit hard to me though without actually running query set on real cluster. |
8fa487a
to
6460857
Compare
@martint I separated the implementations. (commit 195a56a). This patch uses inheritance. |
Add support for non-equi conjuncts in join condition for outer joins. Supported case is when conjunct is resolvable as a whole using symbols of inner join table. In such case conjunct is pushed down as filter to inner source relation.
Previous message was misleading because classification, if conjunct is supported or not, is more complext that checking if it is equality comparison or not. There are cases when specific equality conjuncts are not supported and cases when different types of conjuncts are supported.
This patch adds more generic support for any conjuncts in join condition in OUTER joins. Limitations: - no support in generated ProbeSource/PagesHashStrategy - there is requirement that at least one conjunct is equality based
6460857
to
1a57590
Compare
I rebased this onto master. |
@losipiuk, what parts changed compared to the last version? Are the changes mostly in "Separate filtering/non-filtering LookupSource implementations"? |
Can you remove the hotspot_pidXXXX.log files from this commit: edfdfa7? Github can't handle them and it refuses to show the diff. |
1a57590
to
34675f9
Compare
@martint. Apart from simple conflict resolutions the changes are in "Separate filtering/non-filtering LookupSource implementations" commit. It addresses your concern of branching within I removed the hotspot logs. Sorry about that :). |
Looking at this a bit closer, we may not actually need to split the two implementations. In JoinCompiler, we load the InMemoryJoinHash in an isolated classloader for each query, so profile pollution shouldn't happen. In any case, if we weren't doing that, this implementation wouldn't help much -- the "loop" is not part of the LookupSource, so the call would get virtualized for multiple implementations of that interface. Sorry about the churn. |
@martint thanks for clarification. Please let me know if I am getting this right. Was Or is the latter problem solved with some other classloader mangling? Or the Sorry for possibly dumb questions but I wanted to get better understanding on that. |
34675f9
to
6d274df
Compare
I removed the commit splitting the implementations. Is that good to be merged? |
Yes, the IsolatedClass mechanism is specifically to avoid virtual calls and profile pollution. It's a simple way to generate a class by copying the bytecodes from an existing class.
It doesn't matter. The entire join operator is generated in the same way, so the loops should be specialized for each query. |
Cool - thanks :) |
Btw @martint. We discovered a bug in presto (I think introduced by me - but I did not check git log ;) ). Queries like this one:
should be failing. |
Merged, thanks! |
Is there any plan to improve support for non-equity predicates in outer join? I tested it and the performance is actually worse than equity joins. I simply put a
Also, even though the right side of the JOIN only has one row, JOIN has a big performance impact on the query:
|
Thanks, looking forward to it. |
@losipiuk thanks, I'll look into it. Hope that it can be merged soon. |
Supersedes #4950