New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use code generation for joins with filter function #5810
Use code generation for joins with filter function #5810
Conversation
@erichwang, can you review? |
|
||
constructor.comment("check if join filter function is passed only if field is generated"); | ||
if (joinFilterFunctionFieldOptional.isPresent()) { | ||
constructor.append(invokeStatic(Preconditions.class, "checkArgument", void.class, ImmutableList.of(boolean.class, Object.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems weird that we are adding checkArgs into the compiled code since we are already dictating the exact usage of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - somewhat.
We generate the constructor. But the call is elsewhere. If this check is not present the code will fail when generated PagesHashStrategy.applyFilterFunction
is called.
I can drop it if you like. Please confirm this or that way.
@martint, can you take a look at this too? |
9155409
to
1d292ac
Compare
@martint , @erichwang comments addressed |
|
||
getFilterFunctionMethod.getBody() | ||
.append(invokeStatic(Optional.class, "empty", Optional.class)) | ||
.ret(Optional.class); | ||
.append(constantBoolean(false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.append(constantFalse().ret())
@losipiuk Make sure you see this comment: #5810 (comment). It's automatically hidden because the line is overwritten in a later commit. @martint, @erichwang and I had a further discussion about where JoinFilterFunction should be put. It appears like it would make sense to put it in LookupJoinOperator (instead of PagesHashStrategy or LookupSource). We always instantiate |
I'm removing myself from the reviewers for now because @erichwang originally wanted my help to look at code generation in this pull request. But it appears like this pull request requires quite some additional work on the other parts first. @erichwang can always add me back as necessary sometime later. |
After thinking about this a bit more, I think actually the LookupSource is the right place to do the filtering. In an ideal world for me, the filtering would happen in the LookupJoinOperator, but to do so would require that we have some way to lift the values out of the LookupSource and ProbeSource to do some filtering. In lieu of that, I think the current approach is the next best option. The main thing that sticks out to me here is that the PagesHashStrategy compiles an optional filter function in and we have to deal with the optional even after the code is compiled (which seems silly). What do you guys think about having the filter function just default to a result of true if it does not exist? Then we can get rid of the "hasFilterFunction" method and just always assume that we have something that works. Any concerns on the performance impact here? Finally, a lot of work has been done here for the InMemoryHashJoin path of the code. We should eventually get some of this work added to the IndexJoin path which also uses the LookupSource. Based on my investigation, I think we can do this easily by adjusting the implementations of IndexedData to also take the filter function and passing the probe data to it. |
@losipiuk Yes. |
Well @fmeiser. I don't know. It seems that what you say should work.
I may be biased here a bit as I have limited resources to work on that :). @dain your call |
@losipiuk I would have |
59fcccd
to
d9d2ff7
Compare
d9d2ff7
to
593eb3e
Compare
@dain. It should be good to go. Squashed and rebased. |
Merged, thanks! |
This PR extends the non-equality-outer-joins. It adds flow for execution joins with filter function using compiled version of PagesHashStrategy.