Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use code generation for joins with filter function #5810

Closed

Conversation

losipiuk
Copy link
Contributor

@losipiuk losipiuk commented Aug 8, 2016

This PR extends the non-equality-outer-joins. It adds flow for execution joins with filter function using compiled version of PagesHashStrategy.

@martint
Copy link
Contributor

martint commented Aug 8, 2016

@erichwang, can you review?


constructor.comment("check if join filter function is passed only if field is generated");
if (joinFilterFunctionFieldOptional.isPresent()) {
constructor.append(invokeStatic(Preconditions.class, "checkArgument", void.class, ImmutableList.of(boolean.class, Object.class),
Copy link
Contributor

@erichwang erichwang Aug 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems weird that we are adding checkArgs into the compiled code since we are already dictating the exact usage of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - somewhat.
We generate the constructor. But the call is elsewhere. If this check is not present the code will fail when generated PagesHashStrategy.applyFilterFunction is called.
I can drop it if you like. Please confirm this or that way.

@erichwang
Copy link
Contributor

@martint, can you take a look at this too?

@ghost ghost added the CLA Signed label Aug 13, 2016
@losipiuk losipiuk force-pushed the non-equality-outer-joins-compiled branch from 9155409 to 1d292ac Compare August 16, 2016 13:01
@losipiuk
Copy link
Contributor Author

@martint , @erichwang comments addressed


getFilterFunctionMethod.getBody()
.append(invokeStatic(Optional.class, "empty", Optional.class))
.ret(Optional.class);
.append(constantBoolean(false))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.append(constantFalse().ret())

@haozhun
Copy link
Contributor

haozhun commented Aug 18, 2016

@losipiuk Make sure you see this comment: #5810 (comment). It's automatically hidden because the line is overwritten in a later commit.

@martint, @erichwang and I had a further discussion about where JoinFilterFunction should be put. It appears like it would make sense to put it in LookupJoinOperator (instead of PagesHashStrategy or LookupSource). We always instantiate LookupJoinOperator through JoinProbeCompiler.internalCompileJoinOperatorFactory. Therefore, having a virtual callsite in LookupJoinOperator should be ok. This way, it will benefit InMemoryJoinHash, generated LookupSource, and IndexLookupSource in one shot.

@haozhun
Copy link
Contributor

haozhun commented Aug 18, 2016

I'm removing myself from the reviewers for now because @erichwang originally wanted my help to look at code generation in this pull request. But it appears like this pull request requires quite some additional work on the other parts first. @erichwang can always add me back as necessary sometime later.

@haozhun haozhun removed their assignment Aug 18, 2016
@ghost ghost added the CLA Signed label Aug 18, 2016
@erichwang
Copy link
Contributor

After thinking about this a bit more, I think actually the LookupSource is the right place to do the filtering. In an ideal world for me, the filtering would happen in the LookupJoinOperator, but to do so would require that we have some way to lift the values out of the LookupSource and ProbeSource to do some filtering. In lieu of that, I think the current approach is the next best option.

The main thing that sticks out to me here is that the PagesHashStrategy compiles an optional filter function in and we have to deal with the optional even after the code is compiled (which seems silly). What do you guys think about having the filter function just default to a result of true if it does not exist? Then we can get rid of the "hasFilterFunction" method and just always assume that we have something that works. Any concerns on the performance impact here?

Finally, a lot of work has been done here for the InMemoryHashJoin path of the code. We should eventually get some of this work added to the IndexJoin path which also uses the LookupSource. Based on my investigation, I think we can do this easily by adjusting the implementations of IndexedData to also take the filter function and passing the probe data to it.

@fmeiser
Copy link
Contributor

fmeiser commented Sep 12, 2016

@losipiuk Yes.
It must not be modified all implementations, when you provide a class FilteringJoinProbe. Then the JoinProbeFactory can decide whether it implements the filter with a FilteringJoinProbe, the implementation supports a filter or compile the filter into the JoinProbe.
The generated code should consist of not more than one ifand some variables to call FilterFunction?

@ghost ghost added the CLA Signed label Sep 12, 2016
@losipiuk
Copy link
Contributor Author

Well @fmeiser. I don't know. It seems that what you say should work.
Though I am not 100% sure that it may be much nicer.
You would need to make changes to:

  • LookupJoinOperator.joinCurrentPosition so the call to lookupSource.getNextJoinPosition(joinPosition, probe.getPosition(), probe.getPage()); is proxied via JoinProbe.
  • we would need to pass build pages to JoinProbeFactory and currently LookupJoinOperatorhas no direct access to those.
  • The changes to generated JoinProbe would probably be not that huge, yet still.

I may be biased here a bit as I have limited resources to work on that :).
I would rather merge the good-enough solution. There is always place for improvements.

@dain your call

@ghost ghost added the CLA Signed label Sep 12, 2016
@dain
Copy link
Contributor

dain commented Sep 13, 2016

@losipiuk I would have InMemoryJoinHash take Optional<JoinFilterFunctionVerifier> and then remove EmptyJoinFilterFunctionVerifier. In the constructor unwrap Optional into a JoinFilterFunctionVerifier and do a null check in the loop. Add a comment that we unwrap for performance reasons.

@ghost ghost added the CLA Signed label Sep 13, 2016
@dain dain assigned losipiuk and unassigned dain Sep 13, 2016
@ghost ghost added the CLA Signed label Sep 13, 2016
@losipiuk losipiuk force-pushed the non-equality-outer-joins-compiled branch from 59fcccd to d9d2ff7 Compare September 15, 2016 10:32
@ghost ghost added the CLA Signed label Sep 15, 2016
@losipiuk
Copy link
Contributor Author

@dain. It should be good to go. Squashed and rebased.
I restarted travis build as one of the jobs failed. It looked unrelated.
Let's wait until it completes.

@losipiuk losipiuk assigned dain and unassigned losipiuk Sep 15, 2016
@ghost ghost added the CLA Signed label Sep 15, 2016
@dain
Copy link
Contributor

dain commented Sep 16, 2016

Merged, thanks!

@dain dain closed this Sep 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants