New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RexExecutor to evaluate projections and filters #198

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@rmetzger
Contributor

rmetzger commented Mar 23, 2014

Hi,

I was asking on the mailing list if there is a way to use the existing code-generation infrastructure to evaluate projections and filters: https://groups.google.com/forum/#!topic/optiq-dev/-97ONPmDTB4.

This pull request contains my first approach to change the Rex* components to allow that.

My change includes the following:

  • Decoupling of code generation, compilation and execution (to allow efficient execution). I therefore introduced the RexExecutable class that allows to reduce() and execute() (given a DataContext)
  • I also changed the DataContext interface to allow direct access (using an int-index) to the data (instead of using a HashMap)
  • The most important change is that I introduced a isExternal-flag on the RexInputRef which indicates that the Ref refers to a variable accessible from the DataContext.

I'm very happy about any feedback regarding my pull request. I know its certainly not the cleanest approach to implement this feature. But I didn't want to spend too much time working on it without any feedback. (Maybe adding something like a RexContextRef is cleaner? (I probably have to shuttle through replacing all InputRefs by RexContextRefs then))

I know that the pull request is a bit messed up (4 commits and I accidentally changed some code formattings while working with (/fighting against) the checkstyle plugin. Once I got feedback, I'll clean those things up and update the PR (using force push).

@julianhyde

This comment has been minimized.

Show comment
Hide comment
@julianhyde

julianhyde Mar 23, 2014

Owner

I like the general idea of what you are doing. It might help someone write an interpreter (i.e. implementations of FilterRel, ProjectRel etc. that don't require code generation). Maybe that's your goal too.

A further step could be to create an interpreter for a RexProgram. It computes several expressions at the same time, allows common sub-expressions, short-cutting if there is a filter expression. That interpreter could be made quite efficient.

Regarding RexInputRef.isExternal. Have you looked at other sub-types of RexVariable, e.g. RexSlot and RexDynamicParam? Their type is a big clue to the implementor that it should implement them in a different way than looking in the input row.

Is it really what you want, to store data values in DataContext? It's neither type-safe nor thread-safe. Would it be better if we generated a method and you could invoke it with parameters? (The code-generator could be smart enough to convert variable references into parameters.) We can look at other means for getting values in and values out of the generated code, too.

That said, I'm OK with the idea of adding numbered slots into DataContext, if it's really what you want.

Thanks for submitting an early draft for feedback. It's generally useful feature, and I think we can get this into a shape where I'd accept it.

Owner

julianhyde commented Mar 23, 2014

I like the general idea of what you are doing. It might help someone write an interpreter (i.e. implementations of FilterRel, ProjectRel etc. that don't require code generation). Maybe that's your goal too.

A further step could be to create an interpreter for a RexProgram. It computes several expressions at the same time, allows common sub-expressions, short-cutting if there is a filter expression. That interpreter could be made quite efficient.

Regarding RexInputRef.isExternal. Have you looked at other sub-types of RexVariable, e.g. RexSlot and RexDynamicParam? Their type is a big clue to the implementor that it should implement them in a different way than looking in the input row.

Is it really what you want, to store data values in DataContext? It's neither type-safe nor thread-safe. Would it be better if we generated a method and you could invoke it with parameters? (The code-generator could be smart enough to convert variable references into parameters.) We can look at other means for getting values in and values out of the generated code, too.

That said, I'm OK with the idea of adding numbered slots into DataContext, if it's really what you want.

Thanks for submitting an early draft for feedback. It's generally useful feature, and I think we can get this into a shape where I'd accept it.

@rmetzger

This comment has been minimized.

Show comment
Hide comment
@rmetzger

rmetzger Mar 24, 2014

Contributor

Hi Julian,
thank you very much for the feedback!
I'm currently using the code to evaluate the conditions in FilterRel and the projections in ProjectRel (and it's working pretty good).
I guess the other projects (Drill, Lingual) implemented their own infrastructure to evaluate these.

I stumbled across the RexProgam. If I saw it correctly, its used with the CalcRel.
I'll look into the CalcRel at a later stage. My current goal is to get some simple TPC-H queries running (and the JDBC interface).

For the DataContext: I think its impossible to achieve type-safety for generated code. Generating methods that receive their runtime values via parameters is certainly doable. I'm just not sure how difficult it would be to invoke these (dynamic) methods. I can probably use reflection for that, but I guess this approach has no advantage over my current approach.
I think thread-safety is not a problem since I create a special DataContext for this purpose (and one for each thread).
I added the numbered slots into the DataContext to allow very fast lookups. A hashmap is okay, but I would need one lookup to store and one lookup to retrieve the data for each value in each tuple. With an integer-index, I can use an array for that (avoiding the hashing and hash-collisions).

Please give me a few days to rework the RexInputRef.isExternal, I'm not working full-time on the SQL-related stuff right now.

Contributor

rmetzger commented Mar 24, 2014

Hi Julian,
thank you very much for the feedback!
I'm currently using the code to evaluate the conditions in FilterRel and the projections in ProjectRel (and it's working pretty good).
I guess the other projects (Drill, Lingual) implemented their own infrastructure to evaluate these.

I stumbled across the RexProgam. If I saw it correctly, its used with the CalcRel.
I'll look into the CalcRel at a later stage. My current goal is to get some simple TPC-H queries running (and the JDBC interface).

For the DataContext: I think its impossible to achieve type-safety for generated code. Generating methods that receive their runtime values via parameters is certainly doable. I'm just not sure how difficult it would be to invoke these (dynamic) methods. I can probably use reflection for that, but I guess this approach has no advantage over my current approach.
I think thread-safety is not a problem since I create a special DataContext for this purpose (and one for each thread).
I added the numbered slots into the DataContext to allow very fast lookups. A hashmap is okay, but I would need one lookup to store and one lookup to retrieve the data for each value in each tuple. With an integer-index, I can use an array for that (avoiding the hashing and hash-collisions).

Please give me a few days to rework the RexInputRef.isExternal, I'm not working full-time on the SQL-related stuff right now.

@julianhyde

This comment has been minimized.

Show comment
Hide comment
@julianhyde

julianhyde Mar 24, 2014

Owner

I now think you're right to use RexInputRef. If you're implementing ProjectRel and FilterRel, then you'll want to access "fields of the incoming record", and that is precisely what RexInputRef is for.

I don't think you should add isExternal. I think you should call RexToLixTranslator.translateProjects with a custom implementation of InputGetter.

Also, this isn't such a big deal, but I think we can avoid making changes to DataContext. Rather than storing all of the input fields in separately, we can store the array, and the array only needs to be accessed once. In your code:

RelDataType inputRowType;
DataContext dataContext;
Object[] slots = new Object[inputRowType.getFieldCount()];
dataContext.set("inputRecord", slots);

Inside generated code:

final DataContext root;
Integer x = (Integer) ((Object[]) root.get("inputRecord"))[0];
Integer y = (Integer) ((Object[]) root.get("inputRecord"))[1];
Integer result = x + y;

and later we can optimize by generating

final Object[] slots = (Object[]) root.get("inputRecord");
Owner

julianhyde commented Mar 24, 2014

I now think you're right to use RexInputRef. If you're implementing ProjectRel and FilterRel, then you'll want to access "fields of the incoming record", and that is precisely what RexInputRef is for.

I don't think you should add isExternal. I think you should call RexToLixTranslator.translateProjects with a custom implementation of InputGetter.

Also, this isn't such a big deal, but I think we can avoid making changes to DataContext. Rather than storing all of the input fields in separately, we can store the array, and the array only needs to be accessed once. In your code:

RelDataType inputRowType;
DataContext dataContext;
Object[] slots = new Object[inputRowType.getFieldCount()];
dataContext.set("inputRecord", slots);

Inside generated code:

final DataContext root;
Integer x = (Integer) ((Object[]) root.get("inputRecord"))[0];
Integer y = (Integer) ((Object[]) root.get("inputRecord"))[1];
Integer result = x + y;

and later we can optimize by generating

final Object[] slots = (Object[]) root.get("inputRecord");
@rmetzger

This comment has been minimized.

Show comment
Hide comment
@rmetzger

rmetzger Apr 1, 2014

Contributor

Hi Julian, thank you very much for your guidance!
I updated the pull request to reflect your suggestions.

I would prefer to do the optimizations you've suggested at a later point since I'm still in a "prototyping" phase for my implementation and I want to focus on other parts of the implementation.

Contributor

rmetzger commented Apr 1, 2014

Hi Julian, thank you very much for your guidance!
I updated the pull request to reflect your suggestions.

I would prefer to do the optimizations you've suggested at a later point since I'm still in a "prototyping" phase for my implementation and I want to focus on other parts of the implementation.

@julianhyde julianhyde changed the title from [Draft] Use RexExecutor to evaluate projections and filters to Use RexExecutor to evaluate projections and filters Apr 1, 2014

@julianhyde julianhyde closed this in 7cbd302 Apr 1, 2014

@julianhyde

This comment has been minimized.

Show comment
Hide comment
@julianhyde

julianhyde Apr 1, 2014

Owner

Looks good -- I have committed to master. Glad to see that using InputGetter did the trick.

Let me know whether you approve of my "tidying up" in the following commit. In particular removing RexExecutor.rexBuilder.

Owner

julianhyde commented Apr 1, 2014

Looks good -- I have committed to master. Glad to see that using InputGetter did the trick.

Let me know whether you approve of my "tidying up" in the following commit. In particular removing RexExecutor.rexBuilder.

@julianhyde

This comment has been minimized.

Show comment
Hide comment
@julianhyde

julianhyde Apr 1, 2014

Owner

By the way, best of luck with the Stratosphere incubation!

Owner

julianhyde commented Apr 1, 2014

By the way, best of luck with the Stratosphere incubation!

@rmetzger

This comment has been minimized.

Show comment
Hide comment
@rmetzger

rmetzger Apr 3, 2014

Contributor

Thanks for merging my pull request. Your cleanup is fine. And yeah, the incubation is very exciting.

Contributor

rmetzger commented Apr 3, 2014

Thanks for merging my pull request. Your cleanup is fine. And yeah, the incubation is very exciting.

ldming pushed a commit to ldming/mycalcite that referenced this pull request Sep 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment