Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate WorkProcessor in operators #12096

Closed
1 of 3 tasks
sopel39 opened this issue Dec 18, 2018 · 3 comments
Closed
1 of 3 tasks

Incorporate WorkProcessor in operators #12096

sopel39 opened this issue Dec 18, 2018 · 3 comments
Labels

Comments

@sopel39
Copy link
Contributor

sopel39 commented Dec 18, 2018

Issue for effort to support:

  • cross operator lazy pages (starting from source operators)
  • cleanup/simplify contract between operators via WorkProcessor pipelines
  • provide base for further improvements (e.g: on stack rows without Page materialization, Graal)

The advantage of cross operator lazy pages is that we can avoid IO when queries are highly selective. This requires that significant processing happens in source stage, but this becomes more and more the case with improvements like CBO ("broadcast joins") or grouped execution.

Stages are:

  • Stage 1
  • base PageProcessor on WorkProcessor
  • Stage 2
  • internally base ScanFilterAndProject on WorkProcessor. The pipeline would look like follows:
split singleton -> [flatMap] -> pages source
                -> [transform] -> page processor 
                -> [transform] -> merge pages

or if split is cursor based

split singleton -> [flatMap] -> cursor source -> [transform] -> merge pages
  • internally base FilterAndProject on WorkProcessor. The pipeline would look like follows:
page buffer -> [transform] -> page processor -> [transform] -> [merge pages]
  • Stage 3
  • create interface for operators that are based on WorkProcessor pipelines
  • create standarized abstract operator class for operators that internally are based on WorkProcessor pipelines
  • combine operators that are based on WorkProcessors via dedicated "gluing" operator
  • base TopNOperator on WorkProcessor pipelines (fast data exploration!)
@nezihyigitbasi
Copy link
Contributor

provide base for further improvements (e.g: on stack rows without Page materialization, Graal)

Can you give details about the "Graal" plans?

@sopel39
Copy link
Contributor Author

sopel39 commented Dec 22, 2018

Can you give details about the "Graal" plans?

Work processor provides transformation method:
WorkProcessor#transform
Let's suppose that you have chain of Page transformations, e.g:

WorkProcessor<Page> processor1 = ...;
WorkProcessor<Page> processor2 = processor1.transform(transformation1);
WorkProcessor<Page> processor3 = processor2.transform(transformation2);
...

One can observe that we can compile such chain of Page transformation into a tight loop that doesn't materialize intermediate results. Please checkout paper: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf and project: https://hyper-db.de/.

In order to generate such tight loop one can extend WorkProcessor#transform so that it can generate optimized bytedcode (using existing airlift bytecode framework), e.g:

static <Page, Page> WorkProcessor<Page> transform(
  WorkProcessor<Page> processor,
  Transformation<Page, Page> transformation)
{
  ...
  if (transformation instanceof BytecodeRowTransformation) {
   // generate tight loop
  } else {
   // proceed with intermediate pages materialization
  }
}

interface BytecodeRowTransformation extends Transformation<Page, Page> {
  BytecodeExpression generateTransformation(BytecodeTransformationContext context);
}

interface BytecodeTransformationContext {
  ..
  // transformation result bytecode 
  BytecodeExpression needsMoreData();
  BytecodeExpression producedResult()
  ..
  // input row channels getter bytecode
  BytecodeExpression getChannel(int channel);
  BytecodeExpression isNull(int channel);
  ..
  // output row channel bytecode setters
  void defineChannel(int channel, Supplier<BytecodeExpression> definition);
  void defineIsNull(int channel, Supplier<BytecodeExpression> definition);
  ..
}

BytecodeRowTransformation#generateTransformation would generate bytecode of transformation (using BytecodeTransformationContext to consume input/produce output within generated code).

However generating bytecode is really cumbersome and error prone. Truffle/Graal provides a nice abstraction for creating highly performant interpreters which we could also utilize to generate maintainable and readable WorkProcessor transformations (tutorial on using Truffle: http://cesquivias.github.io/blog/2014/12/02/writing-a-language-in-truffle-part-2-using-truffle-and-graal/). In such case we won't be using BytecodeExpression but much more friendlier classes and annotations mixed with normal type-safe Java code, e.g:

interface TruffleRowTransformation extends Transformation<Page, Page> {
  TruffleNode generateTransformation(TruffleTransformationContext context);
}

interface TruffleTransformationContext {
  ..
  // similar methods as in BytecodeTransformationContext, but using truffle node classes
}

Some notes:

  1. WorkProcessor transformations are functional, so one could actually create a language interpreter for them, e.g:
transform(
  transform(
    processor,
    context -> python transformation),
  context -> java transformation)
  1. Truffle/Graal and WorkProcessor abstraction enables us to use other languages for transformations (e.g: Python). For instance we could implement table functions where such functions are written in non-Java languages, but are JITed into tight loop with Java code.

This is just a draft and I still need to play more with Truffle/Graal in order to obtain more details.

@stale
Copy link

stale bot commented Dec 24, 2020

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Dec 24, 2020
@stale stale bot closed this as completed Dec 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants