Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallelism control #228

Open
je-ik opened this issue Dec 21, 2017 · 1 comment
Open

Add parallelism control #228

je-ik opened this issue Dec 21, 2017 · 1 comment

Comments

@je-ik
Copy link
Contributor

je-ik commented Dec 21, 2017

After removing explicit partitioning, we have currently no explicit control over the parallelism of executing operators. This affects both batch and stream. There must be a way to give a hint to the translator that certain operation should be parallelized more or less than the input. Options are:

  • add a method to set parallelism of operator on executor - e.g.
       Executor executor = ...;
       executor.withParallelism("OPERATOR_NAME", 100).submit(flow);
  • add downstream parallelism hint to shuffle operators, e.g.
      ReduceByKey.of(...)
          .keyBy(...)
          ....
          .withHint(Parallelism.of(100));
  • some other option?
@dmvk
Copy link
Contributor

dmvk commented Dec 21, 2017

I think we should never set explicit parallelism, instead we should hint operator with the percentual estimate of increase / decrease in data size, so we can decide parallelism based on the input data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants