New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalThreadExecutor can use more threads for output plugins #232

Open
frsyuki opened this Issue Jul 8, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@frsyuki
Contributor

frsyuki commented Jul 8, 2015

Some input plugins don't support parallel processing. Or input plugins can't run using multiple threads under certain situation (e.g. data source is a big single file).

In those cases, we want to use more threads than number of input threads.
Since v0.6.0, number of threads is controlled by executor plugin. Executor plugins can use different number of threads for inputs and outputs. Idea here is to use larger number of threads for output.

A difficulty of implementation is dispatching of pages to output plugins. It must be deterministic. For example, use round-robin per input task:

  • input task 1: page 1 -> output task 1, page 2 -> output task 2, page 3 -> output task 3, ...
  • input task 2: page 1 -> output task 2, page 2 -> output task 3, page 3 -> output task 4, ...
  • input task 3: page 1 -> output task 3, page 2 -> output task 4, page 3 -> output task 5, ...
@daledude

This comment has been minimized.

Show comment
Hide comment
@daledude

daledude Sep 10, 2015

Is this only for the fileinput or a general design for having parallism for all plugins?

daledude commented Sep 10, 2015

Is this only for the fileinput or a general design for having parallism for all plugins?

@frsyuki

This comment has been minimized.

Show comment
Hide comment
@frsyuki

frsyuki Sep 11, 2015

Contributor

This is general design that works for all plugins.

Contributor

frsyuki commented Sep 11, 2015

This is general design that works for all plugins.

@frsyuki frsyuki added the new feature label Sep 15, 2015

@frsyuki frsyuki referenced this issue Dec 1, 2015

Closed

Encryption #344

frsyuki added a commit that referenced this issue Dec 24, 2015

LocalExecutorPlugin runs more output threads by scattering input pages
This change implements #232.

LocalExecutorPlugin creates N times more output tasks and run them in
parallel if number of input tasks is less than min_threads option
(N > 1). Default min_threads is same with number of CPU cores.

This behavior is deterministic as long as min_threads option is not
changed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment