LocalThreadExecutor can use more threads for output plugins #232

frsyuki opened this Issue Jul 8, 2015 · 2 comments


None yet

2 participants

frsyuki commented Jul 8, 2015

Some input plugins don't support parallel processing. Or input plugins can't run using multiple threads under certain situation (e.g. data source is a big single file).

In those cases, we want to use more threads than number of input threads.
Since v0.6.0, number of threads is controlled by executor plugin. Executor plugins can use different number of threads for inputs and outputs. Idea here is to use larger number of threads for output.

A difficulty of implementation is dispatching of pages to output plugins. It must be deterministic. For example, use round-robin per input task:

  • input task 1: page 1 -> output task 1, page 2 -> output task 2, page 3 -> output task 3, ...
  • input task 2: page 1 -> output task 2, page 2 -> output task 3, page 3 -> output task 4, ...
  • input task 3: page 1 -> output task 3, page 2 -> output task 4, page 3 -> output task 5, ...

Is this only for the fileinput or a general design for having parallism for all plugins?

frsyuki commented Sep 11, 2015

This is general design that works for all plugins.

@frsyuki frsyuki added the new feature label Sep 15, 2015
@frsyuki frsyuki referenced this issue Dec 1, 2015

Encryption #344

@frsyuki frsyuki added a commit that referenced this issue Dec 24, 2015
@frsyuki frsyuki LocalExecutorPlugin runs more output threads by scattering input pages
This change implements #232.

LocalExecutorPlugin creates N times more output tasks and run them in
parallel if number of input tasks is less than min_threads option
(N > 1). Default min_threads is same with number of CPU cores.

This behavior is deterministic as long as min_threads option is not
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment