LocalThreadExecutor can use more threads for output plugins #232

Open
frsyuki opened this Issue Jul 8, 2015 · 2 comments

Projects

None yet

2 participants

@frsyuki
Contributor
frsyuki commented Jul 8, 2015

Some input plugins don't support parallel processing. Or input plugins can't run using multiple threads under certain situation (e.g. data source is a big single file).

In those cases, we want to use more threads than number of input threads.
Since v0.6.0, number of threads is controlled by executor plugin. Executor plugins can use different number of threads for inputs and outputs. Idea here is to use larger number of threads for output.

A difficulty of implementation is dispatching of pages to output plugins. It must be deterministic. For example, use round-robin per input task:

  • input task 1: page 1 -> output task 1, page 2 -> output task 2, page 3 -> output task 3, ...
  • input task 2: page 1 -> output task 2, page 2 -> output task 3, page 3 -> output task 4, ...
  • input task 3: page 1 -> output task 3, page 2 -> output task 4, page 3 -> output task 5, ...
@daledude

Is this only for the fileinput or a general design for having parallism for all plugins?

@frsyuki
Contributor
frsyuki commented Sep 11, 2015

This is general design that works for all plugins.

@frsyuki frsyuki added the new feature label Sep 15, 2015
@frsyuki frsyuki referenced this issue Dec 1, 2015
Closed

Encryption #344

@frsyuki frsyuki added a commit that referenced this issue Dec 24, 2015
@frsyuki frsyuki LocalExecutorPlugin runs more output threads by scattering input pages
This change implements #232.

LocalExecutorPlugin creates N times more output tasks and run them in
parallel if number of input tasks is less than min_threads option
(N > 1). Default min_threads is same with number of CPU cores.

This behavior is deterministic as long as min_threads option is not
changed.
e1c57d7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment