Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pooling tasks issue #16

Closed
zatsepinvl opened this issue Apr 25, 2019 · 7 comments
Closed

Pooling tasks issue #16

zatsepinvl opened this issue Apr 25, 2019 · 7 comments

Comments

@zatsepinvl
Copy link

Hi! I have noticed that implementation of Pooler assumes working only with one task per time:
https://github.com/piercus/step-function-worker/blob/master/lib/pooler.js#L71
'pool should not be called when task on going'

What is the reason for such approach?

@piercus
Copy link
Owner

piercus commented Apr 25, 2019

@zatsepinvl Thanks for your question

You can do multiple task per time using worker's concurrency

var worker = new StepFunctionWorker({
  activityArn : '<activity-ARN>',
  workerName : 'workerName',
  fn : fn,
  concurrency : 2 // default is 1
});

You can increase it like concurrency: 100 to have a lot of parallel task.

1 worker = multiple pooler = multiple parallel task.

1 pooler = 0 or 1 task

Horizontal scaling (parallel tasking) is managed by the worker's poolers numbers (Worker's concurrency param is managing that), see https://github.com/piercus/step-function-worker/blob/master/lib/worker.js#L88

The word "pooler" come from the concept of long polling which is used to request task from AWS step function.

The pooler's role is to ask AWS step-function's activity for a task, but when the task is associated to the pooler, then this pooler is "taken" and this pooler should not call the "pool" method anymore

Any suggestion is welcome

@zatsepinvl
Copy link
Author

zatsepinvl commented Apr 25, 2019

Great explanation, thanks!

My issue is about the following case.

Imagine that there are about 10k step function executions work at the same time. Every execution includes activity task. Every task handling may wait for a long time for user input (up to activity task timeout). So, if relation of pooler to task is about 1 to 0 or 1, thus actual number of concurrent executions are limited by concurrency configuration of worker. Is it right?

What is the maximum value of concurrency?

And more generally, can it be implemented as 1 pooler to 0 or n tasks so execution concurrency is limited only by runtime performance capabilities?

@piercus
Copy link
Owner

piercus commented Apr 26, 2019

Is it right?

Yes it is right

What is the maximum value of concurrency?

Each pooler will create an http request so it will be limited by the max number of http request of your environment.

And more generally, can it be implemented as 1 pooler to 0 or n tasks so execution concurrency is limited only by runtime performance capabilities?

I agree we need to change the design

Use case infos

I'd like to know more about your use case, step-function-worker is useful when the processing does not "fit" into a lambda function, on my personnal use cases, those processing were cpu-intensive and i cannot run 10k in parallel, so i did use very low concurrency values (1,2,3 max), can you please explain more your use case (do you prefer activity over lambda or do you have a specific reason that makes lambda unusable in your use case ?)

Proposal

Here is a proposal for new design, please confirm it will fix your concerns

Actual

  • Worker's concurrency parameter is limiting 2 different things :
    • The number of parallel pooling request made to AWS activity step function
    • The number of parallel task
  • This is not possible to have 'unlimited' parallel task
  • Default parallel tasks is 1
  • Not possible to have more task that the max number of http connection
  • task is a 0-1 child of pooler

Expected

  • 2 different concurency parameters should be used
    • concurrency should be set to "deprecated" and should be replaced by taskConcurrency
    • taskConcurrency (1 for retrocompatibility) will manage the number of concurrent tasks
    • poolConcurrency (1 by default) will manage the number of parallel poolers
  • This is possible to have 'unlimited' parallel task (by using taskConcurrency: null)
  • Default parallel tasks is 1
  • Possible to have as many parallel tasks as needed
  • tasks and pollers are 0-n children of worker (workers.tasks and worker.poolers)

Next step

  • Please confirm/comment this proposal
  • If you want to PR this, i would be glad to comment, but please create unit test
  • Else i will do this redesign, but i can't start working on this before mid-may 2019

@zatsepinvl
Copy link
Author

Use case infos

My use case is about chat-bot. I have step function as description of flow and activity as messages dispatcher. Dispatcher is responsible for listening user messages and responding them back. Process can be described like this: [task] Some question to user to fill in -> [activity] (dispatcher pools task -> send message from tasks to user -> wait for response -> return user response as a result of task). Assuming that chat-bot is used by 1m users, 10k active executions are quite reasonable fact. In this case number of active executions are the number of users that service can process.

Proposal

Expected design looks appropriate to satisfy my use case.

@piercus
Copy link
Owner

piercus commented May 20, 2019

Hello @zatsepinvl

I have released 3.0 alpha on https://github.com/piercus/step-function-worker/releases/tag/v3.0-alpha.

Is it fixing your concerns ?

@zatsepinvl
Copy link
Author

Thank you! Looks like appropriate.

piercus added a commit that referenced this issue Oct 19, 2021
BREAKING CHANGE: this is a breaking change
Redesign the concurrency architecture for 3.0 following #16
piercus added a commit that referenced this issue Oct 19, 2021
@piercus
Copy link
Owner

piercus commented Oct 19, 2021

@zatsepinvl It's been a while, but for information, v3.0 has been released today

Thank you for your help on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants