Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker count for a code package? #64

Open
calebjclark opened this issue Apr 11, 2013 · 8 comments
Open

Worker count for a code package? #64

calebjclark opened this issue Apr 11, 2013 · 8 comments

Comments

@calebjclark
Copy link

We are following Iron's suggestion of pushing high volume tasks to a queue on IronMQ and then firing up Workers to process those messages. However, there is no way to easily see how many workers are running or queued for a specific code package. The only way we could find is to request all tasks and loop through them to count how many tasks contain the code_id we're looking for.

This doesn't seem very efficient, either for us or Iron. What would you recommend? We don't want to be firing up a new worker whenever there are messages in IronMQ, because will have high a high volume of messages that each require very small amounts of processing.

@treeder
Copy link
Contributor

treeder commented Apr 11, 2013

Hi @calebclark , we will soon have a feature made just for this (scaling workers based on queue sizes), but for now, would a scheduled worker do the trick? Like schedule a worker every minute and have it run until the queue is empty. If the queue grows, you may get more than one running at once (if the first one doesn't finish within a minute), which is good though as it will go through the queue faster. Sort of an auto-scaling hack.

@calebjclark
Copy link
Author

No. Our volume is going to be very uneven. We'll likely have long stretches of time with no tasks and then suddenly have a need to fire up 10 or 20 workers to handle the volume. Having a scheduled worker would do little for us during the stretches of no work and underperform during the heavy volume.

@treeder
Copy link
Contributor

treeder commented Apr 11, 2013

How about a master/slave pattern (very common). The scheduled worker, the master, starts every minute, checks queue info endpoint to get queue size, then spawn up any number of slave workers. If small number, just queue up one slave, if large, queue up a bunch.

@calebjclark
Copy link
Author

It seems that under sustained volume, if the master executes the logic you describe, with no knowledge of how many slaves are already running, then it potentially becomes only a matter of time before the ratio becomes 1 worker to 1 message.

@carimura
Copy link
Contributor

IronCache simple slave counter?

@treeder
Copy link
Contributor

treeder commented Apr 11, 2013

I don't think it would ever get to 1 worker for 1 message unless you had some bad scheme for doing this. I don't know your use case, but simple example:

Let's say you spawn one worker for every 100 messages in queue. Then:

time 0, queue size 0: no slaves
time 60, queue size 500: +5 slaves = 5 slaves
# if queue keeps growing even with 5 workers cranking through them, then you must have a lot of messages coming in, so case A:
time 120, queue size 1000: +10 slaves = 15 slaves
# if 5 workers can reduce the queue, then case B:
time 120, queue size 200: +2 slaves = 7 slaves
# now either 7 workers can eat through queue or they can't, but let's say they can so they all die off and we're at 0 slaves by the time the next master runs:
time 180, queue size 0: +0 slaves = 0 slaves 

And so on, and so on.

@featalion
Copy link

+1 to last Travis' solution. Note: use loop with "while queue is not empty" condition for your slaves, not N messages for queue and exit. But spawn slaves by number of messages per worker as Travis recommended. This way provide automatic scaling on worker side. Number of messages per worker (or number slaves launch) is depend on your processing time of each message. If it is really heavy tasks - you may reach concurrency limit (depend on plan you're using). So, if you are able to predict average high number of messages and time to process each message (on N messages) you may calculate the maximum concurrency you need on msg. queue spikes (and please keep in mind it's avg. high, and adding safety factor for concurrency possibly good idea).

@featalion
Copy link

Concurrency calculation example:

  • Assume queue avg. spike level is 3000 msgs/s
  • Assume one worker can process 10 msgs/s
  • Assume your master worker scheduled run each minute (60 s)
  • Assume safety factor for concurrency is 0.15 (15%)

Then peak:

concurrency = 345   [ (1 + 0.15) * 3000 / 10  ]

N messages per worker:

N_msgs_per_worker = N_msgs_now_in_queue / (10 * 60)  [ 600 msgs per worker per minute ]

Of course you can extend you master worker with more complex prediction. IronCache will be pretty good to store such kind of information. And free acc. will be enough ( ;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants