Support "baskets": long term scheduler backends #53

epwalsh · 2020-01-23T00:52:29Z

The current system (and essentially how Python Celery does it) for handling tasks with a far-off ETA is to have workers consume them as usual but use the delay_for async function to delay execution until ETA is reached, which could be a while.

In the meantime that task is still taking up resources on a worker and more importantly is overriding the backpressure mechanism - the prefetch_count configuration setting - because in order for the worker to continue consuming other tasks while it is holding onto delayed tasks it needs to tell the broker to increase its channel's prefetch_count behind the scenes.

If it didn't do this and the worker kept receiving tasks with a far-off ETA, the initial prefetch_count would soon be reached and so the broker would stop sending tasks to this worker. Absent of more workers being spun up, this would cause the broker to pile up with messages since it has no where to send them. So no new tasks (even those without a future ETA) could be executed until the worker executes some of the tasks it is holding onto.

So we choose the lesser of the two evils: increasing the prefetch_count. In other words, the worker says "hey, thanks for this task, but I can't do anything with it right now so want to just give me another one?". And that's all fine unless there are a ton of tasks with far-off ETA, in which case the worker will keep taking in more of these until it runs out of memory.

The solution to this is to offload those tasks - tasks with a far-off ETA - somewhere else. Someplace where the chance of running out of memory or storage space is a lot lower, and where the cost of additional memory or storage space is a lot cheaper.

Any traditional database works well for this as long as you can index by the ETA, which you should be able to do with pretty much any database since you can represent the ETA by an integer or float. Then workers (or a dedicated worker solely for this purpose) just need to occasionally poll the database for tasks that are due soon.

The text was updated successfully, but these errors were encountered:

epwalsh · 2020-01-29T16:59:14Z

I think we could call these long term scheduler backends "baskets", because they are like a basket that we put tasks aside in for later.

edulix · 2023-12-14T03:59:04Z

Any update on this?

epwalsh added Status: Help Wanted Type: Enhancement Priority: Low Protocol Enhancement A proposed enhancement to the Celery protocol labels Jan 23, 2020

epwalsh changed the title ~~Support long term scheduler backends~~ Support "baskets": long term scheduler backends Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support "baskets": long term scheduler backends #53

Support "baskets": long term scheduler backends #53

epwalsh commented Jan 23, 2020 •

edited

Loading

epwalsh commented Jan 29, 2020

edulix commented Dec 14, 2023

Support "baskets": long term scheduler backends #53

Support "baskets": long term scheduler backends #53

Comments

epwalsh commented Jan 23, 2020 • edited Loading

epwalsh commented Jan 29, 2020

edulix commented Dec 14, 2023

epwalsh commented Jan 23, 2020 •

edited

Loading