Latest Honest Status #226

ja-gooding · 2022-09-18T22:18:33Z

What are the honest truths of the state of this project and its unit-tested capabilities for HA?

As far as I am concerned, "django-celery-beat" is a lost cause. It has had issues with basic functionality, especially High Availability, since 2013 and even earlier: celery/celery#1495 (comment).

It will never be fixed, and I am unsure it can be improved because I'm not sure it was adequately engineered to be done so; from the start.

I am looking for a project that will allow me to schedule periodic tasks in the Django ORM with second and millisecond accuracy just like with "django-celery-beat" while also maintaining some modern ability to have high availability (HA) / fault tolerance optionally used in the event the primary beat mechanism suffers a failure / needs to move to a different node (i.e., docker swarm / k8s pods, etc.) for whatever reason (software, hardware, maintenance, disaster recovery, etc.)

Please let me know if this isn't the right project or if one does not exist.

sibson · 2022-09-23T00:31:46Z

Per the license, https://github.com/sibson/redbeat/blob/main/LICENSE#L145, this software is provided "AS IS". I personally don't use the cluster mode and the support was added by others. I can't comment on how they are using it, so can only assume it's meeting their needs. HA design is complex, both your application and infrastructure needs to be designed to achieve HA across the scenarios you identity. I'd suggest reviewing the code for yourself to determine if it meets your needs. If you discover issues and are able to provide reasonable patches I will integrate them.

ja-gooding · 2022-09-23T00:45:27Z

Thank you for the response. I’m an academic researcher and will get back to you on closing this issue out after a more formal evaluation.

chenseanxy · 2022-09-23T14:56:10Z

Hi! I've been running redbeat as a non-critical task scheduler in production for about a year now, and we found redbeat to be generally capable and stable. We were able to scale up to tens of thousands of tasks, with 100+ tasks dispatched every second off a 1-core instance, with a stand-by beat instance waiting on locks in case the main beat instance fails.

We aren't using clustered redis for simplicity's sake, but we had a way to automatically refill the tasks if the redis instance got restarted, and it got us most of the way towards high availability despite the single redis instance
If you have lots of task dispatches, monitor how long each tick takes and tune the redbeat_lock_timeout parameter accordingly, so that your instance can get through all the tasks within the timeout

And, millisecond accuracy is likely a myth in celery, since by the time worker receives the task and starts execution, 10s of ms would have likely gone by

sibson closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest Honest Status #226

Latest Honest Status #226

ja-gooding commented Sep 18, 2022

sibson commented Sep 23, 2022

ja-gooding commented Sep 23, 2022

chenseanxy commented Sep 23, 2022

Latest Honest Status #226

Latest Honest Status #226

Comments

ja-gooding commented Sep 18, 2022

sibson commented Sep 23, 2022

ja-gooding commented Sep 23, 2022

chenseanxy commented Sep 23, 2022