Multiple tasks do not generate singleton locks #38

denisemauldin · 2021-03-05T07:25:18Z

I'm running two EC2 instances connected to 1 Redis cache. Each EC2 instance is running a celery beat and 4 workers of celery 4.4.7 and celery-singleton 0.3.1.
I'm trying to use celery-singleton to only queue/run one copy of the task from whichever celery beat schedules first.
I have 14 tasks configured with base=Singleton and lock_expiry > task time.
I see only 10 singleton lock keys in my Redis.
Out of the 14 tasks, 9 of them run twice.
Out of the 14 tasks, 3 have parameters. One of those runs once, the other 2 run twice.

One of the tasks is configured with @app.task(base=Singleton, lock_expiry=60, raise_on_duplicate=True) and it still runs two tasks and does not raise an exception.

What could be going wrong so that my 14 configured CELERY_BEAT_SCHEDULE tasks are generating 23 tasks and 10 singleton locks? Why isn't raise_on_duplicate triggering?

The text was updated successfully, but these errors were encountered:

steinitzu · 2021-03-31T23:06:25Z

Did you find a solution?
If still an issue, could you provide a minimal example to reproduce?

denisemauldin · 2021-04-01T01:28:14Z

I did not find a solution. Reproduction will probably be hard. If you don't have any thoughts on what troubleshooting paths I can go down then you can close this issue.

jtkaufman737 · 2021-06-18T14:39:45Z

I am also in a difficult position to repro but have something that at face value seems to be a failure of singletons to block duplicate tasks...

denisemauldin · 2021-06-18T14:50:30Z

We ended up generating an additional redis cache key for tasks that were very important to not have run as duplicates.

jtkaufman737 · 2021-06-21T15:45:14Z

@steinitzu it appears that I can reproduce what should be a singleton function running more than once simultaneously, but before I go farther with a reproducible example or anything want to make sure this isn't the intended behavior - I see functions that should be singletons running simultaneously across separate celery pods. I can't really tell from the docs whether that would be intended behavior or not for separate pods...any guidance appreciated

steinitzu · 2021-06-21T16:18:17Z

Hey @jtkaufman737
What do you mean by pods exactly?

The locking is all coordindated centrally in redis. So this shouldn't happen regardless of how many processes you have dispatching jobs or how many workers you have.

jtkaufman737 · 2021-06-23T13:08:16Z

Hi thanks so much for following up, and sorry for the confusing term - we use several containers for various celery tasks to handle how many we have, its just docker but for some reason my team calls them pods. Not sure why lol. But I can take it from your answer that it shouldn't matter so long as they all communicate back to one redis instance

jtkaufman737 · 2021-06-23T20:55:09Z

For whatever it is worth, I've been able to reproduce a few times now by adding logging to the beginning and end of a function to mark start/end, and getting the "FUNCTION STARTING" logs multiple times consecutively without the "FUNCTION END" log displaying. I know its not that helpful without a minimum example for you to look at but its a proprietary app, I'll work on getting something more demoable because at this point it seems like I have been able to get this behavior going a few times, at least three yesterday (although that was out of many, many runs so seems itermittent)

steinitzu · 2021-06-23T21:37:12Z

I would also log the task ID for this. Check if the tasks have the same ID or different IDs.

To rule out any issue with celery itself where 2+ workers might consume the same job in parallel.

jtkaufman737 · 2021-06-24T13:18:12Z

I think it shouldn't be that...it looks like we already have enabled uuid logging so one of the things I captured was like this

celery-slow_2  | [2021-06-22 20:49:40,963: ERROR/ForkPoolWorker-1] Celery job "sde.tasks.slow_compose_v2" (a1abe82f-df12-4405-b18c-98ad4433f1ee) starting: args=[], kwargs={}
celery-slow_2  | [2021-06-22 20:49:40,963: WARNING/ForkPoolWorker-1] SLOW COMPOSE V2 STARTING AT TIME 20:49:40 > > > > > > >  > > > > > > > >  > >
celery-slow_1  | [2021-06-22 20:50:09,576: ERROR/ForkPoolWorker-1] Celery job "sde.tasks.slow_compose_v2_stream" (10a28cef-05ed-44b6-b1f6-9a1df4b51a6f) finished: retval=None
celery-slow_1  | [2021-06-22 20:50:09,580: ERROR/ForkPoolWorker-1] Celery job "sde.tasks.slow_compose_v2" (da2b2885-635d-4726-b14d-e217ed4de7d8) starting: args=[], kwargs={}
celery-slow_1  | [2021-06-22 20:50:09,581: WARNING/ForkPoolWorker-1] SLOW COMPOSE V2 STARTING AT TIME 20:50:09 > > > > > > >  > > > > > > > >  > >

I see in both of the areas where it says "Celery job "sde.tasks.slow_compose_v2" starting" that there are different UUIDs and they are in different containers. Will still continue working on a minimal example where you could see this in action. Unless I'm misinterpreting what I'm seeing here which, figured worth sharing it just for that possibility

tony · 2022-03-13T14:30:44Z

For some who experience issues troubleshooting this, I've added a request to add logging: #45

This would improve troubleshooting / debuggability and the info provided with questions and bug reports.

The reason why is it's not known for sure if it's a genuine ID mismatch (logging would help you identify the disparity) or something else. Too many possibilities.

juspence · 2022-05-23T15:52:21Z

In our case the dublicate tasks had the same UUID. We finally fixed this issue by changing our Celery settings:

# Set a global 1-hour task timeout. You may override this on individual tasks by decorating it with:
# @app.task(soft_time_limit=<TIME_IN_SECONDS>)
CELERY_TASK_SOFT_TIME_LIMIT = 3600

# Some tasks have their time limits overridden as described above. Track the longest timeout value we use
# The CELERY_SINGLETON_LOCK_EXPIRY and redis visibility timeouts must never be less than the below value
CELERY_LONGEST_SOFT_TIME_LIMIT = 10800

# Expire locks after three hours, which is the longest task time limit.
# https://github.com/steinitzu/celery-singleton#app-configuration
CELERY_SINGLETON_LOCK_EXPIRY = CELERY_LONGEST_SOFT_TIME_LIMIT

# Prevent duplicate task runs when a task takes more than 1 hour
# Needed in addition to above because we use late acknowledgments
# https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#redis-caveats
CELERY_BROKER_TRANSPORT_OPTIONS = {"visibility_timeout": CELERY_LONGEST_SOFT_TIME_LIMIT}

The above is needed if you set CELERY_TASK_ACKS_LATE = True, which causes long-running tasks to remain in the queue until they finish, instead of "until they start".

If they take more than CELERY_TASK_SOFT_TIME_LIMIT (default 1 hour?), either the CELERY_SINGLETON_LOCK_EXPIRY timeout may be reached (default 1 hour?), or the Redis visibility_timeout is reached (default 1 hour) and the task will be executed by another worker.

So technically this issue can be closed, since it's a bug in Celery settings and not celery-singleton's logic. But I think it would be helpful to document the above since this took quite a while to uncover.

TkTech mentioned this issue Oct 5, 2023

Beat unique fix TkTech/celery-heimdall#6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple tasks do not generate singleton locks #38

Multiple tasks do not generate singleton locks #38

denisemauldin commented Mar 5, 2021

steinitzu commented Mar 31, 2021

denisemauldin commented Apr 1, 2021

jtkaufman737 commented Jun 18, 2021

denisemauldin commented Jun 18, 2021

jtkaufman737 commented Jun 21, 2021

steinitzu commented Jun 21, 2021

jtkaufman737 commented Jun 23, 2021

jtkaufman737 commented Jun 23, 2021

steinitzu commented Jun 23, 2021

jtkaufman737 commented Jun 24, 2021 •

edited

Loading

tony commented Mar 13, 2022 •

edited

Loading

juspence commented May 23, 2022

Multiple tasks do not generate singleton locks #38

Multiple tasks do not generate singleton locks #38

Comments

denisemauldin commented Mar 5, 2021

steinitzu commented Mar 31, 2021

denisemauldin commented Apr 1, 2021

jtkaufman737 commented Jun 18, 2021

denisemauldin commented Jun 18, 2021

jtkaufman737 commented Jun 21, 2021

steinitzu commented Jun 21, 2021

jtkaufman737 commented Jun 23, 2021

jtkaufman737 commented Jun 23, 2021

steinitzu commented Jun 23, 2021

jtkaufman737 commented Jun 24, 2021 • edited Loading

tony commented Mar 13, 2022 • edited Loading

juspence commented May 23, 2022

jtkaufman737 commented Jun 24, 2021 •

edited

Loading

tony commented Mar 13, 2022 •

edited

Loading