Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple tasks do not generate singleton locks #38

Open
denisemauldin opened this issue Mar 5, 2021 · 12 comments
Open

Multiple tasks do not generate singleton locks #38

denisemauldin opened this issue Mar 5, 2021 · 12 comments

Comments

@denisemauldin
Copy link

I'm running two EC2 instances connected to 1 Redis cache. Each EC2 instance is running a celery beat and 4 workers of celery 4.4.7 and celery-singleton 0.3.1.
I'm trying to use celery-singleton to only queue/run one copy of the task from whichever celery beat schedules first.
I have 14 tasks configured with base=Singleton and lock_expiry > task time.
I see only 10 singleton lock keys in my Redis.
Out of the 14 tasks, 9 of them run twice.
Out of the 14 tasks, 3 have parameters. One of those runs once, the other 2 run twice.

One of the tasks is configured with @app.task(base=Singleton, lock_expiry=60, raise_on_duplicate=True) and it still runs two tasks and does not raise an exception.

What could be going wrong so that my 14 configured CELERY_BEAT_SCHEDULE tasks are generating 23 tasks and 10 singleton locks? Why isn't raise_on_duplicate triggering?

@steinitzu
Copy link
Owner

Did you find a solution?
If still an issue, could you provide a minimal example to reproduce?

@denisemauldin
Copy link
Author

I did not find a solution. Reproduction will probably be hard. If you don't have any thoughts on what troubleshooting paths I can go down then you can close this issue.

@jtkaufman737
Copy link

I am also in a difficult position to repro but have something that at face value seems to be a failure of singletons to block duplicate tasks...

@denisemauldin
Copy link
Author

We ended up generating an additional redis cache key for tasks that were very important to not have run as duplicates.

@jtkaufman737
Copy link

@steinitzu it appears that I can reproduce what should be a singleton function running more than once simultaneously, but before I go farther with a reproducible example or anything want to make sure this isn't the intended behavior - I see functions that should be singletons running simultaneously across separate celery pods. I can't really tell from the docs whether that would be intended behavior or not for separate pods...any guidance appreciated

@steinitzu
Copy link
Owner

Hey @jtkaufman737
What do you mean by pods exactly?

The locking is all coordindated centrally in redis. So this shouldn't happen regardless of how many processes you have dispatching jobs or how many workers you have.

@jtkaufman737
Copy link

Hi thanks so much for following up, and sorry for the confusing term - we use several containers for various celery tasks to handle how many we have, its just docker but for some reason my team calls them pods. Not sure why lol. But I can take it from your answer that it shouldn't matter so long as they all communicate back to one redis instance

@jtkaufman737
Copy link

For whatever it is worth, I've been able to reproduce a few times now by adding logging to the beginning and end of a function to mark start/end, and getting the "FUNCTION STARTING" logs multiple times consecutively without the "FUNCTION END" log displaying. I know its not that helpful without a minimum example for you to look at but its a proprietary app, I'll work on getting something more demoable because at this point it seems like I have been able to get this behavior going a few times, at least three yesterday (although that was out of many, many runs so seems itermittent)

@steinitzu
Copy link
Owner

I would also log the task ID for this. Check if the tasks have the same ID or different IDs.

To rule out any issue with celery itself where 2+ workers might consume the same job in parallel.

@jtkaufman737
Copy link

jtkaufman737 commented Jun 24, 2021

I think it shouldn't be that...it looks like we already have enabled uuid logging so one of the things I captured was like this

celery-slow_2  | [2021-06-22 20:49:40,963: ERROR/ForkPoolWorker-1] Celery job "sde.tasks.slow_compose_v2" (a1abe82f-df12-4405-b18c-98ad4433f1ee) starting: args=[], kwargs={}
celery-slow_2  | [2021-06-22 20:49:40,963: WARNING/ForkPoolWorker-1] SLOW COMPOSE V2 STARTING AT TIME 20:49:40 > > > > > > >  > > > > > > > >  > >
celery-slow_1  | [2021-06-22 20:50:09,576: ERROR/ForkPoolWorker-1] Celery job "sde.tasks.slow_compose_v2_stream" (10a28cef-05ed-44b6-b1f6-9a1df4b51a6f) finished: retval=None
celery-slow_1  | [2021-06-22 20:50:09,580: ERROR/ForkPoolWorker-1] Celery job "sde.tasks.slow_compose_v2" (da2b2885-635d-4726-b14d-e217ed4de7d8) starting: args=[], kwargs={}
celery-slow_1  | [2021-06-22 20:50:09,581: WARNING/ForkPoolWorker-1] SLOW COMPOSE V2 STARTING AT TIME 20:50:09 > > > > > > >  > > > > > > > >  > >

I see in both of the areas where it says "Celery job "sde.tasks.slow_compose_v2" starting" that there are different UUIDs and they are in different containers. Will still continue working on a minimal example where you could see this in action. Unless I'm misinterpreting what I'm seeing here which, figured worth sharing it just for that possibility

@tony
Copy link
Contributor

tony commented Mar 13, 2022

For some who experience issues troubleshooting this, I've added a request to add logging: #45

This would improve troubleshooting / debuggability and the info provided with questions and bug reports.

The reason why is it's not known for sure if it's a genuine ID mismatch (logging would help you identify the disparity) or something else. Too many possibilities.

@juspence
Copy link

In our case the dublicate tasks had the same UUID. We finally fixed this issue by changing our Celery settings:

# Set a global 1-hour task timeout. You may override this on individual tasks by decorating it with:
# @app.task(soft_time_limit=<TIME_IN_SECONDS>)
CELERY_TASK_SOFT_TIME_LIMIT = 3600

# Some tasks have their time limits overridden as described above. Track the longest timeout value we use
# The CELERY_SINGLETON_LOCK_EXPIRY and redis visibility timeouts must never be less than the below value
CELERY_LONGEST_SOFT_TIME_LIMIT = 10800

# Expire locks after three hours, which is the longest task time limit.
# https://github.com/steinitzu/celery-singleton#app-configuration
CELERY_SINGLETON_LOCK_EXPIRY = CELERY_LONGEST_SOFT_TIME_LIMIT

# Prevent duplicate task runs when a task takes more than 1 hour
# Needed in addition to above because we use late acknowledgments
# https://docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html#redis-caveats
CELERY_BROKER_TRANSPORT_OPTIONS = {"visibility_timeout": CELERY_LONGEST_SOFT_TIME_LIMIT}

The above is needed if you set CELERY_TASK_ACKS_LATE = True, which causes long-running tasks to remain in the queue until they finish, instead of "until they start".

If they take more than CELERY_TASK_SOFT_TIME_LIMIT (default 1 hour?), either the CELERY_SINGLETON_LOCK_EXPIRY timeout may be reached (default 1 hour?), or the Redis visibility_timeout is reached (default 1 hour) and the task will be executed by another worker.

So technically this issue can be closed, since it's a bug in Celery settings and not celery-singleton's logic. But I think it would be helpful to document the above since this took quite a while to uncover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants