New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs run concurrently with the same signature #46
Comments
The job signature is only used when you create a new job, and only prevents the creation of a new job if there's already a 'new' job with that signature in place - ie if there's already a job running, it will let you queue that new job up even though they have the same signatures. Do you have multiple queue executors configured? Or is it that you only want the job to be added once the current one finishes? For the second case, I generally have the job create a new version of itself when it finishes. |
I have it creating another job after it finishes. The queue executors (I assume) are the default ones described in the readme.md (our sites are running on CWP) if you run the job queue task multiple times via cli, even if there's a job being run, it will continue to spawn multiple instances of the same job. The behaviour should check if the same job is not currently being executed. It's very misleading to have one job 'running', even though there often are multiple jobs of the same one being ran. |
Hmm, something sounds amiss. The way the executors are set up is to
getNextPendingJob has an explicit check that looks for jobs executing on $queue (ie the jobs have a status of either INIT or RUNNING) - if it finds an already running job on that queue it returns false (and the previously mentioned job loop does not execute. Now - one thing that does stand out as a possible issue is if, on your job completing and creating a new version of itself, it creates a new job and adds it, which will then get picked up by that processJobQueue loop - and create a new job, which immediately executes, and so on. Could that be what's happening? Or are you actually having two (or three / four / five) instances of the job executing concurrently? |
I wouldn't be able to elaborate on the internals, but it's fairly simple to test: if you create a new job that takes a while (say 30 seconds), and the call This is particularly frustrating for me as I have to deal with multiple APIs which can take a varied amount of time to complete. They're fairly large jobs that will start to have unpredictable behavior if they're running at the same time. |
Yep cool - will try and run up a reproduction of the issue. |
Just adding a note that I've been able to reproduce something that maybe is similar to your issue, but not consistently. Have set up 4 instances of the "DummyQueuedJob" via the queuedjob admin, each with a different run duration - this actually means they have separate signatures. Starting the run from one cli, it starts normally
From another console, it behaves as expected by doing
But on the 4th or 5th execution, it picks up the same job
Now, the sendmail error in there gives me a bit of a hint as to what might be happening, am going to look into it further to see if my guess is correct, but I think what's happening is that the job health check is detecting the running job as being stalled (correctly or not), then pausing it. The subsequent attempts to process the queue are then picking up the job as it has been paused by the system and set for re-starting; what I'm not sure just yet is if it's pausing it incorrectly, or whether that's a legitimate pause. A couple questions about your jobs
My hunch is that your Am just trying to put together another test class to confirm, and figure out if there's a "nice" way to handle the situation. |
Okay, looks to be related to commit ba94a35 - essentially, the first call to The issue is that the QueuedJobService doesn't know anything about the processing queues (by design - the processors could be the dev/task based one, but could be gearman or similar) to know whether they're still alive, other than the job's LastProcessedStep (derived from the currentStep value in the job) incrementing. As an 'interim' thing to prevent the issue in the current release, it might be simplest to introduce a time based delay for job health checks; ie only check every X (5? 10?) minutes for job health. This would at least mean a greater duration before the potential for the job to be paused/restarted. A more complete solution is a little more difficult and would probably involve introducing additional DB fields which I'm not sure is the best way just yet. Will that time delay for health checks work for you? It'd be a configurable variable, defaulting to 2 minutes or something close to what it is now, but let you set it high given your scenario. |
An interim solution for issue #46 Job queue health check can now be configured to extend the duration before the health check is run. Configure QueuedJobService: health_check_mins: {int} for the number of minutes to skip on checking. The check is done by modding this value against the current time's minute, and if that returns 0, then the health check is performed.
Okay you can try https://github.com/silverstripe-australia/silverstripe-queuedjobs/tree/fix-health-check-time, and add some config to your project yml like
to set it to run the health check only every 5th minute. |
Cool - I'll give this a try to see if it helps the situation. I'll also give you some code so you can reproduce the issue as well. I'm hoping this will help as we've got 6 or so jobs that have to be ran manually as they aren't executed by the queued job module. |
Better yet - you can simply make your job set $this->currentStep = -1 in the |
Just a quick note that we hit this issue too and overcome it with setting $this->currentStep = -1 in the construct. Thanks. |
it's a lazy hack, but managed to avoid issues by spacing out our jobs and throwing more CPU at it. health_check_mins: 5 didn't seem to do much |
Need to validate if this affects SS4 |
I have several jobs that I don't want to run at the same time. Even though the job returns a static signature through the getSignature() function, they continue to run concurrently at the same time.
I'm setting isComplete = true when it completes, and I'm stepping through the currentSteps and have set the totalSteps correctly.
The text was updated successfully, but these errors were encountered: