-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restarting the server may stop the recurring jobs forever #131
Comments
Cool workaround! Curious, have you tried decoupling the agenda jobs from the server (using workers approach) and seeing if that resolves the problem with having to cancel everything on shutdown for things to work on restart? |
Simply, using I started using agenda with the code snippet I wrote here, then I used the decoupling approach "jobWorkers" proposed by agenda documentation, and that didn't help solving my issue either, |
Very interesting, thanks for the insights! I am going to add this in this evening and will report back if I uncover anything worth noting. Thanks again @Abdelhady |
You are welcome, & I hope it works well with you. |
@Abdelhady thank you for reporting this. Can you show the state of your job collection (ie. db.agendaJobs.find().toArray()) to see if they are stuck in a lock state? |
I have many non-recurring jobs in the db.agendaJobs.find({repeatInterval: { $exists: true, $ne: null }}).pretty() To reproduce the issue, I've commented out the workaround mentioned earlier and downed the interval from 30 min to only 4 seconds to save my time :) (but don't worry, the job makes an almost trivial query on local db which should take only milliseconds), & here is the recurring job after reproducing the issue: {
"_id" : ObjectId("54d0c194e569d8a528e8560d"),
"data" : null,
"lastFinishedAt" : ISODate("2015-02-03T12:41:55.785Z"),
"lastModifiedBy" : null,
"lastRunAt" : ISODate("2015-02-03T12:41:55.781Z"),
"lockedAt" : ISODate("2015-02-03T12:41:56.132Z"),
"name" : "Example Job",
"nextRunAt" : ISODate("2015-02-03T12:41:59.781Z"),
"priority" : 0,
"repeatInterval" : "4 seconds",
"type" : "single"
} I think you are right, it is stuck in a lock state. |
I'm having the same issue. I'm also using worker process to process the jobs, but also that process needs to be restarted when code is updated. Is the above workaround the only way to solve the issue? |
Unfortunately yes, for the most part. You can also unlock all locked jobs on startup, which seems to fix the issue. Temporary Workaround: var Agenda = require('agenda');
var agenda = new Agenda({db: { address: 'mongodb://localhost:27017/my-test-db'}});
agenda._db.update({ lockedAt : { $exists : true } }, { $set : { lockedAt : null } }, function(e, numUnlocked) {
if(e) {
throw e;
}
console.log("Unlocked " + numUnlocked + " jobs.");
agenda.define('say hi', function(job, done) {
console.log('Hi!!!!');
done();
});
agenda.every('1 second', 'say hi');
agenda.start();
}); Unfortunately I'm not going to have time to look at this until next week sometime. |
Haven't found time to investigate all that's going on, but for what it's worth, using the worker approach with forever -w to restart server on changes appears to be working perfectly for me. |
@MikkoLehtinen I've summed up the 2 possible workarounds till now in this blog post |
@Abdelhady For these jobs that are permanently locked, you're not setting an extremely long lock lifetime when initializing Agenda or the job definitions right? I started by trying to figure out how / where these lock times were set and how long they were good for until they get invalidated. The main piece that handles how long they should be locking for is the I figured that it was a situation where these lock lifetimes weren't being respected, so I dove a bit deeper. I started looking at the main "loop" that gets fired when processing jobs. This method basically runs when This method gets called in one of two ways. The first way this happens is when you save a job and the The other way is the normal setInterval method. This won't move the job to the top of the queue and begin processing, instead what it does is loop through the available job definitions and runs Sorry for the braindump here, I've got a bit more code to read in this area. Just wanted to make sure that we're not running with extremely long lock times on your end for the time being. ...and right before I posted this, it does look like this gets stuck with this code (aka multiple .every() jobs being set). This supports something I noticed earlier with Actual breaking example: var Agenda = require('./lib/agenda.js');
var agenda = new Agenda({db: { address: 'mongodb://localhost:27017/my-test-db'}});
agenda.define('say hi', function(job, done) {
console.log('Hi!!!!');
setTimeout(done, 5000)
});
agenda.every('10 seconds', 'say hi');
agenda.every('10 seconds', 'say hi');
agenda.every('10 seconds', 'say hi');
agenda.every('10 seconds', 'say hi');
agenda.start(); |
After working out how Agenda pulls in jobs for processing, I noticed that the _findAndLockNextJob method could return multiple jobs, which ends up breaking jobs with the same definition. The query grabs and updates the list of jobs whose lock time is passed or should be ran, meaning if you have multiple of the same jobs running and shut down the server, it will end up never running those jobs again. I do need to pick @rschmukler's brain about how to properly get tests for these methods, as it seems that the methods in question are private and unreachable using Mocha. Fixes agenda#131
After working out how Agenda pulls in jobs for processing, I noticed that the _findAndLockNextJob method could return multiple jobs, which ends up breaking jobs with the same definition. The query grabs and updates the list of jobs whose lock time is passed or should be ran, meaning if you have multiple of the same jobs running and shut down the server, it will end up never running those jobs again. I do need to pick @rschmukler's brain about how to properly get tests for these methods, as it seems that the methods in question are private and unreachable using Mocha. Fixes agenda#131
@Abdelhady @BlakePetersen Hey, could you guys possibly check out the newest version of Agenda on the master branch and see if it fixed these issues for you? I'm adding additional tests for this specific case tonight / tomorrow, but master should fix the issues you've been having. |
I've updated my packages to use the latest version the case of "multiple .every() jobs being set" is not my case here, because I have only 1 recurring job, & even when I was having only this job (without the single jobs) the issue was still there too. I think we should re-open it again or something. |
Whelp, fixing bugs while not fixing bugs! agenda.start()
graceful = ()->
agenda.stop ()->
process.exit 0
process.on 'SIGTERM', graceful
process.on 'SIGINT' , graceful
agenda.define 'my job', (job, done) ->
console.log 'Job running! Will finish in 5 seconds.'
setTimeout(done, 5000)
agenda.every '10 seconds', 'my job' Would this be a fair use-case for you that triggers the bug? When I tried something like this it didn't get stuck for me (pre-patch). I tried to restart the server while the job was mid-run, as I'd assumed it was a lock time issue. Do you know if you're restarting the server while a job is in a finished state? |
@rschmukler I found the same problem. when server stop. it calls the unlockJobs. sometimes the lockedAt is set to null, sometime not. And I found the root cause is in jobProcessing(). it pop up the job in the jobQueue, but it does not add it to the _runningJobs immediately as the nextRunAt > now. if server stop at this time. it will not unlock this job.(it is not in jobQueue or _runningJobs but it is locked).
|
I have an odd problem. None of my recurring tasks are locked, but still they haven't been executed after the server restarted. |
is this already fixed? i'm also having the issue with 0.7 series. |
I'm also having issues with the latest build. |
👍 |
I'm currently having this issue as well. Re: @droppedoncaprica's fix. It looks like function removeStaleJobs(callback) {
agenda._collection.update({
lockedAt: {
$exists: true
}
}, {
$set: {
lockedAt: null
}
}, {
multi: true
}, callback);
}
agenda.on('ready', function() {
removeStaleJobs((e, r) => {
if (e) {
console.error("Unable to remove stale jobs. Starting anyways.");
}
agenda.start();
});
}); |
@pcorey Thanks , I tried your method but it still does not work for me . I am on node 4.5.0
|
Issue seems to be resolved on latest release |
I have the same problem. agenda@0.9.0, node@7.5.0 My workaround of this problem is deleting of agenda collection at start time. It's not important for me the persistence, but I'm afraid to rely on graceful exit. |
I'm starting my job every 30 minutes, but sometimes when restarting the server, the job stops working forever until I remove it from the
agendaJobs
collection manually, & I tried smaller periods for testing like 10 seconds, & it is now a consistent behavior, my code is like this:And as a workaround, I decided to cancel all recurring jobs when stopping the server like this:
With this workaround, now restarting the server doesn't affect any of the recurring jobs,
but, I don't know why I have to cancel all the recurring jobs in order to be safe from such a situation?!
The text was updated successfully, but these errors were encountered: