Restarting the server may stop the recurring jobs forever #131

Abdelhady · 2015-01-01T16:31:39Z

I'm starting my job every 30 minutes, but sometimes when restarting the server, the job stops working forever until I remove it from the agendaJobs collection manually, & I tried smaller periods for testing like 10 seconds, & it is now a consistent behavior, my code is like this:

agenda.start()
graceful = ()->
    agenda.stop ()->
        process.exit 0

process.on 'SIGTERM', graceful
process.on 'SIGINT' , graceful

agenda.define 'my job', my_job_function
agenda.every '10 seconds', 'my job'

And as a workaround, I decided to cancel all recurring jobs when stopping the server like this:

graceful = ()->
    agenda.cancel repeatInterval: { $exists: true, $ne: null }, (err, numRemoved)->
        agenda.stop ()->
            process.exit 0

With this workaround, now restarting the server doesn't affect any of the recurring jobs,

but, I don't know why I have to cancel all the recurring jobs in order to be safe from such a situation?!

The text was updated successfully, but these errors were encountered:

BlakePetersen · 2015-02-01T16:52:29Z

Cool workaround!

Curious, have you tried decoupling the agenda jobs from the server (using workers approach) and seeing if that resolves the problem with having to cancel everything on shutdown for things to work on restart?

Abdelhady · 2015-02-01T17:21:40Z

Simply, using jobWorkers didn't help.

I started using agenda with the code snippet I wrote here, then I used the decoupling approach "jobWorkers" proposed by agenda documentation, and that didn't help solving my issue either,
so, I was about to use other alternative, but the agenda-ui feature made me try more until I figured out the workaround here :) .

BlakePetersen · 2015-02-01T17:34:25Z

Very interesting, thanks for the insights! I am going to add this in this evening and will report back if I uncover anything worth noting. Thanks again @Abdelhady

Abdelhady · 2015-02-01T17:49:46Z

You are welcome, & I hope it works well with you.

rschmukler · 2015-02-03T00:10:31Z

@Abdelhady thank you for reporting this. Can you show the state of your job collection (ie. db.agendaJobs.find().toArray()) to see if they are stuck in a lock state?

Abdelhady · 2015-02-03T13:07:48Z

I have many non-recurring jobs in the agendaJobs, so I'll fetch only the recurring ones here like this:

db.agendaJobs.find({repeatInterval: { $exists: true, $ne: null }}).pretty()

To reproduce the issue, I've commented out the workaround mentioned earlier and downed the interval from 30 min to only 4 seconds to save my time :) (but don't worry, the job makes an almost trivial query on local db which should take only milliseconds), & here is the recurring job after reproducing the issue:

{
    "_id" : ObjectId("54d0c194e569d8a528e8560d"),
    "data" : null,
    "lastFinishedAt" : ISODate("2015-02-03T12:41:55.785Z"),
    "lastModifiedBy" : null,
    "lastRunAt" : ISODate("2015-02-03T12:41:55.781Z"),
    "lockedAt" : ISODate("2015-02-03T12:41:56.132Z"),
    "name" : "Example Job",
    "nextRunAt" : ISODate("2015-02-03T12:41:59.781Z"),
    "priority" : 0,
    "repeatInterval" : "4 seconds",
    "type" : "single"
}

I think you are right, it is stuck in a lock state.

kosmikko · 2015-02-04T07:48:47Z

I'm having the same issue. I'm also using worker process to process the jobs, but also that process needs to be restarted when code is updated. Is the above workaround the only way to solve the issue?

Albert-IV · 2015-02-04T19:58:44Z

Unfortunately yes, for the most part. You can also unlock all locked jobs on startup, which seems to fix the issue.

Temporary Workaround:

var Agenda = require('agenda');
var agenda = new Agenda({db: { address: 'mongodb://localhost:27017/my-test-db'}});

agenda._db.update({ lockedAt : { $exists : true } }, { $set : { lockedAt : null } }, function(e, numUnlocked) {
  if(e) {
    throw e;
  }

  console.log("Unlocked " + numUnlocked + " jobs.");

  agenda.define('say hi', function(job, done) {
    console.log('Hi!!!!');
    done();
  });
  agenda.every('1 second', 'say hi');
  agenda.start();
});

Unfortunately I'm not going to have time to look at this until next week sometime.

BlakePetersen · 2015-02-04T22:13:30Z

Haven't found time to investigate all that's going on, but for what it's worth, using the worker approach with forever -w to restart server on changes appears to be working perfectly for me.

Abdelhady · 2015-02-06T18:04:43Z

@MikkoLehtinen I've summed up the 2 possible workarounds till now in this blog post

Albert-IV · 2015-02-11T00:09:43Z

@Abdelhady For these jobs that are permanently locked, you're not setting an extremely long lock lifetime when initializing Agenda or the job definitions right?

I started by trying to figure out how / where these lock times were set and how long they were good for until they get invalidated.

The main piece that handles how long they should be locking for is the lockLifetime. These values are set on an Agenda level, not actually stored in the data (here when being initialized, here when setting up a job definition, as well as here to set it manually later).

I figured that it was a situation where these lock lifetimes weren't being respected, so I dove a bit deeper. I started looking at the main "loop" that gets fired when processing jobs. This method basically runs when agenda.start() starts and then is set to run on an interval based on the processEvery property.

This method gets called in one of two ways. The first way this happens is when you save a job and the nextRunAt gets calculated to be before the current time. This will pass the job in so it gets put on top of the job queue.

The other way is the normal setInterval method. This won't move the job to the top of the queue and begin processing, instead what it does is loop through the available job definitions and runs Agenda._findAndLockNextJob on each definition.

Sorry for the braindump here, I've got a bit more code to read in this area. Just wanted to make sure that we're not running with extremely long lock times on your end for the time being.

...and right before I posted this, it does look like this gets stuck with this code (aka multiple .every() jobs being set). This supports something I noticed earlier with _findAndLockNextJob possibly returning multiple jobs (which processJobs doesn't necessarily support at the moment). I'm not 100% sure on that though, but after playing with it it seems like it might have fixed it.

Actual breaking example:

var Agenda = require('./lib/agenda.js');
var agenda = new Agenda({db: { address: 'mongodb://localhost:27017/my-test-db'}});

agenda.define('say hi', function(job, done) {
  console.log('Hi!!!!');
  setTimeout(done, 5000)
});
agenda.every('10 seconds', 'say hi');
agenda.every('10 seconds', 'say hi');
agenda.every('10 seconds', 'say hi');
agenda.every('10 seconds', 'say hi');

agenda.start();

@rschmukler

After working out how Agenda pulls in jobs for processing, I noticed that the _findAndLockNextJob method could return multiple jobs, which ends up breaking jobs with the same definition. The query grabs and updates the list of jobs whose lock time is passed or should be ran, meaning if you have multiple of the same jobs running and shut down the server, it will end up never running those jobs again. I do need to pick @rschmukler's brain about how to properly get tests for these methods, as it seems that the methods in question are private and unreachable using Mocha. Fixes agenda#131

@rschmukler

After working out how Agenda pulls in jobs for processing, I noticed that the _findAndLockNextJob method could return multiple jobs, which ends up breaking jobs with the same definition. The query grabs and updates the list of jobs whose lock time is passed or should be ran, meaning if you have multiple of the same jobs running and shut down the server, it will end up never running those jobs again. I do need to pick @rschmukler's brain about how to properly get tests for these methods, as it seems that the methods in question are private and unreachable using Mocha. Fixes agenda#131

Albert-IV · 2015-02-12T17:51:00Z

@Abdelhady @BlakePetersen Hey, could you guys possibly check out the newest version of Agenda on the master branch and see if it fixed these issues for you?

I'm adding additional tests for this specific case tonight / tomorrow, but master should fix the issues you've been having.

Abdelhady · 2015-02-15T15:32:33Z

I've updated my packages to use the latest version Release 0.6.28 but the issue still exists,

the case of "multiple .every() jobs being set" is not my case here, because I have only 1 recurring job, & even when I was having only this job (without the single jobs) the issue was still there too.

I think we should re-open it again or something.

Albert-IV · 2015-02-15T16:44:15Z

Whelp, fixing bugs while not fixing bugs!

agenda.start()
graceful = ()->
    agenda.stop ()->
        process.exit 0

process.on 'SIGTERM', graceful
process.on 'SIGINT' , graceful

agenda.define 'my job', (job, done) ->
    console.log 'Job running!  Will finish in 5 seconds.'
    setTimeout(done, 5000)

agenda.every '10 seconds', 'my job'

Would this be a fair use-case for you that triggers the bug? When I tried something like this it didn't get stuck for me (pre-patch). I tried to restart the server while the job was mid-run, as I'd assumed it was a lock time issue. Do you know if you're restarting the server while a job is in a finished state?

yufengyw · 2015-06-17T14:24:59Z

@rschmukler I found the same problem. when server stop. it calls the unlockJobs. sometimes the lockedAt is set to null, sometime not. And I found the root cause is in jobProcessing(). it pop up the job in the jobQueue, but it does not add it to the _runningJobs immediately as the nextRunAt > now. if server stop at this time. it will not unlock this job.(it is not in jobQueue or _runningJobs but it is locked).

var job = jobQueue.pop(),
if (job.attrs.nextRunAt < now) {
      runOrRetry();
    } else {
      setTimeout(runOrRetry, job.attrs.nextRunAt - now);
    }

elssar · 2015-07-03T09:20:58Z

I have an odd problem. None of my recurring tasks are locked, but still they haven't been executed after the server restarted. lockedAt is false for all of them.

bonesoul · 2016-01-22T00:30:40Z

is this already fixed? i'm also having the issue with 0.7 series.

shaunymca · 2016-02-02T21:41:03Z

I'm also having issues with the latest build.

mirkods · 2016-03-09T13:14:12Z

👍

pcorey · 2016-05-24T19:09:37Z

I'm currently having this issue as well.

Re: @droppedoncaprica's fix. It looks like _db has been renamed to _collection. Also, this should probably be a multi Mongo update, in case you have multiple recurring jobs that need to be unlocked. Lastly, _collection isn't available until the "ready" event fires:

function removeStaleJobs(callback) {
    agenda._collection.update({
        lockedAt: {
            $exists: true
        }
    }, {
        $set: {
            lockedAt: null
        }
    }, {
        multi: true
    }, callback);
}

agenda.on('ready', function() {
  removeStaleJobs((e, r) => {
      if (e) {
          console.error("Unable to remove stale jobs. Starting anyways.");
      }
      agenda.start();
  });
});

ashutosh-akss · 2016-08-29T12:38:48Z

@pcorey Thanks , I tried your method but it still does not work for me .

I am on node 4.5.0

function removeStaleJobs(callback) {
    agenda._collection.update({
        lockedAt: {
            $exists: true
        }
    }, {
        $set: {
            lockedAt: null
        }
    }, {
        multi: true
    }, callback);
}

agenda.on('ready', function() {

    console.log("AGENDA Service starting");
     removeStaleJobs((e, r) => {
        console.log("REMOVED STALE JOBSSSS");

          if (e) {
              console.error("Unable to remove stale jobs. Starting anyways.");
          }
          agenda.start();
      });

});

kfiroo · 2016-11-18T14:43:16Z

Issue seems to be resolved on latest release

ghost · 2017-02-15T10:15:50Z

I have the same problem. agenda@0.9.0, node@7.5.0

My workaround of this problem is deleting of agenda collection at start time. It's not important for me the persistence, but I'm afraid to rely on graceful exit.

simison · 2017-06-22T23:14:00Z

I'm trying to triage issues so I'll close this and follow ups here please: #410

This might still be an issue. If anyone still experiences this (or has solutions!), please just even give +1 at #410 so that we'll know better.

Abdelhady mentioned this issue Feb 1, 2015

How to restart job after failing? #123

Closed

Albert-IV mentioned this issue Feb 11, 2015

Fix for when _findAndLockNextJob returns multiple jobs. #151

Merged

rschmukler closed this as completed in #151 Feb 12, 2015

Albert-IV reopened this Feb 15, 2015

elssar mentioned this issue Jul 3, 2015

Restarting the server doesn't update the time for next-scheduled to run. #175

Closed

This was referenced Sep 7, 2016

reschedule after node restart #360

Closed

Cron Jobs not starting when node server is started #357

Open

simison closed this as completed Jun 22, 2017

simison mentioned this issue Jun 22, 2017

Maintainers discussion #441

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting the server may stop the recurring jobs forever #131

Restarting the server may stop the recurring jobs forever #131

Abdelhady commented Jan 1, 2015

BlakePetersen commented Feb 1, 2015

Abdelhady commented Feb 1, 2015

BlakePetersen commented Feb 1, 2015

Abdelhady commented Feb 1, 2015

rschmukler commented Feb 3, 2015

Abdelhady commented Feb 3, 2015

kosmikko commented Feb 4, 2015

Albert-IV commented Feb 4, 2015

BlakePetersen commented Feb 4, 2015

Abdelhady commented Feb 6, 2015

Albert-IV commented Feb 11, 2015

Albert-IV commented Feb 12, 2015

Abdelhady commented Feb 15, 2015

Albert-IV commented Feb 15, 2015

yufengyw commented Jun 17, 2015

elssar commented Jul 3, 2015

bonesoul commented Jan 22, 2016

shaunymca commented Feb 2, 2016

mirkods commented Mar 9, 2016

pcorey commented May 24, 2016

ashutosh-akss commented Aug 29, 2016 •

edited

Loading

kfiroo commented Nov 18, 2016

ghost commented Feb 15, 2017

simison commented Jun 22, 2017

Restarting the server may stop the recurring jobs forever #131

Restarting the server may stop the recurring jobs forever #131

Comments

Abdelhady commented Jan 1, 2015

BlakePetersen commented Feb 1, 2015

Abdelhady commented Feb 1, 2015

BlakePetersen commented Feb 1, 2015

Abdelhady commented Feb 1, 2015

rschmukler commented Feb 3, 2015

Abdelhady commented Feb 3, 2015

kosmikko commented Feb 4, 2015

Albert-IV commented Feb 4, 2015

BlakePetersen commented Feb 4, 2015

Abdelhady commented Feb 6, 2015

Albert-IV commented Feb 11, 2015

Albert-IV commented Feb 12, 2015

Abdelhady commented Feb 15, 2015

Albert-IV commented Feb 15, 2015

yufengyw commented Jun 17, 2015

elssar commented Jul 3, 2015

bonesoul commented Jan 22, 2016

shaunymca commented Feb 2, 2016

mirkods commented Mar 9, 2016

pcorey commented May 24, 2016

ashutosh-akss commented Aug 29, 2016 • edited Loading

kfiroo commented Nov 18, 2016

ghost commented Feb 15, 2017

simison commented Jun 22, 2017

ashutosh-akss commented Aug 29, 2016 •

edited

Loading