Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chronos Arithmetic Exception /Zero #614

Open
Saurabh2004in opened this issue Jan 9, 2016 · 7 comments
Open

Chronos Arithmetic Exception /Zero #614

Saurabh2004in opened this issue Jan 9, 2016 · 7 comments

Comments

@Saurabh2004in
Copy link

Hi,

I am getting below exception, Just curious to know what causing this issue/

[2016-01-07 13:48:02,153] INFO Loading jobs (org.apache.mesos.chronos.scheduler.jobs.JobScheduler:601)

[2016-01-07 13:48:02,240] INFO Registering jobs:55 (org.apache.mesos.chronos.scheduler.jobs.JobUtils$:74)

[2016-01-07 13:48:02,259] ERROR Loading tasks or jobs failed. Exiting. (org.apache.mesos.chronos.scheduler.jobs.JobScheduler:605)

java.lang.ArithmeticException: / by zero

           at org.apache.mesos.chronos.scheduler.jobs.JobUtils$.calculateSkips(JobUtils.scala:157)

           at org.apache.mesos.chronos.scheduler.jobs.JobUtils$.skipForward(JobUtils.scala:119)

           at org.apache.mesos.chronos.scheduler.jobs.JobUtils$.makeScheduleStream(JobUtils.scala:107)

           at org.apache.mesos.chronos.scheduler.jobs.JobScheduler$$anonfun$6.apply(JobScheduler.scala:146)

           at org.apache.mesos.chronos.scheduler.jobs.JobScheduler$$anonfun$6.apply(JobScheduler.scala:146)

           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)

           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)

           at scala.collection.immutable.List.foreach(List.scala:381)

           at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)

           at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)

           at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)

           at scala.collection.AbstractTraversable.map(Traversable.scala:104)

           at org.apache.mesos.chronos.scheduler.jobs.JobScheduler.registerJob(JobScheduler.scala:146)

           at org.apache.mesos.chronos.scheduler.jobs.JobUtils$.loadJobs(JobUtils.scala:75)

           at org.apache.mesos.chronos.scheduler.jobs.JobScheduler.liftedTree1$1(JobScheduler.scala:602)

           at org.apache.mesos.chronos.scheduler.jobs.JobScheduler.onElected(JobScheduler.scala:597)

           at org.apache.mesos.chronos.scheduler.jobs.JobScheduler$$anon$3.isLeader(JobScheduler.scala:568)

           at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:644)

           at org.apache.curator.framework.recipes.leader.LeaderLatch$9.apply(LeaderLatch.java:640)

           at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)

           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

           at java.lang.Thread.run(Thread.java:745)

It looks like calculateSlips on JobUtils.scala is throwing exception. Just want to make sure its a chronos bug or someting related to cron expression causing this.

/**

  • Calculates the number of skips needed to bring the job start into the future

    */

    protected def calculateSkips(dateTime: DateTime, jobStart: DateTime, period: Period): Int = {

    // If the period is at least a month, we have to actually add the period to the date

    // until it's in the future because a month-long period might have different seconds

    if (period.getMonths >= 1) {

    var skips = 0
    
    var newDate = new DateTime(jobStart)
    
    while (newDate.isBefore(dateTime)) {
    
      newDate = newDate.plus(period)
    
      skips += 1
    
    }
    
    skips
    

    } else {

    Seconds.secondsBetween(jobStart, dateTime).getSeconds / period.toStandardSeconds.getSeconds
    

    }

    }

@jordmoz
Copy link

jordmoz commented Mar 1, 2016

I'm seeing this as well.

@xargstop
Copy link

xargstop commented Apr 25, 2016

I get this issue too. And all chronos cant restart.

The reason is chronos allows job with run_interval equal 0 to be created, eg.

"schedule":"R0/2015-08-28T14:04:54.000+0800/PT0M"

But the exception would be triggered when reload jobs from zookeeper, such as restart.

I delete the jobs with such config and restart successfully.

@xtazz
Copy link

xtazz commented Jun 24, 2016

@gongaiguo how do you delete these jobs without chronos started?

@Saurabh2004in
Copy link
Author

Saurabh2004in commented Jun 27, 2016

I set the else part to zero , we don't need to skip time if interval is zero.

@Saurabh2004in
Copy link
Author

Same patch is applied in#692

@xargstop
Copy link

xargstop commented Jul 8, 2016

@xtazz I deleted them from zookeeper.

@bfoussier
Copy link

bfoussier commented Jul 20, 2016

Hi,

I met the problem when doing HA tests. When chronos restarts it reloads jobs stored in Zookeeper (job was { "schedule": "R//P", "name": "create-volume-flocker-demo", "command"...}, ) and fails.

I applied the fix proposed at #692 and now Chronos loops infinitely :
[2016-07-20 09:21:49,968] INFO Calling next for stream: R/2016-07-18T09:38:44.236Z/PT0S, jobname: create-volume-flocker-demo (org.apache.mesos.chrono
s.scheduler.jobs.JobScheduler:509)
[2016-07-20 09:21:49,968] INFO JobNotificationObserver does not handle JobSkipped(ScheduleBasedJob(R/2016-07-18T09:38:44.236Z/PT0S,create-volume-floc
ker-demo,docker volume create -d flocker --name apache_vol_2_staging -o size=45GB,PT60S,0,0,,,,2,,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),List
Buffer(),false,root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),2016-07-18T09:38:44.236Z) (org.apache.mesos.chronos.scheduler.jobs
.JobsObserver$:27)
[2016-07-20 09:21:49,968] INFO JobStats does not handle JobSkipped(ScheduleBasedJob(R/2016-07-18T09:38:44.236Z/PT0S,create-volume-flocker-demo,docker
volume create -d flocker --name apache_vol_2_staging -o size=45GB,PT60S,0,0,,,,2,,,,,,false,0.1,256.0,128.0,false,0,ListBuffer(),ListBuffer(),false,
root,null,,ListBuffer(),true,ListBuffer(),false,false,ListBuffer()),2016-07-18T09:38:44.236Z) (org.apache.mesos.chronos.scheduler.jobs.JobsObserver$:
27)
[2016-07-20 09:21:49,968] INFO tail: R/2016-07-18T09:38:44.236Z/PT0S now: 2016-07-20T09:21:48.145Z (org.apache.mesos.chronos.scheduler.jobs.JobSchedu
ler:563)

and it restarts for same job.

[2016-07-20 09:21:49,968] INFO Calling next for stream: R/2016-07-18T09:38:44.236Z/PT0S, jobname: create-volume-flocker-demo (org.apache.mesos.chrono
s.scheduler.jobs.JobScheduler:509)

Do I need another fix ?
Does the proposed fix at #692 prevent from storing corrupted data in Zookeeper ?
Are my data corrupted in Zookeeper and should I erase them ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants