Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating a job from scheduled to dependent leaves the job continuing to fire on original schedule #627

Closed
tyrannasaurusbanks opened this issue Feb 2, 2016 · 4 comments

Comments

@tyrannasaurusbanks
Copy link

Hi Guys,

I'm seeing some behaviour which I don't think is intended/appropriate, I thought I'd raise a issue here to see if it merits further investigation (or if I've got the wrong end of the stick).

It appears that updating a job from scheduled to dependent causes the original schedule to remain intact, resulting in the job firing twice: triggered once by the schedule, and once by the parent job completing successfully.

Scenario
I add the following 2 scheduled jobs:

$ curl -v -L -H "Content-Type: application/json" -X POST -d @testscheduledA.json http://localhost:8080/scheduler/iso8601
$ curl -v -L -H "Content-Type: application/json" -X POST -d @testscheduledB.json http://localhost:8080/scheduler/iso8601
{
  "schedule": "R/2016-01-20T15:40:00Z/PT24H",
  "name": "test_A",
  "epsilon": "PT30M",
  "command": "echo 'FOO' >> /tmp/JOB1_OUT",
  "owner": "bob@airbnb.com",
  "async": false
}
{
  "schedule": "R/2016-01-20T15:41:00Z/PT24H",
  "name": "test_B",
  "epsilon": "PT30M",
  "command": "echo 'FOO' >> /tmp/JOB1_OUT",
  "owner": "bob@airbnb.com",
  "async": false
}

I then update test_B to be a dependent job of test_A:

$ curl -v -L -H "Content-Type: application/json" -X POST -d @testdependentB.json http://localhost:8080/scheduler/dependency

testdependentB.json

{
  "name": "test_B",
  "epsilon": "PT30M",
  "command": "echo 'FOO' >> /tmp/JOB1_OUT",
  "owner": "bob@airbnb.com",
  "async": false,
  "parents": [
    "test_A"
  ]
}

Result
The test_A job executes at the correct time, and kicks off test_B after it succeeds. But test_B then executes exactly 1 minute after test_A, which matches the original schedule set.

screen shot 2016-02-02 at 3 43 34 pm

As soon as I update test_B to be dependent, I can no longer see any mention of the schedule being held in the job state either through the chronos REST api or by manually inspecting the state of the attached zookeeper; so, from my point of view, this isn't the expected behaviour.

I've had a look at the api resources in the code base (specifically the DependentJobResource & JobScheduler) but can't see anything obvious, could someone shed some light on what's going on here?

Setup
I'm using chronos-2.4.0 & mesos-0.24.1 installed via-yum, & running chronos with the following command:

java -Djava.library.path=/usr/local/lib:/usr/lib64:/usr/lib -Djava.util.logging.SimpleFormatter.format=%2$s %5$s%6$s%n -Xmx512m -cp /usr/bin/chronos org.apache.mesos.chronos.scheduler.Main --master 172.17.0.16:5050 --zk_hosts zk://172.17.0.19:2181 --mail_server xxx:25 --mail_from xxx@xxx.com --failure_retry 900000 --http_credentials xxx:xxx

The output of java -version is:

openjdk version "1.8.0_65"
OpenJDK Runtime Environment (build 1.8.0_65-b17)
OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)

Thanks for your time and the awesome tool, chronos rocks,

Ioan

@Califax
Copy link
Contributor

Califax commented Feb 11, 2016

Looking at the code in DependentJobResource.scala

`val parents = jobGraph.parentJobs(newJob)
oldJob match {
case j: DependencyBasedJob =>
val newParentNames = parents.map(.name)
val oldParentNames = jobGraph.parentJobs(j).map(
.name)

        if (newParentNames != oldParentNames) {
          oldParentNames.foreach(jobGraph.removeDependency(_, oldJob.name))
          newParentNames.foreach(jobGraph.addDependency(_, newJob.name))
        }
        jobScheduler.removeSchedule(j)
      case j: ScheduleBasedJob =>
        parents.foreach(p => jobGraph.addDependency(p.name, newJob.name))
    }`

It seems the jobScheduler.removeSchedule(j) should be on the case j where the old job was a ScheduleBasedJob not when it was a dependent based job.

@Califax
Copy link
Contributor

Califax commented Feb 16, 2016

I fixed this in the following PR:
#635

@tyrannasaurusbanks
Copy link
Author

One step ahead! Thanks Califax!

@Califax
Copy link
Contributor

Califax commented Mar 29, 2016

Should be marked as closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants