Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel job functionality #96

Merged
merged 3 commits into from Apr 3, 2015
Merged

Conversation

petro-rudenko
Copy link
Contributor

No description provided.

@@ -217,6 +217,26 @@ class WebApi(system: ActorSystem,
}
}
} ~
// DELETE /jobs/<jobId>
// Stop the current job. All other jobs submited with this spark context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It stops the job with the specified jobId right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, jobserver job id (that returned after job submitted to jobserver).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but, what I'm trying to say is that the documentation says "current job" , there might be several jobs currently working, you just want to cancel one.

@velvia
Copy link
Contributor

velvia commented Mar 19, 2015

@petro-rudenko This looks great so far.

Can we add a JobKilled status message, to be sent to the statusActor, when a job is killed? This should also result in a DAO/Database update. Basically, after a job is killed, if I look up a job using the jobs route, I should see that the status was "killed".

Thanks!

@petro-rudenko
Copy link
Contributor Author

Take a look. The problem is that cancelJobGroup is an asynchronous operation which will cause eventually a SparkException in job thread. So it will seems like a failed job.

@@ -195,5 +196,20 @@ abstract class JobManagerSpec extends JobSpecBase(JobManagerSpec.getNewSystem) {
case JobResult(_, result: Int) => result should equal (1 + 2 + 3)
}
}

it("should be able to cancel running job") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for what happens when the job is not running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will just call sparkContext.cancelJobGroup(NONEXISTING_GROUP_ID) which is safe (will not raise any exception) and will do actually nothing. In WebAPI it will actually handle this:
https://github.com/petro-rudenko/spark-jobserver/blob/cancel_job3/job-server/src/spark.jobserver/WebApi.scala#L228

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'm saying there's no test for that scenario, the user should receive a message that no such job exists right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if you try to delete a job that already finished what would be the result?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zeitos It seems from the code that deleting a nonexistant job would get the user the NotFound message. I think the same thing if the job already finished.

@velvia
Copy link
Contributor

velvia commented Mar 24, 2015

@petro-rudenko Thanks. Hmm... so if we merged it now, then a killed job will show up as a job that errored out, is that right? Could that be tested in the unit test?

I think this is important enough to merge in soon, and we can deal with the status issues later.

@petro-rudenko
Copy link
Contributor Author

I don't quite understand this line. When i send cancelJobGroup command to spark eventually in job future thread there would be sparkException and the logic will come there. Also i cannot access to statusActor from JobManagerActorSpec to test that i'll get this message.

@velvia
Copy link
Contributor

velvia commented Mar 26, 2015

@petro-rudenko hm, not sure, I'll have to review the logic.
As far as statusActor, the way it works is that you don't access statusActor directly, but rather subscribe to it. The parent who created the JobManagerActor automatically subscribes to the StatusActor messages, so all you have to do is check in the test body that you are getting status messages. If you look at the other tests in JobManagerActorSpec, you see tons of expectMsg(....) or expectMsgClass(...) to check status updates. Hope that helps!

Basically, in line 209 of your JobManagerSpec, in the test case that you wrote, instead of expectNoMsg(), I would think now that you would want something like expectMsg(JobKilled). Unless you are getting the SparkException and the job dies first....

LMK if you'd like a hangout or something to sort stuff out.

@petro-rudenko
Copy link
Contributor Author

Ok got it. JobKilled would be first, because cancelJobGroup is an asynchronous operation and theoretically it's even possible that cancelJobGroup was called, but job was in some millisseconds to finish, so we'll get JobFinished result.

@v-gerasimov
Copy link

I really need this functionality with my work. What about merging this pull request?

@velvia
Copy link
Contributor

velvia commented Apr 3, 2015

@petro-rudenko I think will merge this. Don't see any reasons not to, just minor lingering issues that can be addressed in the future. Thanks!

velvia added a commit that referenced this pull request Apr 3, 2015
@velvia velvia merged commit e5cd03f into spark-jobserver:master Apr 3, 2015
@noorul
Copy link
Contributor

noorul commented Apr 24, 2015

Will this get back ported to earlier versions?

@velvia
Copy link
Contributor

velvia commented Apr 24, 2015

Probably won't have time for it myself, but feel free to submit a PR for
this, it's fairly easy once you locate the original PR.

On Fri, Apr 24, 2015 at 2:09 AM, Noorul Islam K M notifications@github.com
wrote:

Will this get back ported to earlier versions?


Reply to this email directly or view it on GitHub
#96 (comment)
.

If you are free, you need to free somebody else.
If you have some power, then your job is to empower somebody else.
--- Toni Morrison

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants