New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancel job functionality #96
Conversation
@@ -217,6 +217,26 @@ class WebApi(system: ActorSystem, | |||
} | |||
} | |||
} ~ | |||
// DELETE /jobs/<jobId> | |||
// Stop the current job. All other jobs submited with this spark context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It stops the job with the specified jobId right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, jobserver job id (that returned after job submitted to jobserver).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but, what I'm trying to say is that the documentation says "current job" , there might be several jobs currently working, you just want to cancel one.
@petro-rudenko This looks great so far. Can we add a JobKilled status message, to be sent to the statusActor, when a job is killed? This should also result in a DAO/Database update. Basically, after a job is killed, if I look up a job using the jobs route, I should see that the status was "killed". Thanks! |
Take a look. The problem is that cancelJobGroup is an asynchronous operation which will cause eventually a SparkException in job thread. So it will seems like a failed job. |
@@ -195,5 +196,20 @@ abstract class JobManagerSpec extends JobSpecBase(JobManagerSpec.getNewSystem) { | |||
case JobResult(_, result: Int) => result should equal (1 + 2 + 3) | |||
} | |||
} | |||
|
|||
it("should be able to cancel running job") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test for what happens when the job is not running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will just call sparkContext.cancelJobGroup(NONEXISTING_GROUP_ID) which is safe (will not raise any exception) and will do actually nothing. In WebAPI it will actually handle this:
https://github.com/petro-rudenko/spark-jobserver/blob/cancel_job3/job-server/src/spark.jobserver/WebApi.scala#L228
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'm saying there's no test for that scenario, the user should receive a message that no such job exists right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, if you try to delete a job that already finished what would be the result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeitos It seems from the code that deleting a nonexistant job would get the user the NotFound message. I think the same thing if the job already finished.
@petro-rudenko Thanks. Hmm... so if we merged it now, then a killed job will show up as a job that errored out, is that right? Could that be tested in the unit test? I think this is important enough to merge in soon, and we can deal with the status issues later. |
I don't quite understand this line. When i send cancelJobGroup command to spark eventually in job future thread there would be sparkException and the logic will come there. Also i cannot access to statusActor from JobManagerActorSpec to test that i'll get this message. |
@petro-rudenko hm, not sure, I'll have to review the logic. Basically, in line 209 of your JobManagerSpec, in the test case that you wrote, instead of LMK if you'd like a hangout or something to sort stuff out. |
Ok got it. JobKilled would be first, because cancelJobGroup is an asynchronous operation and theoretically it's even possible that cancelJobGroup was called, but job was in some millisseconds to finish, so we'll get JobFinished result. |
I really need this functionality with my work. What about merging this pull request? |
@petro-rudenko I think will merge this. Don't see any reasons not to, just minor lingering issues that can be addressed in the future. Thanks! |
Will this get back ported to earlier versions? |
Probably won't have time for it myself, but feel free to submit a PR for On Fri, Apr 24, 2015 at 2:09 AM, Noorul Islam K M notifications@github.com
If you are free, you need to free somebody else. |
No description provided.