Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-25550] Hard kill #141

Merged
merged 10 commits into from Oct 26, 2015

Conversation

Projects
None yet
6 participants
@jglick
Copy link
Member

commented Jun 17, 2015

JENKINS-25550

  • basic implementation
  • remove special case handling in DurableTaskStep and (cf. #176) AbstractSynchronousNonBlockingStepExecution
  • make second abort attempt forcibly stop just running steps
  • confirmation UI
  • some problem with zombie builds coming back to life after restart

@reviewbybees

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Jun 17, 2015

If I understand correctly, this thing does not abort the flow execution, hence there may be collisions between the hanging step and the newly launched one if they somehow use a shared resource.

@jglick

This comment has been minimized.

Copy link
Member Author

commented Jun 17, 2015

This only happens after the flow execution has been “softly” aborted twice and failed to exit, generally due to a failure of a StepExecution.stop to respond reliably. There are probably undesirable effects associated with a hard kill in some circumstances, but the alternative is very bad. A future revision of CpsFlowExecution.abort might do a better job of aborting steps within the language.

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Jun 25, 2015

I have a weak 👎 regarding such solution.
What about adding "Hard kill" action with a confirmation dialog, which would allow users to abort the build if they are sure and take the responsibility for all effects, which may happen?

@jglick

This comment has been minimized.

Copy link
Member Author

commented Jun 27, 2015

That might be a better UI, assuming the option appeared only after a regular Executor.stop had been run at least once and (say) fifteen seconds had elapsed without the build ending on its own.

@jglick jglick changed the title [JENKINS-25550] Hard kill [WiP] [JENKINS-25550] Hard kill Jun 27, 2015

JenkinsRuleExt.waitForCompletion(b);
r.assertBuildStatus(Result.ABORTED, b);
}
public static class Zombie extends AbstractStepImpl {

This comment has been minimized.

Copy link
@KostyaSha

KostyaSha Jul 7, 2015

Member

add new line. With @Test without new lines this block of code looks very unparsable by eyes.

This comment has been minimized.

Copy link
@jglick

jglick Aug 11, 2015

Author Member

Prefer to keep all the lines relating to this one test case together.

@jglick jglick changed the title [WiP] [JENKINS-25550] Hard kill [JENKINS-25550] Hard kill Aug 4, 2015

@jglick jglick referenced this pull request Aug 13, 2015

Merged

[JENKINS-29875] New API for long lived synchronous steps #176

4 of 4 tasks complete

jglick added some commits Oct 21, 2015

Merge branch 'master' into kill-JENKINS-25550
Conflicts:
	CHANGES.md
	aggregator/src/test/java/org/jenkinsci/plugins/workflow/WorkflowRunTest.java
StepExecution.stop implementations should not call StepContext.onFail…
…ure if they have already notified something which ought to terminate and produce a callback.

Rather the Workflow engine should be responsible for harder kills.
For some implementations it makes sense to immediately fail the step because we are not expecting any further events.
Strengthening test coverage to demonstrate that the build really comp…
…letes as ABORTED after one Executor.interrupt call.
Switching to a multitiered kill system.
Clicking the stop button just sends a polite interrupt (StepExecution.stop).
After 15s, if the build is still running, it also shows a hyperlink allowing StepContext.onFailure.
After another 15s, if the build is *still* running, you can do the hard kill.
@reviewbybees

This comment has been minimized.

Copy link

commented Oct 22, 2015

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

getContext().onFailure(cause);
}
FilePath workspace = getWorkspace();
if (workspace != null) {

This comment has been minimized.

Copy link
@amuniz

amuniz Oct 23, 2015

Member

Under what conditions could this be false?

This comment has been minimized.

Copy link
@jglick

jglick Oct 23, 2015

Author Member

That the slave is offline.

(This code was already there, of course, just reindented.)

@@ -65,7 +65,6 @@ public void stop(Throwable cause) throws Exception {
if (task != null) {
task.cancel(true);
}
getContext().onFailure(cause);

This comment has been minimized.

Copy link
@amuniz

amuniz Oct 23, 2015

Member

Where is this called now? In doTerm? So any AbstractSynchronousNonBlockingStepExecution is now virtually not cancellable without requiring a click on doTerm link, right?

This comment has been minimized.

Copy link
@jglick

jglick Oct 23, 2015

Author Member

The regular abort button should call task.cancel(true) which ought to wind up calling Thread.interrupt, as SynchronousNonBlockingStepTest still demonstrates. The doTerm fallback should only be necessary if that thread ignores interrupts (or catches it but then retries its task, etc.).

@amuniz

This comment has been minimized.

Copy link
Member

commented Oct 23, 2015

🐝

jglick added a commit that referenced this pull request Oct 26, 2015

@jglick jglick merged commit 5aca627 into jenkinsci:master Oct 26, 2015

1 check passed

Jenkins This pull request looks good
Details

@jglick jglick deleted the jglick:kill-JENKINS-25550 branch Oct 26, 2015

@cnmcavoy

This comment has been minimized.

Why isn't this >=? If there has been more than 3 interruptions, there is no way for someone to "undo" one and get back to 3 to hard kill a job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.