Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-wreck cancel: fall back to kill -9 if job is not pending #1385

Merged
merged 3 commits into from Mar 24, 2018

Conversation

Projects
None yet
5 participants
@grondo
Copy link
Contributor

grondo commented Mar 23, 2018

This addresses issue #1379 with a couple minor changes to flux wreck cancel. When sched is loaded and it gets EINVAL error, cancel will fall back to equivalent of flux wreck kill -9 on the job.

I also added a -f, --force option which allows flux wreck cancel to work even if sched is not loaded, mainly so the code could be tested in t2000-wreck.t.

elseif err == "Invalid argument" then
prog:log ("Sending SIGKILL to %d\n", id)
local rc, err = f:sendevent ({signal = 9}, "wreck.%d.kill", id)
if not rc then self:die ("signal: %s\n", err) end

This comment has been minimized.

@dongahn

dongahn Mar 23, 2018

Contributor

Nice work!

If sched.cancel rpc will respond with EINVAL if the state of the target job is essentially not pending (including running and completed), hopefully there is no side effect for kill(-9) rpc.

This comment has been minimized.

@grondo

grondo Mar 23, 2018

Author Contributor

Yeah, I was lazy and do not check the real state of the job here. The wreck.N.kill event is ignored if the job isn't running.

This comment has been minimized.

@dongahn

dongahn Mar 23, 2018

Contributor

OK. Great. LGTM otherwise.

This comment has been minimized.

@grondo

grondo Mar 23, 2018

Author Contributor

Thanks for the review @dongahn!

@grondo

This comment has been minimized.

Copy link
Contributor Author

grondo commented Mar 23, 2018

Example:

$ flux submit -n 32 sleep 100
submit: Submitted jobid 1
$ flux submit -n 32 sleep 100
submit: Submitted jobid 2
 flux wreck ls
    ID NTASKS STATE                    START      RUNTIME    RANKS COMMAND
     1     32 running    2018-03-23T14:56:24       3.490s    [0-1] sleep
     2     32 submitted  2018-03-23T14:56:27       0.000s      nil sleep
$ flux wreck cancel 2
$ flux wreck cancel 1
wreck: Sending SIGKILL to 1
$ flux wreck status 1
Job 1 status: complete
task[0-31]: exited with signal 9
$ flux wreck status 2
Job 2 status: cancelled
@coveralls

This comment has been minimized.

Copy link

coveralls commented Mar 23, 2018

Coverage Status

Coverage decreased (-0.04%) to 78.781% when pulling 69c228b on grondo:issue#1379 into 4223879 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Mar 23, 2018

Codecov Report

Merging #1385 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1385      +/-   ##
==========================================
- Coverage   78.49%   78.49%   -0.01%     
==========================================
  Files         162      162              
  Lines       29741    29741              
==========================================
- Hits        23345    23344       -1     
- Misses       6396     6397       +1
Impacted Files Coverage Δ
src/common/libflux/keepalive.c 86.66% <0%> (-6.67%) ⬇️
src/common/libflux/request.c 87.17% <0%> (-1.29%) ⬇️
src/common/libutil/base64.c 95.07% <0%> (-0.71%) ⬇️
src/broker/overlay.c 74.14% <0%> (-0.32%) ⬇️
src/common/libflux/message.c 81.36% <0%> (+0.11%) ⬆️
src/common/libutil/dirwalk.c 94.28% <0%> (+0.71%) ⬆️
src/common/libflux/rpc.c 94.21% <0%> (+0.82%) ⬆️

grondo added some commits Mar 23, 2018

wreck: kill running jobs in flux-wreck cancel
Fall back to kill -9 for running jobs in flux-wreck cancel.

Fixes issue #1379
wreck: add -f, --force option to flux-wreck cancel
Add -f, --force option to flux-wreck cancel to force kill with SIGKILL
running jobs even when sched is not loaded.
t/t2000-wreck: test flux-wreck cancel -f
Test --force option of flux-wreck cancel.

@grondo grondo force-pushed the grondo:issue#1379 branch from f534f87 to 69c228b Mar 23, 2018

@garlick

This comment has been minimized.

Copy link
Member

garlick commented Mar 23, 2018

LGTM. Will merge after travis finishes.

@garlick garlick merged commit 7fd643e into flux-framework:master Mar 24, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.04%) to 78.781%
Details

@grondo grondo deleted the grondo:issue#1379 branch Apr 26, 2018

@grondo grondo referenced this pull request May 10, 2018

Closed

0.9.0 Release #1479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.