Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job preemption details to Summit User Guide #67

Closed
jack-morrison opened this issue Sep 13, 2019 · 4 comments · Fixed by #82
Closed

Add job preemption details to Summit User Guide #67

jack-morrison opened this issue Sep 13, 2019 · 4 comments · Fixed by #82
Assignees

Comments

@jack-morrison
Copy link
Contributor

No description provided.

@chrisfuson
Copy link
Contributor

Must use the killable queue. Bins 4 and 5 eligible for pre-emption (by jobs in bins 1 and 2) once the bin's standard max walltime limit has been reached. Max walltime for killable queue is 24 hours. For example, in the killable queue a job in bin5 can request a walltime of 24 hours. It will be eligible for pre-emption from a job in bin1 or bin2 after 2-hours of runtime.

Killable queue jobs requesting fewer than 91 nodes will be rejected at submission time by the submit filter.

@jack-morrison jack-morrison self-assigned this Sep 23, 2019
@jack-morrison
Copy link
Contributor Author

jack-morrison commented Sep 23, 2019

Killable queue jobs requesting fewer than 91 nodes will be rejected at submission time by the submit filter.

@chrisfuson why would we want to exclude bin5 jobs from being preempt-able?
This doesn't seem to be in place:

summit login5$ bsub -q killable -W 10 -J test -nnodes 1 -PSTF007 -Is $SHELL
Job <649324> is submitted to queue <killable>.
<<Waiting for dispatch ...>>
<<Starting on batch4>>

@chrisfuson
Copy link
Contributor

@jack-morrison node requests below 91 are eligible. This should be bins 4 and 5.

`summit-login5 237> bsub -q killable -W 10 -J test -nnodes 92 -PSTF007 -Is $SHELL

Batch job not submitted. Please address the following and resubmit:

** Requested nodes (92) over limit. Please resubmit with a node count below 91.


Please contact help@olcf.ornl.gov if you need assistance.

Request aborted by esub. Job not submitted.
summit-login5 238>`

@jack-morrison
Copy link
Contributor Author

Ah, over 91 nodes is what we're going for here. Got it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants