Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: non-service jobs on job restart -reschedule #19147

Merged
merged 4 commits into from Nov 29, 2023

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Nov 22, 2023

The -reschedule flag stops allocations and assumes the Nomad scheduler will create new allocations to replace them. But this is only true for service jobs.

Restarting non-service jobs with the -reschedule flag causes the command to loop forever waiting for the allocations to be replaced, which never happens.

Allocations for system jobs may be replaced by triggering an evaluation after each stop to cause the reconciler to run again.

Batch and sysbatch jobs should not be allowed to be rescheduled as they are expected to run to completion unless stopped.

Closes #19043

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although the test failure seems like it could be related and there isa a merge conflict 😬

The `-reschedule` flag stops allocations and assumes the Nomad scheduler
will create new allocations to replace them. But this is only true for
service jobs.

Restarting non-service jobs with the `-reschedule` flag causes the
command to loop forever waiting for the allocations to be replaced,
which never happens.

Allocations for system jobs may be replaced by triggering an evaluation
after each stop to cause the reconciler to run again.

Batch and sysbatch jobs should not be allowed to be rescheduled as they
are expected to run to completion unless stopped.
Copy link
Member

@Juanadelacuesta Juanadelacuesta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Batch jobs are able to be rescheduled without further steps, but
sysbatch jobs never receive a replacement allocation so they are still
not allowed to be rescheduled.
nvanthao pushed a commit to nvanthao/nomad that referenced this pull request Mar 1, 2024
The `-reschedule` flag stops allocations and assumes the Nomad scheduler
will create new allocations to replace them. But this is only true for
service and batch jobs.

Restarting non-service jobs with the `-reschedule` flag causes the
command to loop forever waiting for the allocations to be replaced,
which never happens.

Allocations for system jobs may be replaced by triggering an evaluation
after each stop to cause the reconciler to run again.

Sysbatch jobs should not be allowed to be rescheduled as they are never
replaced by the scheduler.
nvanthao pushed a commit to nvanthao/nomad that referenced this pull request Mar 1, 2024
The `-reschedule` flag stops allocations and assumes the Nomad scheduler
will create new allocations to replace them. But this is only true for
service and batch jobs.

Restarting non-service jobs with the `-reschedule` flag causes the
command to loop forever waiting for the allocations to be replaced,
which never happens.

Allocations for system jobs may be replaced by triggering an evaluation
after each stop to cause the reconciler to run again.

Sysbatch jobs should not be allowed to be rescheduled as they are never
replaced by the scheduler.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.5.x backport to 1.5.x release line backport/1.6.x backport to 1.6.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't reschedule "system" type job
3 participants