Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 698 - Implement nomad restart JOBID and nomad run -restart jobfile #3949

Closed
wants to merge 2 commits into from

Conversation

maihde
Copy link
Contributor

@maihde maihde commented Mar 7, 2018

Implements nomad restart JOBID and nomad run -restart jobfile.nomad. Also includes one commit for a minor comment error.

@maihde
Copy link
Contributor Author

maihde commented May 14, 2018

I haven't heard anything about this PR. This functionality seems to be desired by many people...are the deficiencies that need to be addressed before this can be merged?

@groggemans
Copy link
Contributor

@maihde thx for implementing this great feature! Looking at the travis logs it seems there are a few tests that still need to be updated.

@dadgar Any other stuff that needs to be changed before this can be merged?

This feature is useful because there are instances where you wish to
restart the existing job definition, triggering any update/canary logic
that exists.

There are two ways to restart a job:

`nomad run -restart jobfile.nomad`

If the job already exists, a restart/update will be triggered even if
the job definition is identical to the previous version.  If the job
does not exist it will be registered and executed.

`nomad restart jobid`

This will restart an existing job.  This creates a new job version and
triggers the update/canary logic.
@maihde
Copy link
Contributor Author

maihde commented May 29, 2018

@groggemans thanks for the feedback! I've updated the PR to resolve the tests that were failing due to this change. The remaining failed tests seem to also be failing on master so I haven't done anything to resolve those.

@shantanugadgil
Copy link
Contributor

This is quite a useful feature indeed, and I hope it gets merged soon.
Also, would it be possible to change the title, to help in searches and such!
Cheers,
Shantanu

@maihde maihde changed the title Issue 698 Issue 698 - Implement nomad restart JOBID and nomad run -restart jobfile Jun 1, 2018
@maihde
Copy link
Contributor Author

maihde commented Jun 1, 2018

@shantanugadgil I updated the title, thanks for the feedback and support!

@42wim
Copy link
Contributor

42wim commented Jun 22, 2018

Any comments on this PR from the hashicorp folks? @dadgar

@maihde
Copy link
Contributor Author

maihde commented Jul 13, 2018

@dadgar is there anything I need to do to help get this merged into the mainline? Thanks!

@dadgar
Copy link
Contributor

dadgar commented Jul 17, 2018

Hey @maihde,

Awesome work on this and sorry for the long delay giving you a response. This is definitely a feature the team wants and we actually planned on implementing in 0.9.X. Unfortunately the PR as it is doesn't quite address all the points we would want. The issues I see:

  1. Extensibility

a) Force
It would be nice to allow the operator to be able to do $ nomad job restart -force <job> in which the update stanza is ignored.

b) Rolling Restart
For jobs that do not define a update stanza, ability to do a rolling restart: $ nomad job restart -rolling-interval=30s -rolling-parallel=2 <job>

* Both CLI syntaxes are just spitball ideas

  1. Shadow Restart Field

From a project perspective, the job struct should define the users intention on what should be run. The restart field is a transition but is not the desired state, and thus shouldn't be in the job spec.

From a technical perspective, this works currently because any time the user submits a new job the restart field is cleared. In future releases of Nomad, there will likely be the job and a set of job variables that are interpolated in to define the final job. Thus we can imagine that the job doesn't change every time the user makes a change (changes a variable). Thus we want to capture the restart outside of the job spec.

This is part of the reason getting you a review has taken a while since we didn't quite have the mechanism to do this until more recently: https://github.com/hashicorp/nomad/blob/master/nomad/structs/structs.go#L5669


As for next steps, unfortunately the PR as it is can't be merged and the delta between what we would like is quite large. If this is something you are interested in taking on let me know and we can potentially collaborate on an RFC before more code is written. Alternatively, given the amount of work, the team is also happy to implement this in 0.9.X.

Thanks for the patience,
Alex

@maihde
Copy link
Contributor Author

maihde commented Jul 17, 2018 via email

@the-maldridge
Copy link

Any news on if this made it into 0.9? Prometheus managed by nomad is a great example of something that needs a periodic kick to read new config files, which nomad can't currently do.

@preetapan
Copy link
Member

@maihde @the-maldridge We definitely appreciate your work. I am closing this PR since its outdated, and also based on @dadgar's comment above. Job restart is slated for 0.9.2 or 0.9.3. We are also implementing allocation restart/task restart before that in the next point release after 0.9 is out.

@preetapan preetapan closed this Feb 23, 2019
@the-maldridge
Copy link

Thanks for the update, I'll keep an eye out for 0.9.x.

@jpclarks
Copy link

@preetapan Do you have any updates on a release date for job restart?

@endocrimes
Copy link
Contributor

@jpclarks nomad alloc stop supports restarting individual allocations, but we didn't yet get to implementing it across all allocations in a job.

@jpclarks
Copy link

Thank you. I noticed the recent addition of support for alloc restart, but we really need the ability to restart jobs as described in #698.

@endocrimes
Copy link
Contributor

It's still on the roadmap - but in the meantime you should be able to use the api to write a rolling job restarter until we get it into nomad core with the alloc restart api

@github-actions
Copy link

github-actions bot commented Feb 5, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants