Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to restart running tasks/jobs #698

Open
supernomad opened this Issue Jan 22, 2016 · 32 comments

Comments

Projects
None yet
@supernomad
Copy link

commented Jan 22, 2016

So I would love the ability to restart tasks, at the very least restart an entire job, but preferably single allocations. This is very useful for when a particular allocation or job happens to get in a bad state.

I am thinking something like nomad restart <job> or nomad alloc-restart <alloc-id>.

One of my specific use cases, is I have a cluster of rabbitmq nodes, and at some point one of the nodes gets partitioned from the rest of the cluster. I would like to restart that specific node (allocation in nomad parlance), or be able to preform a rolling restart to the entire cluster (job in nomad parlance).

Does this sound useful?

@dadgar

This comment has been minimized.

Copy link
Contributor

commented Jan 22, 2016

Its not a bad idea! In the mean time if you just want to restart the job you can stop and then run it again�.

@mkabischev

This comment has been minimized.

Copy link

commented Feb 6, 2016

I think it will be good feature. Now i can stop and then run job, but it won`t be graceful.

@gpaggi

This comment has been minimized.

Copy link

commented Apr 19, 2016

+1
Another use case: most of our services read their configuration either from static files or consul and when there are any changes in the properties the services need to be rolling-restarted.
Stopping and starting the job would cause a service interruption and a blue/green deployment for a configuration change is a bit over kill.

@supernomad did you get a chance to look into it?

@jtuthehien

This comment has been minimized.

Copy link

commented May 24, 2016

+1 for this feature

@c4milo

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2016

This is much needed in order to effectively reload configurations without having downtimes. As mentioned above, blue/green doesn't really scale well when you have too many tasks and it is sort of unpredictable since it depends on the specific app being deployed playing well with multiple versions of it running at the same time.

@liclac

This comment has been minimized.

Copy link

commented Jul 14, 2016

I'd very much like to see this, for a slightly different use case:

I have something running as a system job (in this case, a wrapper script that essentially does docker pull ... && docker run ..., it needs to mount a host directory to work, this is a workaround for #150). To roll out an update, I currently need to change a dummy environment variable, or Nomad won't know anything changed.

@mohitarora

This comment has been minimized.

Copy link

commented Aug 22, 2016

+1

@dennybaa

This comment has been minimized.

Copy link

commented Sep 15, 2016

Why not, guys please add it, should be trivial.

@jippi

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2016

👍 on this feature as well :)

@dadgar dadgar added enhancement and removed thinking labels Sep 27, 2016

@xyzjace

This comment has been minimized.

Copy link

commented Jan 16, 2017

👍 For us, too.

@ashald

This comment has been minimized.

Copy link

commented Jan 26, 2017

We would be happy to see this feature as well. Sometimes... services just need a manual restart. :( Would be nice if it was possible to restart individual tasks or task groups.

@rokka-n

This comment has been minimized.

Copy link

commented Jan 26, 2017

Having rolling "restart" option is a very valid case for tasks/jobs.

@jippi

This comment has been minimized.

Copy link
Contributor

commented Jan 26, 2017

What i've done as a hack is to have a key_or_default inline template{} stanza in the task stanza for each of these keys, simply writing them to some random temp file

  • apps/${NOMAD_JOB_NAME}
  • apps/${NOMAD_JOB_NAME}/${NOMAD_TASK_NAME}
  • apps/${NOMAD_JOB_NAME}/${NOMAD_TASK_NAME}/${NOMAD_ALLOC_INDEX}
  • apps/${NOMAD_ALLOC_NAME}

that each got a change_type = restart or signal with the appropriate change_signal value

so i can do manual rolling restart of any nomad task by simply changing or creating one of those consul keys in my cluster programatically... at my own pace to do a controlled restart too :)

writing to consul KV /apps/${NOMAD_JOB_NAME} will restart all tasks in the job
writing to consul KV /apps/${NOMAD_JOB_NAME}/${NOMAD_TASK_NAME} will restart all tasks within a job
writing to consul KV /apps/${NOMAD_JOB_NAME}/${NOMAD_TASK_NAME}/${NOMAD_ALLOC_INDEX} will restart one specific task index within the job

@ashald

This comment has been minimized.

Copy link

commented Jan 26, 2017

@jippi that's super smart! Thanks, I guess I'll use that for the time being. :)

But that level of control is something that would be great to see in Nomad's native API.

P.S.: That reminds me about my hack/workaround to secure any resource in Nginx (e.g., Nomad API) using Consul ACL tokens with auth_request to some read-only api endpoints. :D

@pznamensky

This comment has been minimized.

Copy link

commented Aug 29, 2017

Would be useful for us too.

@dansteen

This comment has been minimized.

Copy link

commented Sep 6, 2017

This would also be useful for the new deployment stuff. The ability to re-trigger a deployment would be great.

@JewelPengin

This comment has been minimized.

Copy link

commented Sep 6, 2017

Throwing in my +1 but also my non-consul based brute force way:

export NOMAD_ADDR=http://[server-ip]:[admin-port]

curl $NOMAD_ADDR/v1/job/:jobId | jq '.TaskGroups[0].Count = 0 | {"Job": .}' | curl -X POST -d @- $NOMAD_ADDR/v1/job/:jobId

sleep 5

curl $NOMAD_ADDR/v1/job/:jobId | jq '.TaskGroups[0].Count = 1 | {"Job": .}' | curl -X POST -d @- $NOMAD_ADDR/v1/job/:jobId

It requires the jq binary to be installed (which I would highly recommend anyway), but it will first grab the job, modify the task group count to 0, post it back to update, then all over again back to 1 (or whatever number is needed).

Again, kinda brute force and not as elegant as @jippi's, but it works if I need to get something done quickly.

@danielwpz

This comment has been minimized.

Copy link

commented Sep 14, 2017

Really useful feature! Please do it :D

@sullivanchan

This comment has been minimized.

Copy link

commented Sep 19, 2017

I have do some verification follow @jippi suggestion, and data = "{{ key apps/app1/app1/${NOMAD_ALLOC_INDEX} }}" in template stanza, but job start always pending, seems env just get by https://www.nomadproject.io/docs/job-specification/template.html#inline-template {{ env "ENV_VAR" }}, i want to know how to integrate env variable into key string, does anybody have the same question?

@mildred

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2017

This is standard golang template:

          {{keyOrDefault (printf "apps/app1/app1/%s" (env "NOMAD_ALLOC_INDEX")) ""}}
@mildred

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2017

I suggest you use keyOrDefault instead of just key which will prevent your service to start unless the key exists in consul.

@vtorhonen

This comment has been minimized.

Copy link

commented Feb 22, 2018

As a workaround I've been using Nomad's meta stanza to control restarts. Meta keys are populated as environment variables to tasks, so whenever meta block is changed all related tasks (or task groups) are restarted. Meta blocks can be defined on the top-level of the job, per task-group or per task.

For example to restart all tasks in all task groups you could run this:

$ nomad inspect some-job | \
jq --arg d "$(date)" '.Job.Meta={restarted_at: $d}' | \
curl -X POST -d @- nomad.service.consul:4646/v1/jobs

This follows update stanza as well.

@maihde

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2018

I have made a first pass at implementing this, you can find my changes here.

Basically, I've added a -restart flag to nomad run. For example:

nomad run -restart myjob.nomad

When the -restart flag is applied it triggers an update, the same as if you would have changed the meta block, so you get the benefits of canaries and rolling restarts without having to actually change the job file.

If there is agreement that this implementation is going down the right path, I will go the the trouble of writing tests and making sure it works for system scheduler, parameterized jobs, etc.

@jovandeginste

This comment has been minimized.

Copy link

commented Mar 2, 2018

Why not implement this without the need for a plan? Basically, nomad restart myjobname (which should use the current plan)

As a sysop, I sometimes need to force a restart of a job, but I don't have the plan (and don't want to go through nomad inspect | parse)

@rkettelerij

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2018

Agreeing with @jovandeginste here. A restart shouldn't need a job definition in my option, since the job definition is already known inside Nomad.

@jovandeginste

This comment has been minimized.

Copy link

commented Mar 2, 2018

I do see the case to re-submit an existing job with a plan that may or may not have changed but always wanting to force a restart (of the whole job) while submitting. So both are interesting options.

@maihde

This comment has been minimized.

Copy link
Contributor

commented Mar 2, 2018

@maihde

This comment has been minimized.

Copy link
Contributor

commented Mar 4, 2018

I just pushed to my fork code the adds support for nomad restart JOBID in addition to nomad run -restart JOBFILE. This new code should address the request from @jovandeginste.

@rkettelerij

This comment has been minimized.

Copy link
Contributor

commented Mar 24, 2018

@maihde looks great, are you planning to make a PR from your fork?

@maihde

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

Here it is (#3949)

@marcosnils

This comment has been minimized.

Copy link

commented Aug 17, 2018

Its not a bad idea! In the mean time if you just want to restart the job you can stop and then run it again

@dadgar Is there a way to do this but without having downtime?. Stopping and running the job won't honor the update stanza.

@maihde

This comment has been minimized.

Copy link
Contributor

commented Aug 17, 2018

@marcosnils the workaround I've used is placing something in the meta stanza that can be changed as described in this post.

#698 (comment)

Of course this is kinda annoying, hence the reason I made the pull-request that added the restart behavior directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.