New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rolling restart of pods #13488

Open
ghodss opened this Issue Sep 2, 2015 · 72 comments

Comments

@ghodss
Copy link
Member

ghodss commented Sep 2, 2015

kubectl rolling-update is useful for incrementally deploying a new replication controller. But if you have an existing replication controller and want to do a rolling restart of all the pods that it manages, you are forced to do a no-op update to an RC with a new name and the same spec. It would be useful to be able to do a rolling restart without needing to change the RC or to give the RC spec, so anyone with access to kubectl could easily initiate a restart without worrying about having the spec locally, making sure it's the same/up to date, etc. This could work in a few different ways:

  1. A new command, kubectl rolling-restart that takes an RC name and incrementally deletes all the pods controlled by the RC and allows the RC to recreate them.
  2. Same as 1, but instead of deleting each pod, the command iterates through the pods and issues some kind of "restart" command to each pod incrementally (does this exist? is this a pattern we prefer?). The advantage of this one is that the pods wouldn't get unnecessarily rebalanced to other machines.
  3. kubectl rolling-update with a flag that lets you specify an old RC only, and it follows the logic of either 1 or 2.
  4. kubectl rolling-update with a flag that lets you specify an old RC only, and it auto-generates a new RC based on the old one and proceeds with normal rolling update logic.

All of the above options would need the MaxSurge and MaxUnavailable options recently introduced (see #11942) along with readiness checks along the way to make sure that the restarting is done without taking down all the pods.

@nikhiljindal @kubernetes/kubectl

@nikhiljindal

This comment has been minimized.

Copy link
Member

nikhiljindal commented Sep 2, 2015

cc @ironcladlou @bgrant0607

Whats the use case for restarting the pods without any changes to the spec?

Note that there wont be any way to rollback the change if pods started failing when they were restarted.

@ghodss

This comment has been minimized.

Copy link
Member

ghodss commented Sep 2, 2015

Whenever services get into some wedged or undesirable state (maxed out connections and are now stalled, bad internal state, etc.). It's usually one of the first troubleshooting steps if a service is seriously misbehaving.

If the first pod fails as it is restarted, I would expect it to cease continuing or continue retrying to start the pod.

@smarterclayton

This comment has been minimized.

Copy link
Contributor

smarterclayton commented Sep 2, 2015

Also, a rolling restart with no spec change reallocates pods across the
cluster.

However, I would also like the ability to do this without rescheduling the
pods. That could be a rolling label change, but may pick up new dynamic
config or clear the local file state.

On Wed, Sep 2, 2015 at 12:01 AM, Sam Ghods notifications@github.com wrote:

Whenever services get into some wedged or undesirable state (maxed out
connections and are now stalled, bad internal state, etc.). It's usually
one of the first troubleshooting steps if the service is seriously
misbehaving.

If the first pod fails as it is restarted, I would expect it to cease
continuing or continue retrying to start the pod.


Reply to this email directly or view it on GitHub
#13488 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@ghodss

This comment has been minimized.

Copy link
Member

ghodss commented Sep 2, 2015

@smarterclayton Is that like my option 2 listed above? Though why would labels be changed?

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Sep 2, 2015

Re. wedged: That's what liveness probes are for.

Re. rebalancing: see #12140

If we did support this, I'd lump it with #9043 -- the same mechanism is required.

@ghodss

This comment has been minimized.

Copy link
Member

ghodss commented Sep 2, 2015

I suppose this would more be for a situation where the pod is alive and responding to checks but still needs to be restarted. One example is a service with an in-memory cache or internal state that gets corrupted and needs to be cleared.

I feel like asking for an application to be restarted is a fairly common use case, but maybe I'm incorrect.

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Sep 4, 2015

Corruption would just be one pod, which could just be killed and replaced by the RC.

The other case mentioned offline was to re-read configuration. That's dangerous to do implicitly, because restarts for any reason would cause containers to load the new configuration. It would be better to do a rolling update to push a new versioned config reference (e.g. in an env var) to the pods. This is similar to what motivated #1353.

@gmarek

This comment has been minimized.

Copy link
Member

gmarek commented Sep 9, 2015

@bgrant0607 have we decided that we don't want to do this?

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Sep 9, 2015

@gmarek Nothing, for now. Too many things are underway already.

@bgrant0607 bgrant0607 removed their assignment Sep 9, 2015

@gmarek

This comment has been minimized.

Copy link
Member

gmarek commented Sep 10, 2015

Can we have a post v1.1 milestone (or something) for the stuff that we deem important, but we lack people to fix them straight away?

@Glennvd

This comment has been minimized.

Copy link

Glennvd commented Dec 1, 2015

I would be a fan of this feature as well, you don't want to be forced to switch tags for every minor update you want to roll out.

@mbmccoy

This comment has been minimized.

Copy link

mbmccoy commented Dec 31, 2015

I'm a fan of this feature. Use case: Easily upgrade all the pods to use a newly-pushed docker image (with imagePullPolicy: Always). I currently use a bit of a hacky solution: Rolling-update with or without the :latest tag on the image name.

@mbmccoy

This comment has been minimized.

Copy link

mbmccoy commented Jan 8, 2016

Another use case: Updating secrets.

@ericuldall

This comment has been minimized.

Copy link

ericuldall commented Apr 5, 2016

I'd really like to see this feature. We run node apps on kubernetes and currently have certain use cases where we restart pods to clear in app pseudo caching.

Here's what I'm doing for now:

kubectl get pod | grep 'pod-name' | cut -d " " -f1 - | xargs -n1 -P 10 kubectl delete pod

This deletes pods 10 at a time and works well in a replication controller set up. It does not address any concerns like pod allocation or new pods failing to start. It's a quick solution when needed.

@jonaz

This comment has been minimized.

Copy link

jonaz commented Apr 25, 2016

I would really like to be able to do a rolling restart.
The main reason is we will feed ENV variables into pods using ConfigMap and then if we change config we need to restart the consumers of that ConfigMap.

@paunin

This comment has been minimized.

Copy link

paunin commented May 10, 2016

Yes, there are a lot of cases when you really want to restart pod/container without changes inside...
Configs, cache, reconnect to external services, etc. I really hope the feature will be developed.

@paunin

This comment has been minimized.

Copy link

paunin commented May 10, 2016

Small work around (I use deployments and I want to change configs without having real changes in image/pod):

  • create configMap
  • create deployment with ENV variable (you will use it as indicator for your deployment) in any container
  • update configMap
  • update deployment (change this ENV variable)

k8s will see that definition of the deployment has been changed and will start process of replacing pods
PS:
if someone has better solution, please share

@Lasim

This comment has been minimized.

Copy link

Lasim commented May 10, 2016

Thank you @paunin

@wombat

This comment has been minimized.

Copy link

wombat commented Jul 28, 2016

@paunin Thats exactly the case where we need it currently - We have to change ConfigMap values that are very important to the services and need to be rolled-out to the containers within minutes up to some hours. If no deployment happens in the meantime the containers will all fail at the same time and we will have partial downtime of at least some seconds

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Jun 5, 2018

@gjcarneiro Did you have a RESTART_xxx env var in the configuration you applied, or not? If not, then I'd expect apply to ignore the extra env var in the live state.

cc @apelisse

@bgrant0607 bgrant0607 changed the title Rolling restart of pods belonging to a replication controller Rolling restart of pods Jun 5, 2018

@apelisse

This comment has been minimized.

Copy link
Member

apelisse commented Jun 7, 2018

@gjcarneiro Yeah, the problem with @mattdodge script is that it's using apply, so the change will be saved in the lastApplied annotation. The script could be fixed by using patch or another method to update the deployment.

@brycesteinhoff

This comment has been minimized.

Copy link

brycesteinhoff commented Jul 19, 2018

Would love to have this feature. It seems very basic and needed.

@alcohol

This comment has been minimized.

Copy link

alcohol commented Jul 26, 2018

No progress here nor on #22368, le sigh :-(

Can anyone recommend a quick and dirty solution to restart a DaemonSet after the mounted ConfigMap has been updated (name is still identical)?

@KIVagant

This comment has been minimized.

Copy link

KIVagant commented Jul 26, 2018

@alcohol , check this one #13488 (comment)

@alcohol

This comment has been minimized.

Copy link

alcohol commented Jul 26, 2018

Thanks for the tip :-)

@br3ndonland br3ndonland referenced this issue Sep 2, 2018

Merged

Deploy to cloud and implement search #48

1 of 1 task complete
@megakoresh

This comment has been minimized.

Copy link

megakoresh commented Sep 7, 2018

Openshift has the concept of deployment triggers, which trigger a rollout on image change, webhook or configuration change. It would be very good feature to have in Kubernetes. As well as manual rollouts of course.

Furthermore Docker repository has history so there is no reason why rollback couldn't work - pod spawned from .spec.template can use image-tag:@digest format when pulling images for containers. Rollback would use the digest ID of the previous rollout.

@realfresh

This comment has been minimized.

Copy link

realfresh commented Sep 25, 2018

Not sure if I'm understanding correctly. Just in case this helps anybody.

It seems that if you update the value of a label under the pod > template > metadata, then a rolling update takes place after you kubectl apply -f file.yaml

So you can always have a label for your version and whenever you want to rolling update, change the version and apply the file.

@gjcarneiro

This comment has been minimized.

Copy link

gjcarneiro commented Sep 25, 2018

Sure, downside is that next time you want to do a deploy, you do kubectl apply -f some.yaml, right? Normally, if nothing changes in some.yaml then nothing gets restarted, that's one of the nicest things about Kubernetes.

But imagine what happens after you change a label to restart a Deployment. At the next normal software deployment you do kubectl apply -f some.yaml as usual, but because the yaml file doesn't contain the same label, the Deployment will get needlessly restarted.

@joelittlejohn

This comment has been minimized.

Copy link

joelittlejohn commented Sep 25, 2018

@gjcarneiro If you don't apply when you make a change, the kubectl.kubernetes.io/last-applied-configuration annotation will not get updated, so the next apply will not cause another restart.

I'm strongly in favour of adding a rolling restart command to kubectl, but in the meantime I'm using the following (based on solutions above):

kubectl patch deployment mydeployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"mycontainer","env":[{"name":"RESTART_","value":"'$(date +%s)'"}]}]}}}}'

Parameterize this and add it as a function in your .bashrc and it's a good interim solution.

@gjcarneiro

This comment has been minimized.

Copy link

gjcarneiro commented Sep 25, 2018

Ah, cool, I didn't know that, thanks!

I don't need the bash alias, at my company we made my own web interface for managing Kubernetes using Python+aiohttp and it already uses patch. I thought about open sourcing it, just been lazy...

@matti

This comment has been minimized.

Copy link

matti commented Sep 25, 2018

Looks like people are repeating the same workaround solutions in this thread - please read the full thread before posting here

@Macmee

This comment has been minimized.

Copy link

Macmee commented Oct 9, 2018

@joelittlejohn I ran your macro and it DID trigger a reboot of my pods, but they all restarted at the same time. I thought this would trigger a rolling reboot, no?

@joelittlejohn

This comment has been minimized.

Copy link

joelittlejohn commented Oct 9, 2018

@Macmee It depends on the configuration of your deployment. The above command just changes the deployment. The pods are then updated according to the roll out strategy defined by your deployment. This is just like any other change to the deployment.

The only way this will replace all pods at the same time is if your .spec.strategy.rollingUpdate.maxUnavailable allows it.

@japzio

This comment has been minimized.

Copy link

japzio commented Oct 19, 2018

We also kind of need this feature. One use case also on our side is that we use spring-cloud-config-server backed by and scm, for our spring-boot app. When we change a configuration property, the the spring-boot app needs to be restarted to be able to fetch the new config change so we also need this kind graceful restart trigger without having to do a redeployment.

@gsf

This comment has been minimized.

Copy link

gsf commented Oct 19, 2018

@japzio As suggested by Helm, a checksum of the configmap in the annotations is a good way to handle that case.

@bholagabbar-mt

This comment has been minimized.

Copy link

bholagabbar-mt commented Dec 6, 2018

Has there been any update on this ? We're looking to have this feature too. @bgrant0607 @nikhiljindal

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Dec 8, 2018

@bholagabbar-mt What is your use case?

cc @kow3ns @janetkuo

@bholagabbar-mt

This comment has been minimized.

Copy link

bholagabbar-mt commented Dec 11, 2018

@bgrant0607 @kow3ns @janetkuo The usecase for our systems is multifold.

  1. Secrets updating - I'm sure you realise that there are many companies such as mine, that have build their own abstractions over kubernetes. We have our own container management system that is orchestrated over kubernetes. So the helm secret suggestion comment and others are not applicable. To reload secrets from ConfigMaps in the dev cluster, we have to force kill the pods, resulting in a few seconds of downtime. This should not be the case. This is a real usecase for rolling update.

  2. This is a bit complicated, but the overall scope, as someone suggested, is to fix abnormal behaviour. We have 4-5 heavy Java apps running on the Play framework. We are encountering a situation where the memory consumption of our java pods linearly rises and then restarts the pods when the memory limit is reached. This is a documented Java issue with a stackoverflow issue and Kubernetes Issue associated to it. Rolling-restart of all the pods in a 3-4 hour period would reset the memory consumption and allow our apps to function smoothly without spikes.

Hopefully this was convincing enough and this feature can be taken up by someone for development ?

@montanaflynn

This comment has been minimized.

Copy link

montanaflynn commented Dec 11, 2018

@bholagabbar-mt just change an environment variable and it'll trigger a rolling deploy:

kubectl patch deployment mydeployment -p '{"spec":{"template":{"spec":{"containers":[{"name":"mycontainer","env":[{"name":"LAST_MANUAL_RESTART","value":"'$(date +%s)'"}]}]}}}}'
@bholagabbar-mt

This comment has been minimized.

Copy link

bholagabbar-mt commented Dec 11, 2018

@montanaflynn This is perfect. We integrated this change into our systems today itself and it runs fine. Thanks a ton!

@huzhengchuan

This comment has been minimized.

Copy link
Contributor

huzhengchuan commented Dec 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment