Adding pod restart during release upgrade/rollback #1648

nmakhotkin · 2016-12-07T08:47:19Z

Added flag --restart to helm client for restarting pods of resources which don't update their pods by sending the patch. It maybe useful for cases when it is need to update the images of containers in resource. Resources such ReplicationController, PetSet, DaemonSet, ReplicaSet do not perform restart of their pods. This patch is intended to fix this issue.

By default pods do not restart.

* Added pod restarting for - ReplicationController - DaemonSet - PetSet

* Added pod restart for ReplicaSet * Added --restart flag for helm CLI for upgrade/rollback commands * By default, restart is false

k8s-ci-robot · 2016-12-07T08:47:23Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.

If you have questions or suggestions related to this bot's behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

k8s-reviewable · 2016-12-07T08:47:25Z

This change is

nmakhotkin · 2016-12-07T08:57:21Z

Just signed the CLA for contributing here.

nmakhotkin · 2016-12-12T09:52:33Z

Hello Helm team! Could you please take a look?

technosophos · 2016-12-12T23:09:47Z

We've having some problems with the CI system, so we're spread a little thin at the moment.

We have been talking with the Kubernetes team about this feature for a little while. The plan, up until this PR came in, was to wait until upstream had addressed this issue. That is, we were going to let Kubernetes implement this behavior.

I like the way you've implemented it, though. And it makes a nice alternative to telling people they just have to wait.

If you could run gofmt on this and then re-update the PR, I'll read through it more carefully

* Run gofmt on all changed files so they formatted properly now

nmakhotkin · 2016-12-13T07:59:48Z

Thanks for the response @technosophos . I just edited my change by running gofmt -w on all changed files. I noticed changing 4 spaces indentation on the tab ones.

yvespp · 2016-12-13T21:25:18Z

Maybe this could be implemented using a post-upgrade hook?

The hook would run a bash script like this:

#!/bin/bash

pods=$(kubectl get pods -L ${APP_LABEL} -o name)

for pod in $pods; do
  kubectl delete $pod
  sleep 10
done

The label (and maybe the namespace?) would have to be passed into the hook via env vars.

technosophos · 2016-12-13T21:28:36Z

@nmakhotkin Sorry about this, but can you rebase now? Looks like when Kube 1.5 support got merged, this led to a conflict.

We just finally got CircleCI running again now, so it will catch these errors much faster than my manual testing.

mikejk8s · 2016-12-13T22:14:39Z

Question regarding this as feedback was requested;

Once this is added, if someone runs --restart on an update that contains objects like PetSets/DaemonSets and also regular pods, will the regular pods all restart as well or will it follow the update strategy rules, like 30% unavailable, etc?

It may not relate at all, just trying to get an understanding.

bacongobbler · 2016-12-13T23:49:15Z

Once this is added, if someone runs --restart on an update that contains objects like PetSets/DaemonSets and also regular pods, will the regular pods all restart as well or will it follow the update strategy rules, like 30% unavailable, etc?

From the OP and within the code, it seems to apply only to those resources, so the regular pods are not restarted.

Resources such ReplicationController, PetSet, DaemonSet, ReplicaSet do not perform restart of their pods. This patch is intended to fix this issue.

Considering that it's an opt-in flag disabled by default it's a 👍 from me. Definitely a nice alternative than waiting for when upstream implements this, and once they do we can remove/disable this feature flag from within tiller by determining the target API version.

nmakhotkin · 2016-12-14T14:57:13Z

@technosophos oh, ok, will rebase it soon.

@mikejk8s as @bacongobbler said, regular pod do not restart. Only pods belonging to PetSets, DaemonSets, RCs and ReplicaSets restart.

technosophos · 2016-12-14T18:57:59Z

There is a use case that we want to make sure we capture:

When I change a configmap (or secret) for a deployment (or any other type), I would like those pods to get restarted.

This is actually a very common issue the Helm community runs into, and your code would solve it (if we add deployment support).

I've poked around, and it looks like at some point, Kubernetes itself will handle these cases. But we may be 6 months or more out. So it makes sense to me to integrate this feature today, even if we are considering it a stop-gap to be removed if/when Kubernetes does this itself.

Sync with master

* Added pod restarting for - ReplicationController - DaemonSet - PetSet

* Added pod restart for ReplicaSet * Added --restart flag for helm CLI for upgrade/rollback commands * By default, restart is false

* Run gofmt on all changed files so they formatted properly now

…tarting_pods

adamreese · 2016-12-14T23:41:37Z

Be sure to update the mocked and printing updates to include the reset bool

For example:
https://github.com/kubernetes/helm/blob/master/pkg/tiller/environment/environment_test.go#L47
https://github.com/kubernetes/helm/blob/master/pkg/tiller/environment/environment.go#L170

adamreese · 2016-12-15T00:06:27Z

pkg/kube/client.go

+		kind := target.Mapping.GroupVersionKind.Kind
+
+		client, _ := c.ClientSet()
+		switch kind {


I think you can replace this switch with a helper from kubectl

labels, err := c.MapBasedSelectorForObject(target.Object)

I found that implementation of this method doesn't cover cases with daemon and statefulsets. Moreover, it also returns labels for generic pods which I tried to prevent. So we again have to have something like switch for needed kinds.

chancez · 2016-12-15T01:25:27Z

This seems like something a hook could handle as mentioned above, why does this need to be built in? It seems there was never any response to that.

I think when adding such large features that may be subsumed by core functionality that it should be very carefully considered, since removing it may be difficult, and replacing future features in K8s core may also have compatibility issues.

I'm not a fan of this, since it's not really "restarting" in the traditional sense and might give people the wrong idea of what this does. If anything, it should be named delete-managed-pods-after-upgrade or something.

nmakhotkin · 2016-12-15T07:08:20Z

I see the tests are failing now. Will fix them soon. (Unfortunately, I just forgot to fix the tests)

nmakhotkin · 2016-12-15T08:09:47Z

@chancez I think the main case here - be able to upgrade a release using helm client as a library(using Golang, for example), not as a shell client.

technosophos · 2016-12-15T20:28:30Z

@chancez Yeah, your concerns are some of the same that the core team talked about. So I should use this as an opportunity to explain clearly what our thinking is (and I'll CC @michelleN @prydonius and @adamreese in case they want to correct anything I say).

The problem divides into three parts:

Kubernetes has a frustratingly inconsistent set of behaviors attached to when things get restarted. IIRC, Deployments, DaemonSets, and ReplicationControllers each have a different behavior the resource gets patched/applied.
Kubernetes does not currently manage the "loose" relationships between ConfigMaps or Secrets and the pods that mount them. If a ConfigMap is updated, for example, there is no way to have the pods automatically restarted
Helm sort of presents a high-level abstraction (app/release) that Kubernetes doesn't itself have. So we might understand as app devs that we need to restart one thing when another seemingly orthogonal thing gets updated.

To me, ideally Kubernetes will solve the first two. And really the third one could be resolved with hooks. @adamreese suggested that another solution to the second one would be a controller that watched ConfigMaps like this one: https://github.com/fabric8io/configmapcontroller

But the demand for this feature has been really high. I think Ive been asked for it more than any other feature. Up until this PR came in, I've told people to write hooks to handle the situation. But the problem here is that it puts the onus on the chart developer to implement this, and in a way that is relatively redundant from chart to chart. In spite of me telling chart devs to do this since the Alpha.1 days, I have yet to see anyone actually do so.

So along came @nmakhotkin's PR here, and we looked at it as a way of meeting the requests of a lot of users without adding a burden to the chart devs. And it's optional behavior.

It might end up being a temporary fix if Kubernetes ends up meeting most of our requirements. In that case, we'll begin by issuing a deprecation warning on the client, then ignore the restart flag in Tiller. Though we won't be able to remove the restart logic entirely until Helm 3. So @chancez is right that we are making a pretty big commitment here.

Finally, having thought about @chancez 's last point, I think I agree that --restart might not be the right term. But I'd favor --recreate-pods over the lengthier proposed alternative.

On a technical note, this PR seems to be progressing well @nmakhotkin. Thank you for taking on what is probably turning out to be more work than you initially thought.

nmakhotkin · 2016-12-16T18:44:52Z

@technosophos Thanks for the explanations.

I agree on renaming this flag to --recreate-pods. It is semantically more appropriate and totally reflects what it does.

Is there a chance to get my PR merged?

chancez · 2016-12-16T20:06:03Z

Another concern I have after reading the code is this seems like it simply deletes every pod effectively simultaneously, without waiting for pods to come up. This seems like it will certainly cause downtime for services when using this functionality. Is this intended? What use cases is this acceptable when the purpose described is effectively to reload configuration? Am I misunderstanding the code?

technosophos · 2016-12-16T21:24:54Z

I agree with @chancez, but I also think we can tackle that in a follow-up PR. I'm going to...

a. Merge this
b. Open a separate issue for the soft restart

Again, thanks for all of the feedback and cycles on this issue. It does not escape me that this issue is addressing the tension between today's immediate needs and how things really ought to be.

nmakhotkin · 2016-12-16T21:43:20Z

@technosophos thanks for merging this but.. it seems I didn't rename the option flag actually. I already prepared a new commit but this PR is already got merged.

thomastaylor312 · 2016-12-16T22:04:48Z

@nmakhotkin I just saw that when I rebased into my current PR. I could probably cherry-pick your commit into my PR (#1693) as a fix unless we want a separate PR

thomastaylor312 · 2016-12-16T22:06:22Z

cc @technosophos on ☝️

thedebugger · 2017-08-08T02:42:31Z

Related k8s issue link kubernetes/kubernetes#22368

Nikolay Mahotkin and others added 2 commits December 6, 2016 12:10

Adding pod restarting while upgrading release

b5f7d90

* Added pod restarting for - ReplicationController - DaemonSet - PetSet

Introducing flag for restarting pods during release upgrade/rollback

045b158

* Added pod restart for ReplicaSet * Added --restart flag for helm CLI for upgrade/rollback commands * By default, restart is false

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Dec 7, 2016

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 7, 2016

Reformatting changes

ea6328c

* Run gofmt on all changed files so they formatted properly now

nmakhotkin and others added 5 commits December 14, 2016 23:20

Merge pull request #1 from kubernetes/master

23c45dc

Sync with master

Adding pod restarting while upgrading release

22b206c

* Added pod restarting for - ReplicationController - DaemonSet - PetSet

Introducing flag for restarting pods during release upgrade/rollback

fdac5e8

* Added pod restart for ReplicaSet * Added --restart flag for helm CLI for upgrade/rollback commands * By default, restart is false

Reformatting changes

ec0d104

* Run gofmt on all changed files so they formatted properly now

Merge branch 'restarting_pods' of github.com:nmakhotkin/helm into res…

923c499

…tarting_pods

adamreese reviewed Dec 15, 2016

View reviewed changes

Fixing tests

44d9274

Rename 'restart' to 'recreate'

1eaec04

technosophos added the lgtm label Dec 16, 2016

technosophos merged commit ff9651b into helm:master Dec 16, 2016

technosophos mentioned this pull request Dec 16, 2016

helm upgrade --recreate-pods flag should do a soft restart #1702

Closed

thomastaylor312 mentioned this pull request Dec 16, 2016

feat(*): Adds support for timeout flag #1693

Merged

philipbjorge mentioned this pull request Jun 4, 2017

landscaper update does not update pods of ReplicaControllers or DaemonSets Eneco/landscaper#79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding pod restart during release upgrade/rollback #1648

Adding pod restart during release upgrade/rollback #1648

nmakhotkin commented Dec 7, 2016

k8s-ci-robot commented Dec 7, 2016

k8s-reviewable commented Dec 7, 2016

nmakhotkin commented Dec 7, 2016

nmakhotkin commented Dec 12, 2016

technosophos commented Dec 12, 2016

nmakhotkin commented Dec 13, 2016

yvespp commented Dec 13, 2016

technosophos commented Dec 13, 2016

mikejk8s commented Dec 13, 2016

bacongobbler commented Dec 13, 2016

nmakhotkin commented Dec 14, 2016

technosophos commented Dec 14, 2016

adamreese commented Dec 14, 2016

adamreese Dec 15, 2016

nmakhotkin Dec 15, 2016

chancez commented Dec 15, 2016

nmakhotkin commented Dec 15, 2016

nmakhotkin commented Dec 15, 2016

technosophos commented Dec 15, 2016

nmakhotkin commented Dec 16, 2016

chancez commented Dec 16, 2016

technosophos commented Dec 16, 2016

nmakhotkin commented Dec 16, 2016

thomastaylor312 commented Dec 16, 2016 •

edited

thomastaylor312 commented Dec 16, 2016

thedebugger commented Aug 8, 2017

Adding pod restart during release upgrade/rollback #1648

Adding pod restart during release upgrade/rollback #1648

Conversation

nmakhotkin commented Dec 7, 2016

k8s-ci-robot commented Dec 7, 2016

k8s-reviewable commented Dec 7, 2016

nmakhotkin commented Dec 7, 2016

nmakhotkin commented Dec 12, 2016

technosophos commented Dec 12, 2016

nmakhotkin commented Dec 13, 2016

yvespp commented Dec 13, 2016

technosophos commented Dec 13, 2016

mikejk8s commented Dec 13, 2016

bacongobbler commented Dec 13, 2016

nmakhotkin commented Dec 14, 2016

technosophos commented Dec 14, 2016

adamreese commented Dec 14, 2016

adamreese Dec 15, 2016

Choose a reason for hiding this comment

nmakhotkin Dec 15, 2016

Choose a reason for hiding this comment

chancez commented Dec 15, 2016

nmakhotkin commented Dec 15, 2016

nmakhotkin commented Dec 15, 2016

technosophos commented Dec 15, 2016

nmakhotkin commented Dec 16, 2016

chancez commented Dec 16, 2016

technosophos commented Dec 16, 2016

nmakhotkin commented Dec 16, 2016

thomastaylor312 commented Dec 16, 2016 • edited

thomastaylor312 commented Dec 16, 2016

thedebugger commented Aug 8, 2017

thomastaylor312 commented Dec 16, 2016 •

edited