Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding pod restart during release upgrade/rollback #1648

Merged
merged 10 commits into from Dec 16, 2016

Conversation

nmakhotkin
Copy link
Contributor

Added flag --restart to helm client for restarting pods of resources which don't update their pods by sending the patch. It maybe useful for cases when it is need to update the images of containers in resource. Resources such ReplicationController, PetSet, DaemonSet, ReplicaSet do not perform restart of their pods. This patch is intended to fix this issue.

By default pods do not restart.

Nikolay Mahotkin and others added 2 commits December 6, 2016 12:10
 * Added pod restarting for
   - ReplicationController
   - DaemonSet
   - PetSet
 * Added pod restart for ReplicaSet
 * Added --restart flag for helm CLI for upgrade/rollback commands
 * By default, restart is false
@k8s-ci-robot
Copy link

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

Once you've signed, please reply here (e.g. "I signed it!") and we'll verify. Thanks.


If you have questions or suggestions related to this bot's behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Dec 7, 2016
@k8s-reviewable
Copy link

This change is Reviewable

@nmakhotkin
Copy link
Contributor Author

Just signed the CLA for contributing here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 7, 2016
@nmakhotkin
Copy link
Contributor Author

Hello Helm team! Could you please take a look?

@technosophos
Copy link
Member

We've having some problems with the CI system, so we're spread a little thin at the moment.

We have been talking with the Kubernetes team about this feature for a little while. The plan, up until this PR came in, was to wait until upstream had addressed this issue. That is, we were going to let Kubernetes implement this behavior.

I like the way you've implemented it, though. And it makes a nice alternative to telling people they just have to wait.

If you could run gofmt on this and then re-update the PR, I'll read through it more carefully

  * Run gofmt on all changed files
    so they formatted properly now
@nmakhotkin
Copy link
Contributor Author

Thanks for the response @technosophos . I just edited my change by running gofmt -w on all changed files. I noticed changing 4 spaces indentation on the tab ones.

@yvespp
Copy link

yvespp commented Dec 13, 2016

Maybe this could be implemented using a post-upgrade hook?

The hook would run a bash script like this:

#!/bin/bash

pods=$(kubectl get pods -L ${APP_LABEL} -o name)

for pod in $pods; do
  kubectl delete $pod
  sleep 10
done 

The label (and maybe the namespace?) would have to be passed into the hook via env vars.

@technosophos
Copy link
Member

@nmakhotkin Sorry about this, but can you rebase now? Looks like when Kube 1.5 support got merged, this led to a conflict.

We just finally got CircleCI running again now, so it will catch these errors much faster than my manual testing.

@mikejk8s
Copy link

Question regarding this as feedback was requested;

Once this is added, if someone runs --restart on an update that contains objects like PetSets/DaemonSets and also regular pods, will the regular pods all restart as well or will it follow the update strategy rules, like 30% unavailable, etc?

It may not relate at all, just trying to get an understanding.

@bacongobbler
Copy link
Member

Once this is added, if someone runs --restart on an update that contains objects like PetSets/DaemonSets and also regular pods, will the regular pods all restart as well or will it follow the update strategy rules, like 30% unavailable, etc?

From the OP and within the code, it seems to apply only to those resources, so the regular pods are not restarted.

Resources such ReplicationController, PetSet, DaemonSet, ReplicaSet do not perform restart of their pods. This patch is intended to fix this issue.

Considering that it's an opt-in flag disabled by default it's a 👍 from me. Definitely a nice alternative than waiting for when upstream implements this, and once they do we can remove/disable this feature flag from within tiller by determining the target API version.

@nmakhotkin
Copy link
Contributor Author

@technosophos oh, ok, will rebase it soon.

@mikejk8s as @bacongobbler said, regular pod do not restart. Only pods belonging to PetSets, DaemonSets, RCs and ReplicaSets restart.

@technosophos
Copy link
Member

There is a use case that we want to make sure we capture:

When I change a configmap (or secret) for a deployment (or any other type), I would like those pods to get restarted.

This is actually a very common issue the Helm community runs into, and your code would solve it (if we add deployment support).

I've poked around, and it looks like at some point, Kubernetes itself will handle these cases. But we may be 6 months or more out. So it makes sense to me to integrate this feature today, even if we are considering it a stop-gap to be removed if/when Kubernetes does this itself.

nmakhotkin and others added 5 commits December 14, 2016 23:20
 * Added pod restarting for
   - ReplicationController
   - DaemonSet
   - PetSet
 * Added pod restart for ReplicaSet
 * Added --restart flag for helm CLI for upgrade/rollback commands
 * By default, restart is false
  * Run gofmt on all changed files
    so they formatted properly now
@adamreese
Copy link
Member

kind := target.Mapping.GroupVersionKind.Kind

client, _ := c.ClientSet()
switch kind {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can replace this switch with a helper from kubectl

	labels, err := c.MapBasedSelectorForObject(target.Object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that implementation of this method doesn't cover cases with daemon and statefulsets. Moreover, it also returns labels for generic pods which I tried to prevent. So we again have to have something like switch for needed kinds.

@chancez
Copy link

chancez commented Dec 15, 2016

This seems like something a hook could handle as mentioned above, why does this need to be built in? It seems there was never any response to that.

I think when adding such large features that may be subsumed by core functionality that it should be very carefully considered, since removing it may be difficult, and replacing future features in K8s core may also have compatibility issues.

I'm not a fan of this, since it's not really "restarting" in the traditional sense and might give people the wrong idea of what this does. If anything, it should be named delete-managed-pods-after-upgrade or something.

@nmakhotkin
Copy link
Contributor Author

I see the tests are failing now. Will fix them soon. (Unfortunately, I just forgot to fix the tests)

@nmakhotkin
Copy link
Contributor Author

@chancez I think the main case here - be able to upgrade a release using helm client as a library(using Golang, for example), not as a shell client.

@technosophos
Copy link
Member

@chancez Yeah, your concerns are some of the same that the core team talked about. So I should use this as an opportunity to explain clearly what our thinking is (and I'll CC @michelleN @prydonius and @adamreese in case they want to correct anything I say).

The problem divides into three parts:

  • Kubernetes has a frustratingly inconsistent set of behaviors attached to when things get restarted. IIRC, Deployments, DaemonSets, and ReplicationControllers each have a different behavior the resource gets patched/applied.
  • Kubernetes does not currently manage the "loose" relationships between ConfigMaps or Secrets and the pods that mount them. If a ConfigMap is updated, for example, there is no way to have the pods automatically restarted
  • Helm sort of presents a high-level abstraction (app/release) that Kubernetes doesn't itself have. So we might understand as app devs that we need to restart one thing when another seemingly orthogonal thing gets updated.

To me, ideally Kubernetes will solve the first two. And really the third one could be resolved with hooks. @adamreese suggested that another solution to the second one would be a controller that watched ConfigMaps like this one: https://github.com/fabric8io/configmapcontroller

But the demand for this feature has been really high. I think Ive been asked for it more than any other feature. Up until this PR came in, I've told people to write hooks to handle the situation. But the problem here is that it puts the onus on the chart developer to implement this, and in a way that is relatively redundant from chart to chart. In spite of me telling chart devs to do this since the Alpha.1 days, I have yet to see anyone actually do so.

So along came @nmakhotkin's PR here, and we looked at it as a way of meeting the requests of a lot of users without adding a burden to the chart devs. And it's optional behavior.

It might end up being a temporary fix if Kubernetes ends up meeting most of our requirements. In that case, we'll begin by issuing a deprecation warning on the client, then ignore the restart flag in Tiller. Though we won't be able to remove the restart logic entirely until Helm 3. So @chancez is right that we are making a pretty big commitment here.

Finally, having thought about @chancez 's last point, I think I agree that --restart might not be the right term. But I'd favor --recreate-pods over the lengthier proposed alternative.

On a technical note, this PR seems to be progressing well @nmakhotkin. Thank you for taking on what is probably turning out to be more work than you initially thought.

@nmakhotkin
Copy link
Contributor Author

@technosophos Thanks for the explanations.

I agree on renaming this flag to --recreate-pods. It is semantically more appropriate and totally reflects what it does.

Is there a chance to get my PR merged?

@chancez
Copy link

chancez commented Dec 16, 2016

Another concern I have after reading the code is this seems like it simply deletes every pod effectively simultaneously, without waiting for pods to come up. This seems like it will certainly cause downtime for services when using this functionality. Is this intended? What use cases is this acceptable when the purpose described is effectively to reload configuration? Am I misunderstanding the code?

@technosophos
Copy link
Member

I agree with @chancez, but I also think we can tackle that in a follow-up PR. I'm going to...

a. Merge this
b. Open a separate issue for the soft restart

Again, thanks for all of the feedback and cycles on this issue. It does not escape me that this issue is addressing the tension between today's immediate needs and how things really ought to be.

@nmakhotkin
Copy link
Contributor Author

@technosophos thanks for merging this but.. it seems I didn't rename the option flag actually. I already prepared a new commit but this PR is already got merged.

@thomastaylor312
Copy link
Contributor

thomastaylor312 commented Dec 16, 2016

@nmakhotkin I just saw that when I rebased into my current PR. I could probably cherry-pick your commit into my PR (#1693) as a fix unless we want a separate PR

@thomastaylor312
Copy link
Contributor

cc @technosophos on ☝️

@thedebugger
Copy link

Related k8s issue link kubernetes/kubernetes#22368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet