New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facilitate ConfigMap rollouts / management #22368

Open
bgrant0607 opened this Issue Mar 2, 2016 · 112 comments

Comments

@bgrant0607
Member

bgrant0607 commented Mar 2, 2016

To do a rolling update of a ConfigMap, the user needs to create a new ConfigMap, update a Deployment to refer to it, and delete the old ConfigMap once no pods are using it. This is similar to the orchestration Deployment does for ReplicaSets.

One solution could be to add a ConfigMap template to Deployment and do the management there.

Another could be to support garbage collection of unused ConfigMaps, which is the hard part. That would be useful for Secrets and maybe other objects, also.

cc @kubernetes/sig-apps-feature-requests

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Mar 23, 2016

cc @pmorie

@thockin

This comment has been minimized.

Member

thockin commented Mar 23, 2016

This is one approach. I still want to write a demo, using the live-update
feature of configmap volumes to do rollouts without restarts. It's a
little scarier, but I do think it's useful.
On Mar 2, 2016 9:26 AM, "Brian Grant" notifications@github.com wrote:

To do a rolling update of a ConfigMap, the user needs to create a new
ConfigMap, update a Deployment to refer to it, and delete the old ConfigMap
once no pods are using it. This is similar to the orchestration Deployment
does for ReplicaSets.

One solution could be to add a ConfigMap template to Deployment and do the
management there.

Another could be to support garbage collection of unused ConfigMaps, which
is the hard part. That would be useful for Secrets and maybe other objects,
also.

cc @kubernetes/sig-config
https://github.com/orgs/kubernetes/teams/sig-config


Reply to this email directly or view it on GitHub
#22368.

@bgrant0607 bgrant0607 added this to the next-candidate milestone Mar 23, 2016

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Mar 23, 2016

@thockin Live update is a different use case than what's discussed here.

@therc

This comment has been minimized.

Contributor

therc commented Mar 23, 2016

I think live updates without restarts might fall under my issue, #20200.

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Mar 25, 2016

@caesarxuchao @lavalamp: We should consider this issue as part of implementing cascading deletion.

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Mar 25, 2016

Ref #9043 re. in-place rolling updates.

@lavalamp

This comment has been minimized.

Member

lavalamp commented Mar 25, 2016

Yeah I think it should be trivial to set a parent for a config map so it automatically gets cleaned up.

(Why not just add a configmap template section to deployment anyway? Seems like a super common thing people will want to do.)

@caesarxuchao

This comment has been minimized.

Member

caesarxuchao commented Mar 25, 2016

@lavalamp, I guess you mean we can set replicas sets as the parent of a config map, and delete the config map when all the replica sets are deleted?

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Mar 25, 2016

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Mar 30, 2016

@kargakis

This comment has been minimized.

Member

kargakis commented Mar 30, 2016

Recent discussion:
https://groups.google.com/forum/#!topic/google-containers/-em3So0KBnA

Thinking out loud: In OpenShift we have the concept of triggers. For example when an image tag is referenced by a DeploymentConfig and there is a new image for that tag, we detect it via a controller loop and update the DeploymentConfig by resolving the tag to the full spec of the image (thus triggering a new deployment since it's a template change). Could we possibly do something similar here? A controller loop watches for configmap changes and triggers a new deployment (we would also need to support redeployments of the same thing since there is no actual template change involved - maybe by adding an annotation to the podtemplate?)

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Mar 30, 2016

Fundamentally, there need to be multiple ConfigMap objects if we're going to have some pods referring to the new one and others referring to the old one(s), just as with ReplicaSets.

@rata

This comment has been minimized.

Member

rata commented Mar 30, 2016

On Wed, Mar 30, 2016 at 01:56:24AM -0700, Michail Kargakis wrote:

Recent discussion:
https://groups.google.com/forum/#!topic/google-containers/-em3So0KBnA

Thinking out loud: In OpenShift we have the concept of triggers. For example when an image tag is referenced by a DeploymentConfig and there is a new image for that tag, we detect it via a controller loop and update the DeploymentConfig by resolving the tag to the full spec of the image (thus triggering a new deployment since it's a template change). Could we possibly do something similar here? A controller loop watches for configmap changes and triggers a new deployment (we would also need to support redeployments of the same thing since there is no actual template change involved - maybe by adding an annotation to the podtemplate?)

(I posted the original mail in the thread on the google group)

I think making a deployment is the best way, too. Because you can have a syntax
error or whatever in the config and the new nodes hopefully won't start and the
deployment can be rolled back (or, in not common cases I suspect, even do a
canary deployment of a config change).

But I'm not sure if the configmap should be updated, as you propose, or if it
should be a different one (for kube internals, at least). As, in case you do a
config update with a syntax error a pod will be taken down during the
deployment, a new up that fail and now there is no easy way to rollback because
the configmap has been updated. So, probably you need to update again the
configmap and do another deploy. If it is a different configmap, IIUC, the
rollback can be done easily.

@rata

This comment has been minimized.

Member

rata commented Apr 1, 2016

Sorry to bother again, but can this be tagged for milestone v1.3 and, maybe, a lower priority?

@rata

This comment has been minimized.

Member

rata commented Apr 6, 2016

@bgrant0607 ping?

@thockin

This comment has been minimized.

Member

thockin commented Apr 6, 2016

What work is needed, if we agree deployment is the best path ?

On Tue, Apr 5, 2016 at 5:10 PM, rata notifications@github.com wrote:

@bgrant0607 https://github.com/bgrant0607 ping?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#22368 (comment)

@bgrant0607 bgrant0607 modified the milestones: v1.3, next-candidate Apr 6, 2016

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Apr 6, 2016

@rata Sorry, I get zillions of notifications every day. Are you volunteering to help with the implementation?

@thockin We need to ensure that the parent/owner on ConfigMap is set to the referencing ReplicaSet(s) when we implement cascading deletion / GC.

@rata

This comment has been minimized.

Member

rata commented Apr 6, 2016

@bgrant0607 no problem! I can help, yes. Not sure I can get time from my work and I'm quite busy with university, but I'd love to help and probably can find some time. I've never really dealt with kube code (I did a very simple patch only), but I'd love to do it :)

Also, I guess that a ConfigMap can have several owners, right? I think right now it can be used in several RSs and that should be taken into account when doing the cascade deletion/GC (although maybe is something obvious).

Any pointers on where to start? Is someone willing to help with this?

PS: @bgrant0607 sorry the delay, it was midnight here when I got your answer :)

@pmorie

This comment has been minimized.

Member

pmorie commented Apr 6, 2016

@rata

But I'm not sure if the configmap should be updated, as you propose, or if it should be a different one (for kube internals, at least).

If we manage kube internals with deployments, we have to find the right thing to do for both user-consumed configs and internals.

Also, I guess that a ConfigMap can have several owners, right?

@bgrant0607 I also have the same Q here -- I think we will need to reference-count configmaps / secrets since they can be referred to from pods owned by multiple different controllers.

I think right now it can be used in several RSs and that should be taken into account when doing the cascade deletion/GC (although maybe is something obvious).

Cascading deletion has its own issues: #23656 and #19054

@caesarxuchao

This comment has been minimized.

Member

caesarxuchao commented Apr 6, 2016

@rata, I'm working on cascading deletion and am putting together a PR that adds the necessary API, including the "OwnerReferences". I'll cc you there.

@rata

This comment has been minimized.

Member

rata commented Apr 6, 2016

@caesarxuchao thanks!

@rata

This comment has been minimized.

Member

rata commented Apr 6, 2016

On Wed, Apr 06, 2016 at 09:37:25AM -0700, Paul Morie wrote:

@rata

But I'm not sure if the configmap should be updated, as you propose, or if it should be a different one (for kube internals, at least).

If we manage kube internals with deployments, we have to find the right thing to do for both user-consumed configs and internals.

Sure, but I guess the same should probably work for boths, right?

I imagine for example the "internals" configMaps to use a name like
-v<kube-version/commit hash>.

This way, when you upgrade the configmap will became orphan and it should be
deleted, right? Or am I missing something?

I think this can work for boths

I think right now it can be used in several RSs and that should be taken into account when doing the cascade deletion/GC (although maybe is something obvious).

Cascading deletion has its own issues: #23656 and #19054

Ohh, thanks!

@bgrant0607

This comment has been minimized.

Member

bgrant0607 commented Apr 10, 2016

@caesarxuchao @rata We'll likely need a custom mechanism in Deployment to ensure that a referenced ConfigMap is owned by the generated ReplicaSets that reference it.

@ianlewis

This comment has been minimized.

Contributor

ianlewis commented May 21, 2018

Neither of the above solutions create a new configmap for each change and so won't work with rollbacks. I think that's a blocker for any proper implementation.

https://github.com/aabed/kubernetes-configmap-rollouts
https://github.com/fabric8io/configmapcontroller

@alvis

This comment has been minimized.

alvis commented May 21, 2018

There is no more progress in kubernetes/community#1163 since late 2017 😞

@shyiko

This comment has been minimized.

shyiko commented May 21, 2018

@mfojtik

This comment has been minimized.

Contributor

mfojtik commented May 22, 2018

FYI, I made some PoC controller that will trigger a rollout based on changes in configMap and secret data:

Code: https://github.com/mfojtik/k8s-trigger-controller
Demo: https://youtu.be/SRDsRZwAdlA

/cc @tnozicka

@aabed

This comment has been minimized.

aabed commented May 22, 2018

@ianlewis
Can you please elaborate more, I am really interested in enhancing my implementation

@alcohol

This comment has been minimized.

alcohol commented Jul 26, 2018

No progress here nor on #13488, le sigh :-(

Can anyone recommend a quick and dirty solution to restart a DaemonSet after the mounted ConfigMap has been updated (name is still identical)?

@timoreimann

This comment has been minimized.

Contributor

timoreimann commented Jul 26, 2018

@alcohol kubectl deleteing the Daemonset pods should do the trick I guess.

@gmichels

This comment has been minimized.

gmichels commented Jul 26, 2018

Here is a quick solution depending on your Helm knowledge: make it a Helm chart and use the ConfigMap checksum in an annotation as described in https://github.com/helm/helm/blob/master/docs/charts_tips_and_tricks.md

@alcohol

This comment has been minimized.

alcohol commented Jul 26, 2018

Thanks for the tips :-)

@kfox1111

This comment has been minimized.

kfox1111 commented Jul 26, 2018

Still waiting for a real fix to this issue too. :(

@llarsson

This comment has been minimized.

llarsson commented Jul 27, 2018

@jhgoodwin

This comment has been minimized.

jhgoodwin commented Aug 15, 2018

@llarsson Would you be willing to describe your process with more detail?

@kfox1111

This comment has been minimized.

kfox1111 commented Aug 15, 2018

that triggers a rollout, but doesn't handle the issue of configmap(and secret) lifecycle. the configmaps used by deployments/daemonsets currently need to be named with a hash in them so that rollbacks or partial roll forwards and pod failures work properly. Otherwise, rolling back will use the new configmap, not the old. This isn't what the user intended. But when naming with a hash, then the lifecycle of the configmaps have to be manually maintained. "When it is it safe to delete foo-68b329da9893e34099c7d8ad5cb9c940? well, when the replicaSet that references it is no longer able to be rolled back to.

IMO, the lifecycle of a configmap/secret referenced by a deployment/daemonset should begin at the moment the replicaSet for the new version is created (snapshotted configmap/secret), track the life of the RS, and then be deleted when the RS is. I would like the configmap/secret to be equally atomic as the pod's container images are.

@llarsson

This comment has been minimized.

llarsson commented Aug 15, 2018

Agreed. Anything other than actual support for this functionality ultimately means we have to resort to a brittle hack that fails to handle lifecycles correctly.

@jhgoodwin, the hackish approach I outlined is essentially to:

  1. Make your DaemonSet yaml file a template, and place a marker in it for where the hash of the ConfigMap will wind up (e.g. CONFIG_HASH) in the Spec of the actual Pods.
  2. Deploy not via kubectl apply directly (won't be useful, since the marker will then not change), but rather, wrapping it in a script like the dead-simple (and untested!) one below.
  3. Enjoy the rolling upgrade, since updates to the Spec of the Pods will trigger that behavior.

(This is not too far from what Helm will do as well, I might add...)

Dumb script:

#!/bin/bash
set -e
value=$(md5sum configmap.yml | cut -f 1 -d ' ')
sed "s/CONFIG_HASH/$value/g" daemonset-template.yml > deamonset-rendered.yml
kubectl apply -f daemonset-rendered.yml

It's neither pretty nor clever with lifecycles, but it does trigger a rolling upgrade behavior. 😄

If you run a sidecar that wakes up every minute or so, checks the md5 of the ConfigMap and restarts the DaemonSet if the hash is different than last time, you can hack your way to automatic upgrades that way, too.

@kfox1111

This comment has been minimized.

kfox1111 commented Aug 15, 2018

This problem has been around for a very long time. how do we start to make traction on it?

@pentago

This comment has been minimized.

pentago commented Sep 13, 2018

Can't believe that 2 years passed without resolving this critical issue.

@kfox1111

This comment has been minimized.

kfox1111 commented Sep 13, 2018

I agree. As I start encouraging more users to use K8s, its going to become increasingly more important.

The two issues marked above don't really solve the issue. Instead they also work around it. I don't want to signal a process in the container that a configmap changed. I need my configmaps to be atomic for the lifetime of that replicaset. immutable within the replicaset. Without this, you loose one of the big selling points of immutable infrasturcture, as an immutable container can behave very differently if a config file is changed.

@enisoc

This comment has been minimized.

Member

enisoc commented Sep 13, 2018

We're still working on this, and the plan is still to encourage an "immutable ConfigMap" pattern for rolling out changes (e.g. put a hash of the contents in the name and don't edit in-place).

On the client side, we intend to facilitate this pattern with kustomize ConfigMap generation, which I believe is working today. However, we need a server-side change to do proper cleanup (garbage collection) of immutable ConfigMaps that aren't used anymore.

Progress has been slow on the server-side component because GC of unused ConfigMaps is a hard problem. The initial proposal was to track ConfigMap references so they could be cleaned up as soon as they're no longer needed. However, there were performance concerns with tracking references explicitly, and more importantly it's not possible to be 100% sure that a given ConfigMap is no longer needed due to the eventually-consistent nature of k8s API objects.

@janetkuo has written a new proposal that accepts we can't be 100% sure and therefore introduces a grace period -- once we think it's no longer needed, you'll have a (configurable) window of time to prove us wrong before we actually delete it. Unused ConfigMaps that opt into this feature will go away eventually, though not at the earliest possible time.

The full proposal would be a rather large additional component, however, so we're starting by testing out these ideas with GC of finished Jobs (it's much easier to tell if a Job is finished than to tell if anything references a ConfigMap).

@pentago

This comment has been minimized.

pentago commented Sep 13, 2018

Wow bummer. Didn't know it was that complex. The main reason I'm interested in this is that when one uses a TLS cert provisioner such as cert-manager, webservers are unable to figure out that cert is renewed so the cert expires if deployment wasn't updated since certificate expiration.

@kfox1111

This comment has been minimized.

kfox1111 commented Sep 13, 2018

I still think the easier way to solve it is to snapshot the configmap/secret and make it part of the lifecycle of the deployment/daemonset. IE, its owned by a particular replicaset and is created/deleted at the same time. Nothing else should reference it, so nothing should be complaining when its deleted. Garbage collection is easy as its 1:1 events with replicaset creation/deletion.

@rickypai

This comment has been minimized.

Contributor

rickypai commented Sep 13, 2018

@enisoc thank you for the communication and elaborating the complexity of the issue. Client-side enforcement of immutability seems to be the easiest solution today. We've done the same by adding the hash of the content in the names of ConfigMaps to facilitate rollouts, but still run into the issue of dangling ConfigMaps that need to be cleaned manually. I look forward to the progress made on the server-side.

@enisoc

This comment has been minimized.

Member

enisoc commented Sep 13, 2018

@kfox1111 wrote:

I still think the easier way to solve it is to snapshot the configmap/secret and make it part of the lifecycle of the deployment/daemonset. IE, its owned by a particular replicaset and is created/deleted at the same time. Nothing else should reference it, so nothing should be complaining when its deleted. Garbage collection is easy as its 1:1 events with replicaset creation/deletion.

We did briefly consider this alternative and I was initially a fan of it myself. It was a while ago, but as I recall, some of our concerns with this option were:

  1. If the snapshotting happens inside Deployment (e.g. you put the "source" ConfigMap name in the Deployment Pod template and the controller does the rest), that means Deployment needs to (a) watch ConfigMaps for changes, and (b) reach into and understand Pod contents to change ConfigMap names in all the appropriate places. Both of these represent sprawl of Deployment's responsibilities beyond the abstraction layer where it was intended to sit. In addition, to make the immutable ConfigMap pattern universal, we would need to implement this in all workload controllers, as well as help custom controllers implement the right semantics.

  2. If Deployments automatically react to ConfigMap changes by creating new snapshots and triggering rollouts, this makes it dangerous to share a source ConfigMap across multiple Deployments. If you update the ConfigMap, you would have no way to independently control when a given Deployment gets updated, and you could not roll back one Deployment without affecting the others.

  3. If we let you put the source ConfigMap name in the Deployment Pod template, but ultimately real Pods refer to the snapshotted copies, we create a disconnect between declared intent in the template and reality in the Pod. This "hidden magic" would make it harder to reason about the system. It also defeats the declarative nature of rollbacks -- if the source ConfigMap name is the same before and after, what do you change in the Deployment Pod template to indicate you want to roll back to a particular snapshot?

  4. If, instead, we implement snapshotting outside Deployment, that outside thing would need to modify the Deployment spec so it no longer matches the canonical version applied by the user. It would also need to understand and reach into the inner workings of Deployment (e.g. ReplicaSets and their Pods) to link the lifecycle of the ConfigMap to those pieces as suggested. This is again a leakage of abstraction layers and would be difficult to generalize to other core and custom workload controllers.

The plan that @janetkuo proposed, combined with the kustomize feature, should make it possible to facilitate immutable-style ConfigMap rollouts with no changes at all to workload controllers, letting us preserve strong abstraction boundaries, which are critical to managing complexity in a system as messy as this.

@kfox1111

This comment has been minimized.

kfox1111 commented Sep 13, 2018

Snapshotting in a way already happens in Deployment. When you change the deployment, a snapshot is made as a replicaset. This would be similar in that when that happens, it snapshots the mentioned configmap/secrets at the same time and references them in the replicaset. It wouldn't have to 'watch' anything. it would only snapshot at the time the deployment was updated.

Triggering a new rollout after updating a configmap would be a null/unchanged edit. deployment controller would see that the configmap at time of the event changed and do an update.

automatically reacting to configmap/secret changes is an orthogonal issue. It doesn't have to do that, and the garbage collecition method doesn't do that either I think? (Not saying it wouldnt' be a nice feature to have)

the point of the snapshot would be that it isnt shared with any other deployment's replicaset pods other then the one it was snapshotted for. the configmap is only for that one replicaset. That allows proper roll forward/backward and easy lifecycle management. It has the drawback that it could cause multiple versions of the same configmap to be stored in etcd if you do a rolling upgrade without changing the configmap. I think that is probably a reasonable space/complexity tradeoff though.

yeah, the issue you raise in issue 3 is relevent. though we have "magic" in other places. like the volumeTemplate in StatefulSets. A flag saying the volume is "snapshotted: true" in the deployment and replicasets having snapshotted configmap names doesn't feel super heavy to me. It also isn't too much different then editing deployments causing randomly named replicasets to spring into existence. "magic" :)

it feels very weird to me that deployment is half of a lifecycle then expect that the rest of the life cycle be managed outside of the deployment. Like, the container provides immutable infrastructure. but then we build on top of that deployments that "provide rolling upgrades" but don't properly because configmaps are not immutable. then we gotta layer something that conceptually does the same thing "lets you do rolling upgrades" at a higher level on top of deployments to "really get to immutable deployment rolling upgrades". Its something that would be difficult to explain to users, and hard for them to understand. Yes, it could be done up at kustomize/helm/ansible/whatever. but feels to me too high, when deployments stated goal is the same but incomplete. And then we're also not repeating the same needs in a lot of different tools.

Basically saying, if you want immutable rolling upgrades (a best practice), you must choose between kustomize/helm/ansibe/whatever and implement it there, with some convention like tagging your configmaps specifically as garbage collected, seems very wrong to me.

We should either use kustomize/helm/ansible to completely drive the worlkflow of managing replicasets and skip deployments, or have deployments do all the things in common so the other tooling doesn't have to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment