KEP-27: Detailed Control for Pod Restarts #1449

ANeumann82 · 2020-03-31T08:55:46Z

Signed-off-by: Andreas Neumann aneumann@mesosphere.com

Relates to #1424

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

nfnt

This is a great addition for improving stateful deployments of operators. Just have a comment regarding the backwards compatibility but happy to ship this as-is.

keps/0027-force-pod-reload-parameters.md

nfnt · 2020-03-31T10:10:20Z

keps/0027-force-pod-reload-parameters.md

+    forcePodRestart: "false"
+```
+
+The default value for this parameter would be `true` to keep backwards compatibility. The general behavior should stay


I'm all for keeping backwards compatibility. On the other hand I feel that enabling forced pod restarts is a feature that users have to opt-in, i.e. the default should be false. Forced pod restarts enable updating config maps on parameter updates with the caveat of changing some assumptions regarding pod lifetimes. As such the default should be what would be the expected behavior of pods not restarting.

I'm not sure. I think the pod restarting is something that you "usually" want. There may be applications that are "ConfigMap-Aware" and re-read config files without a pod restart. For these, a default of "false" would be ok, but I think most applications will require the pod restarts for most of the parameters.

Having this as "false" for default will probably lead to weird issues for operator developers where they wonder why they've updated a parameter, but the running application keeps running with an old value.

This sounds like something that would be useful to set as a "default" for all parameters, but I'm not sure adding that complexity in would be good.

@zmalik what would you say should be the default?

@ANeumann82 before we decide how to change the current behavior, I would like to understand better why it's there in the first place.

Do we have a record of the initial motivation/discussion/KEP for this kudo.dev/last-plan-execution-uid feature? It seems like you're implying that this was an intentional mechanism put in place to force pod restarts on any parameter change or upgrade. Did I read that right?

Or perhaps you are just describing the current effect, but this was not a designed feature but just an accident - i.e. we were trying to mimic what kustomize was doing?

Given the above lack of information, it is hard to judge but I have to say that in my opinion this feature is surprising in the negative sense:

it is not documented anywhere, as far as I can tell,

it changes the default behaviour of k8s statefulsets and deployments,

the name of the annotation does not provide any hint about for its (implied? intent) - for example when I first saw this annotation being changed very recently (when debugging the issue described in Bump to kudo v0.11.1 mesosphere/kudo-cassandra-operator#78) - it was far from obvious to me why this is happening

there is a serious scoping issue - it affects all statefulsets/deployments in an operator regardless of whether they actually use the parameter at all. Yes, there are ways to work that around by triggering a special plan, but this also has issues.

Adding a forcePodReload flag builds more complexity on an unstable foundation, which I think is exactly what we should be avoiding while we're still in v0... stage.

In general I'd vote for breaking compatibility in this case and get rid of this feature altogether. We did break compatibility for more minor reasons in the past.

If an operator developer wants a certain statefulset/deployment to restart on a parameter change this can be achieved in another way, either using the tool you mentioned, or by us providing a hash function that can be used as seen below by an operator developer to explicitly declare which parameters require a restart:

spec: template: metadata: annotations: config-values-hash: {{ hash( .Params.SOMETHING_THAT_AFFECTS_THIS_POD, .Params.SOMETHING_ELSE_THAT_AFFECTS_THIS_POD, ... ) }}

To give this discussion some context: the feature (though not documented) is very much intentional. Originally, we used stakater/Reloader but it came with its own issues so we decided to implement our own solution. Which right now is applied to all templates and surely can be smarter by updating/restarting only things that are affected.

Thanks for the links! Although I fail to see if #1025 was investigated to find the root cause? Maybe it's there in the description, but only obvious for the initiated :-)
Also I'm surprised to see so little discussion on #1031 🤔

Maybe @alenkacz or @gerred could shed additional light on it 😉

@porridge Thanks for that comment! I think @zen-dog answered some of the questions about the origin of this issue.

I kind of like your approach with the hash, although I don't like to put it directly into the statefulset-configuration: If we think about an operator with > 150 Parameters, the call to `hash( PARAM1, PARAM2, ..., PARAMN) would get really long, and require a operator developer to touch at least two or more places when adding a parameter.

I've added a second proposal to the document that takes up the idea of a hash calculation based on parameters, but have the hash-params defined on the parameters itself, please have a look.

zen-dog

Nice work, I left a few questions, naming suggestions.

keps/0027-force-pod-reload-parameters.md

zen-dog · 2020-04-01T10:13:29Z

keps/0027-force-pod-reload-parameters.md

+    forcePodRestart: "false"
+```
+
+The default value for this parameter would be `true` to keep backwards compatibility. The general behavior should stay


@zmalik what would you say should be the default?

keps/0027-force-pod-reload-parameters.md

zen-dog

LGTM! But let's get a few more eyes on it before we proceed.

kensipe

Similar to another KEP
I would love discuss alternatives first

params to change require a plan to exec to change them... I'm not sure I understand need to annotate the param. aren't we talking about what should trigger a plan? Perhaps I'm missing something.

keps/0027-force-pod-reload-parameters.md

kensipe · 2020-04-02T21:25:08Z

keps/0027-force-pod-reload-parameters.md

+
+This KEP describes addition of a flag for parameters that controls the forced pod reloading
+
+## Motivation


This motivation is not... it describes how something will affect another thing... it doesn't not encompass the motivation of why a user would want to do that. Without the details of the motivation it is challenging to understand if this is the best solution

we need to know the use cases which are in and out of scope for this kep... is this:

to restart a stuck pod

restart pods based on new config

is this in a step, or in a plan or

I've added a more detailed motivation section.

About the second comment:

No - I'm not sure how a pod can be stuck, but it's not about that.

Yes, basically.

It affects the whole plan - The effect for pod restarts is calculated based on the changed parameters. If all changed parameters have the forcePodRestart set to false, then the Pods will not automatically be restarted.

kensipe · 2020-04-02T21:28:51Z

keps/0027-force-pod-reload-parameters.md

+### Goals
+
+Make it possible for an operator developer to specify that a parameter will *not* automatically restart all pods from
+a stateful set or a deployment.


wow.. this went a different way... the goal is to be able to change a param and NOT have pods restart...
IS this only when a plan is being triggered? or?

Yes, exactly. This is calculated when a plan is triggered and affects if the currentPlanUID is written into the spec.template.spec.metadata.attributes

kensipe · 2020-04-02T21:31:30Z

keps/0027-force-pod-reload-parameters.md

+  - name: NODE_COUNT
+    description: "Number of Cassandra nodes."
+    default: "3"
+    forcePodRestart: "false"


is this only for pods? how do we know that? what is there is control on statefulset or deployment or?
I'm question the name forcePodRestart

It's generally for statefulsets and deployments, yes. It would affect all resources that create pods, so jobs as well - although as they don't usually keep pods running it won't really change behavior there.

I'm open for other suggestions with regards to naming, but forcePodRestart was the most apt name I could come up with.

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

porridge · 2020-04-14T11:11:31Z

I like Proposal 2 a lot, apart from the names of the properties - they describe the implementation rather than the intent. Probably something closer to param-set 🤔 ...

…es, cleanup Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

ANeumann82 · 2020-04-15T09:23:27Z

I've reworked the KEP quite substantially, and used @porridge s idea. @zen-dog @kensipe @nfnt Please have another look. I like this idea a lot more, it allows more flexibility, is easier to document and explain and may provide more room for improvement later on.

keps/0027-force-pod-reload-parameters.md

porridge

This is awesome. Some nits in-line, and can I also ask you to include the references provided by @zen-dog somewhere in the text? These things are precious.

keps/0027-force-pod-reload-parameters.md

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

zen-dog

While I welcome the fine-grained control this KEP offers over restarting pods, I can't help but wonder if we're putting a too high of a micromanagement burden on the operator developers. With big operators, > 100 parameters, dozens of configMaps/secrets/statefulSets an operator developer has to keep in mind all the dependency chains and update them accordingly.

KUDO, among other things, is supposed to be this higher-level abstraction layer that provides the magic glue that makes developing and operating complex applications easier. In this sense, I'd like to explore the alternative proposal that builds a dependency graph for used resources and parameters. We don't have to make a 100% generic solution to support all cases. Instead, we should concentrate on the 80/20 case at hand and limit it to k8s pods, jobs, statefulSets, Deployments, configMaps, and secrets. This will already solve our own issues and probably most of our users. And if more fine-grained control is necessary later we could always reiterate.

keps/0027-force-pod-reload-parameters.md

zen-dog · 2020-04-16T10:28:48Z

keps/0027-force-pod-reload-parameters.md

+If multiple parameters are changed in one update, the `forcePodRestart` flags of all attributes are `OR`ed: If at least one
+parameter has the `forcePodRestart` set to `true`, the pods will execute a rolling restart.
+
+This solution would be very easy to implement, but may be hard to explain and requires intimate knowledge of KUDO internals


Well, if this is hard to explain then {{ .ParamGroups.MAIN_GROUP.hash }} and managing all the GROUPs might be even harder 😉

zmalik

This is looking good!
but I feel we need a bit of discussion as the motivation feels like only one use case that is scaling up and down of the sts/deployments. And for that, we are introducing another dimension in parameters (groups) along with existing ones that might make the developer experience not so pleasant.

I would like to really talk about all the use cases where developers might use them. And if we can solve them through a less intrusive way. I see two use cases in the motivation but the use case where a parameter update should only update the configmap and not the sts/deployment is already possible by having specific plan that only updates configmaps.

zmalik · 2020-04-16T12:19:05Z

keps/0027-force-pod-reload-parameters.md

+Allow the operator developer to define groups of parameters that can then be used to trigger a restart of pods from
+a deployment or stateful set.
+
+Add a list of `groups` that each parameter is part of:


so with this operator developers will have to define which plan each parameter might trigger + define for all the groups the parameter belongs?

In the easiest case, to replicate the current behavior, the operator developer would just have to add:

spec: template: metadata: annotations: config-values-hash: {{ .ParamGroupHashes.ALL }}

without defining any custom groups.
But to define custom control, yes, the operator developer would have to add all required parameters to a custom group.

It doesn't affect which plans a parameter is used in though.

zmalik · 2020-04-16T12:28:52Z

keps/0027-force-pod-reload-parameters.md

+a rolling restart of Pods, and sometimes the restart is unwanted and can negatively affect the state of the application
+the operator is controlling.
+
+One example would be the `replica` count of a StatefulSet: An increase here should only start new Pods and not restart


do we have more examples other than replica? I feel we just have this one specific use case of scaling up and down sts/deployment.

in the case of another example, where an application regularly re-reads the config files, the trigger plan of the CM shouldn't include in any of its tasks the sts/deployment. That won't trigger any update.

zmalik · 2020-04-16T12:37:06Z

keps/0027-force-pod-reload-parameters.md

+instead of the the `last-plan-execution-uid` in a deployment or stateful set to trigger the reload of the pods:
+
+```yaml
+spec:


lets take the example of Cassandra so if a statefulset is using MAIN_GROUP like this:

annotations: config-hash: {{ .ParamGroups.MAIN_GROUP.hash }}

if a configmap uses replica to generate the configuration as Cassandra does

seeds: "{{- range $i, $node := until (int (min 2 .Params.NODE_COUNT)) -}}

that would still trigger a rolling restart of the stateful set with this solution. Even the content of seeds is the same in case of scaling up, but might be different in case of scaling down.

So we will be asking operator developers to restrict the use of the parameters to generate configuration they need?

That would depend on if the NODE_COUNT param would be in the MAIN_GROUP.
In the case of Cassandra, I would argue that NODE_COUNT would not be in the MAIN_GROUP, even though it is used in the specific config map.

Which is actually an interesting argument against the automatic dependency graph...

I would argue that NODE_COUNT would not be in the MAIN_GROUP, even though it is used in the specific config map.

that's buggy, as what if seeds content has actually changed and we do really need a rolling restart?

Which is actually an interesting argument against the automatic dependency graph...

I see its an argument against the current approach 🤔
if the dependency graph is implemented correctly it will know that seeds are same so no need to restart, and if they aren't the same then it totally valid to restart all pods.

I guess that depends on how the dependency graph is build. At the moment, the idea is to determine which parameters are used in which resource - and if any parameter changed, we need to restart.

Maybe I'd be possible to fully ignore the parameters in the dependency graph and only look at the rendered resources: If the rendered resource is the same as the currently deployed one, we don't need to restart.

The problem with this is the comparison, as K8s adds all the additional fields. But we do something similar with the 3 way merge when applying resources, so it might be possible.

kensipe

looks like my reject was reset... I would like to review today before we merge this

ANeumann82 · 2020-04-16T14:51:30Z

@kensipe Your reject wasn't reset - you're still marked as "Changes Requested" and the PR is blocked. I just re-requested your review because I changed so much :)

kensipe

I think I'm a fan of a combination of ideas surfaced in this kep discussion...
I like the label or annotation of a hash of params used...
I like the idea of that being auto generated based on params applied to that template (for params that are "included")
then any applied change that has a diff in the hash would require a restart ( I would prefer this to be separate logic... the controller reconciliation should pick up on the diff and restart or another controller for params)
I don't understand why the attribute forcePodRestart needs to include force. Can we just have restart? or podRestart

The grouping of params is an interesting idea.. but can we do better? Can we have a convention over configuration approach?

keps/0027-force-pod-reload-parameters.md

kensipe · 2020-04-16T17:40:33Z

keps/0027-force-pod-reload-parameters.md

+An operator would still want to configure if and when pod restarts may be required, as a pod can be aware of changing
+ConfigMaps. 
+
+Calculating the dependency graph would be a very complex undertaking, and may not even be possible in all cases. The


I'm not sure this is true... I'm not sure what you may have in mind for a dependency graph... but it is fairly easy to get a list of tasks for a plan and a list of resources for those tasks... and should be easy to get all the params used by those resources (we do this for package verification)... then we could have an attribute on params which indicates it is to be excluded (inclusion by default) in triggered update. The automatic hash would be the collection of all params for all resources for that plan that are "included". I think I like this approach more. It would be great to reduce the burden on the operator dev. I'm not a fan (as of yet) with the "grouping" of params

Yeah, I've talked with @zen-dog about the complexity of building this dependency graph, and it seems doable, although I feel it might be a bit complex. But I'll expand on this idea a bit more.

we should add another approach which is parameter-free

and we can use the task's rendered resources to build a dependency graph. If the rendered content of CM(resource) is the same we know we don't need to update all dependent resources. But if the content of CM has changed we need to update the dependent resources.

All resources can have an annotation of the hash of rendered resource on which they depend. And KUDO just updated the hash on each update.

When I say rendered resources I mean a rendered version of templates, for example, https://github.com/kudobuilder/operators/blob/master/repository/kafka/operator/templates/health-check.yaml where we can control the content.
Because with kubernetes objects any small status update will change the hash.

That sounds workable. I have some thoughts about resources about resources that are deployed by different plans, but that should be solvable. I'll flesh that out and add it to the KEP

@zmalik I've updated the KEP with your proposal, and I think that looks like the best solution so far.

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

keps/0027-pod-restart-controls.md

zmalik · 2020-04-20T12:59:58Z

keps/0027-pod-restart-controls.md

+
+## Proposal 2 - Dependency Graph
+
+KUDO will analyze the operator and build a dependency graph of resources:


what about instead of analyzing, providing developers a way to register the dependencies?

spec: template: metadata: annotations: - health-check-cm: {{ .TemplatesHash.healthcheck.yaml }} - bootstrap-script: {{ .TemplatesHash.bootstrap.sh.yaml }}

on each update, we just update the hashes where they are used and make it easier the "dependency" approach.
WDYT?

That would be easier on us, but prone to getting out of sync with real dependencies. Analyzing actual ones means we have a single source of truth.

Hmmm, i don't think the analysis would be too hard in these cases, so I don't think that part makes it more compelling.

If we were to go with this proposal, we'd at least have to replace any dots and special chars:

- bootstrap-script: {{ .TemplatesHash.bootstrap-sh-yaml }}

This might get more annoying if we at some point allow resources to be in subdirectories.

Pro:

It would remove the requirement to have an annotation in a configmap/secret when that resource should not be included in the hash

It would make very clear which resources trigger a pod restart

Con:

It would require the operator developers to add new resources to the annotations.

To replicate the current behavior, an operator developer would have to add all used resources in annotations

I'm very undecided. My gut feeling likes the dependency variant better.

zen-dog · 2020-04-20T18:20:23Z

keps/0027-pod-restart-controls.md

+- Each deployment, stateful set and batch job will be analysed and contains a list of required resources (ConfigMaps and Secrets).
+- When a resource is deployed, KUDO calculates a hash from all dependent resources and adds that to the spec template annotation.
+
+The resources required by a top-level resource may no necessarily be deployed in the same plan as the resource itself. This can lead to an update of a required resource in a plan that does not deploy the top-level resource and vice versa. To correctly calculate the hash of the required resources this needs to be done different ways:


The resources required by a top-level resource may no necessarily be deployed in the same plan as the resource itself

I think it reasonable to simply document this as a current restriction (unless we actually need this as of today). E.g. Pipe-tasks can also only be referenced within the same plan and so far nobody complained 😉

I'd leave that as an implementation option. I think it'd we worth it to make it "correct" and have no restrictions, but if it turns out to be a hard problem to implement we can go with the restriction.

zen-dog · 2020-04-20T18:21:20Z

keps/0027-pod-restart-controls.md

+- A config map that is used by a stateful set or deployment may not necessarily require a pod restart:
+  - In the Cassandra backup/restore branch, there is a side car container that waits for a backup command to be triggered. It reads the mounted config map every time the action is triggered, therefore removing the need to restart the full pod when the config map changes.
+  - If this config map were to be included in the hash calculation, it could trigger a restart of pods the next time the stateful set is deployed.
+  - A solution would be to allow a special annotation in config maps and secrets that lets KUDO skip this resource in the hash calculation.


Simple to explain and implement 👍

zen-dog

My vote goes to the second proposal!

porridge

I think the newer proposal is worth pursuing. More risky, but it would indeed put much less burden on the operator developer.

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

Add KEP to add forcePodReload attribute to params

e584875

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

ANeumann82 requested review from alenkacz, gerred, kensipe, nfnt and zen-dog as code owners March 31, 2020 08:55

nfnt approved these changes Mar 31, 2020

View reviewed changes

zen-dog reviewed Apr 1, 2020

View reviewed changes

zen-dog requested review from porridge, zmalik and akirillov April 2, 2020 12:38

zen-dog approved these changes Apr 2, 2020

View reviewed changes

kensipe requested changes Apr 2, 2020

View reviewed changes

ANeumann82 added 3 commits April 3, 2020 12:35

Merge branch 'master' into an/kep-27-forced-pod-reload

6108d63

Extended motivation, fixed some small issues

e6677fc

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

Added second proposal

02ad574

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

Use second proposal as main proposal, move old proposal to alternativ…

8a066a3

…es, cleanup Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

ANeumann82 changed the title ~~KEP-27: ForcePodReload flag for parameter~~ KEP-27: Detailed Control for Pod Restarts Apr 15, 2020

ANeumann82 requested review from zen-dog, nfnt and kensipe April 15, 2020 09:21

ANeumann82 commented Apr 15, 2020

View reviewed changes

keps/0027-force-pod-reload-parameters.md Outdated Show resolved Hide resolved

porridge approved these changes Apr 15, 2020

View reviewed changes

ANeumann82 mentioned this pull request Apr 15, 2020

Restarting all pods even when only scaling up #1036

Closed

Integrated review requests

ade6480

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

zen-dog reviewed Apr 16, 2020

View reviewed changes

zmalik reviewed Apr 16, 2020

View reviewed changes

kensipe requested changes Apr 16, 2020

View reviewed changes

Updated KEP to include dependency graph proposal

6b60ae7

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

porridge reviewed Apr 20, 2020

View reviewed changes

keps/0027-pod-restart-controls.md Outdated Show resolved Hide resolved

keps/0027-pod-restart-controls.md Outdated Show resolved Hide resolved

keps/0027-pod-restart-controls.md Outdated Show resolved Hide resolved

zmalik reviewed Apr 20, 2020

View reviewed changes

zen-dog reviewed Apr 20, 2020

View reviewed changes

zen-dog approved these changes Apr 20, 2020

View reviewed changes

porridge approved these changes Apr 21, 2020

View reviewed changes

Moved parameter groups to alternatives, small updates

fc1b8bb

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

kensipe approved these changes Apr 23, 2020

View reviewed changes

nfnt approved these changes Apr 24, 2020

View reviewed changes

Updated overview and status

1169789

Signed-off-by: Andreas Neumann <aneumann@mesosphere.com>

ANeumann82 merged commit 2eecbfe into master Apr 24, 2020

ANeumann82 deleted the an/kep-27-forced-pod-reload branch April 24, 2020 08:43


		This KEP describes addition of a flag for parameters that controls the forced pod reloading

		## Motivation


		## Proposal 2 - Dependency Graph

		KUDO will analyze the operator and build a dependency graph of resources:

KEP-27: Detailed Control for Pod Restarts #1449

KEP-27: Detailed Control for Pod Restarts #1449

Conversation

ANeumann82 commented Mar 31, 2020

nfnt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zen-dog left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zen-dog left a comment

Choose a reason for hiding this comment

kensipe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

porridge commented Apr 14, 2020

ANeumann82 commented Apr 15, 2020

porridge left a comment

Choose a reason for hiding this comment

zen-dog left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmalik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmalik Apr 16, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kensipe left a comment

Choose a reason for hiding this comment

ANeumann82 commented Apr 16, 2020

kensipe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zmalik Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zen-dog Apr 20, 2020 • edited Loading

Choose a reason for hiding this comment

zen-dog left a comment

Choose a reason for hiding this comment

porridge left a comment

Choose a reason for hiding this comment

zmalik Apr 16, 2020 •

edited

Loading

zmalik Apr 20, 2020 •

edited

Loading

zen-dog Apr 20, 2020 •

edited

Loading