New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment enters creation hot-loop when rs field is mutated by API server #57167
Comments
Another hack to prevent the hot loop is to call kubernetes/pkg/api/pod/util.go Lines 231 to 252 in 0b9efae
|
This is a fragile check, and means deployments can already encounter this in the face of admission plugin modifications, not just because of this specific field. |
Agree, but there's no better existing way to check it. This is also documented here (re. mutating webhooks): https://github.com/kubernetes/website/pull/6650/files#diff-50fc51cb7d01e2cae2085d75b41e9ce8R324 |
Seems like a perfect use case for generation... on creation, record the generation of the deployment the replicaset is being created from. The new replicaset generation gets set to 1. If the replicaset spec is modified, its generation is incremented and can no longer be assumed to match the deployment spec. |
What if the replicaset is created manually and then adopted by the deployment? What if the user decided to roll back to a previous version of deployment (using replicaset as history)? This will break both use cases. What's more, the deployment should only |
it could record the name and generation it adopted
it could record the name and generation it rolled back to
That doesn't seem right. I'd expect things auto-modifying scale to be controlling the top of the object chain, otherwise scale changes would be lost when rolling out the next version of the deployment. |
I’m pretty sure that if deployments are completely broken when people very reasonably try to initialize / default fields on a pod spec, then deployments may not be sufficiently well designed. I think it’s reasonable to convert a tag to an image sha when a rs is created. It’s also reasonable to set resources, or add annotations to a pod template, or turn a config map ref into a copied sha. I will note we don’t have this option with StatefulSets, so callers may still have to solve this via direct mutation of the set in some cases, and we could argue consistency matters between deployment and stateful set more than flexibility on RS |
But the RS already has generation=X set before it's adopted. How does the deployment use its own generation to compare with the adopted RS's generation?
When rolling back, the deployment's template gets updated and then its generation++ (say it's N). Who will update RS's generation to make it match N?
Scale change has never been recorded in workloads. It's a fundamental design decision made early on. Rollouts are only triggered by template updates and rollbacks never touch things except for templates. When users roll back, their workloads won't be scaled, or anything like rollout strategy will be updated. This is implemented in other workloads API too. re revision comparison: We had solved this revision comparison problem before with DaemonSet. We implemented templateGeneration which is only increased when template is updated, and then label the child resource with parent's templateGeneration. This is needed for DaemonSet at that time because DaemonSet doesn't have history object then and we can't compare its template with its pods. templateGeneration is then deprecated with the introduction of history object (ControllerRevision) and for consistency. Another downside of templateGeneration is that when user wants to delete and recreate a Deployment with orphan-adoption (kubectl delete deployment --cascade=false), the user needs to manually set templateGeneration of the Deployment, otherwise the Deployment's history is messed up. Also, with templateGeneration, when the user rolls back a Deployment, the Deployment needs to update its own spec (applying template from an old ReplicaSet), and then update that ReplicaSet's (it's new ReplicaSet now after the rollback) label with the deployment's updated templateGeneration. This can't be done atomically. Then how does the deployment find its new ReplicaSet if the label update fails? Another open question wrt mutating webhook and workloads. If a webhook controller is updated, should it triggers rollouts? For example, an RS is now mutated differently on creation, should the Deployment start a new rollout? If it should, the generation approach won't work, either. |
Is there a reason not to initialize/default this at pod-level or deployment-level? If webhook mutates rs created by deployments, it changes deployment history too. It may have some side-effect on rollback, is this a concern too? I wish we could simply diff rs and deployment with strategic merge. |
IIUC Initializers only run at creation, so doing this at the Deployment level would miss updates. For tasks like resolving tag->digest, you want this to happen at a cut-point in the deployment process (to give strong consistency across replication), so per-Pod resolution is not much better than |
Correct. Now the challenge is that STS and DS don't have this same
intermediate object, so any solution for them is going to have to be on the
deployment itself, which breaks apply (in some cases)
…On Wed, Dec 27, 2017 at 10:20 AM, Matt Moore ***@***.***> wrote:
Is there a reason not to initialize/default this at pod-level or
deployment-level?
IIUC Initializers only run at creation, so doing this at the Deployment
level would miss updates.
For tasks like resolving tag->digest, you want this to happen at a
cut-point in the deployment process (to give strong consistency across
replication), so per-Pod resolution is not much better than imagePullPolicy:
Always.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#57167 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_pxV6uD4Zw3gRUx6qfO1JOC2znfNeks5tEmAwgaJpZM4RBZDa>
.
|
OTOH, |
No, alpha fields should not be persisted in any object. |
@janetkuo Are RollBacks not deprecated in |
The We deprecated this field not because we didn't want to support rollback, but because we didn't want the controller to mutate its own spec. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I wrote a doc to discuss the intersection of Deployment & mutating admission controllers: https://goo.gl/1JEEhS |
I came back to this, but this also means you can't toggle a cluster to enable a feature gate without potentially causing your deployments to go into a hot loop (for this and other DropDisabledTemplateFields which we have a lot more of now). |
/kind bug
What Happened
A deployment enters create new replicaset hot-loop.
In deployment's spec.template:
In its rs's spec.template:
This will happen when you create a Deployment that specifies
volumes.EmptyDir
in 1.7.0 - 1.7.5, and then upgrade the cluster to >= 1.8.0, with LocalStorageCapacityIsolation disabled.Root Cause
Some background information:
volumes.EmptyDir.sizeLimit
was introduced in 1.7.0, it's an optional field, but incorrectly set asresource.Quantity
type.*resource.Quantity
later (Change SizeLimit to a pointer #50163) in 1.8.0 and backported to 1.7.6.nil
if the LocalStorageCapacityIsolation feature isn’t enabled:https://github.com/kubernetes/kubernetes/blob/v1.8.0/pkg/api/pod/util.go#L242 (this fix will soon be cherrypicked to the next 1.7.x release).
If you create a Deployment that specifies
volumes.EmptyDir
in 1.7.0 - 1.7.5, it will incorrectly set sizeLimit to "0" by default, because of 1 mentioned above.If you then upgrade the cluster to 1.8.0, the
sizeLimit: "0"
in rs will be cleared, because of 3 mentioned above .Deployment cannot find its new replicaset because of the template change, and continuous creating more new replicasets, which will still have different template after creation.
Solution
A possible solution is to implement
Create()
in dry-run mode, and have deployments to use dry-run created repliaset template (instead of deployment template) to compare and find current replicaset. This is a long term solution.A possible short term solution is to implement a hack that clears Deployment's
volumes.EmptyDir.sizeLimit
with ReplicaSets. The code here should do the trick: https://github.com/kubernetes/kubernetes/blob/release-1.8/pkg/registry/extensions/deployment/strategy.go#L90-L91 except that the Deployment needs to be updated to trigger this cleanup code.Workaround
For someone who hit this issue, updating a deployment will trigger https://github.com/kubernetes/kubernetes/blob/release-1.8/pkg/registry/extensions/deployment/strategy.go#L90-L91 and thus solve the problem automatically.
@kubernetes/sig-apps-bugs @liggitt
The text was updated successfully, but these errors were encountered: