-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProgressDeadlineExceeded not set outside of Deployment rollouts #106054
Comments
@wking: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Working through Code Search results, looking for
Summarizing:
|
Ok, so
That's not what I'd expected, and it seems to be overloading "available" a bit, but I can wrap my head around it as "we have exactly the available, updated pods we want". So with a history like:
Currently 4 gets the |
For now I can see one issue with inconsistency of the Progressing condition ( I want to dive into the impact of this feature more later). meaning of the old behaviour:
meaning of the new behaviour:
So the difference with the new behaviour is that we cannot infer if we have a healthy rollout when some pods are disrupted but we have a better signal that the pods are disrupted. Also in either case we always can resolve to *edit: ability to progress is achieved if there was a complete deployment at some point in time for this rollout (revision) |
One thing to note is that since this is fixing a rare behaviour (deployment pods are disrupted after successful rollout), the things this would break would also be rare. For example. it could lead to race conditions in consumers which are polling the deployment status.
You can take a look at this case here: status_check.go#L308 & status_check.go#L220
You can observe the same polling issue here. Since it is waiting for a rollout and this observation could be skipped we might not get to this part resource_kubernetes_deployment.go#L254 even though the rollout happened. I know it is very unlikely to happen, but once it happens it will be very hard figure out what went wrong. This polling pattern is very common (as also seen in your examples) and could potentially cause problems (in rare cases). |
External disruption isn't that rare, although unrecoverable external disruption may be. For example, if you are keeping up with the tip of an OpenShift distribution channel, you are draining and rebooting every node in your cluster every week. |
That's a pretty thin race, where the disruption occurred within one polling period of the successful rollout. But sure, I'm open to alternatives. If there's no fixing |
So to reitarate, the issue is, to have an option as a cluster admin to look at any Deployment to see if any of its replicas lost Availability. And to have easier time reading disrupted/misbehaving deployments.
This could be indeed solved by a new condition called for example Although, one problem is that the condition would be set to Do any of these options sound feasible? Also, as a workaround currently you can set |
@atiratree wrt adding conditions I'd like you to keep in mind this work kubernetes/enhancements#2833 |
@soltysh since we are trying to be analogous to the Deployment |
@wking I am trying to document the current behaviour in more detail here: kubernetes/website#31226 Regarding the proposed changes to the |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
We need to make some progress on giving users deployment level summarization of useful level driven transitions (that match the user driven intent of making changes). A deployment or replicaset in steady state that is failing to get back to an available state for longer than a certain reasonable period should definitely be summarized as a condition, but i agree that changing Progressing is probably not the right place, because Progressing is about the creation of a new replica set (update), not existing replica set. Certainly Available=False Reason=(because of creating new replicas) is a proxy for that, but i don鈥檛 see progress deadline as having a role there because we can aggressively update deployment status when we can鈥檛 create replicas. We might want to have the reason change to something minReadySeconds related, but that is already associated with available. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I don't think we want the deployment controller to panic on short-term scheduling issues, and /remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'd floated #93933 earlier with a PR attempting to address this, and more recently opened rhbz#1983823. Here's a third try at pitching this proposal 馃.
/sig apps
Proposal
When a Deployment has a single underlying ReplicaSet that had previously completed, but which now lacks the target count of ready replicas, and the underling ReplicaSet fails to make progress for more than
progressDeadlineSeconds
, the Deployment controller should setProgressing=False
withProgressDeadlineExceeded
.Benefits
Deployment owners get a clear signal that the Deployment controller (via the delegated ReplicaSet controller) is struggling to recover the desired state. In situations where the Deployment controller continues to be unable to make progress, additional disruption will eventually push us into
Available=False
. ButProgressing=False
withProgressDeadlineExceeded
is a way the Deployment controller can request assistance before things get bad enough to goAvailable=False
.Downsides
Maybe someone depends on the current behavior and would be broken by the proposed pivot. Feedback welcome, if anyone can think of a use case that might be vulnerable.
Context
The only
Progressing
condition in kubernetes/kubernetes isDeploymentProgressing
, so we don't have to worry about changes to the Deployment controller's handling being inconsistent with other core controllers.The dev-facing godocs for Deployment's
Progressing
is and has been since it landed:"when new pods scale up or old pods scale down" is the fuzzy bit where this proposal is working.
From the user-facing Deployment docs:
This last is a bit vague. For example, if the Deployment is completing a rollout and the final Pod becomes ready for min ready seconds, the wording on that last condition is satisfied, but
the Deployment actually uses that event to transition to(edit: actually,Progressing=False
ProgressDeadlineExceeded
is the onlyProgressing=False
case). Anyhow, it's this last entry where this proposal is working, as the Deployment controller attempts to pass along information about how well the target ReplicaSet controller is doing at reconciling the desired state.From later in the user-facing Deployment docs:
This same "stuck trying to deploy its newest ReplicaSet" is the situation I'm concerned with. And again the docs are a bit vague. "the minimum required new replicas are available" seems like it's about
Available
despite belonging to a sentence discussingProgressing
. That line landed with the originalProgressDeadlineExceeded
docs, so it's pretty old. But "your Deployment may get stuck" is the situation I'm trying to detect.Reproducer
Using OpenShift's
oc
, which in this case is a fairly thin shim aroundkubectl
, on a 1.21.1 cluster:Looking at a happy, leveled deployment:
Making it impossible for that deployment to get new Pods:
Disrupt the workload by deleting a Pod (e.g. maybe we're draining a node in preparation to reboot):
Checking back in on the Deployment:
It has gone
Progressing=True
, and while the target ReplicaSet isAvailable=True
,the "has successfully progressed" in the(edit: actually,message
is a bit weird (I'd expect something about why we weren'tProgressing=False
like "is working to create additional pods")ProgressDeadlineExceeded
is the onlyProgressing=False
case, and you can see above that we wereProgressing=True
with the samereason
andmessage
in the leveled case too).And after the 10m (default)
progressDeadlineSeconds
:So no change there, despite going more than 10m without progress. The proposal is to adjust this result to be
Progressing=False
withProgressDeadlineExceeded
.Dropping down into the ReplicaSet:
So the Deployment controller is definitely not getting a lot of help from the ReplicaSet controller.
Alternatives
Prometheus alerts
In my OpenShift cluster, I do have
KubePodNotReady
firing with:But not all clusters will have Prometheus/Alertmanager installed. And if this was a sufficient guard for this situation, we wouldn't have needed
ProgressDeadlineExceeded
at all. Another benefit ofProgressDeadlineExceeded
over the alerts is thatprogressDeadlineSeconds
is a Deployment-specific knob, and having Deployment-specific alerts watching over the shoulder of a quiet Deployment controller seems pretty heavy, compared to making the Deployment controller a bit more forthcoming.Looking over the Deployment controller's shoulder
Deployment owners can work around the Deployment controller's current behavior by reaching around to find ReplicaSets. And then work around the ReplicaSet controller's current behavior (no conditions at all!) by reaching around to find Pods. And then see whether there are long-stuck Pods or other issues. But again, if this was a sufficient guard for this situation, we wouldn't have needed
ProgressDeadlineExceeded
at all. We grewProgressDeadlineExceeded
for the rollout case, because it's more convenient and reliable to have this watching code once in the Deployment controller, where all Deployment owners can benefit from the central analysis.Saying
Progressing
as a whole is rollout-specificAnother internally-consistent approach would be to say "
Progressing
is just about Deployment rollouts and the direct operands of the Deployment controller" and to explicitly exclude downstream operands like the Pods operated on by the ReplicaSet controller. In this case, the Deployment controller would not stayProgressing=True
after the rollout completed. But while this is internally-consistent, it doesn't seem all that helpful for Deployment owners, who would then need to walk the whole controller stack to see how their workload was doing.Having each controller be responsible for reporting any concerning behavior for the workload up to higher levels of the controller stack scales more easily, because each controller only needs to understand how its direct operands will report issues. In this vein, it would certainly be possible for the Deployment controller to delegate lack-of-progress detection to the ReplicaSet controller, and just pass it up the stack if/when the ReplicaSet controller reported it on the target ReplicaSet.
Ignoring issues as long as
Available=True
Available=False
is a pretty unambiguous signal, but depending on the workload, this can be pretty serious. For the ingress-router example I picked for my reproducer, it means "none of your cluster workloads have functional ingress anymore", which is about as bad as it gets and certainly in the midnight-admin-page space for some clusters. On the other hand, signals that get Deployment owners involved earlier on, when it's clear that reconciliation/recovery is having issues, but before the existing pods have all been deleted, allows for calmer, working-hours intervention.The text was updated successfully, but these errors were encountered: