Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProgressDeadlineExceeded not set outside of Deployment rollouts #106054

Closed
wking opened this issue Nov 1, 2021 · 23 comments
Closed

ProgressDeadlineExceeded not set outside of Deployment rollouts #106054

wking opened this issue Nov 1, 2021 · 23 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@wking
Copy link
Contributor

wking commented Nov 1, 2021

I'd floated #93933 earlier with a PR attempting to address this, and more recently opened rhbz#1983823. Here's a third try at pitching this proposal 馃.

/sig apps

Proposal

When a Deployment has a single underlying ReplicaSet that had previously completed, but which now lacks the target count of ready replicas, and the underling ReplicaSet fails to make progress for more than progressDeadlineSeconds, the Deployment controller should set Progressing=False with ProgressDeadlineExceeded.

Benefits

Deployment owners get a clear signal that the Deployment controller (via the delegated ReplicaSet controller) is struggling to recover the desired state. In situations where the Deployment controller continues to be unable to make progress, additional disruption will eventually push us into Available=False. But Progressing=False with ProgressDeadlineExceeded is a way the Deployment controller can request assistance before things get bad enough to go Available=False.

Downsides

Maybe someone depends on the current behavior and would be broken by the proposed pivot. Feedback welcome, if anyone can think of a use case that might be vulnerable.

Context

The only Progressing condition in kubernetes/kubernetes is DeploymentProgressing, so we don't have to worry about changes to the Deployment controller's handling being inconsistent with other core controllers.

The dev-facing godocs for Deployment's Progressing is and has been since it landed:

Progressing means the deployment is progressing. Progress for a deployment is considered when a new replica set is created or adopted, and when new Pods scale up or old Pods scale down. Progress is not estimated for paused deployments or when progressDeadlineSeconds is not specified.

"when new pods scale up or old pods scale down" is the fuzzy bit where this proposal is working.

From the user-facing Deployment docs:

Kubernetes marks a Deployment as progressing when one of the following tasks is performed:

  • The Deployment creates a new ReplicaSet.
  • The Deployment is scaling up its newest ReplicaSet.
  • The Deployment is scaling down its older ReplicaSet(s).
  • New Pods become ready or available (ready for at least MinReadySeconds).

This last is a bit vague. For example, if the Deployment is completing a rollout and the final Pod becomes ready for min ready seconds, the wording on that last condition is satisfied, but the Deployment actually uses that event to transition to Progressing=False (edit: actually, ProgressDeadlineExceeded is the only Progressing=False case). Anyhow, it's this last entry where this proposal is working, as the Deployment controller attempts to pass along information about how well the target ReplicaSet controller is doing at reconciling the desired state.

From later in the user-facing Deployment docs:

Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing. This can occur due to some of the following factors:

  • Insufficient quota
  • Readiness probe failures
  • ...

One way you can detect this condition is to specify a deadline parameter in your Deployment spec: (.spec.progressDeadlineSeconds). .spec.progressDeadlineSeconds denotes the number of seconds the Deployment controller waits before indicating (in the Deployment status) that the Deployment progress has stalled.
...
Type=Progressing with Status=True means that your Deployment is either in the middle of a rollout and it is progressing or that it has successfully completed its progress and the minimum required new replicas are available (see the Reason of the condition for the particulars - in our case Reason=NewReplicaSetAvailable means that the Deployment is complete).

This same "stuck trying to deploy its newest ReplicaSet" is the situation I'm concerned with. And again the docs are a bit vague. "the minimum required new replicas are available" seems like it's about Available despite belonging to a sentence discussing Progressing. That line landed with the original ProgressDeadlineExceeded docs, so it's pretty old. But "your Deployment may get stuck" is the situation I'm trying to detect.

Reproducer

Using OpenShift's oc, which in this case is a fairly thin shim around kubectl, on a 1.21.1 cluster:

$ oc version
Client Version: 4.9.0-0.nightly-arm64-2021-07-08-160356
Server Version: 4.8.18
Kubernetes Version: v1.21.1+6438632

Looking at a happy, leveled deployment:

$ oc -n openshift-ingress get -o json deployment router-default | jq '{spec: (.spec | {replicas, progressDeadlineSeconds}), status: (.status | {availableReplicas, conditions})}'
{
  "spec": {
    "replicas": 2,
    "progressDeadlineSeconds": 600
  },
  "status": {
    "availableReplicas": 2,
    "conditions": [
      {
        "lastTransitionTime": "2021-11-01T17:03:12Z",
        "lastUpdateTime": "2021-11-01T17:03:12Z",
        "message": "Deployment has minimum availability.",
        "reason": "MinimumReplicasAvailable",
        "status": "True",
        "type": "Available"
      },
      {
        "lastTransitionTime": "2021-11-01T17:00:02Z",
        "lastUpdateTime": "2021-11-01T17:03:12Z",
        "message": "ReplicaSet \"router-default-84648d65b6\" has successfully progressed.",
        "reason": "NewReplicaSetAvailable",
        "status": "True",
        "type": "Progressing"
      }
    ]
  }
}

Making it impossible for that deployment to get new Pods:

$ oc get -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' -l node-role.kubernetes.io/worker= nodes | while read NODE; do oc adm cordon "${NODE}"; done
node/ci-ln-8wswzbb-72292-wckxp-worker-a-7rdzb cordoned
node/ci-ln-8wswzbb-72292-wckxp-worker-b-qkdbx cordoned
node/ci-ln-8wswzbb-72292-wckxp-worker-c-kp8qs cordoned

Disrupt the workload by deleting a Pod (e.g. maybe we're draining a node in preparation to reboot):

$ oc -n openshift-ingress get pods | grep router
router-default-84648d65b6-bjxvn   1/1     Running   0          30m
router-default-84648d65b6-mr4cg   1/1     Running   0          30m
$ oc -n openshift-ingress delete pod router-default-84648d65b6-bjxvn
pod "router-default-84648d65b6-bjxvn" deleted

Checking back in on the Deployment:

$ oc -n openshift-ingress get -o json deployment router-default | jq '{spec: (.spec | {replicas, progressDeadlineSeconds}), status: (.status | {availableReplicas, conditions})}'
{
  "spec": {
    "replicas": 2,
    "progressDeadlineSeconds": 600
  },
  "status": {
    "availableReplicas": 1,
    "conditions": [
      {
        "lastTransitionTime": "2021-11-01T17:03:12Z",
        "lastUpdateTime": "2021-11-01T17:03:12Z",
        "message": "Deployment has minimum availability.",
        "reason": "MinimumReplicasAvailable",
        "status": "True",
        "type": "Available"
      },
      {
        "lastTransitionTime": "2021-11-01T17:00:02Z",
        "lastUpdateTime": "2021-11-01T17:03:12Z",
        "message": "ReplicaSet \"router-default-84648d65b6\" has successfully progressed.",
        "reason": "NewReplicaSetAvailable",
        "status": "True",
        "type": "Progressing"
      }
    ]
  }
}

It has gone Progressing=True, and while the target ReplicaSet is Available=True, the "has successfully progressed" in the message is a bit weird (I'd expect something about why we weren't Progressing=False like "is working to create additional pods") (edit: actually, ProgressDeadlineExceeded is the only Progressing=False case, and you can see above that we were Progressing=True with the same reason and message in the leveled case too).

And after the 10m (default) progressDeadlineSeconds:

$ sleep 600
$ oc -n openshift-ingress get -o json deployment router-default | jq '{spec: (.spec | {replicas, progressDeadlineSeconds}), status: (.status | {availableReplicas, conditions})}'
{
  "spec": {
    "replicas": 2,
    "progressDeadlineSeconds": 600
  },
  "status": {
    "availableReplicas": 1,
    "conditions": [
      {
        "lastTransitionTime": "2021-11-01T17:03:12Z",
        "lastUpdateTime": "2021-11-01T17:03:12Z",
        "message": "Deployment has minimum availability.",
        "reason": "MinimumReplicasAvailable",
        "status": "True",
        "type": "Available"
      },
      {
        "lastTransitionTime": "2021-11-01T17:00:02Z",
        "lastUpdateTime": "2021-11-01T17:03:12Z",
        "message": "ReplicaSet \"router-default-84648d65b6\" has successfully progressed.",
        "reason": "NewReplicaSetAvailable",
        "status": "True",
        "type": "Progressing"
      }
    ]
  }
}

So no change there, despite going more than 10m without progress. The proposal is to adjust this result to be Progressing=False with ProgressDeadlineExceeded.

Dropping down into the ReplicaSet:

$ oc -n openshift-ingress get -o json replicaset router-default-84648d65b6 | jq .status
{
  "availableReplicas": 1,
  "fullyLabeledReplicas": 2,
  "observedGeneration": 1,
  "readyReplicas": 1,
  "replicas": 2
}

So the Deployment controller is definitely not getting a lot of help from the ReplicaSet controller.

Alternatives

Prometheus alerts

In my OpenShift cluster, I do have KubePodNotReady firing with:

Pod NS openshift-ingress / P router-default-84648d65b6-bwxcv has been in a non-ready state for longer than 15 minutes.

But not all clusters will have Prometheus/Alertmanager installed. And if this was a sufficient guard for this situation, we wouldn't have needed ProgressDeadlineExceeded at all. Another benefit of ProgressDeadlineExceeded over the alerts is that progressDeadlineSeconds is a Deployment-specific knob, and having Deployment-specific alerts watching over the shoulder of a quiet Deployment controller seems pretty heavy, compared to making the Deployment controller a bit more forthcoming.

Looking over the Deployment controller's shoulder

Deployment owners can work around the Deployment controller's current behavior by reaching around to find ReplicaSets. And then work around the ReplicaSet controller's current behavior (no conditions at all!) by reaching around to find Pods. And then see whether there are long-stuck Pods or other issues. But again, if this was a sufficient guard for this situation, we wouldn't have needed ProgressDeadlineExceeded at all. We grew ProgressDeadlineExceeded for the rollout case, because it's more convenient and reliable to have this watching code once in the Deployment controller, where all Deployment owners can benefit from the central analysis.

Saying Progressing as a whole is rollout-specific

Another internally-consistent approach would be to say "Progressing is just about Deployment rollouts and the direct operands of the Deployment controller" and to explicitly exclude downstream operands like the Pods operated on by the ReplicaSet controller. In this case, the Deployment controller would not stay Progressing=True after the rollout completed. But while this is internally-consistent, it doesn't seem all that helpful for Deployment owners, who would then need to walk the whole controller stack to see how their workload was doing.

Having each controller be responsible for reporting any concerning behavior for the workload up to higher levels of the controller stack scales more easily, because each controller only needs to understand how its direct operands will report issues. In this vein, it would certainly be possible for the Deployment controller to delegate lack-of-progress detection to the ReplicaSet controller, and just pass it up the stack if/when the ReplicaSet controller reported it on the target ReplicaSet.

Ignoring issues as long as Available=True

Available=False is a pretty unambiguous signal, but depending on the workload, this can be pretty serious. For the ingress-router example I picked for my reproducer, it means "none of your cluster workloads have functional ingress anymore", which is about as bad as it gets and certainly in the midnight-admin-page space for some clusters. On the other hand, signals that get Deployment owners involved earlier on, when it's clear that reconciliation/recovery is having issues, but before the existing pods have all been deleted, allows for calmer, working-hours intervention.

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 1, 2021
@k8s-ci-robot
Copy link
Contributor

@wking: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking
Copy link
Contributor Author

wking commented Nov 1, 2021

Working through Code Search results, looking for DeploymentProgressing (will miss some users, e.g. folks who are doing things like oc wait --for=condition=Progressing=False deployment/router-default, because it will not follow the Go -> CLI transition):

Summarizing:

  • I don't see any consumers who would be hurt by this proposal yet, and there are a few that I expect would be helped.
  • Progressing=True with reason NewReplicaSetAvailable is the happy case, and has been for years! I'd naively expected the Deployment to transition back to Progressing=False once it achieved its desired state, with Progressing=True being reserved for "we are making progress in reconciling towards a target we haven't reached yet". I need to go mull this over...

@wking
Copy link
Contributor Author

wking commented Nov 2, 2021

Ok, so Progressing=True with NewReplicaSetAvailable is the documented happy state and I'm just slow:

If you satisfy the quota conditions and the Deployment controller then completes the Deployment rollout, you'll see the Deployment's status update with a successful condition (Status=True and Reason=NewReplicaSetAvailable).

That's not what I'd expected, and it seems to be overloading "available" a bit, but I can wrap my head around it as "we have exactly the available, updated pods we want". So with a history like:

  1. Deployment is happy with spec A.
  2. Owner patches the Deployment to spec B.
  3. Deployment controller begins progressing towards B.
  4. Down to one ReplicaSet, only one Pod to go...
  5. Hooray, we made it! NewReplicaSetAvailable.
  6. Oops, someone deleted one of my pods. No matter, only one Pod to go...

Currently 4 gets the progressDeadlineSeconds and ProgressDeadlineExceeded guards around "hrm, I seem to be stuck", but 6 (which is exactly the same current situation but with a different history of having previously achieved spec B) currently does not. I'm proposing 4 and 6 both get the same progressDeadlineSeconds and ProgressDeadlineExceeded guards. I can start back in on the search results to see if we can find anyone vulnerable to that pivot...

@atiratree
Copy link
Member

atiratree commented Nov 2, 2021

For now I can see one issue with inconsistency of the Progressing condition ( I want to dive into the impact of this feature more later).

meaning of the old behaviour:

  • Progressing=True with NewReplicaSetAvailable with all available pods is saying: we know this rollout has an ability to progress and we have 100% availability
  • Progressing=True with NewReplicaSetAvailable with unavailable pods is saying: we know this rollout has an ability to progress even though we are not on 100% availability anymore
  • Progressing=False with ProgressDeadlineExceeded is saying: we know this rollout does not have an ability to progress

meaning of the new behaviour:

  • Progressing=True with NewReplicaSetAvailable with all available pods : we know this rollout has an ability to progress and we have 100% availability
  • Progressing=False with ProgressDeadlineExceeded is saying: we do not know if this rollout has an ability to progress (it might) and some pods are unavailable

So the difference with the new behaviour is that we cannot infer if we have a healthy rollout when some pods are disrupted but we have a better signal that the pods are disrupted.

Also in either case we always can resolve to NewReplicaSetAvailable after the ProgressDeadlineExceeded . So we will have the old behaviour before the deadline and the new behaviour after the deadline. Which could get quite confusing wrt semantics since you would always need to check the time of your deadline before consulting the status to know the real meaning.

*edit: ability to progress is achieved if there was a complete deployment at some point in time for this rollout (revision)

@atiratree
Copy link
Member

One thing to note is that since this is fixing a rare behaviour (deployment pods are disrupted after successful rollout), the things this would break would also be rare.

For example. it could lead to race conditions in consumers which are polling the deployment status.

  1. poller checks if rollout done
  2. rollout completes
  3. pod is disrupted -> rollout waiting for deployment
  4. poller checks if rollout done
  5. rollout waiting for deployment
    1. and 5. lasts for some amount of time
  6. rollout exceeded its progress deadline and sets ProgressDeadlineExceeded
  7. poller times out with ProgressDeadlineExceeded
  8. rollout completes and the disruption dissapears
    10 poller couldn't execute the logic after succesfull rollout as it could before

You can take a look at this case here: status_check.go#L308 & status_check.go#L220

The Kubernetes provider for Terraform treats ProgressDeadlineExceeded as a non-retryable error in waitForDeploymentReplicasFunc. But they only call it after creating or updating the Deployment, so they likely don't care about changes to out-of-rollout Progressing handling.

You can observe the same polling issue here. Since it is waiting for a rollout and this observation could be skipped we might not get to this part resource_kubernetes_deployment.go#L254 even though the rollout happened. I know it is very unlikely to happen, but once it happens it will be very hard figure out what went wrong.

This polling pattern is very common (as also seen in your examples) and could potentially cause problems (in rare cases).

@wking
Copy link
Contributor Author

wking commented Nov 17, 2021

One thing to note is that since this is fixing a rare behaviour (deployment pods are disrupted after successful rollout)...

External disruption isn't that rare, although unrecoverable external disruption may be. For example, if you are keeping up with the tip of an OpenShift distribution channel, you are draining and rebooting every node in your cluster every week.

@wking
Copy link
Contributor Author

wking commented Nov 17, 2021

poller couldn't execute the logic after succesfull rollout as it could before

That's a pretty thin race, where the disruption occurred within one polling period of the successful rollout. But sure, I'm open to alternatives. If there's no fixing Progressing's current overloading of "I previously leveled the current target" and "I'm currently working to level the current target", can we mint a new condition type pair to decouple? Because saying "meh, I guess we can never represent non-rollout disruption in Deployment conditions" seems like it is leaving a sizeable usability hole on the table.

@atiratree
Copy link
Member

atiratree commented Nov 25, 2021

Deployment owners get a clear signal that the Deployment controller (via the delegated ReplicaSet controller) is struggling to recover the desired state. In situations where the Deployment controller continues to be unable to make progress, additional disruption will eventually push us into Available=False. But Progressing=False with ProgressDeadlineExceeded is a way the Deployment controller can request assistance before things get bad enough to go Available=False.

So to reitarate, the issue is, to have an option as a cluster admin to look at any Deployment to see if any of its replicas lost Availability. And to have easier time reading disrupted/misbehaving deployments.

But sure, I'm open to alternatives. If there's no fixing Progressing's current overloading of "I previously leveled the current target" and "I'm currently working to level the current target", can we mint a new condition type pair to decouple?

This could be indeed solved by a new condition called for example CompletelyAvailable, FullyAvailable, etc. This would tell us if we lost any pod regardless of maxUnavailable value and indicate a potential problem. It could also include checking if all replicas are updated as well. It is up for a discussion.

Although, one problem is that the condition would be set to False any time there is a new rollout happening. Is this what we want in case we are looking for potential problems with our deployment? We could start checking this after the rollout completes/fails, but now I am not so sure what the name of the condition should be. So when triggering new rollout you would either get ReplicaSetFailedCreate, ProgressDeadlingExceeded or False in our new condition after the rollout.

Do any of these options sound feasible?

Also, as a workaround currently you can set maxUnavailable: 0 and set maxSurge: x for your own deployments and monitor Available. Although this would be a problem for workloads that have pods which can't share the same node.

@soltysh
Copy link
Contributor

soltysh commented Nov 26, 2021

@atiratree wrt adding conditions I'd like you to keep in mind this work kubernetes/enhancements#2833

@atiratree
Copy link
Member

atiratree commented Jan 6, 2022

@atiratree wrt adding conditions I'd like you to keep in mind this work kubernetes/enhancements#2833

@soltysh since we are trying to be analogous to the Deployment Progressing condition in other workloads, it probably doesn't make sense to bring new behaviours to the KEP at the moment

@atiratree
Copy link
Member

@wking I am trying to document the current behaviour in more detail here: kubernetes/website#31226

Regarding the proposed changes to the Progressing condition or introducing a new condition, I think it makes sense to wait for more input / interest from the community before committing to such a change.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2022
@atiratree
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 10, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 9, 2022
@atiratree
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 5, 2022
@smarterclayton
Copy link
Contributor

We need to make some progress on giving users deployment level summarization of useful level driven transitions (that match the user driven intent of making changes).

A deployment or replicaset in steady state that is failing to get back to an available state for longer than a certain reasonable period should definitely be summarized as a condition, but i agree that changing Progressing is probably not the right place, because Progressing is about the creation of a new replica set (update), not existing replica set. Certainly Available=False Reason=(because of creating new replicas) is a proxy for that, but i don鈥檛 see progress deadline as having a role there because we can aggressively update deployment status when we can鈥檛 create replicas. We might want to have the reason change to something minReadySeconds related, but that is already associated with available.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2023
@wking
Copy link
Contributor Author

wking commented Jan 25, 2023

... but i don鈥檛 see progress deadline as having a role there because we can aggressively update deployment status when we can鈥檛 create replicas...

I don't think we want the deployment controller to panic on short-term scheduling issues, and progressDeadlineSeconds is already owner-input on how long we wait before we decide lack-of-progress is suspiciously slow. But 馃し I'll take what I can get here, because it's not great to have owners either ignoring the issue, or reaching around the deployment controller to perform their own checks.

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 25, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 25, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2023
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests

6 participants