Field `minReadySeconds` forces some replicas to wait more than predefined threshold #101319

rafaellima · 2021-04-21T12:45:05Z

What happened:

According to the official kubernetes deployment docs, the field .spec.minReadySeconds has the following definition:

specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing, for it to be considered available

When I create a new deployment with minReadySeconds set (let's say 60 seconds), and this deployment has more than one replica and a readiness probe configured, I expect a POD to be available 60 seconds after the Readiness probe is successful. However, the scenario I encountered is that after the first POD is considered available, all the other PODS take the value of minReadySeconds to be considered healthy.

I started looking in the source code to understand this behavior, and I've found this piece of code in the replicaset-controller that could be causing the issue. From what I could understand, it checks the availability of all PODS, re-enqueues the replicaset to run again via the rsc.queue.AddAfter using the minReadySeconds as a base. That implies that if only one POD is available, all the others only get checked after minReadySeconds interval.

What you expected to happen:

All replicas of a deployment being available after the transition to Ready plus what is defined in minReadySeconds.

How to reproduce it (as minimally and precisely as possible):

Use this deployment manifest:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
spec:
  replicas: 4
  minReadySeconds: 60
  selector:
    matchLabels:
      app: echoserver
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
        - name: echoserver
          image: gcr.io/google_containers/echoserver:1.10
          ports:
            - containerPort: 8080
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 8080
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
            initialDelaySeconds: 0

Run the following commands:

$ kubectl apply -f deployment.yaml
deployment.apps/echoserver created
$ kubectl rollout status deployment echoserver

This produces the following output:

Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 1 of 4 updated replicas are available...
deployment "echoserver" successfully rolled out

The time between the last 2 lines is exactly as what is defined in minReadySeconds, even though the remaining PODS are eligible for being available before that time.

Anything else we need to know?:

The issue is reproducible using kind in my local machine, but it happens in our production environment running in a cloud provider using linux.

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:09:28Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-22T22:54:21Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: kind cluster
OS (e.g: cat /etc/os-release): OSX - Big Sur 11.2.3
Kernel (e.g. uname -a):

Darwin Rafaels-MBP 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64

Install tools: kubectl and kind

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2021-04-21T12:45:12Z

@rafaellima: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2021-04-21T17:23:46Z

/sig apps

fejta-bot · 2021-07-20T18:35:02Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

k8s-triage-robot · 2021-08-19T19:07:16Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-09-18T19:37:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-09-18T19:37:15Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sclevine · 2023-11-22T18:31:26Z

Another data point: we are running into this issue in our production environment on EKS (v1.24.17-eks-43840fb), also with new deployments that have minReadySeconds set.

rafaellima added the kind/bug Categorizes issue or PR as related to a bug. label Apr 21, 2021

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 21, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 21, 2021

rafaellima changed the title ~~Field minReadySeconds forces PODS to wait more than predefined threshold~~ Field minReadySeconds forces some replicas to wait more than predefined threshold Apr 21, 2021

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 19, 2021

k8s-ci-robot closed this as completed Sep 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Field `minReadySeconds` forces some replicas to wait more than predefined threshold #101319

Field `minReadySeconds` forces some replicas to wait more than predefined threshold #101319

rafaellima commented Apr 21, 2021

k8s-ci-robot commented Apr 21, 2021

neolit123 commented Apr 21, 2021

fejta-bot commented Jul 20, 2021

k8s-triage-robot commented Aug 19, 2021

k8s-triage-robot commented Sep 18, 2021

k8s-ci-robot commented Sep 18, 2021

sclevine commented Nov 22, 2023

Field minReadySeconds forces some replicas to wait more than predefined threshold #101319

Field minReadySeconds forces some replicas to wait more than predefined threshold #101319

Comments

rafaellima commented Apr 21, 2021

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

k8s-ci-robot commented Apr 21, 2021

neolit123 commented Apr 21, 2021

fejta-bot commented Jul 20, 2021

k8s-triage-robot commented Aug 19, 2021

k8s-triage-robot commented Sep 18, 2021

k8s-ci-robot commented Sep 18, 2021

sclevine commented Nov 22, 2023

Field `minReadySeconds` forces some replicas to wait more than predefined threshold #101319

Field `minReadySeconds` forces some replicas to wait more than predefined threshold #101319