Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field minReadySeconds forces some replicas to wait more than predefined threshold #101319

Closed
rafaellima opened this issue Apr 21, 2021 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@rafaellima
Copy link

What happened:

According to the official kubernetes deployment docs, the field .spec.minReadySeconds has the following definition:

specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing, for it to be considered available

When I create a new deployment with minReadySeconds set (let's say 60 seconds), and this deployment has more than one replica and a readiness probe configured, I expect a POD to be available 60 seconds after the Readiness probe is successful. However, the scenario I encountered is that after the first POD is considered available, all the other PODS take the value of minReadySeconds to be considered healthy.

I started looking in the source code to understand this behavior, and I've found this piece of code in the replicaset-controller that could be causing the issue. From what I could understand, it checks the availability of all PODS, re-enqueues the replicaset to run again via the rsc.queue.AddAfter using the minReadySeconds as a base. That implies that if only one POD is available, all the others only get checked after minReadySeconds interval.

What you expected to happen:

All replicas of a deployment being available after the transition to Ready plus what is defined in minReadySeconds.

How to reproduce it (as minimally and precisely as possible):

Use this deployment manifest:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
spec:
  replicas: 4
  minReadySeconds: 60
  selector:
    matchLabels:
      app: echoserver
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
        - name: echoserver
          image: gcr.io/google_containers/echoserver:1.10
          ports:
            - containerPort: 8080
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 8080
              scheme: HTTP
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
            initialDelaySeconds: 0

Run the following commands:

$ kubectl apply -f deployment.yaml
deployment.apps/echoserver created
$ kubectl rollout status deployment echoserver

This produces the following output:

Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 0 of 4 updated replicas are available...
Waiting for deployment "echoserver" rollout to finish: 1 of 4 updated replicas are available...
deployment "echoserver" successfully rolled out

The time between the last 2 lines is exactly as what is defined in minReadySeconds, even though the remaining PODS are eligible for being available before that time.

Anything else we need to know?:

The issue is reproducible using kind in my local machine, but it happens in our production environment running in a cloud provider using linux.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.9", GitCommit:"9dd794e454ac32d97cde41ae10be801ae98f75df", GitTreeState:"clean", BuildDate:"2021-03-18T01:09:28Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-22T22:54:21Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: kind cluster
  • OS (e.g: cat /etc/os-release): OSX - Big Sur 11.2.3
  • Kernel (e.g. uname -a):
Darwin Rafaels-MBP 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64
  • Install tools: kubectl and kind
@rafaellima rafaellima added the kind/bug Categorizes issue or PR as related to a bug. label Apr 21, 2021
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 21, 2021
@k8s-ci-robot
Copy link
Contributor

@rafaellima: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 21, 2021
@rafaellima rafaellima changed the title Field minReadySeconds forces PODS to wait more than predefined threshold Field minReadySeconds forces some replicas to wait more than predefined threshold Apr 21, 2021
@neolit123
Copy link
Member

/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 21, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 19, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sclevine
Copy link

Another data point: we are running into this issue in our production environment on EKS (v1.24.17-eks-43840fb), also with new deployments that have minReadySeconds set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests

6 participants