Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add blog article about minReadySeconds for StatefulSets and maxSurge for DaemonSets #35440

Conversation

ravisantoshgudimetla
Copy link
Contributor

sig-apps is excited to promote 2 features to GA this release:

  • minReadySeconds for StatefulSets
  • maxSurge for DS pods

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 27, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sftim for approval by writing /assign @sftim in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the area/blog Issues or PRs related to the Kubernetes Blog subproject label Jul 27, 2022
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Jul 27, 2022
@netlify
Copy link

netlify bot commented Jul 27, 2022

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 794299f
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/62f6d72b2a53020008c652a4
😎 Deploy Preview https://deploy-preview-35440--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@sftim
Copy link
Contributor

sftim commented Jul 27, 2022

/retitle [WIP] Add blog article about minReadySeconds for StatefulSets and maxSurge for DaemonSets

@k8s-ci-robot k8s-ci-robot changed the title Sig-apps 1.25 GA features [WIP] Add blog article about minReadySeconds for StatefulSets and maxSurge for DaemonSets Jul 27, 2022
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 27, 2022
@sftim
Copy link
Contributor

sftim commented Jul 29, 2022

Related to #35538

@atiratree
Copy link
Member

and related to #35539

@katcosgrove
Copy link
Contributor

Hi from the Comms team! Just a reminder that the Ready to Review deadline for feature blogs is Tuesday, August 16. You will also be assigned a publication date. Is there anything we can do to help you right now?

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Aug 12, 2022

### MinReadySeconds for StatefulSets
`minReadySeconds` ensures that the statefulset workload is `Ready` for the given number of seconds before calling the
pod `Available`. The notion of being `Ready` and `Available` is quiet important for workloads. For example, some workloads like Prometheus with multiple instances of Alertmanager should be considered `Available` only when the Alertmanager's state transfer is complete. `minReadySeconds` also helps when using loadbalancers with cloud providers. Since the pod should be `Ready` for the given number of seconds, it provides buffer time to prevent killing pods in rotation before new pods show up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pod `Available`. The notion of being `Ready` and `Available` is quiet important for workloads. For example, some workloads like Prometheus with multiple instances of Alertmanager should be considered `Available` only when the Alertmanager's state transfer is complete. `minReadySeconds` also helps when using loadbalancers with cloud providers. Since the pod should be `Ready` for the given number of seconds, it provides buffer time to prevent killing pods in rotation before new pods show up.
pod `Available`. The notion of being `Ready` and `Available` is quite important for workloads. For example, some workloads like Prometheus with multiple instances of Alertmanager should be considered `Available` only when the Alertmanager's state transfer is complete. `minReadySeconds` also helps when using loadbalancers with cloud providers. Since the pod should be `Ready` for the given number of seconds, it provides buffer time to prevent killing pods in rotation before new pods show up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the last sentence says it, but maybe we should mention specifically rollouts? We discussed something similar in the docs PR: #35539 (comment)


**Authors:** Ravi Gudimetla (Apple), Filip Krepensky (Red Hat), Maciej Szulik (Red Hat)

This blog describes the two features namely `minReadySeconds for StatefulSets` and `maxSurge for Daemonsets` that sig-apps is happy to graduate to stable in 1.25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This blog describes the two features namely `minReadySeconds for StatefulSets` and `maxSurge for Daemonsets` that sig-apps is happy to graduate to stable in 1.25
This blog describes the two features namely `minReadySeconds for StatefulSets` and `maxSurge for DaemonSets` that sig-apps is happy to graduate to stable in 1.25


You are required to download and install a kubectl greater than v1.22.0 version

Specify a value for `minReadySeconds` for any StatefulSet and you check if pods are available or not by checking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Specify a value for `minReadySeconds` for any StatefulSet and you check if pods are available or not by checking
Specify a value for `minReadySeconds` for any StatefulSet and check if pods are available or not by inspecting



### MaxSurge for DaemonSets
`MaxSurge` allows a daemonset workload to run multiple instances of the same pod on a node during rollout to minimize the downtime of the daemonset to other consumers. Kubernetes system-level components like CNI, CSI are typically run as daemonsets. These components can have impact on the availablity of the workloads if those daemonsets go down momentarily during the upgrades. The feature allows daemonset pods to surge, there by ensuring zero-downtime for the daemonsets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`MaxSurge` allows a daemonset workload to run multiple instances of the same pod on a node during rollout to minimize the downtime of the daemonset to other consumers. Kubernetes system-level components like CNI, CSI are typically run as daemonsets. These components can have impact on the availablity of the workloads if those daemonsets go down momentarily during the upgrades. The feature allows daemonset pods to surge, there by ensuring zero-downtime for the daemonsets.
`MaxSurge` allows a daemonset workload to run multiple instances of the same pod on a node during rollout to minimize the downtime of the daemonset to other consumers. Kubernetes system-level components like CNI, CSI are typically run as daemonsets. These components can have impact on the availablity of the workloads if those daemonsets go down momentarily during the upgrades. The feature allows daemonset pods to surge, thereby ensuring zero-downtime for the daemonsets.

### MaxSurge for DaemonSets

Specify the update strategy to `RollingUpdate` and set `.spec.updateStrategy.rollingUpdate.maxSurge`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and observe a faster rollout and higher number of pods running at the same time in the next rollout

what about adding the observe part as well in some form?

slug: "sig-apps features graduating to stable in 1.25"
---

**Authors:** Ravi Gudimetla (Apple), Filip Krepensky (Red Hat), Maciej Szulik (Red Hat)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Authors:** Ravi Gudimetla (Apple), Filip Krepensky (Red Hat), Maciej Szulik (Red Hat)
**Authors:** Ravi Gudimetla (Apple), Filip Krepinsky (Red Hat), Maciej Szulik (Red Hat)

@katcosgrove
Copy link
Contributor

Hi from Comms! Your assigned publication date is September 16. Thank you!

@sftim
Copy link
Contributor

sftim commented Aug 22, 2022

@ravisantoshgudimetla it'd be great to get this PR ready for review. Would you like help with that?

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start with a simple use-case describing the necessity for both of these features and also add d add that this effort is to align higher-level workloads controllers between each other, since Deployments already support both of these features. Only then go into details how this works and how is implemented.

## What problems does these features solve?

### MinReadySeconds for StatefulSets
`minReadySeconds` ensures that the statefulset workload is `Ready` for the given number of seconds before calling the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the resource names always upper case:

Suggested change
`minReadySeconds` ensures that the statefulset workload is `Ready` for the given number of seconds before calling the
`minReadySeconds` ensures that the StatefulSet workload is `Ready` for the given number of seconds before calling the



### MaxSurge for DaemonSets
`MaxSurge` allows a daemonset workload to run multiple instances of the same pod on a node during rollout to minimize the downtime of the daemonset to other consumers. Kubernetes system-level components like CNI, CSI are typically run as daemonsets. These components can have impact on the availablity of the workloads if those daemonsets go down momentarily during the upgrades. The feature allows daemonset pods to surge, there by ensuring zero-downtime for the daemonsets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`MaxSurge` allows a daemonset workload to run multiple instances of the same pod on a node during rollout to minimize the downtime of the daemonset to other consumers. Kubernetes system-level components like CNI, CSI are typically run as daemonsets. These components can have impact on the availablity of the workloads if those daemonsets go down momentarily during the upgrades. The feature allows daemonset pods to surge, there by ensuring zero-downtime for the daemonsets.
`MaxSurge` allows a DaemonSet workload to run multiple instances of the same pod on a node during rollout to minimize the downtime of the DaemonSet to other consumers. Kubernetes system-level components like CNI, CSI are typically run as DaemonSets. These components can have impact on the availability of the workloads if those DaemonSets go down momentarily during the upgrades. The feature allows DaemonSet pods to surge, there by ensuring zero-downtime for the DaemonSets.

### MaxSurge for DaemonSets
`MaxSurge` allows a daemonset workload to run multiple instances of the same pod on a node during rollout to minimize the downtime of the daemonset to other consumers. Kubernetes system-level components like CNI, CSI are typically run as daemonsets. These components can have impact on the availablity of the workloads if those daemonsets go down momentarily during the upgrades. The feature allows daemonset pods to surge, there by ensuring zero-downtime for the daemonsets.

Please note that the usage of `HostPort` in conjunction with `MaxSurge` in daemonsets is not allowed as daemonset pods are tied to a single node and two active pods cannot share the same port on the same node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Please note that the usage of `HostPort` in conjunction with `MaxSurge` in daemonsets is not allowed as daemonset pods are tied to a single node and two active pods cannot share the same port on the same node.
Please note that the usage of `HostPort` in conjunction with `MaxSurge` in DaemonSets is not allowed as DaemonSet pods are tied to a single node and two active pods cannot share the same port on the same node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And through the rest of the docs as well...


### MaxSurge for DaemonSets

The `DaemonSet` controller creates the additional pods based on the value given in `.spec.strategy.rollingUpdate.maxSurge`. The additional pods would run on the same node where the old daemonset pod is running till the old pod gets killed. This value cannot be `0` when `MaxUnavailable` is 0. The default value is 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `DaemonSet` controller creates the additional pods based on the value given in `.spec.strategy.rollingUpdate.maxSurge`. The additional pods would run on the same node where the old daemonset pod is running till the old pod gets killed. This value cannot be `0` when `MaxUnavailable` is 0. The default value is 0.
The DaemonSet controller creates the additional pods (above the desired number resulting from DaemonSet spec) based on the value given in `.spec.strategy.rollingUpdate.maxSurge`. The additional pods would run on the same node where the old daemonset pod is running till the old pod gets killed. This value cannot be `0` when `MaxUnavailable` is 0. The default value is 0.

or something similar where we'd explicitly call out the fact that .maxSurge is above the usual replicas.


### MinReadySeconds for StatefulSets

You are required to download and install a kubectl greater than v1.22.0 version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrelevant, I'd drop it.

@atiratree
Copy link
Member

After a dicsussion with @ravisantoshgudimetla, I have started a new PR #36763 that tries to address the suggested changes from here. So that hopefully we can publish it soon.

@soltysh
Copy link
Contributor

soltysh commented Sep 15, 2022

/close

@k8s-ci-robot
Copy link
Contributor

@soltysh: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/blog Issues or PRs related to the Kubernetes Blog subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants