Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1867608: ds/machine-config-daemon: Set maxUnavailable 10% #1992

Merged
merged 1 commit into from Oct 21, 2020

Conversation

sdodson
Copy link
Member

@sdodson sdodson commented Aug 8, 2020

This daemonset is not critical to availability like SDN or DNS pods therefore
allow maxUnavailable to scale with cluster size. This roughly increases rollout
speed by 10x on a 250 node cluster.

How to verify this, provision a cluster with 20 or more hosts. Confirm that updates
to the machine-config-daemonset allows for up to 10% to update at once but otherwise
operates normally.

@sdodson
Copy link
Member Author

sdodson commented Aug 8, 2020

This is similar to others I'm updating as well, but for some reason this one feels riskier than those just given the criticality of MCO but I don't think it really is. Feedback welcome though.

openshift/cluster-monitoring-operator#902
openshift/cluster-node-tuning-operator#149

@sdodson
Copy link
Member Author

sdodson commented Aug 8, 2020

/retest

@sdodson
Copy link
Member Author

sdodson commented Aug 10, 2020

/retitle Bug 1867608: ds/machine-config-daemon: Set maxUnavailable 10%

@sdodson
Copy link
Member Author

sdodson commented Aug 10, 2020

/retest

@openshift-ci-robot openshift-ci-robot changed the title ds/machine-config-daemon: Set maxUnavailable 10% Bug 1867608: ds/machine-config-daemon: Set maxUnavailable 10% Aug 10, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. label Aug 10, 2020
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1867608, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1867608: ds/machine-config-daemon: Set maxUnavailable 10%

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 10, 2020
@kikisdeliveryservice
Copy link
Contributor

/test e2e-aws

@runcom
Copy link
Member

runcom commented Sep 1, 2020

@sdodson given how critical the MCO is, what do you think if we defer this to the next release to give it more and more time to soak? the issue is known (around timing) but it'll make us more comfortable and find us ready to debug if anything goes south. Would that work?

@cgwalters
Copy link
Member

/approve
Agree with the rationale here - in fact most of the time the MCD is effectively idle I believe. We watch for ssh events from journald but I'm not thinking offhand of anything else we react to other than when the MCC changes the desiredConfig annotation.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 1, 2020
@sdodson
Copy link
Member Author

sdodson commented Sep 1, 2020

Yeah, I'm fine with deferring this as long as we're open to backporting to 4.6.z before it transitions to maintenance phase. This is just an optimization.

@sdodson
Copy link
Member Author

sdodson commented Sep 1, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. and removed bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Sep 1, 2020
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1867608, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "4.7.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Sep 1, 2020
@openshift-ci-robot
Copy link
Contributor

@sdodson: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-metal-ipi ef0f01a5630edbbadfbec17690be49f94237d6bc link /test e2e-metal-ipi
ci/prow/okd-e2e-aws ef0f01a5630edbbadfbec17690be49f94237d6bc link /test okd-e2e-aws
ci/prow/e2e-ovn-step-registry ef0f01a5630edbbadfbec17690be49f94237d6bc link /test e2e-ovn-step-registry
ci/prow/e2e-aws ef0f01a5630edbbadfbec17690be49f94237d6bc link /test e2e-aws
ci/prow/e2e-upgrade ef0f01a5630edbbadfbec17690be49f94237d6bc link /test e2e-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sdodson
Copy link
Member Author

sdodson commented Oct 20, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Oct 20, 2020
@openshift-ci-robot
Copy link
Contributor

@sdodson: This pull request references Bugzilla bug 1867608, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This daemonset is not critical to availability like SDN or DNS pods therefore
allow maxUnavailable to scale with cluster size. This roughly increases rollout
speed by 10x on a 250 node cluster.
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 20, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

12 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit a135604 into openshift:master Oct 21, 2020
@openshift-ci-robot
Copy link
Contributor

@sdodson: All pull requests linked via external trackers have merged:

Bugzilla bug 1867608 has been moved to the MODIFIED state.

In response to this:

Bug 1867608: ds/machine-config-daemon: Set maxUnavailable 10%

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sdodson
Copy link
Member Author

sdodson commented Nov 19, 2020

This has sat in master for a month now and as far as I know there's been no ill effect.
/cherry-pick release-4.6

@openshift-cherrypick-robot

@sdodson: new pull request created: #2240

In response to this:

This has sat in master for a month now and as far as I know there's been no ill effect.
/cherry-pick release-4.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants