Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1986453: Check for API server and node versions skew #2658

Merged
merged 1 commit into from Jul 27, 2021

Conversation

QiWang19
Copy link
Member

@QiWang19 QiWang19 commented Jul 2, 2021

ref: https://issues.redhat.com/browse/OCPNODE-595
replace #2552

- What I did

Check for API server and node versions skew.
Update with the message the Kube API version is skew too far, but do not force
Upgradeable=False according to enhancement https://github.com/openshift/enhancements/pull/762/files

- How to verify it

- Description for the changelog

@QiWang19
Copy link
Member Author

QiWang19 commented Jul 2, 2021

/assign @sinnykumari

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial questions/comments

pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments

/hold

pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 2, 2021
@kikisdeliveryservice
Copy link
Contributor

@QiWang19 I know that this PR is based on someone else's previous PR, but I think we can refine it. What most of my comments are asking for is: using that work as a baseline, how can we make this into an understandable & actionable status for a user. Also keep in mind a pool can be large, so the info needs to be easy to consume if we have, say, 50 nodes that have unsupported skew.

Users can get into this situation bc a pool is paused and so it did not upgrade to match the apiserver and is still at an older version. They will likely have to let those pools upgrade (to at a minimum a supported skew) before they can initiate another clusterwide upgrade (which would cause the kubeapiserver to get even further away). So we need to think about telling them the state in a meaningful way, but also give them some hint about what they need to do to remedy it.

Happy to discuss further if you have any questions. =)

@kikisdeliveryservice kikisdeliveryservice changed the title Check for API server and node versions skew [OCPNODE-595] Check for API server and node versions skew Jul 3, 2021
@QiWang19 QiWang19 force-pushed the no-skew-check branch 4 times, most recently from 4cf72ac to 3664ebe Compare July 6, 2021 20:25
@QiWang19
Copy link
Member Author

QiWang19 commented Jul 6, 2021

@kikisdeliveryservice Thanks for the explanation. I have cleaned up some reviews. PTAL.

@QiWang19
Copy link
Member Author

QiWang19 commented Jul 7, 2021

/retest

2 similar comments
@QiWang19
Copy link
Member Author

QiWang19 commented Jul 8, 2021

/retest

@QiWang19
Copy link
Member Author

QiWang19 commented Jul 9, 2021

/retest

@QiWang19
Copy link
Member Author

@kikisdeliveryservice Thanks for the explanation. I have cleaned up some reviews. Could you PTAL?

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments

pkg/operator/status.go Outdated Show resolved Hide resolved
pkg/operator/status.go Outdated Show resolved Hide resolved
@kikisdeliveryservice
Copy link
Contributor

Just noting that when this is done, we need to get the commits updated so Qi is a co-author.

@QiWang19
Copy link
Member Author

A few comments.

Can we also see a must gather since there've been so many changes to this PR?

Thanks!

must-gather.tar.gz
must-gather attached. apiserver version is 1.21.1, kubelet version is 1.18.0.

// namespaces/openshift-machine-config-operator/pods/machine-config-operator-bbb5555d6-z5rvz/machine-config-operator/machine-config-operator/logs/current.log
2021-07-27T03:55:19.559426141Z I0727 03:55:19.559244       1 start.go:43] Version: 4.9.0-0.ci.test-2021-07-27-033800-ci-ln-04clxvb-latest (Raw: machine-config-daemon-4.6.0-202006240615.p0-975-g9f036fe9-dirty, Hash: 9f036fe9b49b6de2eacc991c37678587e14c0386)
2021-07-27T03:55:19.562886709Z I0727 03:55:19.562779       1 leaderelection.go:243] attempting to acquire leader lease openshift-machine-config-operator/machine-config...
2021-07-27T03:57:15.259786604Z I0727 03:57:15.259685       1 leaderelection.go:253] successfully acquired lease openshift-machine-config-operator/machine-config
2021-07-27T03:57:16.311510738Z I0727 03:57:16.311439       1 operator.go:262] Starting MachineConfigOperator
2021-07-27T03:57:22.603593629Z I0727 03:57:22.603512       1 status.go:325] kubelet skew status: KubeletSkewUnsupported, status reason: KubeletSkewUnsupported
2021-07-27T03:57:22.603811222Z I0727 03:57:22.603689       1 event.go:282] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"4753add8-791d-4b39-8ab8-3ecfbc169be2", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'KubeletSkewUnsupported' One or more nodes have an unsupported kubelet version skew. Please see `oc get nodes` for details and upgrade all nodes so that they have a kubelet version of at least 1.19.1.

@rphillips
Copy link
Contributor

/hold cancel
/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 27, 2021
@QiWang19 QiWang19 changed the title [OCPNODE-595] Check for API server and node versions skew Bug 1986453: Check for API server and node versions skew Jul 27, 2021
@openshift-ci openshift-ci bot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Jul 27, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 27, 2021

@QiWang19: This pull request references Bugzilla bug 1986453, which is invalid:

  • expected the bug to target the "4.9.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1986453: Check for API server and node versions skew

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jul 27, 2021
@QiWang19
Copy link
Member Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 27, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 27, 2021

@QiWang19: This pull request references Bugzilla bug 1986453, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.9.0) matches configured target release for branch (4.9.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (schoudha@redhat.com), skipping review request.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kikisdeliveryservice
Copy link
Contributor

/test e2e-agnostic-upgrade

@rphillips
Copy link
Contributor

I tested this PR locally and it does report the skew, and go back to no skew as expected.

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last thing otherwise looks good

pkg/operator/status.go Outdated Show resolved Hide resolved
Co-authored-by: Qi Wang <qiwan@redhat.com>
Signed-off-by: Qi Wang <qiwan@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 27, 2021
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of the work on this. I think we've gotten it to a good state.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2021
@rphillips
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 27, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 27, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, QiWang19, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 27, 2021

@QiWang19: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-disruptive 61c86bf43e472ef41b55fc826940026bebabb547 link /test e2e-aws-disruptive
ci/prow/okd-e2e-aws 1471d2c link /test okd-e2e-aws
ci/prow/e2e-aws-upgrade-single-node 1471d2c link /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-workers-rhel7 1471d2c link /test e2e-aws-workers-rhel7
ci/prow/e2e-ovn-step-registry 1471d2c link /test e2e-ovn-step-registry
ci/prow/e2e-aws-techpreview-featuregate 1471d2c link /test e2e-aws-techpreview-featuregate
ci/prow/e2e-aws-serial 1471d2c link /test e2e-aws-serial
ci/prow/e2e-vsphere-upgrade 1471d2c link /test e2e-vsphere-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 9c07edd into openshift:master Jul 27, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 27, 2021

@QiWang19: All pull requests linked via external trackers have merged:

Bugzilla bug 1986453 has been moved to the MODIFIED state.

In response to this:

Bug 1986453: Check for API server and node versions skew

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants