Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1952174: status: Report old versions while rolling out new #274

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented May 18, 2021

computeOperatorStatusVersions: Delete receiver

  • pkg/operator/controller/status/controller.go (Reconcile): Delete receiver in call to computeOperatorStatusVersions.
    (computeOperatorStatusVersions): Delete unused receiver.
  • pkg/operator/controller/status/controller_test.go (TestComputeOperatorStatusVersions): Adjust call to computeOperatorStatusVersions.

status: Report old versions while rolling out new

For each operand, keep reporting the operand's old version in the clusteroperator status.versions value until the new version of the operand is fully rolled out, so that the clusteroperator doesn't report the new version before the operand has finished upgrading. Likewise, keep reporting the operator's old version until all operands are updated to their respective new versions.

  • pkg/operator/controller/status/controller.go (Reconcile): Use the new operatorState fields and the new computeOldVersions, computeNewVersions, and computeCurrentVersions functions to compute versions maps. Pass these versions maps to computeOperatorProgressingCondition. Pass the current versions map to computeOperatorStatusVersions.
    (operatorState): Add dnsDaemonSet, nodeResolverDaemonSet, dnsPodList, and nodeResolverPodList fields.
    (getOperatorState): Initialize the new fields in operatorState.
    (computeOperatorStatusVersions): Rewrite to take the new versions map and return a slice of OperandVersion.
    (computeOperatorProgressingCondition): Replace the OperandVersion slice and string parameter values with parameters for the old, new, and current versions maps. Use the versions maps to compute the Progressing status condition.
    (computeOldVersions): New function. Convert the OperandVersion slice value from the clusteroperator's status.versions field into a versions map.
    (computeNewVersions): New function. Convert the versions from the reconciler into a versions map.
    (computeCurrentVersions): New function. Use the old and new versions maps and dns and node-resolver daemonsets and pods to compute the current versions map.
  • pkg/operator/controller/status/controller_test.go (TestComputeOperatorProgressingCondition): Adjust for changes to computeOperatorProgressingCondition.
    (TestComputeOperatorStatusVersions): Delete test.
    (TestComputeCurrentVersions): New test. Verify that computeCurrentVersions behaves as expected.
  • go.mod: Add k8s.io/kubectl dependency for the "k8s.io/kubectl/pkg/util/podutils".IsPodAvailable function, used in computeCurrentVersions.
  • go.sum:
  • vendor/*: Regenerate.

Follow-up to #269.

* pkg/operator/controller/status/controller.go (Reconcile): Delete receiver
in call to computeOperatorStatusVersions.
(computeOperatorStatusVersions): Delete unused receiver.
* pkg/operator/controller/status/controller_test.go
(TestComputeOperatorStatusVersions): Adjust call to
computeOperatorStatusVersions.
@openshift-ci openshift-ci bot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label May 18, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 18, 2021

@Miciah: This pull request references Bugzilla bug 1952174, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @quarterpin

In response to this:

Bug 1952174: status: Report old versions while rolling out new

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 18, 2021
@openshift-ci openshift-ci bot requested a review from quarterpin May 18, 2021 03:11
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 18, 2021

@Miciah: This pull request references Bugzilla bug 1952174, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @quarterpin

In response to this:

Bug 1952174: status: Report old versions while rolling out new

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from candita and rfredette May 18, 2021 03:12
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2021
@Miciah Miciah force-pushed the BZ1952174-status-report-old-versions-while-rolling-out-new branch 3 times, most recently from 84f2683 to f62d668 Compare May 19, 2021 20:30
namespace corev1.Namespace

// haveDNS indicates whether the "default" dnses.openshift.operator.io
// CRD exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CRD exists.
// CR exists.

total nit, don't feel obligated to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, couldn't help feeling obligated to fix "CRD" here and above, as well as the API group name.

k8s.io/api v0.21.0
k8s.io/apimachinery v0.21.0
k8s.io/client-go v0.21.0
k8s.io/kubectl v0.21.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first this made me a little nervous!
But then I remembered that go mod is smart enough to only import what we actually need from kubectl.

@Miciah Miciah force-pushed the BZ1952174-status-report-old-versions-while-rolling-out-new branch from f62d668 to c2b2577 Compare May 20, 2021 14:01
@Miciah
Copy link
Contributor Author

Miciah commented May 21, 2021

/test e2e-upgrade

For each operand, keep reporting the operand's old version in the
clusteroperator status.versions value until the new version of the operand
is fully rolled out, so that the clusteroperator doesn't report the new
version before the operand has finished upgrading.  Likewise, keep
reporting the operator's old version until all operands are updated to
their respective new versions.

This commit fixes bug 1952174.

https://bugzilla.redhat.com/show_bug.cgi?id=1952174

* pkg/operator/controller/status/controller.go (Reconcile): Use the new
operatorState fields and the new computeOldVersions, computeNewVersions,
and computeCurrentVersions functions to compute versions maps.  Pass these
versions maps to computeOperatorProgressingCondition.  Pass the current
versions map to computeOperatorStatusVersions.
(operatorState): Add dnsDaemonSet, nodeResolverDaemonSet, dnsPodList, and
nodeResolverPodList fields.
(getOperatorState): Initialize the new fields in operatorState.
(computeOperatorStatusVersions): Rewrite to take the new versions map and
return a slice of OperandVersion.
(computeOperatorProgressingCondition): Replace the OperandVersion slice and
string parameter values with parameters for the old, new, and current
versions maps.  Use the versions maps to compute the "Progressing" status
condition.
(computeOldVersions): New function.  Convert the OperandVersion slice value
from the clusteroperator's status.versions field into a versions map.
(computeNewVersions): New function.  Convert the versions from the
reconciler into a versions map.
(computeCurrentVersions): New function.  Use the old and new versions maps
and dns and node-resolver daemonsets and pods to compute the current
versions map.
* pkg/operator/controller/status/controller_test.go
(TestComputeOperatorProgressingCondition): Adjust for changes to
computeOperatorProgressingCondition.
(TestComputeOperatorStatusVersions): Delete test.
(TestComputeCurrentVersions): New test.  Verify that computeCurrentVersions
behaves as expected.
* go.mod: Add k8s.io/kubectl dependency for the
"k8s.io/kubectl/pkg/util/podutils".IsPodAvailable function, used in
computeCurrentVersions.
* go.sum:
* vendor/*: Regenerate.
@Miciah
Copy link
Contributor Author

Miciah commented May 21, 2021

Reviewing the latest run of the e2e-upgrade CI job, e2e-upgrade/openshift-e2e-test/artifacts/junit/e2e-intervals_20210521-033733.json shows that the upgrade went from 04:29:04 to 04:33:11:

        {
            "level": "Warning",
            "locator": "clusteroperator/dns",
            "message": "condition/Progressing status/True reason/Upgrading kube-rbac-proxy to \"registry.build02.ci.openshift.org/ci-op-d7g7114b/stable@sha256:3869910c1e208b125bdecd4ac2d8b2cae42efe221c704491b86aa9b18ce95a65\".\nUpgrading operator to \"4.8.0-0.ci.test-2021-05-21-025413-ci-op-d7g7114b-latest\".\nUpgrading coredns to \"registry.build02.ci.openshift.org/ci-op-d7g7114b/stable@sha256:bcdefdbcee8af1e634e68a850c52fe1e9cb31364525e30f5b20ee4eacb93c3e8\".\nUpgrading openshift-cli to \"registry.build02.ci.openshift.org/ci-op-d7g7114b/stable@sha256:7c886e097d93dec6633844c20611937b028d283ddcd11fb2c33d89233dd00abe\".",
            "from": "2021-05-21T04:29:04Z",
            "to": "2021-05-21T04:33:11Z"
        },

And looking at e2e-upgrade/openshift-e2e-test/artifacts/e2e.log, it looks like the reported version for the node-resolver operand was updated at 04:29:37:

May 21 04:29:37.557 I clusteroperator/dns versions: openshift-cli registry.build02.ci.openshift.org/ci-op-d7g7114b/stable-initial@sha256:7c886e097d93dec6633844c20611937b028d283ddcd11fb2c33d89233dd00abe -> registry.build02.ci.openshift.org/ci-op-d7g7114b/stable@sha256:7c886e097d93dec6633844c20611937b028d283ddcd11fb2c33d89233dd00abe

And then the reported versions for the coredns and kube-rbac-proxy operands as well as for the operator itself were updated at 04:33:11:

May 21 04:33:11.128 I clusteroperator/dns versions: kube-rbac-proxy registry.build02.ci.openshift.org/ci-op-d7g7114b/stable-initial@sha256:3869910c1e208b125bdecd4ac2d8b2cae42efe221c704491b86aa9b18ce95a65 -> registry.build02.ci.openshift.org/ci-op-d7g7114b/stable@sha256:3869910c1e208b125bdecd4ac2d8b2cae42efe221c704491b86aa9b18ce95a65, operator 4.8.0-0.ci.test-2021-05-21-025011-ci-op-d7g7114b-initial -> 4.8.0-0.ci.test-2021-05-21-025413-ci-op-d7g7114b-latest, coredns registry.build02.ci.openshift.org/ci-op-d7g7114b/stable-initial@sha256:bcdefdbcee8af1e634e68a850c52fe1e9cb31364525e30f5b20ee4eacb93c3e8 -> registry.build02.ci.openshift.org/ci-op-d7g7114b/stable@sha256:bcdefdbcee8af1e634e68a850c52fe1e9cb31364525e30f5b20ee4eacb93c3e8

So versions were updated at 04:29:37 and 04:33:11, which fall within the period of 04:29:04 to 04:33:11 during which the operator indicated it was progressing, which is as expected.

I'll push a fix for #274 (comment) and hope for another good CI run.

@Miciah Miciah force-pushed the BZ1952174-status-report-old-versions-while-rolling-out-new branch from c2b2577 to 1b3e6d6 Compare May 21, 2021 06:33
@Miciah
Copy link
Contributor Author

Miciah commented May 24, 2021

In the latest run of the e2e-upgrade CI job, e2e-intervals_20210521-212654.json shows that the upgrade went from 22:17:33 to 22:21:36:

        {
            "level": "Warning",
            "locator": "clusteroperator/dns",
            "message": "condition/Progressing status/True reason/Upgrading operator to \"4.8.0-0.ci.test-2021-05-21-202215-ci-op-k590kbmz-latest\".\nUpgrading coredns to \"registry.build02.ci.openshift.org/ci-op-k590kbmz/stable@sha256:bcdefdbcee8af1e634e68a850c52fe1e9cb31364525e30f5b20ee4eacb93c3e8\".\nUpgrading openshift-cli to \"registry.build02.ci.openshift.org/ci-op-k590kbmz/stable@sha256:9c0914ba655514ee0a144b255f34c9ed7c5c66a7d90e04024355b9edb080dca9\".\nUpgrading kube-rbac-proxy to \"registry.build02.ci.openshift.org/ci-op-k590kbmz/stable@sha256:3869910c1e208b125bdecd4ac2d8b2cae42efe221c704491b86aa9b18ce95a65\".",
            "from": "2021-05-21T22:17:33Z",
            "to": "2021-05-21T22:21:36Z"
        },

And looking at e2e.log, it looks like the reported version for the node-resolver operand was updated at 22:18:24:

May 21 22:18:24.678 I clusteroperator/dns versions: openshift-cli registry.build02.ci.openshift.org/ci-op-k590kbmz/stable-initial@sha256:9c0914ba655514ee0a144b255f34c9ed7c5c66a7d90e04024355b9edb080dca9 -> registry.build02.ci.openshift.org/ci-op-k590kbmz/stable@sha256:9c0914ba655514ee0a144b255f34c9ed7c5c66a7d90e04024355b9edb080dca9

And then the reported versions for the coredns and kube-rbac-proxy operands as well as for the operator itself were updated at 22:21:36:

May 21 22:21:36.818 I clusteroperator/dns versions: operator 4.8.0-0.ci.test-2021-05-21-201759-ci-op-k590kbmz-initial -> 4.8.0-0.ci.test-2021-05-21-202215-ci-op-k590kbmz-latest, coredns registry.build02.ci.openshift.org/ci-op-k590kbmz/stable-initial@sha256:bcdefdbcee8af1e634e68a850c52fe1e9cb31364525e30f5b20ee4eacb93c3e8 -> registry.build02.ci.openshift.org/ci-op-k590kbmz/stable@sha256:bcdefdbcee8af1e634e68a850c52fe1e9cb31364525e30f5b20ee4eacb93c3e8, kube-rbac-proxy registry.build02.ci.openshift.org/ci-op-k590kbmz/stable-initial@sha256:3869910c1e208b125bdecd4ac2d8b2cae42efe221c704491b86aa9b18ce95a65 -> registry.build02.ci.openshift.org/ci-op-k590kbmz/stable@sha256:3869910c1e208b125bdecd4ac2d8b2cae42efe221c704491b86aa9b18ce95a65

So versions were updated at 22:18:24 and 22:21:36, which fall within the period of 22:17:33 to 22:21:36 during which the operator indicated it was progressing, which is as expected.

@Miciah
Copy link
Contributor Author

Miciah commented May 24, 2021

/test e2e-upgrade

@sgreene570
Copy link
Contributor

looks good! Thanks for the in-depth explanation in #274 (comment)

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 24, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit f264908 into openshift:master May 24, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 24, 2021

@Miciah: All pull requests linked via external trackers have merged:

Bugzilla bug 1952174 has been moved to the MODIFIED state.

In response to this:

Bug 1952174: status: Report old versions while rolling out new

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants