Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet rollout #1734

Merged
merged 1 commit into from
May 9, 2024

Conversation

dobsonj
Copy link
Member

@dobsonj dobsonj commented May 7, 2024

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels May 7, 2024
@openshift-ci openshift-ci bot requested a review from RomanBednar May 7, 2024 22:14
@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 7, 2024
@openshift-ci-robot
Copy link

@dobsonj: This pull request references Jira Issue OCPBUGS-24061, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented May 7, 2024

@dobsonj: GitHub didn't allow me to request PR reviews from the following users: openshift/storage.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

case daemonSet.Status.NumberUnavailable > 0:
case daemonSet.Status.NumberUnavailable > 0,
daemonSet.Status.UpdatedNumberScheduled < desiredNumber,
daemonSet.Status.NumberAvailable < desiredNumber:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand from the API docs, the comparison daemonSet.Status.NumberAvailable < desiredNumber is the same as daemonSet.Status.NumberUnavailable > 0.

Also, I think daemonSet.Status.UpdatedNumberScheduled < desiredNumber will always evaluate to true before daemonSet.Status.NumberAvailable < desiredNumber.

If this is true, we could drop daemonSet.Status.NumberAvailable < desiredNumber and still get the same behavior.

Makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: if we split this case we could return more helpful messages, likeWaiting for X nodes to have pods running and available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, thanks @bertinatto :)

I pushed a new version, it may be a little too verbose, but let me know what you think:

0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 0 out of 5 scheduled"
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 0 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 1 out of 5 scheduled"
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 1 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 2 out of 5 scheduled"
1s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 2 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 3 out of 5 scheduled"
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 3 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 4 out of 5 scheduled"
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 4 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

Copy link
Member Author

@dobsonj dobsonj May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it's better not to update the condition once for every node like my example above, because a 1000 node cluster would end up with 1000 status changes for a single DaemonSet rollout.

I pushed a simpler version that just says how many pods it will update:

0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update 5 node pods"
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update 5 node pods" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal   OperatorStatusChanged                            deployment/cluster-storage-operator              Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

Copy link
Contributor

openshift-ci bot commented May 8, 2024

@dobsonj: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jsafrane
Copy link
Contributor

jsafrane commented May 9, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2024
Copy link
Contributor

openshift-ci bot commented May 9, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dobsonj, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 9, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit dc3020f into openshift:master May 9, 2024
4 checks passed
@openshift-ci-robot
Copy link

@dobsonj: Jira Issue OCPBUGS-24061: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-24061 has not been moved to the MODIFIED state.

In response to this:

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.16.0-0.nightly-2024-05-14-095225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants