OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet rollout #1734

dobsonj · 2024-05-07T22:14:16Z

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

openshift-ci-robot · 2024-05-07T22:14:23Z

@dobsonj: This pull request references Jira Issue OCPBUGS-24061, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.16.0) matches configured target version for branch (4.16.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-05-07T22:14:25Z

@dobsonj: GitHub didn't allow me to request PR reviews from the following users: openshift/storage.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bertinatto · 2024-05-08T17:47:50Z

pkg/operator/csi/csidrivernodeservicecontroller/csi_driver_node_service_controller.go

-	case daemonSet.Status.NumberUnavailable > 0:
+	case daemonSet.Status.NumberUnavailable > 0,
+		daemonSet.Status.UpdatedNumberScheduled < desiredNumber,
+		daemonSet.Status.NumberAvailable < desiredNumber:


From what I understand from the API docs, the comparison daemonSet.Status.NumberAvailable < desiredNumber is the same as daemonSet.Status.NumberUnavailable > 0.

Also, I think daemonSet.Status.UpdatedNumberScheduled < desiredNumber will always evaluate to true before daemonSet.Status.NumberAvailable < desiredNumber.

If this is true, we could drop daemonSet.Status.NumberAvailable < desiredNumber and still get the same behavior.

Makes sense?

Just an idea: if we split this case we could return more helpful messages, likeWaiting for X nodes to have pods running and available.

It makes sense, thanks @bertinatto :)

I pushed a new version, it may be a little too verbose, but let me know what you think:

0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes") 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 0 out of 5 scheduled" 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 0 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 1 out of 5 scheduled" 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 1 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 2 out of 5 scheduled" 1s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 2 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 3 out of 5 scheduled" 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 3 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 4 out of 5 scheduled" 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update node pods, 4 out of 5 scheduled" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods" 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

Actually I think it's better not to update the condition once for every node like my example above, because a 1000 node cluster would end up with 1000 status changes for a single DaemonSet rollout.

I pushed a simpler version that just says how many pods it will update:

0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes") 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update 5 node pods" 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to update 5 node pods" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods" 0s Normal OperatorStatusChanged deployment/cluster-storage-operator Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

…ollout

openshift-ci · 2024-05-08T21:05:42Z

@dobsonj: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

jsafrane · 2024-05-09T11:36:46Z

/lgtm

openshift-ci · 2024-05-09T11:39:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dobsonj, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/operator/csi/OWNERS~~ [jsafrane]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2024-05-09T11:42:40Z

@dobsonj: Jira Issue OCPBUGS-24061: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

openshift/vmware-vsphere-csi-driver-operator#230 is open

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-24061 has not been moved to the MODIFIED state.

In response to this:

https://issues.redhat.com/browse/OCPBUGS-24061

Patching the ClusterCSIDriver causes the progressing condition to flip-flop between true and false during DaemonSet rollout. For example:

$ oc -n openshift-cluster-storage-operator get event --sort-by='.lastTimestamp' --watch

$ oc patch clustercsidriver/csi.vsphere.vmware.com --type=merge -p '{"spec":{"driverConfig":{"vSphere":{"globalMaxSnapshotsPerBlockVolume": 5}}}}'

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
1s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

We should also be evaluating UpdatedNumberScheduled and NumberAvailable and wait until they reach DesiredNumberScheduled to avoid reporting progressing=false until the rollout is complete. This is consistent with what we already do in the CSO deployment controller.

After making these changes, and testing a vmware-vsphere-csi-driver-operator build with this change vendored:

0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing message changed from "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
0s          Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from True to False ("VSphereCSIDriverOperatorCRProgressing: All is well\nSHARESCSIDriverOperatorCRProgressing: All is well")

/cc @openshift/storage @RomanBednar

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-robot · 2024-05-14T16:57:35Z

Fix included in accepted release 4.16.0-0.nightly-2024-05-14-095225

openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels May 7, 2024

openshift-ci bot requested a review from RomanBednar May 7, 2024 22:14

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 7, 2024

dobsonj added a commit to dobsonj/vmware-vsphere-csi-driver-operator that referenced this pull request May 7, 2024

DO NOT MERGE: testing openshift/library-go#1734

6bdb160

dobsonj mentioned this pull request May 7, 2024

OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet rollout openshift/vmware-vsphere-csi-driver-operator#230

Merged

bertinatto reviewed May 8, 2024

View reviewed changes

dobsonj force-pushed the OCPBUGS-24061 branch from 86e04f3 to ce1e1c9 Compare May 8, 2024 20:36

OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet r…

1e09506

…ollout

dobsonj force-pushed the OCPBUGS-24061 branch from ce1e1c9 to 1e09506 Compare May 8, 2024 20:50

openshift-ci bot assigned jsafrane May 9, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 9, 2024

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 9, 2024

openshift-merge-bot bot merged commit dc3020f into openshift:master May 9, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet rollout #1734

OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet rollout #1734

dobsonj commented May 7, 2024

openshift-ci-robot commented May 7, 2024

openshift-ci bot commented May 7, 2024

bertinatto May 8, 2024

bertinatto May 8, 2024

dobsonj May 8, 2024

dobsonj May 8, 2024 •

edited

Loading

openshift-ci bot commented May 8, 2024

jsafrane commented May 9, 2024

openshift-ci bot commented May 9, 2024

openshift-ci-robot commented May 9, 2024

openshift-merge-robot commented May 14, 2024

OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet rollout #1734

OCPBUGS-24061: Keep CSI operators progressing=true during DaemonSet rollout #1734

Conversation

dobsonj commented May 7, 2024

openshift-ci-robot commented May 7, 2024

openshift-ci bot commented May 7, 2024

bertinatto May 8, 2024

Choose a reason for hiding this comment

bertinatto May 8, 2024

Choose a reason for hiding this comment

dobsonj May 8, 2024

Choose a reason for hiding this comment

dobsonj May 8, 2024 • edited Loading

Choose a reason for hiding this comment

openshift-ci bot commented May 8, 2024

jsafrane commented May 9, 2024

openshift-ci bot commented May 9, 2024

openshift-ci-robot commented May 9, 2024

openshift-merge-robot commented May 14, 2024

dobsonj May 8, 2024 •

edited

Loading