New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-23120: [IBM ROKS] cluster-storage-operator does not set upgradeable=True #417
Conversation
@dobsonj: This pull request references Jira Issue OCPBUGS-23120, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dobsonj: GitHub didn't allow me to request PR reviews from the following users: openshift/storage, jeffnowicki. Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dobsonj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@@ -218,6 +230,14 @@ func (dsrc *driverStarterCommon) sync(ctx context.Context, syncCtx factory.SyncC | |||
return err | |||
} | |||
if !shouldRun { | |||
// If shouldRun is false, then the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch will be executed for all CSI drivers that do not belong to the current platform, i.e. for most of these:
cluster-storage-operator/pkg/operator/operator_starter.go
Lines 203 to 214 in 813fcbe
csioperatorclient.GetAWSEBSCSIOperatorConfig(false), | |
csioperatorclient.GetGCPPDCSIOperatorConfig(), | |
csioperatorclient.GetOpenStackCinderCSIOperatorConfig(clients, ssr.eventRecorder), | |
csioperatorclient.GetOVirtCSIOperatorConfig(clients, ssr.eventRecorder), | |
csioperatorclient.GetManilaOperatorConfig(clients, ssr.eventRecorder), | |
csioperatorclient.GetVMwareVSphereCSIOperatorConfig(), | |
csioperatorclient.GetAzureDiskCSIOperatorConfig(), | |
csioperatorclient.GetAzureFileCSIOperatorConfig(), | |
csioperatorclient.GetSharedResourceCSIOperatorConfig(false), | |
csioperatorclient.GetAlibabaDiskCSIOperatorConfig(), | |
csioperatorclient.GetIBMVPCBlockCSIOperatorConfig(), | |
csioperatorclient.GetPowerVSBlockCSIOperatorConfig(false), |
You should check if this loop actually matched any CSI driver operator and set Upgradeable=true
after the loop.
Or, you can always unconditionally create one StorageOperatorrUpgradeable=true
. If any ClusterCSIDriver then reports *Upgradeable=False
, ClusterOperatorStatusController will do an union where one *Upgradeable=false
trumps any number of *Upgradeable=true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I moved the setUpgradeableTrue
call to after the loop, if none of these controllers have (ever) started.
I'm hesitant to set Upgradeable=true unconditionally because then in the most common case, where at least one controller is started, we may introduce a brief window of time where it's set to true but will soon change to false. Maybe this is benign, but I would like to avoid that flip-flop and set it only if we know those controllers will not run.
c530671
to
e67877c
Compare
I built e67877c and pushed the image to |
/label acknowledge-critical-fixes-only |
@dobsonj: once the present PR merges, I will cherry-pick it on top of release-4.14 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dobsonj: This pull request references Jira Issue OCPBUGS-23120, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Just one failure in e2e-ibmcloud-csi:
/retest |
@dobsonj: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
5d86084
into
openshift:master
@dobsonj: Jira Issue OCPBUGS-23120: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-23120 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dobsonj: #417 failed to apply on top of branch "release-4.14":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira cherry-pick |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-cluster-storage-operator-container-v4.15.0-202311131732.p0.g5d86084.assembly.stream for distgit cluster-storage-operator. |
Fix included in accepted release 4.15.0-0.nightly-2023-11-14-041944 |
https://issues.redhat.com/browse/OCPBUGS-23120
Problem
The upgradeable condition used to be set by SnapshotCRDController, but that was removed in fa9af3a
DefaultStorageClassController only sets upgradeable=True if unsupportedPlatformError.
IBM VPC returns supportedbyCSIError instead.
The current code assumes if supportedbyCSIError, then csidriveroperator controller will eventually set it.
The problem is if shouldRunController returns false for a driver with supportedbyCSIError. Then csidriveroperator does not start and nothing will set the condition. shouldRunController always returns false for IBM ROKS because of this StatusFilter.
Solution
DefaultStorageClassController has no way to tell if shouldRunController will eventually return false. So in this PR, CSIDriverStarter sets the condition if no controller is started, because we know at that point no other controller will set the condition.
/cc @openshift/storage @jeffnowicki