Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STOR-1334: update storage operator to read featuregates from API on standalone OCP #368

Merged
merged 2 commits into from May 29, 2023

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented May 8, 2023

The previous method of using the featuregate object and a potentially out of date hardcoded list is left for hypershift until hypershift is ready to switch.

This allows coordination of featuregates via openshift/api and vendoring to openshift/cluster-config-operator. Every operator on standalone kcp consumes from oc get featuregates status.

@openshift-ci openshift-ci bot requested review from bertinatto and gnufied May 8, 2023 19:04
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 8, 2023

@deads2k: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ovn-vsphere b8b52e1 link false /test e2e-ovn-vsphere
ci/prow/e2e-hypershift-ovn-conformance b8b52e1 link false /test e2e-hypershift-ovn-conformance
ci/prow/e2e-azure-file-csi b8b52e1 link false /test e2e-azure-file-csi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jsafrane
Copy link
Contributor

jsafrane commented May 9, 2023

cc @openshift/storage

Comment on lines +41 to +46
// By default, this will exit(0) the process if the featuregates ever change to a different set of values.
featureGateAccessor := featuregates.NewFeatureGateAccess(
desiredVersion, missingVersion,
clients.ConfigInformers.Config().V1().ClusterVersions(), clients.ConfigInformers.Config().V1().FeatureGates(),
controllerConfig.EventRecorder,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will exit(0) trigger a new leader election? That will slow down feature enablement considerably, without any indication that the operator is Progressing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will exit(0) trigger a new leader election? That will slow down feature enablement considerably, without any indication that the operator is Progressing.

This is true. It's a scenario that can only happen once in the history of a cluster. If it becomes more common, I can see motivation to provide an alternative hook closing a channel or some such.

@jsafrane
Copy link
Contributor

/lgtm
/approve

If we find out that exit() is bad, we can always refactor it back to Informer.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 15, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 15, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2023
@jsafrane
Copy link
Contributor

/label px-approved
/label docs-approved

@openshift-ci openshift-ci bot added px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR labels May 15, 2023
@jsafrane
Copy link
Contributor

Note to QE: we should test that this PR does not introduce any regression in how SharedResource CSI driver operator is started after TechPreviewNoUpgrade is set. Apart from longer delay between setting TechPreviewNoUpgrade and the driver installed, few minutes max.

@jsafrane
Copy link
Contributor

jsafrane commented May 16, 2023

/retitle STOR-1334: update storage operator to read featuregates from API on standalone OCP

Adding a story for visibility on our board for QE.

@openshift-ci openshift-ci bot changed the title update storage operator to read featuregates from API on standalone OCP STOR-1334: update storage operator to read featuregates from API on standalone OCP May 16, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 16, 2023

@deads2k: This pull request references STOR-1334 which is a valid jira issue.

In response to this:

The previous method of using the featuregate object and a potentially out of date hardcoded list is left for hypershift until hypershift is ready to switch.

This allows coordination of featuregates via openshift/api and vendoring to openshift/cluster-config-operator. Every operator on standalone kcp consumes from oc get featuregates status.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 16, 2023
@duanwei33
Copy link

QE are starting pre-merge test.

@duanwei33
Copy link

duanwei33 commented May 16, 2023

Simply verified with the PR:

  1. When I enable the TechPreviewNoUpgrade, we see enabled featuregates are listed as below:
$ oc get featuregate cluster -o yaml
apiVersion: config.openshift.io/v1
kind: FeatureGate
metadata:
  creationTimestamp: "2023-05-16T09:56:43Z"
  generation: 2
  name: cluster
  resourceVersion: "65285"
  uid: 164d3780-364d-49f1-bf9e-e33c67c94397
spec:
  featureSet: TechPreviewNoUpgrade
status:
  featureGates:
  - enabled:
    - name: AdmissionWebhookMatchConditions
    - name: AzureWorkloadIdentity
    - name: BuildCSIVolumes
    - name: CSIDriverSharedResource
    - name: DynamicResourceAllocation
    - name: ExternalCloudProvider
    - name: ExternalCloudProviderAzure
    - name: ExternalCloudProviderGCP
    - name: InsightsConfigAPI
    - name: MachineAPIProviderOpenStack
    - name: MatchLabelKeysInPodTopologySpread
    - name: NodeSwap
    - name: OpenShiftPodSecurityAdmission
    - name: PDBUnhealthyPodEvictionPolicy
    - name: RetroactiveDefaultStorageClass
    version: 4.14.0-0.test-2023-05-16-094624-ci-ln-5c4g47k-latest
  1. SharedResource CSI Driver is installed successfully:
$ oc get clustercsidrivers csi.sharedresource.openshift.io
NAME                              AGE
csi.sharedresource.openshift.io   25m

$ oc -n openshift-cluster-csi-drivers get pod | grep share
shared-resource-csi-driver-node-mxgs6                  2/2     Running   0          21m
shared-resource-csi-driver-node-r2b2h                  2/2     Running   2          21m
shared-resource-csi-driver-node-rblr4                  2/2     Running   2          21m
shared-resource-csi-driver-operator-5b64bd68c4-d4wr7   1/1     Running   0          15m
shared-resource-csi-driver-webhook-d586db8bb-gjbkh     1/1     Running   0          19m
shared-resource-csi-driver-webhook-d586db8bb-hqwqj     1/1     Running   0          15m

And this just take few minutes to install the sharedresource CSI Driver in my test.

@duanwei33
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label May 16, 2023
@jsafrane
Copy link
Contributor

/payload 4.14 nightly blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 26, 2023

@jsafrane: trigger 7 job(s) of type blocking for the nightly release of OCP 4.14

  • periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-serial
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-ovn-ipv6
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2f897d70-fbdc-11ed-8a11-c724b3d87d78-0

@dgoodwin
Copy link
Contributor

/payload 4.14 ci blocking

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 29, 2023

@dgoodwin: trigger 4 job(s) of type blocking for the ci release of OCP 4.14

  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/bb0db310-fe1c-11ed-8a29-5bb2a9d67f98-0

@dgoodwin
Copy link
Contributor

/label jira/valid-bug

Bringing in via merge queue.

@openshift-ci openshift-ci bot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label May 29, 2023
@openshift-merge-robot openshift-merge-robot merged commit d91b59b into openshift:master May 29, 2023
11 of 14 checks passed
@@ -269,7 +268,8 @@ func shouldRunController(cfg csioperatorclient.CSIOperatorConfig, infrastructure
return true, nil
}

if !featureGateEnabled(fg, cfg.RequireFeatureGate) {
knownFeatures := sets.New[configv1.FeatureGateName](fg.KnownFeatures()...)
Copy link
Contributor

@sjenning sjenning May 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is panicing on the hypershift control plane and blocking ci release stream

E0530 00:05:42.329947       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 820 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26a62a0?, 0x475c5c0})
	k8s.io/apimachinery@v0.27.1/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0xc000699408, 0x1, 0x2545ca0?})
	k8s.io/apimachinery@v0.27.1/pkg/util/runtime/runtime.go:49 +0x75
panic({0x26a62a0, 0x475c5c0})
	runtime/panic.go:884 +0x213
github.com/openshift/cluster-storage-operator/pkg/operator/csidriveroperator.shouldRunController({{0x2b15110, 0x1f}, {0x2ade262, 0x6}, {0x2ae8cf2, 0xc}, 0x0, {0x0, 0x0, 0x0}, ...}, ...)
	github.com/openshift/cluster-storage-operator/pkg/operator/csidriveroperator/driverstarter_standalone.go:271 +0x157

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.ci/release/4.14.0-0.ci-2023-05-29-185629

hypershift conformance test failed on this PR catching the break, but the test is not blocking yet and it merged anyway

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-storage-operator/368/pull-ci-openshift-cluster-storage-operator-master-e2e-hypershift-ovn-conformance/1655648223684988928/artifacts/e2e-hypershift-ovn-conformance/dump/artifacts/hostedcluster-b82afc77cb6ba7dcaec8/cluster-scoped-resources/config.openshift.io/clusteroperators/storage.yaml

Because openshift/release#39783 has not yet merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants