Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature gate for disabling the MHC controller #1151

Merged
merged 4 commits into from Jul 14, 2023

Conversation

slintes
Copy link
Member

@slintes slintes commented Jul 10, 2023

There is a plan to let the Node Healthcheck Operator, which is an optional day 2 operator today, also implement the MHC controller. For being able to test the POC in NHC, we need to be able to disable MAO's MHC controller. So for now this feature gate is just for dev usage. The actual transition of the MHC controller from MAO to NHC will be discussed and planned in the future.

ECOPROJECT-1423

There is a plan to let the Node Healthcheck Operator, which is an
optional day 2 operator today, also implement the MHC controller. For
being able to test the POC in NHC, we need to be able to disable MAO's
MHC controller. So for now this feature gate is just for dev usage.
The actual transition of the MHC controller from MAO to NHC will be
discussed and planned in the future.

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@@ -382,7 +369,7 @@ func (optr *Operator) maoConfigFromInfrastructure() (*OperatorConfig, error) {
return nil, err
}

featureGate, err := getFeatureGate(optr.featureGateLister)
featureGate, err := optr.osClient.ConfigV1().FeatureGates().Get(context.Background(), "cluster", metav1.GetOptions{})
Copy link
Member Author

@slintes slintes Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand why tbh, but the existing getFeatureGate() function didn't work in the unit test at least

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using the direct client you're skipping the caching that should be happening as part of the informer. Looking at this, we are using the old way of listing feature gates here, we should update it to use the new feature gate access.

You can see how that works for openshift/cluster-cloud-controller-manager-operator#249

Would be good to update to use this new method here now.

Especially as the way you're calculating whether this feature should be on/off is a little unconventional

@openshift-ci openshift-ci bot requested review from elmiko and JoelSpeed July 10, 2023 17:19
@@ -32,10 +33,9 @@ import (
)

const (
deploymentName = "machine-api-controllers"
targetNamespace = "test-namespace"
hcControllerName = "machine-healthcheck-controller"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was unused

@@ -382,7 +369,7 @@ func (optr *Operator) maoConfigFromInfrastructure() (*OperatorConfig, error) {
return nil, err
}

featureGate, err := getFeatureGate(optr.featureGateLister)
featureGate, err := optr.osClient.ConfigV1().FeatureGates().Get(context.Background(), "cluster", metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using the direct client you're skipping the caching that should be happening as part of the informer. Looking at this, we are using the old way of listing feature gates here, we should update it to use the new feature gate access.

You can see how that works for openshift/cluster-cloud-controller-manager-operator#249

Would be good to update to use this new method here now.

Especially as the way you're calculating whether this feature should be on/off is a little unconventional


const (
// DeployMHCControllerFeatureGateName is the name of the feature gate for enabling the MHC controller
DeployMHCControllerFeatureGateName = "MachineAPIOperatorDeployMHCController"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature gates need to live in config/v1 in openshift/api

Comment on lines 17 to 36
func IsDeployMHCControllerEnabled(fg *v1.FeatureGate) bool {
deployMHCControllerFeatureGate := v1.FeatureGateName(DeployMHCControllerFeatureGateName)
if fg != nil && fg.Spec.CustomNoUpgrade != nil {
for _, enabled := range fg.Spec.CustomNoUpgrade.Enabled {
if enabled == deployMHCControllerFeatureGate {
klog.V(2).Info("MHC controller enabled by feature gate")
return true
}
}
for _, disabled := range fg.Spec.CustomNoUpgrade.Disabled {
if disabled == deployMHCControllerFeatureGate {
klog.V(2).Info("MHC controller disabled by feature gate")
return false
}
}
}
// switch to false once NHC is the default!
klog.V(4).Info("MHC controller enabled (default)")
return true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The featuregateaccess pattern gives you a way to check if a specific feature gate is enabled or not. We should be using that rather than using our own logic, it helps to co-ordinate between multiple components and versions. I realise we are doing this for testing initially but I suspect at some point we may end up evolving this so we should probably get it right first time.

You can read more in this FAQ

Signed-off-by: Marc Sluiter <msluiter@redhat.com>
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@@ -314,6 +314,24 @@ rules:
- patch
- delete

- apiGroups:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed for the getting the controller ref for the recorder for the feature gate accessor...

@@ -68,6 +68,10 @@ spec:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed for the getting the controller ref for the recorder for the feature gate accessor...

@slintes
Copy link
Member Author

slintes commented Jul 12, 2023

I finally got the feature gate accessor working, thanks for the links to the FAQ and the cluster-cloud-controller-manager-operator PR, that was helpful. Successfully tested manually using OCP 4.14.0-0.ci-2023-07-12-081430 as base.

Just for reference, here the related PRs for adding the feature gate to the api and cluster config operator:
openshift/api#1523
openshift/cluster-config-operator#327

@slintes
Copy link
Member Author

slintes commented Jul 12, 2023

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2023

@slintes: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-operator ce3fe45 link true /test e2e-aws-operator
ci/prow/e2e-nutanix ce3fe45 link false /test e2e-nutanix
ci/prow/e2e-vsphere-ovn-upgrade ce3fe45 link false /test e2e-vsphere-ovn-upgrade
ci/prow/e2e-gcp-ovn ce3fe45 link false /test e2e-gcp-ovn
ci/prow/e2e-gcp-operator ce3fe45 link false /test e2e-gcp-operator
ci/prow/e2e-vsphere-ovn ce3fe45 link false /test e2e-vsphere-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick question otherwise LGTM assuming we can get the E2E green

@@ -39,6 +41,9 @@ const (
// 5ms, 10ms, 20ms, 40ms, 80ms, 160ms, 320ms, 640ms, 1.3s, 2.6s, 5.1s, 10.2s, 20.4s, 41s, 82s
maxRetries = 15
maoOwnedAnnotation = "machine.openshift.io/owned"

releaseVersionEnvVariableName = "RELEASE_VERSION"
releaseVersionUnknownValue = "unknown"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same value as the raw version string when it's not initialised?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think it's fine to leave as that for now, lets see if anything breaks!

@slintes
Copy link
Member Author

slintes commented Jul 13, 2023

several e2e tests failed with an etcd issue, let's try again before digging more into it...

/retest

@JoelSpeed
Copy link
Contributor

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 13, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 13, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 13, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 8f0f041 and 2 for PR HEAD ce3fe45 in total

@slintes
Copy link
Member Author

slintes commented Jul 14, 2023

/retest

@openshift-merge-robot openshift-merge-robot merged commit 62edc82 into openshift:master Jul 14, 2023
22 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants