New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature gate for disabling the MHC controller #1151
Add feature gate for disabling the MHC controller #1151
Conversation
There is a plan to let the Node Healthcheck Operator, which is an optional day 2 operator today, also implement the MHC controller. For being able to test the POC in NHC, we need to be able to disable MAO's MHC controller. So for now this feature gate is just for dev usage. The actual transition of the MHC controller from MAO to NHC will be discussed and planned in the future. Signed-off-by: Marc Sluiter <msluiter@redhat.com>
pkg/operator/operator.go
Outdated
@@ -382,7 +369,7 @@ func (optr *Operator) maoConfigFromInfrastructure() (*OperatorConfig, error) { | |||
return nil, err | |||
} | |||
|
|||
featureGate, err := getFeatureGate(optr.featureGateLister) | |||
featureGate, err := optr.osClient.ConfigV1().FeatureGates().Get(context.Background(), "cluster", metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand why tbh, but the existing getFeatureGate()
function didn't work in the unit test at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using the direct client you're skipping the caching that should be happening as part of the informer. Looking at this, we are using the old way of listing feature gates here, we should update it to use the new feature gate access.
You can see how that works for openshift/cluster-cloud-controller-manager-operator#249
Would be good to update to use this new method here now.
Especially as the way you're calculating whether this feature should be on/off is a little unconventional
@@ -32,10 +33,9 @@ import ( | |||
) | |||
|
|||
const ( | |||
deploymentName = "machine-api-controllers" | |||
targetNamespace = "test-namespace" | |||
hcControllerName = "machine-healthcheck-controller" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was unused
pkg/operator/operator.go
Outdated
@@ -382,7 +369,7 @@ func (optr *Operator) maoConfigFromInfrastructure() (*OperatorConfig, error) { | |||
return nil, err | |||
} | |||
|
|||
featureGate, err := getFeatureGate(optr.featureGateLister) | |||
featureGate, err := optr.osClient.ConfigV1().FeatureGates().Get(context.Background(), "cluster", metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using the direct client you're skipping the caching that should be happening as part of the informer. Looking at this, we are using the old way of listing feature gates here, we should update it to use the new feature gate access.
You can see how that works for openshift/cluster-cloud-controller-manager-operator#249
Would be good to update to use this new method here now.
Especially as the way you're calculating whether this feature should be on/off is a little unconventional
|
||
const ( | ||
// DeployMHCControllerFeatureGateName is the name of the feature gate for enabling the MHC controller | ||
DeployMHCControllerFeatureGateName = "MachineAPIOperatorDeployMHCController" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feature gates need to live in config/v1 in openshift/api
func IsDeployMHCControllerEnabled(fg *v1.FeatureGate) bool { | ||
deployMHCControllerFeatureGate := v1.FeatureGateName(DeployMHCControllerFeatureGateName) | ||
if fg != nil && fg.Spec.CustomNoUpgrade != nil { | ||
for _, enabled := range fg.Spec.CustomNoUpgrade.Enabled { | ||
if enabled == deployMHCControllerFeatureGate { | ||
klog.V(2).Info("MHC controller enabled by feature gate") | ||
return true | ||
} | ||
} | ||
for _, disabled := range fg.Spec.CustomNoUpgrade.Disabled { | ||
if disabled == deployMHCControllerFeatureGate { | ||
klog.V(2).Info("MHC controller disabled by feature gate") | ||
return false | ||
} | ||
} | ||
} | ||
// switch to false once NHC is the default! | ||
klog.V(4).Info("MHC controller enabled (default)") | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The featuregateaccess pattern gives you a way to check if a specific feature gate is enabled or not. We should be using that rather than using our own logic, it helps to co-ordinate between multiple components and versions. I realise we are doing this for testing initially but I suspect at some point we may end up evolving this so we should probably get it right first time.
You can read more in this FAQ
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
Signed-off-by: Marc Sluiter <msluiter@redhat.com>
@@ -314,6 +314,24 @@ rules: | |||
- patch | |||
- delete | |||
|
|||
- apiGroups: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed for the getting the controller ref for the recorder for the feature gate accessor...
@@ -68,6 +68,10 @@ spec: | |||
fieldRef: | |||
apiVersion: v1 | |||
fieldPath: metadata.namespace | |||
- name: POD_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed for the getting the controller ref for the recorder for the feature gate accessor...
I finally got the feature gate accessor working, thanks for the links to the FAQ and the cluster-cloud-controller-manager-operator PR, that was helpful. Successfully tested manually using OCP Just for reference, here the related PRs for adding the feature gate to the api and cluster config operator: |
/retest |
@slintes: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quick question otherwise LGTM assuming we can get the E2E green
@@ -39,6 +41,9 @@ const ( | |||
// 5ms, 10ms, 20ms, 40ms, 80ms, 160ms, 320ms, 640ms, 1.3s, 2.6s, 5.1s, 10.2s, 20.4s, 41s, 82s | |||
maxRetries = 15 | |||
maoOwnedAnnotation = "machine.openshift.io/owned" | |||
|
|||
releaseVersionEnvVariableName = "RELEASE_VERSION" | |||
releaseVersionUnknownValue = "unknown" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same value as the raw version string when it's not initialised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand how/when this is used tbh, I copied it from here: https://github.com/openshift/cluster-cloud-controller-manager-operator/blob/43facae32a53c73ad567129150db6865673bdf1d/pkg/controllers/status.go#L224
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think it's fine to leave as that for now, lets see if anything breaks!
several e2e tests failed with an etcd issue, let's try again before digging more into it... /retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
62edc82
into
openshift:master
There is a plan to let the Node Healthcheck Operator, which is an optional day 2 operator today, also implement the MHC controller. For being able to test the POC in NHC, we need to be able to disable MAO's MHC controller. So for now this feature gate is just for dev usage. The actual transition of the MHC controller from MAO to NHC will be discussed and planned in the future.
ECOPROJECT-1423