Set clusterID label on machine #608

alexander-demicev · 2020-06-04T10:16:43Z

Set clusterID label on the machine using controller. Currently, we require the user to set clusterID label manually, this PR introduces usage of infrastructure resource for setting label.

enxebre · 2020-06-04T10:26:46Z

Thanks!
Please let's make sure to reference the upcoming revendor PRs here.
This will require updating docs to drop the requirement to manually add the label.

/approve

openshift-ci-robot · 2020-06-04T10:27:02Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JoelSpeed · 2020-06-04T10:34:28Z

Given we are introducing webhooks to validate provider specs for Machine/MachineSet, why don't we add this as a default/validation item in the webhook rather than in the controller? I would have thought the controller behaviour shouldn't be changing and it should still expect it to be set

enxebre · 2020-06-04T10:46:50Z

Given we are introducing webhooks to validate provider specs for Machine/MachineSet, why don't we add this as a default/validation item in the webhook rather than in the controller? I would have thought the controller behaviour shouldn't be changing and it should still expect it to be set

The need for this label is mainly for AWS to be able use it to filter instances by name and clusterID when there's no an instanceID available yet https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/actuators/machine/reconciler.go#L358-L371

This is just a detail we as devs choose to implement this way, it's not a legitimately user input API field. The clusterID is by design implicit and available on the context and client that the machine is being created at any time. Therefore I don't think this should be exposed as user input at all or defaulted by a webhook at creation/updating time. The controller should be able to discover and pass the clusterID through with or without a webhook at the front. We could even eventually drop the label and use a different mechanism to pass the clusterID if we choose to and this should be transparent to machine creation validation/defaulting.

Danil-Grigorev

LGTM, just an improvement

Danil-Grigorev · 2020-06-04T14:32:54Z

pkg/controller/machine/controller.go

+	infra := &configv1.Infrastructure{}
+	infraName := client.ObjectKey{Name: globalInfrastuctureName}
+
+	if err := r.Client.Get(ctx, infraName, infra); err != nil {
+		return err
+	}


This could be substituted with

machine-api-operator/pkg/controller/vsphere/util.go

Lines 51 to 65 in 60eb822

func getInfrastructure(c runtimeclient.Reader) (*configv1.Infrastructure, error) {

if c == nil {

return nil, errors.New("no API reader -- will not fetch infrastructure config")

}

infra := &configv1.Infrastructure{}

infraName := runtimeclient.ObjectKey{Name: globalInfrastuctureName}

if err := c.Get(context.Background(), infraName, infra); err != nil {

return nil, err

}

return infra, nil

}

With added tests from pkg/controller/machine/machine_controller_test.go you'll essentially cover both.

I don't want to import a function from vsphere utils, but making a shared utils package in future makes sense. Anyway, it's not the scope of this PR

Danil-Grigorev · 2020-06-04T14:33:59Z

pkg/controller/machine/controller.go

@@ -91,6 +92,8 @@ const (
 	unknownInstanceState = "Unknown"

 	skipWaitForDeleteTimeoutSeconds = 60 * 5
+
+	globalInfrastuctureName = "cluster"


Same here, VSphere util already exposes this variable:

machine-api-operator/pkg/controller/vsphere/util.go

Line 18 in 60eb822

globalInfrastuctureName = "cluster"

alexander-demicev · 2020-06-04T19:44:52Z

/retest

The need for this label is mainly for AWS to be able use it to filter instances by name and clusterID when there's no an instanceID available yet https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/actuators/machine/reconciler.go#L358-L371 This is just a detail we as devs choose to implement this way, it's not a legitimately user input API field. The clusterID is by design implicit and available on the context and client that the machine is being created at any time. Therefore I don't think this should be exposed as user input at all or defaulted by a webhook at creation/updating time. The controller should be able to discover and pass the clusterID through with or without a webhook at the front. We could even eventually drop the label and use a different mechanism to pass the clusterID if we choose to and this should be transparent to machine creation validation/defaulting. The burden for the user to set this label will be fixed by openshift#608

enxebre · 2020-06-10T14:22:20Z

/retest
PTAL @Danil-Grigorev

Danil-Grigorev · 2020-06-10T14:30:38Z

/lgtm

openshift-bot · 2020-06-10T16:40:31Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-10T17:06:23Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-10T18:24:25Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-10T19:15:46Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-10T20:07:56Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-10T20:20:44Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-10T20:33:49Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-10T21:13:15Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-06-10T22:54:46Z

@alexander-demichev: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-azure	`75857f1`	link	`/test e2e-azure`
ci/prow/e2e-azure-operator	`75857f1`	link	`/test e2e-azure-operator`
ci/prow/e2e-aws-scaleup-rhel7	`75857f1`	link	`/test e2e-aws-scaleup-rhel7`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2020-06-11T00:53:47Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-06-11T01:19:46Z

/retest

Please review the full test history for this PR and help us cut down flakes.

- Add configv1 to scheme to automatically get clusterId from infrastructure resource introduced in PR: openshift/machine-api-operator#608

- Add missing configv1.AddToScheme for getting ClusterId label from Infrastructure resource, added in: openshift/machine-api-operator#608

- Revendor includes MAO api change from: openshift/machine-api-operator#608

- Add missing configv1.AddToScheme for getting ClusterId label from Infrastructure resource, added in: openshift/machine-api-operator#608

- Revendor includes MAO api change from: openshift/machine-api-operator#608

…nd machineSet via webhook With openshift#608 we dropped the burden from the user to set the clusterID label on machines. As elaborated in openshift#608 (comment) the motivation is that this is an implementation detail that users shouldn't care about. However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad. Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.

This tries to fix the following scenario: We set ms.Spec.Selector.MatchLabels[MachineClusterIDLabel] if it's not present. It's not present and we set it to the correct value. If there happens to be a bad label in `ms.Spec.Template.Labels` this would result in a miss match. Follow for openshift#608, openshift#644 and openshift#653.

openshift-ci-robot requested review from elmiko and enxebre June 4, 2020 10:17

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2020

Danil-Grigorev reviewed Jun 4, 2020

View reviewed changes

Set clusterID label on machine

75857f1

alexander-demicev force-pushed the clusterid branch from 2fa72f3 to 75857f1 Compare June 4, 2020 14:51

enxebre mentioned this pull request Jun 5, 2020

Drop clusterID defaulting label #610

Merged

openshift-ci-robot assigned Danil-Grigorev Jun 10, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2020

Danil-Grigorev approved these changes Jun 10, 2020

View reviewed changes

openshift-merge-robot merged commit 9a69f85 into openshift:master Jun 11, 2020

alexander-demicev deleted the clusterid branch June 11, 2020 10:42

This was referenced Jun 15, 2020

Revendor MAO openshift/cluster-api-provider-aws#330

Merged

Revendor MAO openshift/cluster-api-provider-gcp#98

Merged

Revendor MAO openshift/cluster-api-provider-azure#141

Merged

Danil-Grigorev mentioned this pull request Jul 14, 2020

Add metrics openshift/cluster-api-provider-openstack#106

Merged

Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 14, 2020

Revendor MAO with metrics integration

e743024

- Revendor includes MAO api change from: openshift/machine-api-operator#608

Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 15, 2020

Revendor MAO with metrics integration

7db4c59

- Revendor includes MAO api change from: openshift/machine-api-operator#608

Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 15, 2020

Revendor MAO with metrics integration

d0d9b1a

- Revendor includes MAO api change from: openshift/machine-api-operator#608

enxebre mentioned this pull request Jul 16, 2020

Bug 1857175: enforce clusterID label via webhook and preserve old behaviour in the backend #644

Merged

enxebre mentioned this pull request Jul 27, 2020

Set default clusterID labels only on machine #659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set clusterID label on machine #608

Set clusterID label on machine #608

alexander-demicev commented Jun 4, 2020

enxebre commented Jun 4, 2020

openshift-ci-robot commented Jun 4, 2020

JoelSpeed commented Jun 4, 2020

enxebre commented Jun 4, 2020 •

edited

Danil-Grigorev left a comment

Danil-Grigorev Jun 4, 2020

alexander-demicev Jun 4, 2020

Danil-Grigorev Jun 4, 2020

alexander-demicev commented Jun 4, 2020

enxebre commented Jun 10, 2020

Danil-Grigorev commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-ci-robot commented Jun 10, 2020

openshift-bot commented Jun 11, 2020

openshift-bot commented Jun 11, 2020

	func getInfrastructure(c runtimeclient.Reader) (*configv1.Infrastructure, error) {
	if c == nil {
	return nil, errors.New("no API reader -- will not fetch infrastructure config")
	}

	infra := &configv1.Infrastructure{}
	infraName := runtimeclient.ObjectKey{Name: globalInfrastuctureName}

	if err := c.Get(context.Background(), infraName, infra); err != nil {
	return nil, err
	}

	return infra, nil
	}

Set clusterID label on machine #608

Set clusterID label on machine #608

Conversation

alexander-demicev commented Jun 4, 2020

enxebre commented Jun 4, 2020

openshift-ci-robot commented Jun 4, 2020

JoelSpeed commented Jun 4, 2020

enxebre commented Jun 4, 2020 • edited

Danil-Grigorev left a comment

Choose a reason for hiding this comment

Danil-Grigorev Jun 4, 2020

Choose a reason for hiding this comment

alexander-demicev Jun 4, 2020

Choose a reason for hiding this comment

Danil-Grigorev Jun 4, 2020

Choose a reason for hiding this comment

alexander-demicev commented Jun 4, 2020

enxebre commented Jun 10, 2020

Danil-Grigorev commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-bot commented Jun 10, 2020

openshift-ci-robot commented Jun 10, 2020

openshift-bot commented Jun 11, 2020

openshift-bot commented Jun 11, 2020

enxebre commented Jun 4, 2020 •

edited