Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set clusterID label on machine #608

Merged
merged 1 commit into from Jun 11, 2020

Conversation

alexander-demicev
Copy link
Contributor

Set clusterID label on the machine using controller. Currently, we require the user to set clusterID label manually, this PR introduces usage of infrastructure resource for setting label.

@enxebre
Copy link
Member

enxebre commented Jun 4, 2020

Thanks!
Please let's make sure to reference the upcoming revendor PRs here.
This will require updating docs to drop the requirement to manually add the label.

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 4, 2020
@JoelSpeed
Copy link
Contributor

Given we are introducing webhooks to validate provider specs for Machine/MachineSet, why don't we add this as a default/validation item in the webhook rather than in the controller? I would have thought the controller behaviour shouldn't be changing and it should still expect it to be set

@enxebre
Copy link
Member

enxebre commented Jun 4, 2020

Given we are introducing webhooks to validate provider specs for Machine/MachineSet, why don't we add this as a default/validation item in the webhook rather than in the controller? I would have thought the controller behaviour shouldn't be changing and it should still expect it to be set

The need for this label is mainly for AWS to be able use it to filter instances by name and clusterID when there's no an instanceID available yet https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/actuators/machine/reconciler.go#L358-L371

This is just a detail we as devs choose to implement this way, it's not a legitimately user input API field. The clusterID is by design implicit and available on the context and client that the machine is being created at any time. Therefore I don't think this should be exposed as user input at all or defaulted by a webhook at creation/updating time. The controller should be able to discover and pass the clusterID through with or without a webhook at the front. We could even eventually drop the label and use a different mechanism to pass the clusterID if we choose to and this should be transparent to machine creation validation/defaulting.

Copy link
Contributor

@Danil-Grigorev Danil-Grigorev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just an improvement

Comment on lines +474 to +479
infra := &configv1.Infrastructure{}
infraName := client.ObjectKey{Name: globalInfrastuctureName}

if err := r.Client.Get(ctx, infraName, infra); err != nil {
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be substituted with

func getInfrastructure(c runtimeclient.Reader) (*configv1.Infrastructure, error) {
if c == nil {
return nil, errors.New("no API reader -- will not fetch infrastructure config")
}
infra := &configv1.Infrastructure{}
infraName := runtimeclient.ObjectKey{Name: globalInfrastuctureName}
if err := c.Get(context.Background(), infraName, infra); err != nil {
return nil, err
}
return infra, nil
}

With added tests from pkg/controller/machine/machine_controller_test.go you'll essentially cover both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to import a function from vsphere utils, but making a shared utils package in future makes sense. Anyway, it's not the scope of this PR

@@ -91,6 +92,8 @@ const (
unknownInstanceState = "Unknown"

skipWaitForDeleteTimeoutSeconds = 60 * 5

globalInfrastuctureName = "cluster"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, VSphere util already exposes this variable:

globalInfrastuctureName = "cluster"

@alexander-demicev
Copy link
Contributor Author

/retest

enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jun 5, 2020
The need for this label is mainly for AWS to be able use it to filter instances by name and clusterID when there's no an instanceID available yet https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/actuators/machine/reconciler.go#L358-L371

This is just a detail we as devs choose to implement this way, it's not a legitimately user input API field. The clusterID is by design implicit and available on the context and client that the machine is being created at any time. Therefore I don't think this should be exposed as user input at all or defaulted by a webhook at creation/updating time. The controller should be able to discover and pass the clusterID through with or without a webhook at the front. We could even eventually drop the label and use a different mechanism to pass the clusterID if we choose to and this should be transparent to machine creation validation/defaulting.
The burden for the user to set this label will be fixed by openshift#608
@enxebre
Copy link
Member

enxebre commented Jun 10, 2020

/retest
PTAL @Danil-Grigorev

@Danil-Grigorev
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@alexander-demichev: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-azure 75857f1 link /test e2e-azure
ci/prow/e2e-azure-operator 75857f1 link /test e2e-azure-operator
ci/prow/e2e-aws-scaleup-rhel7 75857f1 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 9a69f85 into openshift:master Jun 11, 2020
@alexander-demicev alexander-demicev deleted the clusterid branch June 11, 2020 10:42
Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-openstack that referenced this pull request Jul 14, 2020
- Add configv1 to scheme to automatically get clusterId from infrastructure resource introduced in PR: openshift/machine-api-operator#608
Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 14, 2020
- Add missing configv1.AddToScheme for getting ClusterId label from Infrastructure resource, added in: openshift/machine-api-operator#608
Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 14, 2020
Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 15, 2020
- Add missing configv1.AddToScheme for getting ClusterId label from Infrastructure resource, added in: openshift/machine-api-operator#608
Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 15, 2020
Danil-Grigorev added a commit to Danil-Grigorev/cluster-api-provider-baremetal that referenced this pull request Jul 15, 2020
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 16, 2020
…nd machineSet via webhook

With openshift#608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in openshift#608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 16, 2020
…nd machineSet via webhook

With openshift#608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in openshift#608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 16, 2020
…nd machineSet via webhook

With openshift#608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in openshift#608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 16, 2020
…nd machineSet via webhook

With openshift#608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in openshift#608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 16, 2020
…nd machineSet via webhook

With openshift#608 we dropped the burden from the user to set the clusterID label on machines.
As elaborated in openshift#608 (comment) the motivation is that this is an implementation detail that users shouldn't care about.

However as the labels are used by machineSet to determine ownership, the change introduced above might result in edge scenarios where the machineSet and machine label has a different value. This would result in machines going orphan and the machineSet recreating new instances. Bad.

Therefore we choose now to remove the burden from users by enforcing the label value via webhhooks and keeping the old behaviour in the backend to avoid any chance of breaking existing environments where bad input might have been set as in https://bugzilla.redhat.com/show_bug.cgi?id=1857175.
enxebre added a commit to enxebre/machine-api-operator that referenced this pull request Jul 27, 2020
This tries to fix the following scenario:
We set ms.Spec.Selector.MatchLabels[MachineClusterIDLabel] if it's not present. It's not present and we set it to the correct value. If there happens to be a bad label in `ms.Spec.Template.Labels` this would result in a miss match.

Follow for
openshift#608,
openshift#644 and
openshift#653.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants