Allow join command to run in idempotent mode #555

csrwng · 2019-01-22T19:34:50Z

Introduces a flag to the JoinCluster function that specifies whether to run in idempotent mode. If true, creates that fail because the resource already exists will not fail the entire operation.

gyliu513 · 2019-01-23T02:14:28Z

pkg/kubefed2/join.go

@@ -374,13 +378,17 @@ func createFederatedCluster(fedClientset *fedclient.Clientset, joiningClusterNam
 		return fedCluster, nil
 	}

-	return fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster)
+	fedCluster, err := fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster)
+	if idempotent && errors.IsAlreadyExists(err) {


How about create a function named as IsIdempotentMode to replace idempotent && errors.IsAlreadyExists(err)?

/cc @xunpan

Something like

if ignoreIfIdempotent(idempotent, err) { ... }

?
imho, I would think that it would obscure the fact that we're only ignoring the error when the item already exists. But I can make the change if you feel strongly about it.

xunpan · 2019-01-23T06:03:04Z

@csrwng
Thanks for your PR. Could you please explain more about your use case?

For federation, kubefed2 join operation creates related resources automatically to make joined cluster work. When reverse the join operation, we unjoin the cluster. The related resource should be deleted automatically.

So my question is, what is the case that the related resources is not cleaned up but still need to installation forcely?

marun · 2019-01-23T13:31:32Z

@csrwng It appears that you intend to vendor fedv2 and call the Join function. I would caution you that federation is alpha software, and the join command is neither extensively tested nor provides any guarantee of interface stability. You may want to consider exposing the idempotent flag in the kubefed2 binary instead (and I would be supportive of that addition), since that is likely to be a better supported interface.

Also, is there a reason thejoin command shouldn't just default to idempotency?

csrwng · 2019-01-23T15:08:42Z

Hi @xunpan and @marun thanks for taking a look.
Yes, as @marun implied, we aim to invoke the join function from a controller that is part of OpenShift Hive. The use case is that when you create a new cluster through Hive, we automatically join that cluster to the Hive host. If any step in the join operation fails, we need to be able to retry the operation until it succeeds. We didn't want to re-write the same join operation inside our controller and thought it'd be better to vendor the federation code to do so.

@marun to your question as to why join isn't idempotent by default. I think it should be. However, I didn't know whether changing the way it works would be acceptable to existing users of the federation CLI. Please let me know if it's ok to make it the default and I'll rework this PR.

As for the reverse function (unjoin), we need to be able to invoke it, but only the part that affects the host cluster. When a Hive cluster is deleted, we would like to remove the local FederatedCluster, secret, and the cluster registry Cluster. However, we really don't care to do anything in the target cluster as it may not be accessible anymore and all we want to do is to delete it. Not sure if the host part can be broken out from the target part inside kubefed2.

xunpan · 2019-01-24T08:20:15Z

pkg/kubefed2/join.go

 }

 // createFederationNamespace creates the federation namespace in the cluster
 // associated with clusterClientset, if it doesn't already exist.
 func createFederationNamespace(clusterClientset client.Interface, federationNamespace,
-	joiningClusterName string, dryRun bool) (*corev1.Namespace, error) {
+	joiningClusterName string, dryRun, idempotent bool) (*corev1.Namespace, error) {
 	federationNS := &corev1.Namespace{


If it is not used in the function, please do not add idempotent in argument list.

xunpan · 2019-01-24T08:41:18Z

I think default behavior should be current one instead of idempotent one.
- if system exists some same name resources but not created by federation, it is easy to get error from command line.
- our unjoin should clean all things and join should report error if anything is not cleaned.
I agree with marun. If we add this flag, we'd better export it in command line.
However, if we enhance it, we need to make log message meaningful. E.g.
If a resource exists, Infof message should tell that the resource is not created by current command line operation but exists in the system already.

csrwng · 2019-01-24T15:48:54Z

Thank you for your review so far. I've added a commit to expose the flag in the CLI and log when items already exist.

marun · 2019-01-24T22:08:55Z

@csrwng FYI the ci job is showing a go vet failure.

csrwng · 2019-01-24T23:04:24Z

Thanks! fix pushed

marun · 2019-01-24T23:20:58Z

@xunpan I think fedv2 should follow the example of kubectl apply. join could create resources if they don't exist, and not complain if resources already exist in the desired form. An error should only be reported if creation was not successful or an existing resource was not of the desired form (this latter characteristic would require implementation).

xunpan · 2019-01-25T03:42:37Z

I think it is fine if we checking an existing resource was not of the desired. We should make sure all related resources in desired status. Or else, it is not good to reuse any resources that is unknown from kubefed2

csrwng · 2019-01-26T02:13:35Z

@xunpan @marun should the join then behave like an apply and not only skip, but also update existing resources with their desired state?

marun · 2019-01-27T17:04:18Z

@csrwng I think apply-like behavior should be the default. I think it would make sense to retain the existing behavior with an optional flag that returns an error if any of the resources already exist.

csrwng · 2019-01-28T14:59:50Z

Thank you @marun. I’ll submit an update for that.

csrwng · 2019-02-04T20:47:19Z

@marun @xunpan I have updated the join operation to behave like apply, added a flag for errorOnExisting, and unit tests for the modified functions.

marun · 2019-02-05T23:24:07Z

pkg/kubefed2/join.go

+		case err == nil && errorOnExisting:
+			return nil, fmt.Errorf("secret %s already exists in host cluster", secretName)
+		case err == nil:
+			existingSecret.Data = v1Secret.Data


I'm a bit worried at the prospect of overwriting a secret. I guess in this case the user is providing the name of the secret, but is that sufficient?

@marun sorry I'm just getting back to this PR today. I still think that overwriting the secret is the right thing to do since the service account on the target cluster could have been (re-)created and you have a different token. To mitigate risk of overwriting stuff we could merge the Data map, but not sure that is a better result.

To be clear, I'm worried because join can be configured with an arbitrary secret name, and a secret is a generic type rather than federation-specific. It would be entirely possible, then, for join to unintentionally override an arbitrary secret. Maybe an existing secret should result in an error unless the name is the default (i.e. not an arbitrary name provided by the user)?

That sounds fine. I'll change it. I should hopefully submit an updated (hopefully cleaner) PR today.

marun · 2019-02-05T23:26:16Z

@csrwng My apologies for limiting my comment to the testing. I think the implementation is sound. There's more repetition than I'd like (since the creation/update logic is mostly common across types) but I think that is an optimization that can be pursued separately if at all.

csrwng · 2019-02-12T19:59:57Z

@marun finally made the changes to the tests and squashed, ptal

csrwng · 2019-02-18T17:11:20Z

@marun bump

marun · 2019-02-19T06:21:24Z

pkg/kubefed2/join.go

@@ -367,13 +373,27 @@ func registerCluster(crClientset *crclient.Clientset, clusterNamespace, host, jo
 		return cluster, nil
 	}

-	return crClientset.ClusterregistryV1alpha1().Clusters(clusterNamespace).Create(cluster)
+	existingCluster, err := crClientset.ClusterregistryV1alpha1().Clusters(clusterNamespace).Get(cluster.Name, metav1.GetOptions{})


(No action required) There seems to be a considerable amount of repetition across all types in ensuring the desired form. Is there room for creating a parametizable helper instead of just duplicating the code? There is already prior art in the tree for using controller-runtime's generic client, so it's not necessary to use a strongly-typed client. This would allow the testing, too, to be parametized rather than repetitively defined.

marun · 2019-02-19T06:59:26Z

pkg/kubefed2/kubefed2_suite_test.go

+	testenv = &test.TestEnvironment{CRDs: crds}
+
+	var err error
+	config, err = testenv.Start()


I'm afraid I'm a hard no on using kubebuilder's test library in this repo. It takes ~60s to execute this suite on a fast machine due to the costly setup involved. I would prefer that the tests migrate to test/e2e and target the framework in use there. That will allow sharing of both fixture execution and maintenance.

The tests in test/e2e can be run in 'managed' (similar to kb's test library) or 'unmanaged' (targeting a deployed federation) modes. 'managed' mode was the starting point for federation testing, but it's utility is diminished by how cheap it is to deploy a full federation with minikube or kind. In unmanaged mode - enabled by providing a -kubeconfig argument - tests run against an actual federation (via the helm chart or script deployment). Past the initial deployment the cost of fixture is nearly zero so tests can be re-run much more quickly than against managed fixture, with the added benefit of having a cluster that can be accessed and manipulated during test debugging by tools like kubectl.

@marun if ok with you I’ll just remove the tests. They test non-public functions in this package and don’t make sense to move to an e2e type test. They are really meant as unit tests which is why using fake clients made the most sense to me initially. In e2e maybe we can increase coverage for the join operation in a different PR. Sound ok?

I'm ok with that.

I don't think a test that requires a ton of faking or a kube api server makes for a good unit test. One is expensive to write and maintain, the other to run. That's why we've purposely blurred the lines between integration and e2e in our e2e package, to reduce the cost of both maintaining and running tests that require api interaction. If you wanted to expose those non-public functions to enable testing, I'd be fine with that too.

Thanks, done.

marun · 2019-02-19T20:01:05Z

pkg/kubefed2/join.go

+	default:
+		fedCluster, err = fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster)
+		if err != nil {
+			glog.V(2).Infof("Could not created federated cluster %s due to %v", fedCluster.Name, err)


nit: s/created/create/

Thanks, fixed.

marun · 2019-02-19T20:02:00Z

@xunpan @gyliu513 This lgtm. Would one of you be able to review and merge once you're satisfied?

Adds an additional flag 'error-on-existing' that causes the join command to fail when existing resources are found.

marun · 2019-02-20T16:51:11Z

@shashidharatd PTAL when you have a chance.

shashidharatd · 2019-02-22T02:40:35Z

This LGTM has already undergone elaborate review and is as expected by other reviewers. Thanks @csrwng for doing this !
/lgtm

xunpan · 2019-02-22T03:31:51Z

pkg/kubefed2/join.go

+			glog.V(2).Infof("Could not create cluster role binding for service account: %s in joining cluster: %s due to: %v",
+				saName, clusterName, err)
+			return err
+		}
 	}


To me, this can be simplified with:

case err == nil: err = clientset.RbacV1().ClusterRoleBindings().Delete(existingBinding.Name, &metav1.DeleteOptions{}) ... fallthrough default: _, err = clientset.RbacV1().ClusterRoleBindings().Create(binding) ...

Reasons:

more branches in testing

join is not frequent operation. update vs. delete&update is not critical.

@marun

@csrwng Is there a reason you wanted to update an existing binding if possible instead of always deleting?

@marun my thinking was that the most common use case is that if there was an existing rolebinding that it would point to the same roleref, which means an update would be just ok. The delete/recreate in my mind is just to cover an edge case.

csrwng · 2019-02-25T15:56:38Z

@marun btw, I think I need your approval again

marun · 2019-02-25T19:08:46Z

/approve

k8s-ci-robot · 2019-02-25T19:08:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, marun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [marun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 22, 2019

k8s-ci-robot requested review from irfanurrehman and marun January 22, 2019 19:35

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 22, 2019

csrwng force-pushed the idempotent_join branch from 17f9903 to ab1eb86 Compare January 22, 2019 19:43

k8s-ci-robot requested a review from xunpan January 23, 2019 02:14

gyliu513 reviewed Jan 23, 2019

View reviewed changes

xunpan suggested changes Jan 24, 2019

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 24, 2019

csrwng force-pushed the idempotent_join branch from 9fd1fc9 to 923ae27 Compare January 24, 2019 15:52

csrwng force-pushed the idempotent_join branch from 923ae27 to 571ae4d Compare January 24, 2019 23:04

csrwng force-pushed the idempotent_join branch from 571ae4d to fcdb652 Compare February 4, 2019 20:45

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 4, 2019

csrwng force-pushed the idempotent_join branch 2 times, most recently from 85ef9c3 to a7f8968 Compare February 4, 2019 20:56

marun reviewed Feb 5, 2019

View reviewed changes

csrwng force-pushed the idempotent_join branch from a7f8968 to c43129e Compare February 12, 2019 19:58

csrwng force-pushed the idempotent_join branch 2 times, most recently from 2c2f8fe to a6ab6a5 Compare February 14, 2019 16:20

marun suggested changes Feb 19, 2019

View reviewed changes

csrwng force-pushed the idempotent_join branch from a6ab6a5 to 606eda4 Compare February 19, 2019 19:40

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 19, 2019

marun reviewed Feb 19, 2019

View reviewed changes

marun approved these changes Feb 19, 2019

View reviewed changes

Make join behave like an apply

145f95e

Adds an additional flag 'error-on-existing' that causes the join command to fail when existing resources are found.

csrwng force-pushed the idempotent_join branch from 606eda4 to 145f95e Compare February 19, 2019 20:58

k8s-ci-robot assigned shashidharatd Feb 22, 2019

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2019

xunpan reviewed Feb 22, 2019

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2019

k8s-ci-robot merged commit 6b40588 into kubernetes-retired:master Feb 25, 2019

This was referenced Apr 10, 2019

kubefed2: some suggestion about kubefed2 command #513

Closed

It should not be possible to join a cluster multiple times #749

Closed

kubefed2 enable succeeds if federation of the type already enabled #744

Merged

xunpan mentioned this pull request Apr 11, 2019

helper functions for idempotent operations #753

Closed

Allow join command to run in idempotent mode #555

Allow join command to run in idempotent mode #555

Conversation

csrwng commented Jan 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xunpan commented Jan 23, 2019

marun commented Jan 23, 2019

csrwng commented Jan 23, 2019

Choose a reason for hiding this comment

xunpan commented Jan 24, 2019

csrwng commented Jan 24, 2019

marun commented Jan 24, 2019

csrwng commented Jan 24, 2019

marun commented Jan 24, 2019

xunpan commented Jan 25, 2019

csrwng commented Jan 26, 2019

marun commented Jan 27, 2019

csrwng commented Jan 28, 2019

csrwng commented Feb 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marun commented Feb 5, 2019

csrwng commented Feb 12, 2019

csrwng commented Feb 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marun commented Feb 19, 2019

marun commented Feb 20, 2019

shashidharatd commented Feb 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csrwng commented Feb 25, 2019

marun commented Feb 25, 2019

k8s-ci-robot commented Feb 25, 2019