Allow join command to run in idempotent mode #555
Allow join command to run in idempotent mode #555
Conversation
17f9903
to
ab1eb86
Compare
pkg/kubefed2/join.go
Outdated
@@ -374,13 +378,17 @@ func createFederatedCluster(fedClientset *fedclient.Clientset, joiningClusterNam | |||
return fedCluster, nil | |||
} | |||
|
|||
return fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster) | |||
fedCluster, err := fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster) | |||
if idempotent && errors.IsAlreadyExists(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about create a function named as IsIdempotentMode
to replace idempotent && errors.IsAlreadyExists(err)
?
/cc @xunpan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like
if ignoreIfIdempotent(idempotent, err) {
...
}
?
imho, I would think that it would obscure the fact that we're only ignoring the error when the item already exists. But I can make the change if you feel strongly about it.
@csrwng For federation, kubefed2 join operation creates related resources automatically to make joined cluster work. When reverse the join operation, we unjoin the cluster. The related resource should be deleted automatically. So my question is, what is the case that the related resources is not cleaned up but still need to installation forcely? |
@csrwng It appears that you intend to vendor fedv2 and call the Also, is there a reason the |
Hi @xunpan and @marun thanks for taking a look. @marun to your question as to why join isn't idempotent by default. I think it should be. However, I didn't know whether changing the way it works would be acceptable to existing users of the federation CLI. Please let me know if it's ok to make it the default and I'll rework this PR. As for the reverse function (unjoin), we need to be able to invoke it, but only the part that affects the host cluster. When a Hive cluster is deleted, we would like to remove the local FederatedCluster, secret, and the cluster registry Cluster. However, we really don't care to do anything in the target cluster as it may not be accessible anymore and all we want to do is to delete it. Not sure if the host part can be broken out from the target part inside kubefed2. |
pkg/kubefed2/join.go
Outdated
} | ||
|
||
// createFederationNamespace creates the federation namespace in the cluster | ||
// associated with clusterClientset, if it doesn't already exist. | ||
func createFederationNamespace(clusterClientset client.Interface, federationNamespace, | ||
joiningClusterName string, dryRun bool) (*corev1.Namespace, error) { | ||
joiningClusterName string, dryRun, idempotent bool) (*corev1.Namespace, error) { | ||
federationNS := &corev1.Namespace{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not used in the function, please do not add idempotent
in argument list.
|
Thank you for your review so far. I've added a commit to expose the flag in the CLI and log when items already exist. |
9fd1fc9
to
923ae27
Compare
@csrwng FYI the ci job is showing a go vet failure. |
923ae27
to
571ae4d
Compare
Thanks! fix pushed |
@xunpan I think fedv2 should follow the example of |
I think it is fine if we checking |
@csrwng I think apply-like behavior should be the default. I think it would make sense to retain the existing behavior with an optional flag that returns an error if any of the resources already exist. |
Thank you @marun. I’ll submit an update for that. |
571ae4d
to
fcdb652
Compare
85ef9c3
to
a7f8968
Compare
pkg/kubefed2/join.go
Outdated
case err == nil && errorOnExisting: | ||
return nil, fmt.Errorf("secret %s already exists in host cluster", secretName) | ||
case err == nil: | ||
existingSecret.Data = v1Secret.Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried at the prospect of overwriting a secret. I guess in this case the user is providing the name of the secret, but is that sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marun sorry I'm just getting back to this PR today. I still think that overwriting the secret is the right thing to do since the service account on the target cluster could have been (re-)created and you have a different token. To mitigate risk of overwriting stuff we could merge the Data map, but not sure that is a better result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm worried because join can be configured with an arbitrary secret name, and a secret is a generic type rather than federation-specific. It would be entirely possible, then, for join to unintentionally override an arbitrary secret. Maybe an existing secret should result in an error unless the name is the default (i.e. not an arbitrary name provided by the user)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds fine. I'll change it. I should hopefully submit an updated (hopefully cleaner) PR today.
@csrwng My apologies for limiting my comment to the testing. I think the implementation is sound. There's more repetition than I'd like (since the creation/update logic is mostly common across types) but I think that is an optimization that can be pursued separately if at all. |
a7f8968
to
c43129e
Compare
@marun finally made the changes to the tests and squashed, ptal |
2c2f8fe
to
a6ab6a5
Compare
@marun bump |
@@ -367,13 +373,27 @@ func registerCluster(crClientset *crclient.Clientset, clusterNamespace, host, jo | |||
return cluster, nil | |||
} | |||
|
|||
return crClientset.ClusterregistryV1alpha1().Clusters(clusterNamespace).Create(cluster) | |||
existingCluster, err := crClientset.ClusterregistryV1alpha1().Clusters(clusterNamespace).Get(cluster.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action required) There seems to be a considerable amount of repetition across all types in ensuring the desired form. Is there room for creating a parametizable helper instead of just duplicating the code? There is already prior art in the tree for using controller-runtime's generic client, so it's not necessary to use a strongly-typed client. This would allow the testing, too, to be parametized rather than repetitively defined.
pkg/kubefed2/kubefed2_suite_test.go
Outdated
testenv = &test.TestEnvironment{CRDs: crds} | ||
|
||
var err error | ||
config, err = testenv.Start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I'm a hard no on using kubebuilder's test library in this repo. It takes ~60s to execute this suite on a fast machine due to the costly setup involved. I would prefer that the tests migrate to test/e2e and target the framework in use there. That will allow sharing of both fixture execution and maintenance.
The tests in test/e2e
can be run in 'managed' (similar to kb's test library) or 'unmanaged' (targeting a deployed federation) modes. 'managed' mode was the starting point for federation testing, but it's utility is diminished by how cheap it is to deploy a full federation with minikube or kind. In unmanaged mode - enabled by providing a -kubeconfig
argument - tests run against an actual federation (via the helm chart or script deployment). Past the initial deployment the cost of fixture is nearly zero so tests can be re-run much more quickly than against managed fixture, with the added benefit of having a cluster that can be accessed and manipulated during test debugging by tools like kubectl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marun if ok with you I’ll just remove the tests. They test non-public functions in this package and don’t make sense to move to an e2e type test. They are really meant as unit tests which is why using fake clients made the most sense to me initially. In e2e maybe we can increase coverage for the join operation in a different PR. Sound ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with that.
I don't think a test that requires a ton of faking or a kube api server makes for a good unit test. One is expensive to write and maintain, the other to run. That's why we've purposely blurred the lines between integration and e2e in our e2e package, to reduce the cost of both maintaining and running tests that require api interaction. If you wanted to expose those non-public functions to enable testing, I'd be fine with that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
a6ab6a5
to
606eda4
Compare
pkg/kubefed2/join.go
Outdated
default: | ||
fedCluster, err = fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster) | ||
if err != nil { | ||
glog.V(2).Infof("Could not created federated cluster %s due to %v", fedCluster.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/created/create/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
Adds an additional flag 'error-on-existing' that causes the join command to fail when existing resources are found.
606eda4
to
145f95e
Compare
@shashidharatd PTAL when you have a chance. |
This LGTM has already undergone elaborate review and is as expected by other reviewers. Thanks @csrwng for doing this ! |
glog.V(2).Infof("Could not create cluster role binding for service account: %s in joining cluster: %s due to: %v", | ||
saName, clusterName, err) | ||
return err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this can be simplified with:
case err == nil:
err = clientset.RbacV1().ClusterRoleBindings().Delete(existingBinding.Name, &metav1.DeleteOptions{})
...
fallthrough
default:
_, err = clientset.RbacV1().ClusterRoleBindings().Create(binding)
...
Reasons:
- more branches in testing
join
is not frequent operation. update vs. delete&update is not critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csrwng Is there a reason you wanted to update an existing binding if possible instead of always deleting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marun my thinking was that the most common use case is that if there was an existing rolebinding that it would point to the same roleref, which means an update would be just ok. The delete/recreate in my mind is just to cover an edge case.
@marun btw, I think I need your approval again |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: csrwng, marun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Introduces a flag to the JoinCluster function that specifies whether to run in idempotent mode. If true, creates that fail because the resource already exists will not fail the entire operation.