Skip to content
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.

Allow join command to run in idempotent mode #555

Merged
merged 1 commit into from Feb 25, 2019

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Jan 22, 2019

Introduces a flag to the JoinCluster function that specifies whether to run in idempotent mode. If true, creates that fail because the resource already exists will not fail the entire operation.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 22, 2019
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 22, 2019
@@ -374,13 +378,17 @@ func createFederatedCluster(fedClientset *fedclient.Clientset, joiningClusterNam
return fedCluster, nil
}

return fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster)
fedCluster, err := fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster)
if idempotent && errors.IsAlreadyExists(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about create a function named as IsIdempotentMode to replace idempotent && errors.IsAlreadyExists(err)?

/cc @xunpan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

if ignoreIfIdempotent(idempotent, err) {
...
}

?
imho, I would think that it would obscure the fact that we're only ignoring the error when the item already exists. But I can make the change if you feel strongly about it.

@xunpan
Copy link
Contributor

xunpan commented Jan 23, 2019

@csrwng
Thanks for your PR. Could you please explain more about your use case?

For federation, kubefed2 join operation creates related resources automatically to make joined cluster work. When reverse the join operation, we unjoin the cluster. The related resource should be deleted automatically.

So my question is, what is the case that the related resources is not cleaned up but still need to installation forcely?

@marun
Copy link
Contributor

marun commented Jan 23, 2019

@csrwng It appears that you intend to vendor fedv2 and call the Join function. I would caution you that federation is alpha software, and the join command is neither extensively tested nor provides any guarantee of interface stability. You may want to consider exposing the idempotent flag in the kubefed2 binary instead (and I would be supportive of that addition), since that is likely to be a better supported interface.

Also, is there a reason thejoin command shouldn't just default to idempotency?

@csrwng
Copy link
Contributor Author

csrwng commented Jan 23, 2019

Hi @xunpan and @marun thanks for taking a look.
Yes, as @marun implied, we aim to invoke the join function from a controller that is part of OpenShift Hive. The use case is that when you create a new cluster through Hive, we automatically join that cluster to the Hive host. If any step in the join operation fails, we need to be able to retry the operation until it succeeds. We didn't want to re-write the same join operation inside our controller and thought it'd be better to vendor the federation code to do so.

@marun to your question as to why join isn't idempotent by default. I think it should be. However, I didn't know whether changing the way it works would be acceptable to existing users of the federation CLI. Please let me know if it's ok to make it the default and I'll rework this PR.

As for the reverse function (unjoin), we need to be able to invoke it, but only the part that affects the host cluster. When a Hive cluster is deleted, we would like to remove the local FederatedCluster, secret, and the cluster registry Cluster. However, we really don't care to do anything in the target cluster as it may not be accessible anymore and all we want to do is to delete it. Not sure if the host part can be broken out from the target part inside kubefed2.

}

// createFederationNamespace creates the federation namespace in the cluster
// associated with clusterClientset, if it doesn't already exist.
func createFederationNamespace(clusterClientset client.Interface, federationNamespace,
joiningClusterName string, dryRun bool) (*corev1.Namespace, error) {
joiningClusterName string, dryRun, idempotent bool) (*corev1.Namespace, error) {
federationNS := &corev1.Namespace{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not used in the function, please do not add idempotent in argument list.

@xunpan
Copy link
Contributor

xunpan commented Jan 24, 2019

  1. I think default behavior should be current one instead of idempotent one.

    • if system exists some same name resources but not created by federation, it is easy to get error from command line.
    • our unjoin should clean all things and join should report error if anything is not cleaned.
  2. I agree with marun. If we add this flag, we'd better export it in command line.
    However, if we enhance it, we need to make log message meaningful. E.g.
    If a resource exists, Infof message should tell that the resource is not created by current command line operation but exists in the system already.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 24, 2019
@csrwng
Copy link
Contributor Author

csrwng commented Jan 24, 2019

Thank you for your review so far. I've added a commit to expose the flag in the CLI and log when items already exist.

@marun
Copy link
Contributor

marun commented Jan 24, 2019

@csrwng FYI the ci job is showing a go vet failure.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 24, 2019

Thanks! fix pushed

@marun
Copy link
Contributor

marun commented Jan 24, 2019

@xunpan I think fedv2 should follow the example of kubectl apply. join could create resources if they don't exist, and not complain if resources already exist in the desired form. An error should only be reported if creation was not successful or an existing resource was not of the desired form (this latter characteristic would require implementation).

@xunpan
Copy link
Contributor

xunpan commented Jan 25, 2019

I think it is fine if we checking an existing resource was not of the desired. We should make sure all related resources in desired status. Or else, it is not good to reuse any resources that is unknown from kubefed2

@csrwng
Copy link
Contributor Author

csrwng commented Jan 26, 2019

@xunpan @marun should the join then behave like an apply and not only skip, but also update existing resources with their desired state?

@marun
Copy link
Contributor

marun commented Jan 27, 2019

@csrwng I think apply-like behavior should be the default. I think it would make sense to retain the existing behavior with an optional flag that returns an error if any of the resources already exist.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 28, 2019

Thank you @marun. I’ll submit an update for that.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 4, 2019
@csrwng
Copy link
Contributor Author

csrwng commented Feb 4, 2019

@marun @xunpan I have updated the join operation to behave like apply, added a flag for errorOnExisting, and unit tests for the modified functions.

@csrwng csrwng force-pushed the idempotent_join branch 2 times, most recently from 85ef9c3 to a7f8968 Compare February 4, 2019 20:56
case err == nil && errorOnExisting:
return nil, fmt.Errorf("secret %s already exists in host cluster", secretName)
case err == nil:
existingSecret.Data = v1Secret.Data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried at the prospect of overwriting a secret. I guess in this case the user is providing the name of the secret, but is that sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marun sorry I'm just getting back to this PR today. I still think that overwriting the secret is the right thing to do since the service account on the target cluster could have been (re-)created and you have a different token. To mitigate risk of overwriting stuff we could merge the Data map, but not sure that is a better result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm worried because join can be configured with an arbitrary secret name, and a secret is a generic type rather than federation-specific. It would be entirely possible, then, for join to unintentionally override an arbitrary secret. Maybe an existing secret should result in an error unless the name is the default (i.e. not an arbitrary name provided by the user)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds fine. I'll change it. I should hopefully submit an updated (hopefully cleaner) PR today.

@marun
Copy link
Contributor

marun commented Feb 5, 2019

@csrwng My apologies for limiting my comment to the testing. I think the implementation is sound. There's more repetition than I'd like (since the creation/update logic is mostly common across types) but I think that is an optimization that can be pursued separately if at all.

@csrwng
Copy link
Contributor Author

csrwng commented Feb 12, 2019

@marun finally made the changes to the tests and squashed, ptal

@csrwng csrwng force-pushed the idempotent_join branch 2 times, most recently from 2c2f8fe to a6ab6a5 Compare February 14, 2019 16:20
@csrwng
Copy link
Contributor Author

csrwng commented Feb 18, 2019

@marun bump

@@ -367,13 +373,27 @@ func registerCluster(crClientset *crclient.Clientset, clusterNamespace, host, jo
return cluster, nil
}

return crClientset.ClusterregistryV1alpha1().Clusters(clusterNamespace).Create(cluster)
existingCluster, err := crClientset.ClusterregistryV1alpha1().Clusters(clusterNamespace).Get(cluster.Name, metav1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) There seems to be a considerable amount of repetition across all types in ensuring the desired form. Is there room for creating a parametizable helper instead of just duplicating the code? There is already prior art in the tree for using controller-runtime's generic client, so it's not necessary to use a strongly-typed client. This would allow the testing, too, to be parametized rather than repetitively defined.

testenv = &test.TestEnvironment{CRDs: crds}

var err error
config, err = testenv.Start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I'm a hard no on using kubebuilder's test library in this repo. It takes ~60s to execute this suite on a fast machine due to the costly setup involved. I would prefer that the tests migrate to test/e2e and target the framework in use there. That will allow sharing of both fixture execution and maintenance.

The tests in test/e2e can be run in 'managed' (similar to kb's test library) or 'unmanaged' (targeting a deployed federation) modes. 'managed' mode was the starting point for federation testing, but it's utility is diminished by how cheap it is to deploy a full federation with minikube or kind. In unmanaged mode - enabled by providing a -kubeconfig argument - tests run against an actual federation (via the helm chart or script deployment). Past the initial deployment the cost of fixture is nearly zero so tests can be re-run much more quickly than against managed fixture, with the added benefit of having a cluster that can be accessed and manipulated during test debugging by tools like kubectl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marun if ok with you I’ll just remove the tests. They test non-public functions in this package and don’t make sense to move to an e2e type test. They are really meant as unit tests which is why using fake clients made the most sense to me initially. In e2e maybe we can increase coverage for the join operation in a different PR. Sound ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with that.

I don't think a test that requires a ton of faking or a kube api server makes for a good unit test. One is expensive to write and maintain, the other to run. That's why we've purposely blurred the lines between integration and e2e in our e2e package, to reduce the cost of both maintaining and running tests that require api interaction. If you wanted to expose those non-public functions to enable testing, I'd be fine with that too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 19, 2019
default:
fedCluster, err = fedClientset.CoreV1alpha1().FederatedClusters(federationNamespace).Create(fedCluster)
if err != nil {
glog.V(2).Infof("Could not created federated cluster %s due to %v", fedCluster.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/created/create/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@marun
Copy link
Contributor

marun commented Feb 19, 2019

@xunpan @gyliu513 This lgtm. Would one of you be able to review and merge once you're satisfied?

Adds an additional flag 'error-on-existing' that causes the join command
to fail when existing resources are found.
@marun
Copy link
Contributor

marun commented Feb 20, 2019

@shashidharatd PTAL when you have a chance.

@shashidharatd
Copy link
Contributor

This LGTM has already undergone elaborate review and is as expected by other reviewers. Thanks @csrwng for doing this !
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 22, 2019
glog.V(2).Infof("Could not create cluster role binding for service account: %s in joining cluster: %s due to: %v",
saName, clusterName, err)
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this can be simplified with:

case err == nil:
    err = clientset.RbacV1().ClusterRoleBindings().Delete(existingBinding.Name, &metav1.DeleteOptions{})
    ...
    fallthrough
default:
    _, err = clientset.RbacV1().ClusterRoleBindings().Create(binding)
    ...

Reasons:

  1. more branches in testing
  2. join is not frequent operation. update vs. delete&update is not critical.

@marun

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csrwng Is there a reason you wanted to update an existing binding if possible instead of always deleting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marun my thinking was that the most common use case is that if there was an existing rolebinding that it would point to the same roleref, which means an update would be just ok. The delete/recreate in my mind is just to cover an edge case.

@csrwng
Copy link
Contributor Author

csrwng commented Feb 25, 2019

@marun btw, I think I need your approval again

@marun
Copy link
Contributor

marun commented Feb 25, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng, marun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2019
@k8s-ci-robot k8s-ci-robot merged commit 6b40588 into kubernetes-retired:master Feb 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants