Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uninstall on missing install directory #746

Closed
akostadinov opened this issue Nov 28, 2018 · 21 comments
Closed

uninstall on missing install directory #746

akostadinov opened this issue Nov 28, 2018 · 21 comments

Comments

@akostadinov
Copy link

When user installs a cluster but deletes the directory created by installer, there is no easy way to remove the cluster.

I think that all necessary metadata should already exist inside cluster so it should be made possible for user to uninstall cluster only by pointing installer at the target cluster.

Version

7e7c26f

@dgoodwin
Copy link
Contributor

If you clone and build hiveutil here: https://github.com/openshift/hive (see the bottom) you can then scrub the AWS resources by tags, which is the same code the installer uses if you still have your metadata.json.

It would be nice for openshift-install to expose this functionality in the event you've lost your metadata though.

@akostadinov
Copy link
Author

It would be nice for openshift-install to expose this functionality in the event you've lost your metadata though.

Exactly. cleaning up can be as user friendly as the installation IMO.

@wking
Copy link
Member

wking commented Nov 28, 2018

Pushing the metadata into the cluster would address the "I've lost my metadata.json but kept my kubeconfig" use case. Is that a thing? I'd expect folks blowing away their metadata.json would have done so with rm -rf "${INSTALL_DIR}", which would have removed their kubeconfig as well. Or are folks copying the kubeconfig somewhere safe first (but not copying their metadata.json)? Or did you want to push it into the cluster and allow unauthenticated clients to retrieve it?

@dgoodwin
Copy link
Contributor

Would just leave the "my cluster is also broken" use case, in which case ideally it would be awesome to have something like openshift-install destroy cluster --platform=aws --uuid=clusteruuid

@wking
Copy link
Member

wking commented Nov 28, 2018

Would just leave the "my cluster is also broken" use case...

If we're addressing that, why bother with pushing metadata.json into the cluster? Just use this approach regardless of whether the cluster is alive?

... in which case ideally it would be awesome to have something like openshift-install destroy cluster --platform=aws --uuid=clusteruuid

You also need to know the region (although we can assume the user has that configured in ~/.aws/config or the other usual channels). And you currently need to know the name as well, although once we pivot kubernetes.io/cluster/... to use the UUID you'll just need the region and UUID on AWS. You'll still the cluster name and libvirt URI (but not the UUID) on libvirt. And it's not all that far from that before you get to:

openshift-install destroy cluster --metadata='{"platform": "libvirt", "clusterName": "wking", "uri": "qemu+tcp://192.168.122.1/system"}'

and:

openshift-install destroy cluster --metadata='{"platform": "aws", "region": "us-east-1", "clusterID": "fb038bc9-b005-4fc8-996e-0d4968595937"}'

You can already get pretty close to that with:

echo '{"clusterName": "wking", "aws": {"region": "us-east-1", "identifier": [{"tectonicClusterID": "fb038bc9-b005-4fc8-996e-0d4968595937"}, {"kubernetes.io/cluster/wking": "owned"}]}}' >metadata.json
openshift-install destroy cluster

we'd just need to add the option and simplify the metadata.json layout.

CC @abhinavdahiya

@dgoodwin
Copy link
Contributor

Indeed that is close, it's just not great UX. I'm sure we can live with it internally, but it's not the best foot forward to show a customer when they inevitably will look to do this.

@wking
Copy link
Member

wking commented Nov 28, 2018

Indeed that is close, it's just not great UX.

I'm open to UX improvements, but aside from the platform string, the remaining information needed is fundamentally different for each platform. Did you want per-platform subcommands with positional arguments?

openshift-install destroy cluster aws CLUSTER_ID [REGION]

(pulling the region from the usual places if unspecified),

openshift-install destroy cluster libvirt CLUSTER_NAME [URI]

(pulling the URI from LIBVIRT_DEFAULT_URI if URI is unset), etc.?

@jianlinliu
Copy link
Contributor

Some more question, if user even lost CLUSTER_ID, how to move one the destroy? Search it from aws instance tag? Is there any other way to get the cluster id, such as: oc command?

Today I hit another UX improvements issue.

  1. create a cluster with 'qe-jialiu' as OPENSHIFT_INSTALL_CLUSTER_NAME with --dir ./test1
  2. try to create another cluster with --dir ./test2, but using the same OPENSHIFT_INSTALL_CLUSTER_NAME setting.
  3. step 2 is failed, which is saying some 'qe-jialiu' IAM already exists.
  4. try to clean up the cluster with --dir ./test2.
  5. found cluster 1 installed in step 1 is also broken, seem like cluster 1's route53 is also removed together with cluster 2's removal.
    Is it possible to tire all resources in one cluster together via some uniq id, so that each destroy action do not interrupt another one.

@jianlinliu
Copy link
Contributor

Some more question, if user even lost CLUSTER_ID, how to move one the destroy? Search it from aws instance tag? Is there any other way to get the cluster id, such as: oc command?

Seem like I could run "oc get machineset -n openshift-cluster-api -o yaml" to get tectonicClusterID, though I am not sure if that is a correct way.

@wking
Copy link
Member

wking commented Dec 18, 2018

@jianlinliu, the correct way to get the cluster ID is from the ClusterVersion object. You'll still need the cluster name and the AWS region for an AWS deletion, though, and those don't live in ClusterVersion.

@wking
Copy link
Member

wking commented Dec 18, 2018

And for multiple clusters in one account with the same name, that's an open issue as well: #762.

@akostadinov
Copy link
Author

Reading this and related issues, I'm thinking that the best approach would be for installer to have a discover mode where it detects signs of clusters in a particular account/region based on the tags it sets during provisioning. In this way, when we end up with a cloud account full of old cluster pieces, the installer can be used to discover those and clean them up properly.
It is a hell to do this manually.

@sferich888
Copy link
Contributor

@crawford
Copy link
Contributor

crawford commented Feb 1, 2019

The above article is the recommended procedure for recovering the cluster metadata.

/close

@openshift-ci-robot
Copy link
Contributor

@crawford: Closing this issue.

In response to this:

The above article is the recommended procedure for recovering the cluster metadata.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tnozicka
Copy link
Contributor

tnozicka commented Feb 5, 2019

/reopen

Manually messing with aws resources is not a solution, just a workaround. I didn't have to mess with AWS resource to create a cluster, I shouldn't do it for destroy. Installer create asks for ~4 inputs when creating the cluster, asking for those again / listing existing clusters and being given a choice to select one to delete would be the appropriate counterpart.

@openshift-ci-robot
Copy link
Contributor

@tnozicka: Reopened this issue.

In response to this:

/reopen

Manually messing with aws resources is not a solution, just a workaround. I didn't have to mess with AWS resource to create a cluster, I shouldn't do it for destroy. Installer create asks for ~4 inputs when creating the cluster, asking for those again / listing existing clusters and being given a choice to select one to delete would be the appropriate counterpart.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@eparis
Copy link
Member

eparis commented Feb 19, 2019

Closing this issue. I have updated the kbase article with another way to get the clusterID without going to AWS if the cluster is still running. If the cluster is not still running getting the clusterID from AWS is the only option as the uuid is generated at install time.

@eparis eparis closed this as completed Feb 19, 2019
@akostadinov
Copy link
Author

Available cluster names and cluster IDs can be discovered by installer. There is no reason to ask user to find manually. Ideally there should be a mode where any cluster resources are removed by name (without cluster id). For test clusters this is the most useful approach to avoid stale resources that can break a new install.

@sferich888
Copy link
Contributor

@akostadinov how, how does the installer do this?

@akostadinov
Copy link
Author

It does not presently. I tried to say that instead of asking user to discover cluster names and IDs, it would be more user friendly to make installer able to discover those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants