uninstall on missing install directory #746

akostadinov · 2018-11-28T10:02:15Z

When user installs a cluster but deletes the directory created by installer, there is no easy way to remove the cluster.

I think that all necessary metadata should already exist inside cluster so it should be made possible for user to uninstall cluster only by pointing installer at the target cluster.

Version

7e7c26f

The text was updated successfully, but these errors were encountered:

dgoodwin · 2018-11-28T12:44:11Z

If you clone and build hiveutil here: https://github.com/openshift/hive (see the bottom) you can then scrub the AWS resources by tags, which is the same code the installer uses if you still have your metadata.json.

It would be nice for openshift-install to expose this functionality in the event you've lost your metadata though.

akostadinov · 2018-11-28T16:34:39Z

It would be nice for openshift-install to expose this functionality in the event you've lost your metadata though.

Exactly. cleaning up can be as user friendly as the installation IMO.

wking · 2018-11-28T17:54:21Z

Pushing the metadata into the cluster would address the "I've lost my metadata.json but kept my kubeconfig" use case. Is that a thing? I'd expect folks blowing away their metadata.json would have done so with rm -rf "${INSTALL_DIR}", which would have removed their kubeconfig as well. Or are folks copying the kubeconfig somewhere safe first (but not copying their metadata.json)? Or did you want to push it into the cluster and allow unauthenticated clients to retrieve it?

dgoodwin · 2018-11-28T17:58:15Z

Would just leave the "my cluster is also broken" use case, in which case ideally it would be awesome to have something like openshift-install destroy cluster --platform=aws --uuid=clusteruuid

wking · 2018-11-28T18:12:54Z

Would just leave the "my cluster is also broken" use case...

If we're addressing that, why bother with pushing metadata.json into the cluster? Just use this approach regardless of whether the cluster is alive?

... in which case ideally it would be awesome to have something like openshift-install destroy cluster --platform=aws --uuid=clusteruuid

You also need to know the region (although we can assume the user has that configured in ~/.aws/config or the other usual channels). And you currently need to know the name as well, although once we pivot kubernetes.io/cluster/... to use the UUID you'll just need the region and UUID on AWS. You'll still the cluster name and libvirt URI (but not the UUID) on libvirt. And it's not all that far from that before you get to:

openshift-install destroy cluster --metadata='{"platform": "libvirt", "clusterName": "wking", "uri": "qemu+tcp://192.168.122.1/system"}'

and:

openshift-install destroy cluster --metadata='{"platform": "aws", "region": "us-east-1", "clusterID": "fb038bc9-b005-4fc8-996e-0d4968595937"}'

You can already get pretty close to that with:

echo '{"clusterName": "wking", "aws": {"region": "us-east-1", "identifier": [{"tectonicClusterID": "fb038bc9-b005-4fc8-996e-0d4968595937"}, {"kubernetes.io/cluster/wking": "owned"}]}}' >metadata.json
openshift-install destroy cluster

we'd just need to add the option and simplify the metadata.json layout.

CC @abhinavdahiya

dgoodwin · 2018-11-28T18:16:53Z

Indeed that is close, it's just not great UX. I'm sure we can live with it internally, but it's not the best foot forward to show a customer when they inevitably will look to do this.

wking · 2018-11-28T19:06:16Z

Indeed that is close, it's just not great UX.

I'm open to UX improvements, but aside from the platform string, the remaining information needed is fundamentally different for each platform. Did you want per-platform subcommands with positional arguments?

openshift-install destroy cluster aws CLUSTER_ID [REGION]

(pulling the region from the usual places if unspecified),

openshift-install destroy cluster libvirt CLUSTER_NAME [URI]

(pulling the URI from LIBVIRT_DEFAULT_URI if URI is unset), etc.?

jianlinliu · 2018-12-13T10:42:04Z

Some more question, if user even lost CLUSTER_ID, how to move one the destroy? Search it from aws instance tag? Is there any other way to get the cluster id, such as: oc command?

Today I hit another UX improvements issue.

create a cluster with 'qe-jialiu' as OPENSHIFT_INSTALL_CLUSTER_NAME with --dir ./test1
try to create another cluster with --dir ./test2, but using the same OPENSHIFT_INSTALL_CLUSTER_NAME setting.
step 2 is failed, which is saying some 'qe-jialiu' IAM already exists.
try to clean up the cluster with --dir ./test2.
found cluster 1 installed in step 1 is also broken, seem like cluster 1's route53 is also removed together with cluster 2's removal.
Is it possible to tire all resources in one cluster together via some uniq id, so that each destroy action do not interrupt another one.

jianlinliu · 2018-12-13T10:51:34Z

Some more question, if user even lost CLUSTER_ID, how to move one the destroy? Search it from aws instance tag? Is there any other way to get the cluster id, such as: oc command?

Seem like I could run "oc get machineset -n openshift-cluster-api -o yaml" to get tectonicClusterID, though I am not sure if that is a correct way.

wking · 2018-12-18T07:52:16Z

@jianlinliu, the correct way to get the cluster ID is from the ClusterVersion object. You'll still need the cluster name and the AWS region for an AWS deletion, though, and those don't live in ClusterVersion.

wking · 2018-12-18T07:54:05Z

And for multiple clusters in one account with the same name, that's an open issue as well: #762.

akostadinov · 2018-12-18T09:55:47Z

Reading this and related issues, I'm thinking that the best approach would be for installer to have a discover mode where it detects signs of clusters in a particular account/region based on the tags it sets during provisioning. In this way, when we end up with a cloud account full of old cluster pieces, the installer can be used to discover those and clean them up properly.
It is a hell to do this manually.

sferich888 · 2019-01-23T19:44:22Z

Documented: https://access.redhat.com/solutions/3826921

crawford · 2019-02-01T18:58:53Z

The above article is the recommended procedure for recovering the cluster metadata.

/close

openshift-ci-robot · 2019-02-01T18:58:54Z

@crawford: Closing this issue.

In response to this:

The above article is the recommended procedure for recovering the cluster metadata.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tnozicka · 2019-02-05T13:04:13Z

/reopen

Manually messing with aws resources is not a solution, just a workaround. I didn't have to mess with AWS resource to create a cluster, I shouldn't do it for destroy. Installer create asks for ~4 inputs when creating the cluster, asking for those again / listing existing clusters and being given a choice to select one to delete would be the appropriate counterpart.

openshift-ci-robot · 2019-02-05T13:04:14Z

@tnozicka: Reopened this issue.

In response to this:

/reopen

Manually messing with aws resources is not a solution, just a workaround. I didn't have to mess with AWS resource to create a cluster, I shouldn't do it for destroy. Installer create asks for ~4 inputs when creating the cluster, asking for those again / listing existing clusters and being given a choice to select one to delete would be the appropriate counterpart.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

eparis · 2019-02-19T21:25:04Z

Closing this issue. I have updated the kbase article with another way to get the clusterID without going to AWS if the cluster is still running. If the cluster is not still running getting the clusterID from AWS is the only option as the uuid is generated at install time.

akostadinov · 2019-02-19T21:43:39Z

Available cluster names and cluster IDs can be discovered by installer. There is no reason to ask user to find manually. Ideally there should be a mode where any cluster resources are removed by name (without cluster id). For test clusters this is the most useful approach to avoid stale resources that can break a new install.

sferich888 · 2019-02-25T01:40:09Z

@akostadinov how, how does the installer do this?

akostadinov · 2019-02-25T19:10:03Z

It does not presently. I tried to say that instead of asking user to discover cluster names and IDs, it would be more user friendly to make installer able to discover those.

wking mentioned this issue Dec 18, 2018

RFE: preconfigure masters with KUBECONFIG #929

Closed

wking mentioned this issue Jan 5, 2019

Openshift installer is not correctly tracking terraform resources #1000

Closed

openshift-ci-robot closed this as completed Feb 1, 2019

wking mentioned this issue Feb 4, 2019

Allow force installing to replace or overwrite existing resources #1174

Closed

openshift-ci-robot reopened this Feb 5, 2019

eparis closed this as completed Feb 19, 2019

wking mentioned this issue Apr 10, 2019

Lost metadata.json and kubeconfig #1583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uninstall on missing install directory #746

uninstall on missing install directory #746

akostadinov commented Nov 28, 2018

dgoodwin commented Nov 28, 2018

akostadinov commented Nov 28, 2018

wking commented Nov 28, 2018

dgoodwin commented Nov 28, 2018

wking commented Nov 28, 2018

dgoodwin commented Nov 28, 2018

wking commented Nov 28, 2018

jianlinliu commented Dec 13, 2018

jianlinliu commented Dec 13, 2018

wking commented Dec 18, 2018

wking commented Dec 18, 2018

akostadinov commented Dec 18, 2018

sferich888 commented Jan 23, 2019

crawford commented Feb 1, 2019

openshift-ci-robot commented Feb 1, 2019

tnozicka commented Feb 5, 2019

openshift-ci-robot commented Feb 5, 2019

eparis commented Feb 19, 2019

akostadinov commented Feb 19, 2019

sferich888 commented Feb 25, 2019

akostadinov commented Feb 25, 2019

uninstall on missing install directory #746

uninstall on missing install directory #746

Comments

akostadinov commented Nov 28, 2018

Version

dgoodwin commented Nov 28, 2018

akostadinov commented Nov 28, 2018

wking commented Nov 28, 2018

dgoodwin commented Nov 28, 2018

wking commented Nov 28, 2018

dgoodwin commented Nov 28, 2018

wking commented Nov 28, 2018

jianlinliu commented Dec 13, 2018

jianlinliu commented Dec 13, 2018

wking commented Dec 18, 2018

wking commented Dec 18, 2018

akostadinov commented Dec 18, 2018

sferich888 commented Jan 23, 2019

crawford commented Feb 1, 2019

openshift-ci-robot commented Feb 1, 2019

tnozicka commented Feb 5, 2019

openshift-ci-robot commented Feb 5, 2019

eparis commented Feb 19, 2019

akostadinov commented Feb 19, 2019

sferich888 commented Feb 25, 2019

akostadinov commented Feb 25, 2019