Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need an API call for "teardown all external resources" #4630

Closed
zmerlynn opened this issue Feb 19, 2015 · 22 comments
Closed

Need an API call for "teardown all external resources" #4630

zmerlynn opened this issue Feb 19, 2015 · 22 comments
Assignees
Labels
area/cloudprovider area/teardown lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@zmerlynn
Copy link
Member

See #4627 / #4530: These are both the wrong approach, as also noted in #4411 (comment). We need to delete these things on the master, prior to deleting the VM itself. For system add-ons, this is basically the API hook necessary for #3579 cleanup, but it's also required for any user services that were created as well.

@zmerlynn zmerlynn added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Feb 19, 2015
@zmerlynn
Copy link
Member Author

cc @roberthbailey @jlowdermilk

@fgrzadkowski
Copy link
Contributor

cc @jszczepkowski

Can you please explain what this API would look like? I'm not sure that widening API for a central component is the right approach.
If external resource is tightly related to a resource (e.g. service) it'd be strange to have a separate API call just to remove it's external parts (e.g. load balancer). Also it may leave a resource (service) in an inconsistent state.
Additionally you will still need something like kube-down.sh to clear other things (e.g. remove machines etc.) so what's the benefit?

@fgrzadkowski
Copy link
Contributor

Just to be clear - I think that removing of internal services should be handled by master itself, but I see no reason why we need an API call to delete external resources for user defined services.

@zmerlynn
Copy link
Member Author

First, the meta-point: kube-down.sh isn't the only setup and teardown
"client". (GKE is, others might come along.) That was the meta-point for
#3579 as well. To the extent that we can push this logic onto the server,
we should.

The second meta-point: today ELBs are the biggest issue. There are
discussions around, say, firewall rules - those would also need teardown.
Or xyz cloud provider network widget. The point is, that shell script is
going to keep creeping.

As far as what the API looks like, we were envisioning something like:
"kubectl clusterteardown", that made one API call to the master (call it
teardown, or destroy, or destroyExternalResources or whatever) that could
even just have a loop on Go very similar to rejected PR #4530. (And yes,
we'd make one addition to kube-down.sh, right before the VM itself was
annihilated.)

The API should probably be best-effort semantics, since it's going down
anyways.
On Feb 23, 2015 12:08 AM, "Filip Grzadkowski" notifications@github.com
wrote:

cc @jszczepkowski https://github.com/jszczepkowski

Can you please explain what this API would look like? I'm not sure that
widening API for a central component is the right approach.
If external resource is tightly related to a resource (e.g. service) it'd
be strange to have a separate API call just to remove it's external parts
(e.g. load balancer). Also it may leave a resource (service) in an
inconsistent state.
Additionally you will still need something like kube-down.sh to clear
other things (e.g. remove machines etc.) so what's the benefit?


Reply to this email directly or view it on GitHub
#4630 (comment)
.

@zmerlynn
Copy link
Member Author

I just saw your next comment. Why a distinction between user services and
add-ons here? They both represent external resources owned by cluster
services. Some might be owned by the user, so you could argue that we need
policy bits like "don't delete on teardown", but that doesn't mean the
client should delete them.
On Feb 23, 2015 6:42 AM, "Zachary Loafman" zml@google.com wrote:

First, the meta-point: kube-down.sh isn't the only setup and teardown
"client". (GKE is, others might come along.) That was the meta-point for
#3579 as well. To the extent that we can push this logic onto the server,
we should.

The second meta-point: today ELBs are the biggest issue. There are
discussions around, say, firewall rules - those would also need teardown.
Or xyz cloud provider network widget. The point is, that shell script is
going to keep creeping.

As far as what the API looks like, we were envisioning something like:
"kubectl clusterteardown", that made one API call to the master (call it
teardown, or destroy, or destroyExternalResources or whatever) that could
even just have a loop on Go very similar to rejected PR #4530. (And yes,
we'd make one addition to kube-down.sh, right before the VM itself was
annihilated.)

The API should probably be best-effort semantics, since it's going down
anyways.
On Feb 23, 2015 12:08 AM, "Filip Grzadkowski" notifications@github.com
wrote:

cc @jszczepkowski https://github.com/jszczepkowski

Can you please explain what this API would look like? I'm not sure that
widening API for a central component is the right approach.
If external resource is tightly related to a resource (e.g. service) it'd
be strange to have a separate API call just to remove it's external parts
(e.g. load balancer). Also it may leave a resource (service) in an
inconsistent state.
Additionally you will still need something like kube-down.sh to clear
other things (e.g. remove machines etc.) so what's the benefit?


Reply to this email directly or view it on GitHub
#4630 (comment)
.

@zmerlynn
Copy link
Member Author

Also, to be clear, the API can just outright delete the services, too. I don't see a reason it has to delete the underlying resources versus the services themselves, since it's running in the shutdown path. If we really want to be delicate, we can terminate all objects (#1535) first.

@jszczepkowski jszczepkowski self-assigned this Feb 25, 2015
@jszczepkowski
Copy link
Contributor

I'll be happy working on this. I hope no one is working on it now.

@roberthbailey roberthbailey added this to the v1.0 milestone Mar 2, 2015
jszczepkowski added a commit to jszczepkowski/kubernetes that referenced this issue Mar 4, 2015
Implementation of master call "/teardown" which removes all external resources used by kubernetes cluseter (currently, external load balancers are removed).
 Related to kubernetes#4630.
@alex-mohr
Copy link
Contributor

We need to support deletion of clusters, both for GKE and for e.g. e2e tests. The master will create and delete various CloudProvider objects, so it should own those objects. We need a way to (a) crease accepting new objects, (b) change desired state for existing objects to does-not-exist, (c) block until the CloudProvider reconciler (or equivalent) finishes actually deleting all of those, then (d) the master can be deleted.

Given the master knows what it created and has code to delete such things (for whatever version of k8s it's running), we should use the master itself to clean up a cluster that needs to be deleted, not require some out-of-band tool to do so.

@bgrant0607
Copy link
Member

Discussion is occurring in #5025.

@brendandburns
Copy link
Contributor

I don't think that this makes the 1.0 cut.

@brendandburns brendandburns modified the milestones: v1.0-bubble, v1.0 Mar 23, 2015
@alex-mohr
Copy link
Contributor

I don't think that this makes the 1.0 cut.

@brendandburns Without this, we orphan resources in GCE on cluster delete. And if e.g. user spins up a new cluster with the same name, there will be all sorts of fun from dangling rules. I think this falls under operational reliability.

@lavalamp
Copy link
Member

I don't think a 'protected' field is necessary for this. Today we have RBAC, finalizers, and GC. I think a client w/ super admin powers could delete all namespaces and then wait for the namespace count to go to 0. (There's probably a corner case or two around the default and kube-system namespaces that this would turn up.)

@zmerlynn
Copy link
Member Author

I might be missing it, but I haven't found a way to delete default and kube-system. If you come up with one, I'll close the bug. :)

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 1, 2018
@roberthbailey
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2018
@roberthbailey
Copy link
Contributor

It still isn't possible to delete the following namespaces: kube-system, kube-public, default. So we can't rely on finalizers / GC for resources in those namespaces.

We also don't have a way to put the apiserver into a lame duck mode to prevent new namespaces from being created during cluster teardown.

@bgrant0607 bgrant0607 mentioned this issue Jan 22, 2018
4 tasks
@bgrant0607
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 23, 2018
@neolit123
Copy link
Member

seems like a FR for api-machinery, that can eventually land in kubectl (sig-cli).
sig-cluster-lifecycle tools can adapt it via client-go if they need to.

/sig cli api-machinery

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Sep 3, 2020
@thockin thockin closed this as completed Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloudprovider area/teardown lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests