AWS NLB linger after they're orphaned #1718

voor · 2020-05-07T21:43:58Z

/kind bug

What steps did you take and what happened:
Workload clusters are able to create Network Load Balancers instead of Classic ELBs, you just add an annotation service.beta.kubernetes.io/aws-load-balancer-type: nlb (read more), we need to identify those Load Balancers by tag and delete them as well when the associated workload cluster is destroyed.

Referenced here in a conversation on the topic.

What did you expect to happen:
NLB are deleted alongside the cluster.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Cluster-api-provider-aws version: v0.5.2
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

vincepri · 2020-05-07T22:33:27Z

/milestone v0.5.x
/help

k8s-ci-robot · 2020-05-07T22:33:27Z

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/milestone v0.5.x
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bagnaram · 2020-05-15T19:38:10Z

/assign

bagnaram · 2020-05-15T20:15:41Z

It is probably safe to assume that we will need an additional elbv2 service to handle the cleanup of the NLB resources

randomvariable · 2020-05-18T11:41:53Z

The issue is that you essentially need to drain the workload cluster of services, so that the cloud provider tears down the NLBs. It's not specifically a CAPA problem IMHO, or you add logic to CAPA to handle resources created by the cloud provider integration. Worth adding to the agenda for the meeting, as I'm a bit weary of crossing responsibility boundaries here.

detiber · 2020-05-18T15:35:31Z

@randomvariable that is a good point, we could remove the current ELB Classic cleanup we do in favor of moving the core functionality into core cluster api, where we could delete all Services w/ Type=LoadBalancer prior to deletion of a given Cluster. That would then cover any similar issues that would arise with other infrastructure providers as well.

randomvariable · 2020-05-18T15:39:55Z

That could be good. Did suggest to @nckturner, @justinsb, @andrewsykim that we could use the test framework to set up CI for the cloud provider repo. Having CAPI take care of auto-deleting service type="load balancers" would be a neat trick.

randomvariable · 2020-05-19T09:13:42Z

We forgot to discuss this in yesterday's meeting. I'll file an issue to Cluster API and add it to the agenda.

fejta-bot · 2020-11-12T13:44:08Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

voor · 2020-11-12T13:59:58Z

/remove-lifecycle stale
/remove-lifecycle frozen

voor · 2020-11-12T14:00:37Z

/lifecycle frozen

sedefsavas · 2021-11-01T17:50:30Z

Closing in favor of kubernetes-sigs/cluster-api#3075

richardcase · 2022-06-09T10:58:08Z

@sedefsavas - this issue is impacting some customers so going to reopen with this plan in mind:

Fix this in CAPA for the time being, so that we can get this fixed quicker for the people its impacting.
Will put the functionality behind a feature flag.
Then i'll help with working this into upstream via proposal/change based on Drain workload clusters of service Type=Loadbalancer on teardown cluster-api#3075
Once its available upstream we can deprecate

/reopen
/assign
/lifecycle active

k8s-ci-robot · 2022-06-09T10:58:18Z

@richardcase: Reopened this issue.

In response to this:

@sedefsavas - this issue is impacting some customers so going to reopen with this plan in mind:

Fix this in CAPA for the time being, so that we can get this fixed quicker for the people its impacting.

Will put the functionality behind a feature flag.

Then i'll help with working this into upstream via proposal/change based on Drain workload clusters of service Type=Loadbalancer on teardown cluster-api#3075

Once its available upstream we can deprecate

/reopen
/assign
/lifecycle active

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

richardcase · 2022-06-09T10:59:01Z

/priority critical-urgent

richardcase · 2022-06-09T10:59:12Z

/triage accepted

richardcase · 2022-06-09T15:53:12Z

Just tested this scenario and it does occur. The delete of the cluster gets stuck because of

E0609 15:49:16.022022      14 awsmanagedcontrolplane_controller.go:292] controller/awsmanagedcontrolplane "msg"="error deleting network for AWSManagedControlPlane" "error"="failed  │
│ to detach internet gateway \"igw-0f81d9e12a5a97bf2\": DependencyViolation: Network vpc-0b06fcdbbc37ab172 has some mapped public address(es). Please unmap those public address(es) b │
│ efore detaching the gateway.\n\tstatus code: 400, request id: 65dc0fa0-584f-4256-baf5-a2aac2d2dde4" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="AWSManaged │
│ ControlPlane" "name"="capi-managed-test-control-plane" "namespace"="default"                                                                                                         │
│ I0609 15:49:16.022130      14 recorder.go:103] events "msg"="Warning"  "message"="Failed to detach Internet Gateway \"igw-0f81d9e12a5a97bf2\" from VPC \"vpc-0b06fcdbbc37ab172\": De │
│ pendencyViolation: Network vpc-0b06fcdbbc37ab172 has some mapped public address(es). Please unmap those public address(es) before detaching the gateway.\n\tstatus code: 400, reques │
│ t id: 65dc0fa0-584f-4256-baf5-a2aac2d2dde4" "object"={"kind":"AWSManagedControlPlane","namespace":"default","name":"capi-managed-test-control-plane","uid":"adefde7f-760d-453d-b81e- │
│ cde2461ccdd6","apiVersion":"controlplane.cluster.x-k8s.io/v1beta1","resourceVersion":"20624"} "reason"="FailedDetachInternetGateway"

And the load balancer still exists.

sedefsavas · 2022-06-09T17:06:34Z

Looks like we only clean up the CCM created ELBs but not NLBs.

richardcase · 2022-06-09T17:49:57Z

Looks like we only clean up the CCM created ELBs but not NLBs.

@sedefsavas - spot on :)

richardcase · 2022-07-25T17:00:29Z

/milestone v1.5.0

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 7, 2020

k8s-ci-robot added this to the v0.5.x milestone May 7, 2020

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label May 7, 2020

k8s-ci-robot assigned bagnaram May 15, 2020

randomvariable mentioned this issue May 19, 2020

Drain workload clusters of service Type=Loadbalancer on teardown kubernetes-sigs/cluster-api#3075

Closed

voor mentioned this issue May 26, 2020

Allow kapp-controller to change context to another cluster carvel-dev/kapp-controller#7

Closed

randomvariable added this to Backlog in Cluster API Provider AWS Aug 6, 2020

randomvariable modified the milestones: v0.5.x, v0.6.x Aug 14, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2020

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 12, 2020

randomvariable modified the milestones: v0.6.x, Next Mar 11, 2021

sedefsavas closed this as completed Nov 1, 2021

k8s-ci-robot assigned richardcase Jun 9, 2022

k8s-ci-robot reopened this Jun 9, 2022

k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority labels Jun 9, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 9, 2022

richardcase mentioned this issue Jun 9, 2022

feat: external load balancer garbage collection #3518

Closed

6 tasks

This was referenced Jul 21, 2022

feat: external load balancer garbage collection (part 1) - proposal #3609

Merged

feat: external load balancer garbage collection (part 2) - new gc service #3610

Merged

k8s-ci-robot modified the milestones: Next, v1.5.0 Jul 25, 2022

k8s-ci-robot closed this as completed in #3648 Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS NLB linger after they're orphaned #1718

AWS NLB linger after they're orphaned #1718

voor commented May 7, 2020

vincepri commented May 7, 2020

k8s-ci-robot commented May 7, 2020

bagnaram commented May 15, 2020

bagnaram commented May 15, 2020

randomvariable commented May 18, 2020 •

edited

Loading

detiber commented May 18, 2020

randomvariable commented May 18, 2020

randomvariable commented May 19, 2020

fejta-bot commented Nov 12, 2020

voor commented Nov 12, 2020

voor commented Nov 12, 2020

sedefsavas commented Nov 1, 2021

richardcase commented Jun 9, 2022

k8s-ci-robot commented Jun 9, 2022

richardcase commented Jun 9, 2022

richardcase commented Jun 9, 2022

richardcase commented Jun 9, 2022

sedefsavas commented Jun 9, 2022

richardcase commented Jun 9, 2022

richardcase commented Jul 25, 2022

AWS NLB linger after they're orphaned #1718

AWS NLB linger after they're orphaned #1718

Comments

voor commented May 7, 2020

vincepri commented May 7, 2020

k8s-ci-robot commented May 7, 2020

bagnaram commented May 15, 2020

bagnaram commented May 15, 2020

randomvariable commented May 18, 2020 • edited Loading

detiber commented May 18, 2020

randomvariable commented May 18, 2020

randomvariable commented May 19, 2020

fejta-bot commented Nov 12, 2020

voor commented Nov 12, 2020

voor commented Nov 12, 2020

sedefsavas commented Nov 1, 2021

richardcase commented Jun 9, 2022

k8s-ci-robot commented Jun 9, 2022

richardcase commented Jun 9, 2022

richardcase commented Jun 9, 2022

richardcase commented Jun 9, 2022

sedefsavas commented Jun 9, 2022

richardcase commented Jun 9, 2022

richardcase commented Jul 25, 2022

randomvariable commented May 18, 2020 •

edited

Loading