[release-1.5] feat: external load balancer garbage collection (part 2) - new gc service #3632

richardcase · 2022-07-29T11:28:32Z

This is a manual cherry-pick of #3610

What type of PR is this?

/kind feature

What this PR does / why we need it:

This introduces a new garbage collection service which will be used later by the controllers to do clean up of AWS resources for workload clusters that were created via the CCM. Initially, we support deleting load balancers (classic elb, nlb, alb), target groups and security groups.

The service is design to be called by the infra cluster reconcilers when a cluster is delete. The entry point is ReconcileDelete into the service.

When a cluster is deleted the cluster controllers will call ReconcileDelete. The first job is to determine if the workload cluster being deleted should be garbage collected. A cluster will be garbage collected if either of these are true:

the aws.cluster.x-k8s.io/external-resource-gc annotation is absent
the aws.cluster.x-k8s.io/external-resource-gc annotation exists and its value is set to true

If a cluster is to be garbage collected then we need to identify the AWS resources that where created by the CCM in the workload cluster. This is done by using the AWS resource tagging api and getting resources with the Kubernetes cluster owned label (i.e. kubernetes.io/cluster/[CLUSTERNAME]). The resources that are returned are converted and then the list of them is passed to the defined cleanup functions in order. The order of the functions matter because some AWS resources cannot be delete before others. For example, target groups can only be deleted after load balancers. The cleanup function will then iterate over the resources and decide if it wants to handle that resource and delete it from AWS.

The service will be use by a later PR to the controllers and so is initially unused.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Relates #1718

Special notes for your reviewer:

This part 2 of 4 of changes (i.e. a stack) to implement the garbage collection. This is work to split up the original pr #3518

Checklist:

squashed commits
adds unit tests

k8s-ci-robot · 2022-07-29T11:28:40Z

@richardcase: This issue is currently awaiting triage.

If CAPA/CAPI contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sedefsavas · 2022-07-29T12:18:19Z

/lgtm

sedefsavas · 2022-07-29T13:18:15Z

/test ?

k8s-ci-robot · 2022-07-29T13:18:16Z

@sedefsavas: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-build-release-1-5
/test pull-cluster-api-provider-aws-e2e-blocking-release-1-5
/test pull-cluster-api-provider-aws-test-release-1-5
/test pull-cluster-api-provider-aws-verify-release-1-5

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-release-1-5
/test pull-cluster-api-provider-aws-e2e-conformance-release-1-5
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts-release-1-5
/test pull-cluster-api-provider-aws-e2e-eks-release-1-5
/test pull-cluster-api-provider-aws-e2e-release-1-5

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-release-1-5
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-build-release-1-5
pull-cluster-api-provider-aws-e2e-blocking-release-1-5
pull-cluster-api-provider-aws-test-release-1-5
pull-cluster-api-provider-aws-verify-release-1-5

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sedefsavas · 2022-07-29T13:19:34Z

We might need to backport a linter update: #3617

sedefsavas · 2022-07-29T21:11:18Z

#3637 should fix it.

This introduces a new garbage collection service which will be used later by the controllers to do clean up of AWS resources for child/tenant clusters that where created via the CCM. Initially we support clearing up load balancers (classic elb, nlb, alb) and security groups. It works in 2 phases depending on if a child cluster is being created/update (i.e. **Reconcile**) or deleted (i.e. **ReconcileDelete**). When a cluster is created the cluster controllers will call **Reconcile**. Its purpose is to deteremine if the cluster should be garbage collected and if it should then it needs marked. If the gc feature is enabled we will operate an opt-out model, so all clusters will be garbage collected unless they exlicitly opt out. To opt out the infra cluster must have the `aws.cluster.x-k8s.io/external-resource-gc` annotation with a value of false. Otherwise a cluster is marked as requiring gc by adding the `awsexternalresourcegc.infrastructure.cluster.x-k8s.io` finalizer to the infra cluster. When a cluster is deleted the cluster controllers will call **ReconcileDelete**. The first job is to identify the AWS resources that where created by the CCM in the child/tenant cluster. This is done by using the AWS resource tagging api and getting resources with the kubernetes cluster owned label. The resources that are returned are then put into buckets for each AWS resource type (i.e. ec2, elasticloadbalancing) and then these buckets of resource ARNs are passed to a function for that specific AWS service which will do teh actual API calls to clear up the AWS resources. The reason we use buckets is that the order in which you delete services can matter, for example load balancers must be deleted before target groups. The service will be use by a later change to the controllers. Signed-off-by: Richard Case <richard.case@outlook.com>

Changed the gc service based on the lateset proposal update & review feedback. It now uses a model where "the user decides whether to enable the gc at any time before they delete the cluster.". This translates into the `ReconcileDelete` now: - Checks to see ig the cluster should be garbage collected by looking at the gc annotation (if it exists). - Does resource cleanup Signed-off-by: Richard Case <richard.case@outlook.com>

The the cleanup functions now accepted the full list of AWS resources and then they filter which resources they want to delete themselves. Signed-off-by: Richard Case <richard.case@outlook.com>

richardcase · 2022-08-01T10:33:10Z

/test pull-cluster-api-provider-aws-e2e-blocking-release-1-5

richardcase · 2022-08-01T10:33:28Z

/test pull-cluster-api-provider-aws-verify-release-1-5

Ankitasw · 2022-08-01T10:54:32Z

@richardcase some issue with multitenancy test case 🤔 ?

richardcase · 2022-08-01T15:50:29Z

/test pull-cluster-api-provider-aws-e2e-blocking-release-1-5

richardcase · 2022-08-03T12:46:09Z

All passing now :)

/approve

k8s-ci-robot · 2022-08-03T12:46:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [richardcase]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Skarlso · 2022-08-03T12:58:31Z

/lgtm

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 29, 2022

k8s-ci-robot requested review from Ankitasw and sedefsavas July 29, 2022 11:28

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 29, 2022

richardcase changed the title ~~[release-1.5] 1718 gc new service 1 5~~ [release-1.5] feat: external load balancer garbage collection (part 2) - new gc service Jul 29, 2022

k8s-ci-robot assigned sedefsavas Jul 29, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2022

richardcase added 3 commits August 1, 2022 11:04

refactor: cleanup functions filter resource

a65ccb2

The the cleanup functions now accepted the full list of AWS resources and then they filter which resources they want to delete themselves. Signed-off-by: Richard Case <richard.case@outlook.com>

richardcase force-pushed the 1718_gc_new_service_1-5 branch from 74c8471 to a65ccb2 Compare August 1, 2022 10:11

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2022

k8s-ci-robot assigned Skarlso Aug 3, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2022

k8s-ci-robot merged commit 6e634b9 into kubernetes-sigs:release-1.5 Aug 3, 2022

dlipovetsky mentioned this pull request Oct 11, 2022

[release-1.5] fix: use v1beta1 version in EKS user kubeconfig #3763

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release-1.5] feat: external load balancer garbage collection (part 2) - new gc service #3632

[release-1.5] feat: external load balancer garbage collection (part 2) - new gc service #3632

richardcase commented Jul 29, 2022

k8s-ci-robot commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

k8s-ci-robot commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

richardcase commented Aug 1, 2022

richardcase commented Aug 1, 2022

Ankitasw commented Aug 1, 2022

richardcase commented Aug 1, 2022

richardcase commented Aug 3, 2022

k8s-ci-robot commented Aug 3, 2022

Skarlso commented Aug 3, 2022

[release-1.5] feat: external load balancer garbage collection (part 2) - new gc service #3632

[release-1.5] feat: external load balancer garbage collection (part 2) - new gc service #3632

Conversation

richardcase commented Jul 29, 2022

This is a manual cherry-pick of #3610

k8s-ci-robot commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

k8s-ci-robot commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

sedefsavas commented Jul 29, 2022

richardcase commented Aug 1, 2022

richardcase commented Aug 1, 2022

Ankitasw commented Aug 1, 2022

richardcase commented Aug 1, 2022

richardcase commented Aug 3, 2022

k8s-ci-robot commented Aug 3, 2022

Skarlso commented Aug 3, 2022