Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS security group (not created by kubernetes) deleted when deleting ELB #62204

Closed
pmahoney-raise opened this issue Apr 6, 2018 · 10 comments · Fixed by #74311
Closed

AWS security group (not created by kubernetes) deleted when deleting ELB #62204

pmahoney-raise opened this issue Apr 6, 2018 · 10 comments · Fixed by #74311
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@pmahoney-raise
Copy link

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

I have a Kubernetes (v1.7) in AWS. I created a Service of type LoadBalancer and used the annotation service.beta.kubernetes.io/aws-load-balancer-extra-security-groups to apply an extra security group to the ELB that gets created.

The extra security group was created outside of Kubernetes, with the expectation that it is not owned by Kubernetes and will be managed independently.

I deleted the Service resource. Kubernetes then deleted the ELB and my extra security group.

What you expected to happen:

I expect my extra security group to not be deleted.

How to reproduce it (as minimally and precisely as possible):

In AWS, create security group that is otherwise unused. In Kubernetes, create Service of type LoadBalancer, include the annotation service.beta.kubernetes.io/aws-load-balancer-extra-security-group with the previously created security group. Watch an ELB be created with that security group. Delete the Service from Kubernetes. Watch the ELB be deleted (as expected) and the security group be deleted (unexpected).

Anything else we need to know?:

It seems a workaround may be to ensure the security group is in use by at least one other resource within AWS so that the deletion attempt will fail with a DependencyVioloation. The deletion process will eventually timeout, if I understand the code correctly.

I've linked to v1.7.16, though I don't see any additional behavior on master, so I believe the bug is there as well.

Environment:

  • Kubernetes version (use kubectl version): v1.7.16
  • Cloud provider or hardware configuration: AWS
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Apr 6, 2018
@pmahoney-raise
Copy link
Author

@kubernetes/sig-aws-bugs

@k8s-ci-robot k8s-ci-robot added sig/aws and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 6, 2018
@k8s-ci-robot
Copy link
Contributor

@pmahoney-raise: Reiterating the mentions to trigger a notification:
@kubernetes/sig-aws-bugs

In response to this:

@kubernetes/sig-aws-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@akonkol
Copy link

akonkol commented May 3, 2018

I have also experienced this same issue luckily we have a very strict iam policy for k8s in aws and did not give the controller access to delete the 'extra-security-group` we specified in the annotation. There also seems to not be a try/catch around deleting security groups when deleting ELBs.

The flow looks like the following:

  1. Fetch security groups associated with ELB
  2. For each security group try to delete the security group, if the security group fails to delete do not move on to the next security group to delete

In this scenario the security group that is created by k8s for this ELB does not get deleted and is left behind.

Here are some relevant logs,
sg-01173d992404ffa5a is the k8s generated sg
sg-0a31cc1e0e04e741b is the security group specified in the service.beta.kubernetes.io/aws-load-balancer-extra-security-group annotation
We are using version 1.9.6

May 02 19:49:00 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:00.812217    8566 service_controller.go:763] Service has been deleted infra/some-service-pub
May 02 19:49:00 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:00.812567    8566 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"infra", Name:"some-service-pub", UID:"546996ec-4e3e-11e8-ae79-0261c564fb34", APIVersion:"v1", ResourceVersion:"22833", FieldPath:""}): type: 'Normal' reason: 'DeletingLoadBalancer' Deleting load balancer
May 02 19:49:00 ip-10-49-19-149 kube-controller-manager[8566]: W0502 19:49:00.862476    8566 aws.go:3687] Multiple security groups for load balancer: "a546996ec4e3e11e8ae790261c564fb3"
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:01.359131    8566 aws.go:3770] Removing rule for traffic from the load balancer (sg-0a31cc1e0e04e741b) to instance (sg-00c70d400c6f61de0)
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:01.394788    8566 aws.go:2615] Comparing sg-0a31cc1e0e04e741b to sg-01173d992404ffa5a
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:01.394811    8566 aws.go:2615] Comparing sg-0a31cc1e0e04e741b to sg-0a31cc1e0e04e741b
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:01.394823    8566 aws.go:2811] Removing security group ingress: sg-00c70d400c6f61de0 [{
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]:   IpProtocol: "-1",
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]:   UserIdGroupPairs: [{
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]:       GroupId: "sg-0a31cc1e0e04e741b"
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]:     }]
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: }]
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:01.675108    8566 aws.go:4005] Ignoring DependencyViolation while deleting load-balancer security group (sg-0074355be8fc583dc), assuming because LB is in process of deleting
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: E0502 19:49:01.818714    8566 service_controller.go:776] Failed to process service infra/some-service-pub. Retrying in 5s: error while deleting load balancer security group (sg-0a31cc1e0e04e741b): "UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: LeO7XNBflhfzC5oCKpcpQFg3l10rnXgzBVydOfInBJU2d5kTBaXPQWr6873Rui3aMPltlcyqxVeGz7VDElrJ9dQkBwH-Cg1Mqe54hIzN2EGa-WH4_SdJBX3UMZW0axCsKrm0FM3zn4xcKzG2AnV9DMzjB3B-Ui2yErwCN
May 02 19:49:01 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:01.819173    8566 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"infra", Name:"some-service-pub", UID:"546996ec-4e3e-11e8-ae79-0261c564fb34", APIVersion:"v1", ResourceVersion:"22833", FieldPath:""}): type: 'Warning' reason: 'DeletingLoadBalancerFailed' Error deleting load balancer (will retry): error while deleting load balancer security group (sg-0a31cc1e0e04e741b): "UnauthorizedOperation: You are not authorized to perform this operation.
uR36c3IfHYQzFr3A0DEKuzn_Qph_oYjqVPoUkZa-Rs5HBn-91NkaReL2j_f153P0kXtrK76oOSMj3ebA-3o3W76xxbwJG-7Fe6AUgscBeLqWNdeGqVzN4gfj7pces9fQD6pqBGSwUWOEIz-DVvugRI_hvMFKoRPhsVYA2gv_ssftthxvz_bkJ0sx3bN_N793V17mNs7hkrKgCEFEKF8zYxlyjKeZ2c-hI2eEm6pfoBDey_cVxw1-chj_Fi4bEfIG5-FORaBxJPR_1_1w\n\tstatus code: 403, request id: f744c985-227d-4bd0-ac44-719d6a1b000e"
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:02.235351    8566 aws.go:3770] Removing rule for traffic from the load balancer (sg-01173d992404ffa5a) to instance (sg-00c70d400c6f61de0)
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:02.273909    8566 aws.go:2615] Comparing sg-01173d992404ffa5a to sg-01173d992404ffa5a
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:49:02.273934    8566 aws.go:2811] Removing security group ingress: sg-00c70d400c6f61de0 [{
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]:   IpProtocol: "-1",
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]:   UserIdGroupPairs: [{
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]:       GroupId: "sg-01173d992404ffa5a"
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]:     }]
May 02 19:49:02 ip-10-49-19-149 kube-controller-manager[8566]: }]
May 02 19:51:05 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:51:05.614824    8566 service_controller.go:763] Service has been deleted infra/some-service-pub
May 02 19:51:05 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:51:05.615155    8566 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"infra", Name:"some-service-pub", UID:"546996ec-4e3e-11e8-ae79-0261c564fb34", APIVersion:"v1", ResourceVersion:"22833", FieldPath:""}): type: 'Normal' reason: 'DeletingLoadBalancer' Deleting load balancer
May 02 19:51:05 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:51:05.656225    8566 aws.go:3947] Load balancer already deleted: a546996ec4e3e11e8ae790261c564fb3
May 02 19:51:05 ip-10-49-19-149 kube-controller-manager[8566]: I0502 19:51:05.656286    8566 event.go:218] Event(v1.ObjectReference{Kind:"Service", Namespace:"infra", Name:"some-service-pub", UID:"546996ec-4e3e-11e8-ae79-0261c564fb34", APIVersion:"v1", ResourceVersion:"22833", FieldPath:""}): type: 'Normal' reason: 'DeletedLoadBalancer' Deleted load balancer

@2rs2ts
Copy link
Contributor

2rs2ts commented May 3, 2018

I believe the solution should be to only have k8s try to delete security groups that it has ownership tags for.

@rljohnsn
Copy link

Another solution could be while assembling the list of security groups to "delete" subtract the set from the annotation aws-load-balancer-extra-security-groups

@marciogarcianubeliu
Copy link

A "workaround" for this issue is to add the SG as a source in the nodes SG

@dylanrhysscott
Copy link

I'm hitting the same issue on this one and it's causing cluster networking issues - Kubernetes deletes a shared SG which breaks my API access - I have added the shared ownership tag to the security group but it is still removed. Any ideas?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 3, 2018
@2rs2ts
Copy link
Contributor

2rs2ts commented Dec 3, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 3, 2018
@campee
Copy link

campee commented Jan 10, 2019

This issue is affecting me as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
9 participants