-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to delete targetGroup: timed out waiting for the condition #3037
Comments
For the logs, seems you are deleting The controller will first delete the LoadBalancer and then keep retry delete the TargetGroup. Would you help try check from cloudTrail on why the TargetGroup deletion failed? If it's due to the LoadBalancer not deleted, would you help check the AWS Tags on the LoadBalancer? If the LoadBalancer using that TargetGroup is still around, one possible scenario that could cause this is the AWS Tags on the LoadBalancer is removed by other external process(or manually) that is not the controller, so the controller won't delete the LoadBalancer(it only deletes a LoadBalancer if it thinks the LB is created by it via checking the AWS Tags). If there is no LoadBalancer using that TargetGroup, it could be a ELBv2 bug in terms of their eventual consistency model. |
Hi @M00nF1sh Thank you for your response. As requested, I checked the following things.
I think the tags are correct, but can you confirm what specific tags it's actually looking for? Also, I think it's trying to delete the target group before deleting the LB Listeners (not Load Balancer) |
Same issue here. Describe the bug {"level":"info","ts":1676524227.1348596,"logger":"controllers.ingress","msg":"successfully built model","model":"{\"id\":\"dongdgy/django-app\",\"resources\":{}}"}
{"level":"info","ts":1676524227.8399084,"logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-049319ea8a45797f1"}
{"level":"error","ts":1676524348.9529638,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"django-app","namespace":"dongdgy","error":"failed to delete securityGroup: timed out waiting for the condition"}
{"level":"info","ts":1676525348.9535875,"logger":"controllers.ingress","msg":"successfully built model","model":"{\"id\":\"dongdgy/django-app\",\"resources\":{}}"}
{"level":"info","ts":1676525349.8079634,"logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-049319ea8a45797f1"}
{"level":"error","ts":1676525471.042075,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"django-app","namespace":"dongdgy","error":"failed to delete securityGroup: timed out waiting for the condition"}
{"level":"info","ts":1676526471.0429358,"logger":"controllers.ingress","msg":"successfully built model","model":"{\"id\":\"dongdgy/django-app\",\"resources\":{}}"}
{"level":"info","ts":1676526471.7085032,"logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-049319ea8a45797f1"}
{"level":"error","ts":1676526592.7582066,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"django-app","namespace":"dongdgy","error":"failed to delete securityGroup: timed out waiting for the condition"}
{"level":"info","ts":1676527592.7585614,"logger":"controllers.ingress","msg":"successfully built model","model":"{\"id\":\"dongdgy/django-app\",\"resources\":{}}"}
{"level":"info","ts":1676527593.424313,"logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-049319ea8a45797f1"}
{"level":"error","ts":1676527714.5189643,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"django-app","namespace":"dongdgy","error":"failed to delete securityGroup: timed out waiting for the condition"} The securtity group created by the ALB is referenced by EKS Control Plane Security Group as follows, so failed to delete the ingress as the sg got dependent object: Here are the cloudtrail logs: {
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "AROA2RRFIHV62HYHNH25V:1676517067916154489",
"arn": "arn:aws:sts::123456:assumed-role/AmazonEKSLoadBalancerControllerRole-eks-1-20/1676517067916154489",
"accountId": "123456",
"accessKeyId": "ASIA2RRFIHV673SALUCG",
"sessionContext": {
"sessionIssuer": {
"type": "Role",
"principalId": "AROA2RRFIHV62HYHNH25V",
"arn": "arn:aws:iam::123456:role/AmazonEKSLoadBalancerControllerRole-eks-1-20",
"accountId": "123456",
"userName": "AmazonEKSLoadBalancerControllerRole-eks-1-20"
},
"webIdFederationData": {
"federatedProvider": "arn:aws:iam::123456:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/123456",
"attributes": {}
},
"attributes": {
"creationDate": "2023-02-16T03:11:08Z",
"mfaAuthenticated": "false"
}
}
},
"eventTime": "2023-02-16T03:35:59Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "DeleteSecurityGroup",
"awsRegion": "us-east-1",
"sourceIPAddress": "123456",
"userAgent": "elbv2.k8s.aws/v2.4.4 aws-sdk-go/1.42.27 (go1.18.6; linux; amd64)",
"errorCode": "Client.DependencyViolation",
"errorMessage": "resource sg-049319ea8a45797f1 has a dependent object",
"requestParameters": {
"groupId": "sg-049319ea8a45797f1"
},
"responseElements": null,
"requestID": "d2229571-6c6b-42bf-a93a-bbd939124a4f",
"eventID": "6d7adc5b-5748-4d22-acee-864f1fdf2b11",
"readOnly": false,
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "123456",
"eventCategory": "Management",
"tlsDetails": {
"tlsVersion": "TLSv1.2",
"cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
"clientProvidedHostHeader": "ec2.us-east-1.amazonaws.com"
}
} With delete the rule manually, the ingress can be deleted. Environment Version: 1.20 Is this issue already documented on https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/deploy/upgrade/migrate_v1_v2/ as follows? when security-groups annotation isn't used: a managed SecurityGroup will be created and attached to ALB. This SecurityGroup will be preserved. The AWSALBIngressController didn't add any description for that inbound rule. The AWSLoadBalancerController will use elbv2.k8s.aws/targetGroupBinding=shared for that inbound rule You'll need to manually add elbv2.k8s.aws/targetGroupBinding=shared description to that inbound rule so that AWSLoadBalancerController can delete such rule when you delete your Ingress. The following shell pipeline can be used to update the rules automatically. Replace $REGION and $SG_ID with your own values. After running it change DryRun: true to DryRun: false to have it actually update your security group: |
After added description to the rule, the auto created sg is deleted, still ingress stuck LOGS {"level":"info","ts":1676542266.5007129,"logger":"controllers.ingress","msg":"deleting loadBalancer","arn":"arn:aws:elasticloadbalancing:us-east-1:123456:loadbalancer/app/a7b3dc48-dongdgy-testapp-5620/3e4c4ea48a816a42"}
{"level":"info","ts":1676542266.5845156,"logger":"controllers.ingress","msg":"deleted loadBalancer","arn":"arn:aws:elasticloadbalancing:us-east-1:123456:loadbalancer/app/a7b3dc48-dongdgy-testapp-5620/3e4c4ea48a816a42"}
{"level":"info","ts":1676542266.5846033,"logger":"controllers.ingress","msg":"deleting securityGroup","securityGroupID":"sg-01b8891afec5c8cae"}
{"level":"info","ts":1676542283.356179,"logger":"controllers.ingress","msg":"deleted securityGroup","securityGroupID":"sg-01b8891afec5c8cae"}
{"level":"info","ts":1676542283.3562045,"logger":"controllers.ingress","msg":"successfully deployed model","ingressGroup":"dongdgy/test-app"}
{"level":"info","ts":1676542283.3562562,"logger":"backend-sg-provider","msg":"No ingress found, backend SG can be deleted","SG ID":"sg-03a3ef9a0e380bb70"}
{"level":"info","ts":1676542283.35627,"logger":"backend-sg-provider","msg":"No ingress found, backend SG can be deleted","SG ID":"sg-03a3ef9a0e380bb70"} |
@rushikesh-outbound If so, you are likely hit by this bug, where you created the service You should avoid create two service with same |
@hitsub2 |
The ingress still can not be deleted and stuck, the load balancer controller outputs are as follows when try to delete the ingress. {"level":"info","ts":1676597388.777729,"logger":"backend-sg-provider","msg":"No ingress found, backend SG can be deleted","SG ID":"sg-03a3ef9a0e380bb70"}
{"level":"info","ts":1676597388.7777433,"logger":"backend-sg-provider","msg":"No ingress found, backend SG can be deleted","SG ID":"sg-03a3ef9a0e380bb70"}
{"level":"error","ts":1676597509.8300896,"logger":"controller-runtime.manager.controller.ingress","msg":"Reconciler error","name":"test-app","namespace":"dongdgy","error":"failed to delete securityGroup: timed out waiting for the condition"} Here is the logs from cloudtrail {
"eventVersion": "1.08",
"userIdentity": {
"type": "AssumedRole",
"principalId": "AROA2RRFIHV62HYHNH25V:1676595144596144559",
"arn": "arn:aws:sts::123456:assumed-role/AmazonEKSLoadBalancerControllerRole-eks-1-20/1676595144596144559",
"accountId": "123456",
"accessKeyId": "ASIA2RRFIHV64E77JXPQ",
"sessionContext": {
"sessionIssuer": {
"type": "Role",
"principalId": "AROA2RRFIHV62HYHNH25V",
"arn": "arn:aws:iam::123456:role/AmazonEKSLoadBalancerControllerRole-eks-1-20",
"accountId": "123456",
"userName": "AmazonEKSLoadBalancerControllerRole-eks-1-20"
},
"webIdFederationData": {
"federatedProvider": "arn:aws:iam::123456:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/123456",
"attributes": {}
},
"attributes": {
"creationDate": "2023-02-17T00:52:24Z",
"mfaAuthenticated": "false"
}
}
},
"eventTime": "2023-02-17T01:31:49Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "DeleteSecurityGroup",
"awsRegion": "us-east-1",
"sourceIPAddress": "123456",
"userAgent": "elbv2.k8s.aws/v2.4.4 aws-sdk-go/1.42.27 (go1.18.6; linux; amd64)",
"errorCode": "Client.DependencyViolation",
"errorMessage": "resource sg-03a3ef9a0e380bb70 has a dependent object",
"requestParameters": {
"groupId": "sg-03a3ef9a0e380bb70"
},
"responseElements": null,
"requestID": "1fc5038c-dfc1-4c8e-8444-9115277226fa",
"eventID": "48426b0d-113c-4615-9c75-ad7a2b2df94d",
"readOnly": false,
"eventType": "AwsApiCall",
"managementEvent": true,
"recipientAccountId": "123456",
"eventCategory": "Management",
"tlsDetails": {
"tlsVersion": "TLSv1.2",
"cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
"clientProvidedHostHeader": "ec2.us-east-1.amazonaws.com"
}
} The sg has two ENIs and ENI bounded to another two sgs. |
Hi @M00nF1sh sorry for the long delay in response. Yes, I used However, we planning to shift from nginx to alb ingress controllers, so this may not the blocker for us right now. Also its happening only when destroying the resources. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Any update on this issue? We had on a NLB |
I also encountered the same thing. When the ingress was not completely deleted, then the creation was applied again, and the deletion was performed again during creation. At this time, you can see that the target group cannot be deleted normally. My eks version is 1.23. |
When I re-apply ingress and the associated svc, and then manually use kubectl delete svc to delete the svc associated with ingress, the target group can be successfully deleted. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Describe the bug
Sometimes
aws-load-balancer-controller
is stuck deleting the load balancers and target groups it has created.I think it may be because it tries to delete the target group before deleting the load balancer or listeners. in that case, it's obvious that target groups will not be deleted because they are associated with the listeners.
Most of the time it works correctly, but not always.
Steps to reproduce
Expected outcome
Load balancer, target groups and other resources must be deleted on every attempt
Environment
Additional Context:
Snap of the logs from
aws-load-balancer-controller
:The text was updated successfully, but these errors were encountered: